I’m a postdoc at TakeLab, University of Zagreb. My research interests within natural language processing are faithful explainability, safety and controllability of language models.

Previously, I did a postdoc at the Technion working with Yonatan Belinkov working on unlearning and faithful explainability of language models. Before that, I did a postdoc at the UKP Lab at TU Darmstadt working with Iryna Gurevych on the InterText initiative. I obtained my PhD at the University of Zagreb under supervision of Jan Šnajder. Before my PhD I worked at the European Commission’s Joint Research Centre in Ispra on using NLP to update the the Sendai Framework for Disaster Risk Reduction 2015-2030.

I am on the job market for academic opportunities. Check my CV and reach out if you believe me a good fit.

News

October 2025.

We released a benchmark evaluating whether LLM agents are safe for use in managerial decisions. Check out the [paper & data]!

September 2025.

We released a new preprint on directly encoding contextual information into adapter parameters in a compositional manner! Check out the [paper]

August 2025.

FUR was accepted as an oral at EMNLP 2025 main, and (non-archival) at the Interplay workshop at COLM 2025! Check out the [paper].
We released a new preprint on using SAEs to precisely & permanently erase harmful concepts from LMs! Check out the preprint: [paper].

July 2025.

Two papers will be presented at the Interplay workshop at COLM 2025: FUR as non-archival [paper], and Predicting Success of Model Editing via Intrinsic Features as archival!

June 2025.

Our paper studying diachronic word embeddings trained on Croatian has been accepted to the Slavic NLP workshop at ACL 2025! Check out our [paper].

May 2025.

REVS, our gradient-free method for erasing sensitive information from language models has been accepted to the Findings of ACL 2025! Check out the [paper & code].

April 2025.

We released a Mechanistic Interpretability Benchmark, a step towards standardizing evaluation in mechanistic interpretability! The paper describing our effort has been accepted to ICML 2025.

September 2024.

Our paper on prompting models with pseudocode to improve their conditional reasoning capabilities is accepted to EMNLP 2024 main! Check out the [paper]

Martin Tutek

News