I’m a postdoc at TakeLab, University of Zagreb. My research interests within natural language processing are faithful explainability, safety and controllability of language models.

Previously, I did a postdoc at the Technion working with Yonatan Belinkov working on unlearning and faithful explainability of language models. Before that, I did a postdoc at the UKP Lab at TU Darmstadt working with Iryna Gurevych on the InterText initiative. I obtained my PhD at the University of Zagreb under supervision of Jan Šnajder. Before my PhD I worked at the European Commission’s Joint Research Centre in Ispra on using NLP to update the the Sendai Framework for Disaster Risk Reduction 2015-2030.

I am on the job market for academic opportunities. Check my CV and reach out if you believe me a good fit.

News

May 2025.

REVS, our gradient-free method for erasing sensitive information from language models has been accepted to the Findings of ACL 2025! [paper & code]

April 2025.

We released a Mechanistic Interpretability Benchmark, a step towards standardizing evaluation in mechanistic interpretability! The paper describing our effort has been accepted to ICML 2025.

February 2025.

We released a new preprint describing a parametric method of estimating faithfulness of CoTs [paper]
I am substituting Jan Šnajder at the University of Zagreb during the summer semester, teaching Introduction to AI.

September 2024.

Our paper on prompting models with pseudocode to improve their conditional reasoning capabilities is accepted to EMNLP 2024 main! [paper]

February 2024.

I started as a postdoc at Technion, working with Yonatan Belinkov

Martin Tutek

News