I’m a postdoc at TakeLab, University of Zagreb. My research interests within natural language processing are faithful explainability, safety and controllability of language models.
Previously, I did a postdoc at the Technion working with Yonatan Belinkov working on unlearning and faithful explainability of language models. Before that, I did a postdoc at the UKP Lab at TU Darmstadt working with Iryna Gurevych on the InterText initiative. I obtained my PhD at the University of Zagreb under supervision of Jan Šnajder. Before my PhD I worked at the European Commission’s Joint Research Centre in Ispra on using NLP to update the the Sendai Framework for Disaster Risk Reduction 2015-2030.
I am on the job market for academic opportunities. Check my CV and reach out if you believe me a good fit.
News
May 2025.
- REVS, our gradient-free method for erasing sensitive information from language models has been accepted to the Findings of ACL 2025! [paper & code]
April 2025.
- We released a Mechanistic Interpretability Benchmark, a step towards standardizing evaluation in mechanistic interpretability! The paper describing our effort has been accepted to ICML 2025.
February 2025.
- We released a new preprint describing a parametric method of estimating faithfulness of CoTs [paper]
- I am substituting Jan Šnajder at the University of Zagreb during the summer semester, teaching Introduction to AI.
September 2024.
- Our paper on prompting models with pseudocode to improve their conditional reasoning capabilities is accepted to EMNLP 2024 main! [paper]
February 2024.
- I started as a postdoc at Technion, working with Yonatan Belinkov