I’m a postdoc at TakeLab, University of Zagreb. My research interests within natural language processing are faithful explainability, safety and controllability of language models.
Previously, I did a postdoc at the Technion working with Yonatan Belinkov working on unlearning and faithful explainability of language models. Before that, I did a postdoc at the UKP Lab at TU Darmstadt working with Iryna Gurevych on the InterText initiative. I obtained my PhD at the University of Zagreb under supervision of Jan Šnajder. Before my PhD I worked at the European Commission’s Joint Research Centre in Ispra on using NLP to update the the Sendai Framework for Disaster Risk Reduction 2015-2030.
I am on the job market for academic opportunities. Check my CV and reach out if you believe me a good fit.
News
January 2026.
- Our benchmark evaluating LLM agent safety for use in realistic managerial decisions has been accepted to ICLR 2026! Check out the [paper & data]!
- Our work showing that repeating input sequences multiple times improves model performance on sequence labeling has been accepted to the Findings of EACL! Check out the paper!
November 2025.
- FUR has received an outstanding paper award at EMNLP 2025! If you haven’t yet, read our paper on using parametric interventions to measure CoT faithfulness!
- Our work presenting a benchmark investigating the capacity of LLMs to track and model local world states in conversations has been accepted to AAAI 2026 as an oral! Check out the paper!
September 2025.
- We released a new preprint on directly encoding contextual information into adapter parameters in a compositional manner! Check out the [paper]
August 2025.
- We released a new preprint on using SAEs to precisely & permanently erase harmful concepts from LMs! Check out the preprint: [paper].
July 2025.
- Predicting Success of Model Editing via Intrinsic Features is accepted to the Interplay workshop at COLM 2025!
June 2025.
- Our paper studying diachronic word embeddings trained on Croatian has been accepted to the Slavic NLP workshop at ACL 2025! Check out our [paper].
May 2025.
- REVS, our gradient-free method for erasing sensitive information from language models has been accepted to the Findings of ACL 2025! Check out the [paper & code].
April 2025.
- We released a Mechanistic Interpretability Benchmark, a step towards standardizing evaluation in mechanistic interpretability! The paper describing our effort has been accepted to ICML 2025.
