Leonard F. Bereska
AI Safety Researcher | Mechanistic Interpretability Enthusiast | PhD Candidate at the University of Amsterdam.
4.125, LAB42
Science Park 900
1098XH Amsterdam
I’m Leonard, a PhD Candidate at the University of Amsterdam, dedicated to enhancing AI safety through mechanistic interpretability. My research aims to make transformer models more transparent and understandable, contributing to the broader goal of AI alignment.
research focus
My work revolves around reverse engineering neural networks into human-interpretable algorithms. I’m particularly interested in:
- Engineering monosemanticity and implementing sparse distillation techniques in transformer models.
- Investigating the relationship between mechanistic interpretability and adversarial robustness.
- Analyzing truth representations and simulacra in large language models.
- Applying singular learning theory to examine phase transitions in algorithmic tasks.
- Mechanistically interpreting prior-fitted tabular transformers.
- Creating sparse boolean circuits (inspired by computation in superposition) as testbeds and benchmarks for interpretability methods.
If you find any of these topics interesting, please reach out.
As part of the AI Safety Initiative Amsterdam, I’m actively involved in promoting AI safety research and awareness. We organize events, facilitate reading groups, and foster discussions on crucial AI safety topics.
I’m also passionate about nurturing the next generation of AI safety researchers. I’ve been involved in teaching courses and supervising numerous Master’s students on projects ranging from detecting bias, eliciting truth in LLMs, to interpretability in medical AI applications.
beyond research
When I’m not diving into the intricacies of neural networks, you might find me:
- Practicing yoga or meditation to maintain balance.
- Reading science-fiction novels (recently discovered Vernor Vinge’s work for a plausible treatment of AI singularity).
- Playing with my two chihuahuas, Cicchetti and Pancetta.
- Exploring Amsterdam’s culinary scene (always on the lookout for the best vegan spots!)
- Brushing up on my Mandarin or picking up Dutch phrases.
- Engaging in discussions about the future of AI and its implications for society.