Leonard F. Bereska

AI Safety Researcher | Mechanistic Interpretability Enthusiast | PhD Candidate at the University of Amsterdam.

prof_pic.png

4.125, LAB42

Science Park 900

1098XH Amsterdam

I’m Leonard, a PhD Candidate at the University of Amsterdam, dedicated to enhancing AI safety through mechanistic interpretability. My research aims to make transformer models more transparent and understandable, contributing to the broader goal of AI alignment.

research focus

My work revolves around reverse engineering neural networks into human-interpretable algorithms. I’m particularly interested in:

If you find any of these topics interesting, please reach out.

As part of the AI Safety Initiative Amsterdam, I’m actively involved in promoting AI safety research and awareness. We organize events, facilitate reading groups, and foster discussions on crucial AI safety topics.

I’m also passionate about nurturing the next generation of AI safety researchers. I’ve been involved in teaching courses and supervising numerous Master’s students on projects ranging from detecting bias, eliciting truth in LLMs, to interpretability in medical AI applications.

beyond research

When I’m not diving into the intricacies of neural networks, you might find me:

  • Practicing yoga or meditation to maintain balance.
  • Reading science-fiction novels (recently discovered Vernor Vinge’s work for a plausible treatment of AI singularity).
  • Playing with my two chihuahuas, Cicchetti and Pancetta.
  • Exploring Amsterdam’s culinary scene (always on the lookout for the best vegan spots!)
  • Brushing up on my Mandarin or picking up Dutch phrases.
  • Engaging in discussions about the future of AI and its implications for society.

latest posts

selected publications

  1. Mechanistic Interpretability for AI Safety — A Review
    Leonard F. Bereska, and Efstratios Gavves
    TMLR, Apr 2024
  2. Taming Simulators: Challenges, Pathways and Vision for the Alignment of Large Language Models
    Leonard F. Bereska, and Efstratios Gavves
    AAAI-SS, Oct 2023