cv
Basics
Name | Leonard Friedrich Bereska |
Label | AI Safety Researcher |
leonard [dot] bereska [at] uva [dot] nl | |
Url | https://leonardbereska.github.io/ |
Summary | PhD Candidate at the University of Amsterdam, specializing in AI Safety and Mechanistic Interpretability. Focused on making AI systems more transparent, interpretable, and aligned with human values. |
Work
-
2021.10 - Present PhD Candidate
University of Amsterdam
Pioneering transformer model interpretability through monosemanticity engineering for enhanced AI safety. Focused on AI Alignment strategies to ensure long-term value preservation.
- AI Safety
- Mechanistic Interpretability
- Transformer Models
- Monosemanticity Engineering
-
2019.02 - 2021.09 Research Assistant
University of Heidelberg
Infused dendritic computation principles into neural networks. Explored novel optimization criteria for dynamical systems.
- Dendritic Computation
- Neural Networks
- Dynamical Systems
-
2017.08 - 2017.10 Research Intern
Central Institute of Mental Health
Investigated initialization schemes for a piecewise-linear recurrent neural network using expectation-maximization.
- Recurrent Neural Networks
- Initialization Schemes
- Expectation-Maximization
Volunteer
-
2023.09 - Present Amsterdam, Netherlands
Co-founder and Core Team Member
AI Safety Initiative Amsterdam
Co-founded and actively contribute to a group dedicated to promoting AI safety research and awareness in Amsterdam.
- Organized OpenAI Talk and Q&A on AI and Existential Risk
- Coordinated Panel Discussion on AI Risks: From Today to Doomsday
- Facilitated reading groups on AGI Safety Fundamentals
Education
-
2021.10 - Present Amsterdam, Netherlands
-
2016.09 - 2019.02 Heidelberg, Germany
MSc
University of Heidelberg
Computational Physics
- Visual Learning and Computer Vision
- Machine Learning
- Artificial Intelligence
- Time Series Analysis
-
2014.09 - 2015.07 Taipei, Taiwan
-
2012.09 - 2016.07 Heidelberg, Germany
-
2006.09 - 2012.07 Celle, Germany
Awards
- 2012
Prized by German Mathematical, Physical, and Chemical Societies
German Mathematical, Physical, and Chemical Societies
Recognized for outstanding performance in mathematics, physics, and chemistry during Abitur.
- 2023.12
2nd Place, AI Safety Hackathon
Entrepreneur First and Apart Research
Achieved second place in the AI Safety Hackathon held in Delft.
Certificates
ML Safety Course | ||
Dan Hendrycks, Center for AI Safety | 2023-08 |
Publications
-
2024.04 Mechanistic Interpretability for AI Safety - A Review
CoRR
Comprehensive review of mechanistic interpretability in AI, focusing on its applications in AI safety and potential to prevent catastrophic outcomes.
-
2023.10 Taming Simulators: Challenges, Pathways and Vision for the Alignment of Large Language Models
AAAI Inaugural Summer Symposium Series
Explored challenges and strategies for aligning large language models, emphasizing the concept of simulators and simulacra in AI alignment.
-
2022.11 Continual Learning of Dynamical Systems With Competitive Federated Reservoir Computing
Conference on Lifelong Learning Agents
Proposed a novel approach to continual learning of dynamical systems using competitive federated reservoir computing.
-
2022.06 Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems
International Conference on Machine Learning (ICML)
Developed an interpretable and tractable piecewise-linear RNN augmented with a linear spline basis expansion for inferring nonlinear dynamical systems.
-
2019.06 Unsupervised Part-Based Disentangling of Object Shape and Appearance
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Presented a novel approach for unsupervised disentanglement of object shape and appearance in computer vision (oral presentation, best paper finalist).
Skills
AI Safety Research | |
Mechanistic Interpretability | |
AI Alignment | |
Transformer Models | |
Monosemanticity Engineering |
Programming | |
Python | |
JAX | |
PyTorch | |
Functional Programming | |
Git | |
Bash | |
Linux | |
LaTeX |
Machine Learning | |
Deep Learning | |
Reinforcement Learning | |
Computer Vision | |
Natural Language Processing | |
Dynamical Systems |
Languages
German | |
Native |
English | |
Fluent |
Dutch | |
Conversational |
Mandarin | |
Conversational |
French | |
Basic |
Italian | |
Basic |
Latin | |
Advanced Latinum |
Ancient Greek | |
Graecum |
Old Hebrew | |
Hebraicum |
Interests
AI Safety | |
Alignment | |
Robustness | |
Transparency | |
Value Learning |
Mechanistic Interpretability | |
Neural Networks | |
Feature Visualization | |
Circuit Analysis |
Dynamical Systems | |
Nonlinear Dynamics | |
Chaos Theory | |
Time Series Analysis |