academic
Courses, student supervision, and miscellaneous academic activities.
talks, presentations and figures
I have given several talks on AI Safety and Mechanistic Interpretability:
- The AI Alignment Problem (slides) (keynote) (March, 2023, University of Amsterdam)
- Taming Simulators (slides) (keynote) (June, 2023, AAAI Symposium, Singapore)
- AI Safety for LLMs (slides) (keynote) (Nov. 2023, Keynote ELLIS BeNeLux, Delft)
- Mechanistic Interpretability google slides (May 2024, University of Amsterdam)
I have also created figures for research papers in keynote:
All slides and keynote files are licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA 4.0) license.
teaching
I have been involved in teaching several courses for the Master’s programs in Artificial Intelligence at the University of Amsterdam. Below is a brief description of each.
deep learning I & II (teaching assistant: WS 21/22, WS 22/23, SS 22, SS 23)
These courses cover the fundamentals and advanced topics in deep learning. For deep learning I, I was responsible for the topics: recurrent neural networks, dynamical systems, energy-based models, and graph neural networks. For deep learning II, I supervised student projects on bayesian deep learning, and reward hacking in reinforcement learning.
foundation models (lecturer)
In this course, I co-lectured a section on AI safety for large language models, covering:
- reinforcement learning from human feedback (RLHF) prepared by Leon Lang,
- developmental interpretability prepared by Tim Bakker,
- mechanistic interpretability prepared by me. Our lecture slides are available under this link to google slides.
conferences
- Reviewer for NeurIPS 2023, CVPR 2022, CVPR 2021, ICCV.
- Presented work at CoLLAs 2022 and AAAI Summer Symposium 2023.
summer schools and courses
I have participated in several summer schools and workshops to enhance my knowledge in AI and AI Safety:
- Mediteranean Machine Learning Summer School, Milan, Italy, September 2022.
- Nordic Probabilistic AI Summer School, Helsinki, Finland, June 2022.
- Dan Hendrycks’ Course on AI Safety, online, August 2023.
- AI Safety Hackathon, finished 2nd place (slides), Delft, December 2023.
- Human-Aligned AI Summer School, Prague, Czech Republic, July 2024.
extracurricular
As co-founder and part of the core team of the AI Safety Initiative Amsterdam, I have been involved in organizing various events:
- OpenAI Talk and Q&A on AI and Existential Risk (I managed the local Amsterdam event) (September 2023),
- Panel Discussion on AI Risks: From Today to Doomsday (slides) (October 2023),
- Reading groups on AGI Safety Fundamentals (Jan. - March 2024).
student supervision
I have supervised and co-supervised several Master’s students in their thesis projects and research:
student name | project title | program | my role | period |
---|---|---|---|---|
Jochem Hoelscher | Feature-Conditional Diffusion: A Novel Approach to Neural Network Interpretation | MSc Artificial Intelligence | Co-Supervisor with Tim Bakker | July 2024 - Present |
Kieron Kretschmar | The Whole Truth and Nothing but the Truth? On Representations of Truth in Language Models | MSc Artificial Intelligence | Daily Supervisor with Cadenza Labs | January 2024 - Present |
Benjamin Shaffrey | Phase Transitions In Algorithmic Tasks | MSc Artificial Intelligence | Daily Supervisor | April 2024 - Present |
Derck Prinzhorn | Mechanistic Interpretability of Adversarially Robust Models | Project AI (MSc Artificial Intelligence) | Daily Supervisor | July 2024 - Present |
Gijs de Jong | Mechanistic Interpretability of Adversarially Robust Models | Project AI (MSc Artificial Intelligence) | Daily Supervisor | July 2024 - Present |
Marius Strampel | Tracking Pre- And Post-operative Tumor Lesions | MSc Artificial Intelligence | Co-supervisor with Jacqueline Bereska | November 2023 - Present |
Thijmen Nijdam | Sparse Autoencoder Representations in OthelloGPT | Project AI (MSc Artificial Intelligence) | Daily Supervisor | June 2024 - July 2024 |
Amir Sahrani | Interpreting Transformers on a Sorting Task | Project AI (MSc Artificial Intelligence) | Daily Supervisor | March 2024 - April 2024 |
Tarmo Pungas | Unveiling the Mechanisms of Bias in Large Language Models by Eliciting Latent Knowledge | MSc Artificial Intelligence | Daily Supervisor with Rhite | November 2023 - June 2024 |
Alexia Mureșan | AI-Based Hiring and the Appeal of Novelty: Do LLMs Solve or Exacerbate the Problem of Discrimination? | MSc Artificial Intelligence | Daily Supervisor with Rhite | November 2023 - June 2024 |
Anya Nikiforova | Interpretable Tabular Deep Learning in PDAC Resectability Prediction | MSc Information Systems | Daily Supervisor | March - June 2024 |
Muhammad Amir Bin Mohd Azman | Investigating the Application of A Transformer-Based Model in Predicting Pancreatic Ductal Adenocarcinoma Resectability | MSc Information Systems | Daily Supervisor | March - June 2024 |
Mund Vetter | Neural Fields for Irregularly Sampled Data | MSc Artificial Intelligence | Daily Supervisor with David Knigge | November 2022 - June 2023 |
Mattia Cintioli | Deep Learning Radiomic Features for Survival Estimation for Patients with Pancreatic Ductal Adenocarcinoma | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
Matej Lončarić | Predicting Post-Operative Pancreatic Fistula in Patients with Intraductal Papillary Mucinous Neoplasm of the Pancreas on Magnetic Resonance Imaging Using Machine Learning Models | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
Raiz Fatehmahomed | Enhancing VasQNet: Implementing a Centerline Approach for Assessment of Vascular Involvement in Pancreatic Ductal Adenocarcinoma Resectability | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
Selina Palić | Implementing an extended Self-Learning-Based AI model for Segmenting Locally Advanced Pancreatic Ductal Adenocarcinoma | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
Luuk Wagenaar | Development of an auto-segmentation deep learning model for patients with colorectal liver metastases | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
Danial Iqbal | Advancing IPMN Classification: A Comprehensive Study on Deep Learning Segmentation of Intraductal Papillary Mucinous Neoplasm on MRI | MSc Information Systems | Co-supervisor with Jacqueline Bereska | March - June 2023 |
If you are a student interested in AI Safety, Mechanistic Interpretability, or related fields, feel free to reach out to discuss potential supervision or collaboration.