academic

Courses, student supervision, and miscellaneous academic activities.

talks, presentations and figures

I have given several talks on AI Safety and Mechanistic Interpretability:

  • The AI Alignment Problem (slides) (keynote) (March, 2023, University of Amsterdam)
  • Taming Simulators (slides) (keynote) (June, 2023, AAAI Symposium, Singapore)
  • AI Safety for LLMs (slides) (keynote) (Nov. 2023, Keynote ELLIS BeNeLux, Delft)
  • Mechanistic Interpretability google slides (May 2024, University of Amsterdam)

I have also created figures for research papers in keynote:

All slides and keynote files are licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA 4.0) license.


teaching

I have been involved in teaching several courses for the Master’s programs in Artificial Intelligence at the University of Amsterdam. Below is a brief description of each.

deep learning I & II (teaching assistant: WS 21/22, WS 22/23, SS 22, SS 23)

These courses cover the fundamentals and advanced topics in deep learning. For deep learning I, I was responsible for the topics: recurrent neural networks, dynamical systems, energy-based models, and graph neural networks. For deep learning II, I supervised student projects on bayesian deep learning, and reward hacking in reinforcement learning.

foundation models (lecturer)

In this course, I co-lectured a section on AI safety for large language models, covering:

  • reinforcement learning from human feedback (RLHF) prepared by Leon Lang,
  • developmental interpretability prepared by Tim Bakker,
  • mechanistic interpretability prepared by me. Our lecture slides are available under this link to google slides.

conferences


summer schools and courses

I have participated in several summer schools and workshops to enhance my knowledge in AI and AI Safety:


extracurricular

As co-founder and part of the core team of the AI Safety Initiative Amsterdam, I have been involved in organizing various events:

  • OpenAI Talk and Q&A on AI and Existential Risk (I managed the local Amsterdam event) (September 2023),
  • Panel Discussion on AI Risks: From Today to Doomsday (slides) (October 2023),
  • Reading groups on AGI Safety Fundamentals (Jan. - March 2024).

student supervision

I have supervised and co-supervised several Master’s students in their thesis projects and research:

student name project title program my role period
Jochem Hoelscher Feature-Conditional Diffusion: A Novel Approach to
Neural Network Interpretation
MSc Artificial Intelligence Co-Supervisor with Tim Bakker July 2024 - Present
Kieron Kretschmar The Whole Truth and Nothing but the Truth? On Representations of Truth in Language Models MSc Artificial Intelligence Daily Supervisor with Cadenza Labs January 2024 - Present
Benjamin Shaffrey Phase Transitions In Algorithmic Tasks MSc Artificial Intelligence Daily Supervisor April 2024 - Present
Derck Prinzhorn Mechanistic Interpretability of Adversarially Robust Models Project AI (MSc Artificial Intelligence) Daily Supervisor July 2024 - Present
Gijs de Jong Mechanistic Interpretability of Adversarially Robust Models Project AI (MSc Artificial Intelligence) Daily Supervisor July 2024 - Present
Marius Strampel Tracking Pre- And Post-operative Tumor Lesions MSc Artificial Intelligence Co-supervisor with Jacqueline Bereska November 2023 - Present
Thijmen Nijdam Sparse Autoencoder Representations in OthelloGPT Project AI (MSc Artificial Intelligence) Daily Supervisor June 2024 - July 2024
Amir Sahrani Interpreting Transformers on a Sorting Task Project AI (MSc Artificial Intelligence) Daily Supervisor March 2024 - April 2024
Tarmo Pungas Unveiling the Mechanisms of Bias in Large Language Models by Eliciting Latent Knowledge MSc Artificial Intelligence Daily Supervisor with Rhite November 2023 - June 2024
Alexia Mureșan AI-Based Hiring and the Appeal of Novelty: Do LLMs Solve or Exacerbate the Problem of Discrimination? MSc Artificial Intelligence Daily Supervisor with Rhite November 2023 - June 2024
Anya Nikiforova Interpretable Tabular Deep Learning in PDAC Resectability Prediction MSc Information Systems Daily Supervisor March - June 2024
Muhammad Amir Bin Mohd Azman Investigating the Application of A Transformer-Based Model in Predicting Pancreatic Ductal Adenocarcinoma Resectability MSc Information Systems Daily Supervisor March - June 2024
Mund Vetter Neural Fields for Irregularly Sampled Data MSc Artificial Intelligence Daily Supervisor with David Knigge November 2022 - June 2023
Mattia Cintioli Deep Learning Radiomic Features for Survival Estimation for Patients with Pancreatic Ductal Adenocarcinoma MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023
Matej Lončarić Predicting Post-Operative Pancreatic Fistula in Patients with Intraductal Papillary Mucinous Neoplasm of the Pancreas on Magnetic Resonance Imaging Using Machine Learning Models MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023
Raiz Fatehmahomed Enhancing VasQNet: Implementing a Centerline Approach for Assessment of Vascular Involvement in Pancreatic Ductal Adenocarcinoma Resectability MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023
Selina Palić Implementing an extended Self-Learning-Based AI model for Segmenting Locally Advanced Pancreatic Ductal Adenocarcinoma MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023
Luuk Wagenaar Development of an auto-segmentation deep learning model for patients with colorectal liver metastases MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023
Danial Iqbal Advancing IPMN Classification: A Comprehensive Study on Deep Learning Segmentation of Intraductal Papillary Mucinous Neoplasm on MRI MSc Information Systems Co-supervisor with Jacqueline Bereska March - June 2023

 

If you are a student interested in AI Safety, Mechanistic Interpretability, or related fields, feel free to reach out to discuss potential supervision or collaboration.