Johannes von Oswald
I am a research scientist at Google in Zurich,
working with Blaise Agüera y Arcas and
João Sacramento in the
Paradigms of Intelligence Team.
Until 2023, I was a PhD student supervised by Angelika Steger and
João Sacramento at the Institute of Theoretical Computer Science, ETH Zurich.
My research focuses on how and what machines, in particular neural networks, learn from data. One important goal is to allow these learning algorithms to generalize broadly and solve novel complex tasks.
Therefore, I am heavily inspired by (meta-) learning within a large, possibly open-ended, enviroment.
Currently, I investigate how state-of-the-art neural network models, in particular transformer-based large language models, can move beyond pattern matching to learn and implement algorithmic-like solutions.
This led me to work on mesa-optimization: the emergence of optimization algorithms within neural networks!
In recent work, we showed that gradient descent-based algorithms
can be implemented within the activations of transformers by simple autoregressive outer-optimization. This allows transformers to learn and
generalize at test time to novel data provided in-context. This led to the development of a novel recurrent neural
network architecture, the MesaNet, which we tested on language modeling at scale.
Recently, I became more intrigued by hardware, and I started working with Charlotte Frenkel on exciting ideas to run neural networks on the hardware substrates they deserve!
News
- December, 2025 - Tristan Torchet is joining Google PI as a student researcher and working with Charlotte and me on new exiting hardware ideas.
- October, 2025 - Marwa Abdulhai is joining Google PI as a student researcher and working with Nino and me on dynamic evaluations.
- October, 2025 - Our work on gradient descent within Transformers was discussed by Andrej Karpathy on the Dwarkesh Patel Podcast https://youtu.be/lXUZvyajciY?si=jeoFgabWE-IkMh1N!
- July, 2025 - Nino and me are hosting a student researcher for the end of this year at Google - please apply here .
- June, 2025 - Nino and me presented the MesaNet at the ASAP seminar. Please see the slides here.
- June, 2025 - João gave a talk at the Kempner Institute at Harvard University about our work on mesa-optimization and the MesaNet.
- April, 2025 - Yassir gave an excellent oral presentation of our work at ICLR 2025 on how to learn randomized algorithms within transformers.
- March, 2024 - Our work on gradient descent within Transformers was discussed by Sholto Douglas & Trenton Bricken on the Dwarkesh Patel Podcast https://youtu.be/UTuuTTnjxMQ?si=6HV5ahGM3YXY1K50&t=264!
- December, 2023 - I succesfully defended my PhD thesis! I am particularly happy about the front and back cover :).
- December, 2023 - I gave a short talk at Alignment Workshop in New Orleans about our mechanistic interpretability work of in-context learning.
Research Highlights
-
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Johannes von Oswald*, Nino Scherrer*, Seijin Kobayashi, Luca Versari, Songlin Yang, Maximilian Schlegel, Kaitlin Maile, Yanick Schimpf, Oliver Sieberling, Alexander Meulemans, Rif A. Saurous, Guillaume Lajoie, Charlotte Frenkel, Razvan Pascanu, Blaise Agüera y Arcas and João Sacramento
ArXiv 2025
Paper
Code
Colab Tutorial
-
Uncovering Mesa-Optimization Algorithms in Transformers
Johannes von Oswald*, Maximilian Schlegel*, Alexander Meulemans, Seijin Kobayashi, Eyvind Niklasson, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu and João Sacramento
ICLR 2024, Workshop on Understanding of Foundation Models (Oral)
Paper
Code
-
-
-