Johannes von Oswald Portrait

Johannes von Oswald

I am a research scientist at Google in Zurich, working with Blaise Agüera y Arcas and João Sacramento in the Paradigms of Intelligence Team. Until 2023, I was a PhD student supervised by Angelika Steger and João Sacramento at the Institute of Theoretical Computer Science, ETH Zurich.

My research focuses on how and what machines, in particular neural networks, learn from data. One important goal is to allow these learning algorithms to generalize broadly and solve novel complex tasks. Therefore, I am heavily inspired by (meta-) learning within a large, possibly open-ended, enviroment.

Currently, I investigate how state-of-the-art neural network models, in particular transformer-based large language models, can move beyond pattern matching to learn and implement algorithmic-like solutions. This led me to work on mesa-optimization: the emergence of optimization algorithms within neural networks! In recent work, we showed that gradient descent-based algorithms can be implemented within the activations of transformers by simple autoregressive outer-optimization. This allows transformers to learn and generalize at test time to novel data provided in-context. This led to the development of a novel recurrent neural network architecture, the MesaNet, which we tested on language modeling at scale.

I am always interested in scientific collaborations, particularly in helping students with their research. If you want to collaborate with me and think that your research fits my interest, please reach out to me via one of the social media channels below or via email at jvoswald[at]google.com.

News

Research Highlights