I am a research scientist at Google in Zurich,
working with Blaise Agüera y Arcas and
João Sacramento in the
Paradigms of Intelligence Team.
Until 2023, I was a PhD student supervised by Angelika Steger and
João Sacramento at the Institute of Theoretical Computer Science, ETH Zurich.
My research focuses on how and what machines, in particular neural networks, learn from data. One important goal is to allow these learning algorithms to generalize broadly and solve novel complex tasks.
Therefore, I am heavily inspired by (meta-) learning within a large, possibly open-ended, enviroment.
Currently, I investigate how state-of-the-art neural network models, in particular transformer-based large language models, can move beyond pattern matching to learn and implement algorithmic-like solutions.
This led me to work on mesa-optimization: the emergence of optimization algorithms within neural networks!
In recent work, we showed that gradient descent-based algorithms
can be implemented within the activations of transformers by simple autoregressive outer-optimization. This allows transformers to learn and
generalize at test time to novel data provided in-context. This led to the development of a novel recurrent neural
network architecture, the MesaNet, which we tested on language modeling at scale.