About — Jiafan Yu

We are at an inflection point in AI. Today's LLMs reason over text, write code, and perform well on benchmarks. They are remarkable reasoning engines. But they remain fundamentally incomplete. They cannot perceive the physical world, act within it, or learn from experience after deployment. The capabilities that have advanced most quickly in language have not yet carried over to embodied AI, where intelligence must operate under uncertainty, in real time, and with safety-critical consequences.

That is the problem I work on. At Waymo, I build multimodal foundation models that bring vision, language, and action into a single system. The challenge extends beyond scale. It is making these models reliable and responsive enough for the real world, where failure is not an abstract metric but a safety issue.

The deeper question is what comes after the current paradigm. Pretraining on static datasets has taken us far, but it is unlikely to be sufficient for the systems we ultimately need. More capable AI will need to learn continuously, adapting to new environments, accumulating knowledge over time, and improving through interaction with the world. Not savant-like memorization of internet text, but genuine lifelong learning whose competence compounds with experience.

My thinking is increasingly shaped by three frontiers: self-evolving AI systems that can discover and adopt their own improvements; connected architectures where agents learn from one another rather than only from fixed human-curated corpora; and the search for post-transformer foundations that can support the kind of robust generalization and adaptation current models lack.

My path into this work was indirect. I began in quantitative trading, where latency and reliability were not secondary concerns but core constraints. My doctoral research at Stanford applied machine learning to physical infrastructure, including power grids, solar energy, and utility systems. Across those settings, the same question kept resurfacing: how do you build intelligence that works under real constraints, in real environments, where mistakes matter?

The next paradigm shift will not come from scaling existing recipes. It will come from building systems that learn continuously, engage with the physical world, and evolve beyond the architectures we have today.

Now

Building VLA foundation models for autonomous driving at Waymo
Exploring lifelong learning and self-evolving AI systems
Thinking about post-transformer architectures for embodied intelligence
Writing about multimodal systems and AI infrastructure
Based in the Bay Area