I build frontier multimodal models — systems that see, reason, and act. The goal is general-purpose AI that works in the real world, not just on benchmarks.

Selected Work

Waymo Foundation Model (WaymoFM)
Architected a 20B-parameter Vision-Language-Action model trained on 8K+ TPUs — the first foundation-model-based autonomous driving stack. Designed the multimodal interface connecting Lidar and Camera encoders to reactive and deliberative reasoning decoders.
Blog post
TPU Inference at Scale
Led Waymo's first TPU inference deployment, growing to 15+ models company-wide. ~6x latency reduction for VLMs (145ms → 25ms), >50% HBM savings, >$150M annual cost reduction. Custom kernels in JAX/XLA/Pallas.
DeepSolar
CNN-based framework for identifying solar installations from satellite imagery. Used to construct the first detailed map of solar adoption in the United States.
Paper

Background

Waymo Research (Alphabet)
The Voleon Group & Virtu Financial
Stanford University
Tsinghua University