Waymo’s AI Strategy: Powering Demonstrably Safe Autonomous Driving at Scale

Autonomous driving represents the pinnacle of artificial intelligence challenges in the physical realm. At Waymo, we are tackling this by embedding safety as a fundamental principle in our AI model engineering and ecosystem development. This commitment has led to the creation of a sophisticated AI system that operates safely at scale in the real world. With over 100 million fully autonomous miles logged, we are enhancing street safety in our operating areas, achieving a significant reduction in serious injury crashes compared to human drivers.

This deep dive explores Waymo’s AI strategy and how it propels our progress, enabling us to expand our service to more riders, faster than ever before. We will examine our comprehensive AI approach, centered on the Waymo Foundation Model, which underpins a unified, demonstrably safe AI ecosystem that drives continuous learning and improvement.

Waymo’s Holistic Approach to AI

In contrast to AI applications that may prioritize capability first and then add safety layers, autonomous driving demands that safety be the foundational element, not an afterthought. At Waymo, safety is the non-negotiable bedrock of our AI ecosystem.

Achieving demonstrably safe AI—where safety is proven, not merely stated—necessitates a holistic strategy. This involves not only an intelligent and capable Driver but also a realistic, closed-loop Simulator for extensive training and testing across diverse scenarios, and a discerning Critic to assess the Driver’s performance and pinpoint areas for enhancement.

The integration of these components is key. Developed collaboratively with safety as their core focus, our Driver, Simulator, and Critic are all powered by the same underlying AI: the Waymo Foundation Model. This synergy creates a continuous virtuous cycle of improvement.

Waymo Foundation Model: The Cornerstone of Waymo AI

The Waymo Foundation Model is a sophisticated, state-of-the-art world model that serves as the engine for our AI ecosystem. Its innovative architecture offers distinct advantages over purely end-to-end or modular approaches.

Specifically, the model utilizes the full expressive power of learned embeddings as a rich interface between its components, enabling end-to-end signal backpropagation during training. Concurrently, its inclusion of compact, materialized structured representations—such as objects, semantic attributes, and roadgraph elements—facilitates:

  • Robust correctness and safety validation at inference time within the Driver.
  • Highly efficient, physically accurate, and realistic closed-loop Simulation at an immense scale.
  • Strong, verifiable feedback signals for evaluation by the Critic and for reinforcement learning during the training process.

The Waymo Foundation Model incorporates a “Think Fast and Think Slow” (System 1 and System 2) architecture, comprising two distinct model components:

  • Sensor Fusion Encoder for rapid responses. This perceptual component integrates inputs from cameras, lidar, and radar over time, generating objects, semantic information, and rich embeddings for downstream tasks. These inputs are crucial for our system to make swift and safe driving decisions.
  • Driving Vision-Language Model (VLM) for complex semantic reasoning. This component leverages rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it harnesses Gemini’s extensive world knowledge to better comprehend rare, novel, and complex semantic scenarios encountered on the road. For example, in an exceptionally rare situation involving a burning vehicle ahead, even if the physical path and drivable lanes appear clear, the VLM can provide a semantic signal prompting the Waymo Driver to reroute or turn back.

Both encoders feed into Waymo’s World Decoder, which utilizes these inputs to predict the behavior of other road users, generate high-definition maps, create vehicle trajectories, and produce signals for trajectory validation.

Waymo’s AI Ecosystem: Distilling Knowledge from Teacher to Student Models

Guided by our holistic approach, the Waymo Foundation Model powers the Driver, Simulator, and Critic. This is achieved by initially adapting the model for each of these three functions, resulting in large, high-quality “Teacher” models that excel in their specific roles. However, these Teacher models are too large to operate on vehicles for real-time decision-making or in the cloud for simulating and evaluating millions of miles. Consequently, we safely distill them into smaller “Student” models. Distillation is a critical process that allows us to retain the superior performance of large models within more compact and efficient versions. As a result, by first training powerful, high-capacity Teacher models and then employing efficient distillation techniques, we achieve significantly better scaling laws for the resulting Student models, mirroring trends observed in other AI domains.

  • Driver: Our Teacher Driver models are trained to generate safe, comfortable, and compliant action sequences. Through distillation, we transfer their extensive world understanding and reasoning capabilities to more efficient Student models, optimized for real-time onboard deployment. To maximize the benefits of distillation, our onboard architecture is designed to mirror the Waymo Foundation Model’s structure. Crucially, the Waymo Driver incorporates a separate, rigorous onboard validation layer that verifies the trajectories generated by the Driver’s ML model.

  • Simulation: Simulation serves as an essential tool for closed-loop training and testing of our Driver across a spectrum of diverse and challenging scenarios. These include potential collisions, adverse weather conditions, complex intersections, and unusual road user behaviors. The Simulator Teacher models are capable of generating high-fidelity, multi-modal dynamic worlds to evaluate our Driver. The Student models are computationally efficient versions of these larger models, designed to execute the massive scale of simulations required for the Driver’s robust evaluation. The Waymo Foundation Model’s architecture enables us to seamlessly integrate compact, materialized world-state representations with sensor simulation, unlocking large-scale, hyper-realistic, physically correct, yet computationally efficient virtual environments.

By employing text-based prompts for global scene elements (such as weather conditions and time of day) and semantic conditioning for dynamic scene elements (like other road users and traffic lights), we can transform real-world scenes into highly realistic simulations. This example illustrates camera simulation on the left, and lidar simulation on the right, with purely synthetic sensor data generated from the underlying compact structured world representation by our generative sensor-simulation models.

  • Critic: Our world-class evaluation system is engineered to rigorously stress-test the Waymo Driver, proactively identify subtle edge cases, and facilitate rapid, targeted improvements. The Critic Teacher models can analyze driving behavior and generate high-quality signals used for training Student models and for automatically building rich evaluation datasets. Subsequently, the Critic Student models analyze driving logs, pinpoint interesting or problematic scenarios, and provide nuanced feedback on driving quality.

Fueled by the Waymo Foundation Model, these components collectively form a seamless AI ecosystem, establishing a flywheel for continuous learning and enhancement.

Creating Flywheels for Continuous Improvement

An exceptional Driver is not static; it is the outcome of relentless learning and refinement. Several mechanisms contribute to the evolution of the Waymo Driver. Our inner learning loop, powered by the Simulator and Critic, employs Reinforcement Learning to train the Driver. Within this secure and controlled simulated environment, it gains experience, receiving rewards or penalties for its actions, enabling large-scale learning.

Our outer learning loop, informed by Waymo’s real-world driving data, creates an even more potent learning flywheel. The cycle commences when our Critic automatically flags any suboptimal driving behavior observed during our extensive fully autonomous operations. Subsequently, we generate improved, alternative behaviors from these instances to serve as training data for the Driver. These enhancements undergo rigorous testing in our Simulator, with the Critic verifying the fixes. Finally, only after our safety framework confirms the absence of unreasonable risk, is the enhanced Driver deployed to the real world.

This flywheel is made possible by the unprecedented volume of fully autonomous data we have accumulated over the years and continue to gather at an exponentially increasing rate. Historically, we relied heavily on high-quality manual driving data for training and refining the Waymo Driver. Today, our fully autonomous mileage far surpasses manual data. There is simply no substitute for this volume of real-world fully autonomous experience; no amount of simulation, manually driven data collection, or testing with a safety driver can replicate the full spectrum of situations and reactions the Waymo Driver encounters when it is fully in command. Integrating this rich, real-world fully autonomous data directly into our unique flywheel empowers the Waymo Driver to learn from its vast experience and continuously improve.

By embracing this holistic AI approach and constructing learning flywheels, we are not only advancing the Waymo Driver but also setting the standard for safe, scaled autonomous driving. We are committed to continuous innovation and pushing the boundaries of what is achievable, with significant advancements in AI still on the horizon.

Please share your thoughts on Waymo’s AI strategy in the comments below.