Count the methods and the story changes: DeepMind's 2025 robot-learning patent runs two at once. US12343874B2, "Reinforcement and imitation learning for a task," combines learning from demonstration with learning from reward.

The B25J filing tells a different story than the keynote. Classified under B25J 9/163 (manipulator control) with G06N 3/08 and G06N 3/045 (neural-network learning), the patent blends two approaches that are usually presented as rivals. Imitation learning copies human demonstrations; reinforcement learning optimizes a reward through trial and error. The patent uses both in one scheme.

Here is why combining them is the pragmatic move. Imitation learning is sample-efficient — a few demonstrations get you started fast — but it cannot exceed the demonstrator and fails on situations the demos never showed. Reinforcement learning can discover better-than-human behavior and handle novel states, but it is data-hungry and explores dangerously at first. Each method's weakness is the other's strength.

The blend, stated plainly: use demonstrations to bootstrap a competent starting policy, then use reinforcement to refine and extend it beyond what was demonstrated. The robot learns the safe basics from humans and the hard edges from reward. It is the obvious-in-hindsight synthesis, and DeepMind patented a concrete way to do it.

The honest limit is reward design. Reinforcement learning is only as good as the reward function, and specifying a reward that produces the behavior you actually want — without the reward-hacking pathologies the field is famous for — is its own hard problem. The patent combines the methods; it does not abolish the difficulty of telling a robot what 'good' means.

For readers tracking embodied AI, this grant is a marker of consolidation. The early debate was imitation versus reinforcement; the mature answer, as DeepMind's filing shows, is both, staged so each covers the other. That is what a maturing field looks like — not a winner, but a recipe.