Sign in for free: Preamble (PDF, ebook & audiobook) + Forum access + Direct purchases Sign In

Unscarcity Research

How Do You Train a Robot? The Data Problem Nobody Solved Yet

Tesla used millions of drivers to train self-driving AI. Training robots requires someone to fold your laundry first. The data bottleneck that will define the humanoid era.

11 min read 2563 words /a/robot-training-data

Note: This is a research note supplementing the book Unscarcity, now available for purchase. These notes expand on concepts from the main text. Start here or get the book.

How Do You Train a Robot? The Data Problem Nobody Solved Yet

The Smartest Trick Elon Musk Ever Pulled

In 2016, Tesla started shipping cars with two NVIDIA GPUs. One ran the Autopilot system that helped you stay in your lane. The other one did nothing except watch. It recorded everything: how you turned the wheel, how hard you braked, when you hesitated at a yellow light, how you swerved around a pothole the GPS didn’t know about.

You thought you were buying a car. You were actually a data laborer.

By 2024, Tesla’s fleet had logged over 35 billion miles of real-world driving data. Not in simulation. Not in controlled test tracks. On actual roads, in actual weather, with actual idiots cutting you off on the freeway. Millions of customers who just wanted heated seats and zero-to-sixty in 3.1 seconds were simultaneously generating the largest driving dataset ever assembled.

The genius wasn’t the hardware. The genius was the incentive alignment. People wanted to drive. Tesla wanted driving data. The customer paid Tesla to produce the product Tesla needed most. It’s the most elegant data collection scheme since Google figured out that people would happily type their secrets into a search bar.

This trick changed the self-driving industry. And now the entire robotics industry is trying to figure out how to pull it off again. For robots.

They can’t.


The Problem: There Is No Internet for Physical Tasks

Here’s why the humanoid robot revolution is stuck in a paradox.

ChatGPT trained on the internet. Trillions of words of text, scraped from blogs, books, forums, Wikipedia, Reddit arguments about whether a hot dog is a sandwich. The data was just there, lying around, produced by billions of humans who were writing for their own reasons. OpenAI didn’t have to convince anyone to generate training data. People had been doing it for twenty years already.

Now try that for physical tasks.

How many hours of video exist showing someone properly folding a fitted sheet? (And no, the ones where the person gives up and balls it into a drawer don’t count as training data.) How many hours capture the exact finger pressure, grip angle, and wrist rotation needed to crack an egg one-handed? Where is the Reddit of dexterous manipulation?

It doesn’t exist. Physical skill has always been transmitted through apprenticeship, not publication. A baker doesn’t blog about how hard to knead dough — they show you with their hands. A surgeon doesn’t upload their finger positions to GitHub. The entire corpus of human physical expertise lives in human muscle memory, untranscribed, undigitized, invisible to machine learning.

Language AI had the internet. Robotics AI has… nothing. That’s the bottleneck.


Tesla’s Playbook: Why It Worked for Cars

Let’s be precise about why Tesla’s approach was so effective, because the contrast with robotics is instructive.

The feedback loop was passive. Drivers didn’t need to do anything special. They drove normally. The shadow GPU captured their behavior without changing it. No extra steps. No training protocol. No consent form for each data point.

The task was narrow. Driving, for all its complexity, is fundamentally one activity in one context: navigating a vehicle on roads. Roads follow rules. They have lanes, signs, traffic lights. The search space is large but structured.

The fleet was massive. By 2025, Tesla had over 7 million vehicles on the road worldwide. Even if only a fraction contributed Autopilot data at any given moment, the sheer fleet size generated data volumes that no competitor — not Waymo with its taxi fleets, not Cruise with its test vehicles — could approach.

The economics aligned perfectly. Customers paid $35,000-$100,000 for the privilege of generating data. Tesla didn’t pay them. They paid Tesla. The data was a byproduct of a product people actually wanted.

Now try to replicate each of those four properties for a home robot.


The Robot Training Disaster

A startup called 1X Technologies — the company behind the Neo humanoid at $499/month — has a clever approach to the data problem. Their “human-in-the-loop” strategy works like this: during the early deployment phase, 1X employees remotely operate the robots, performing tasks in real homes and workplaces. The robot records everything, and that demonstration data trains the AI to eventually do the tasks autonomously.

It’s smart. It also has a problem the size of a house.

You can’t passively collect robot training data. Unlike Tesla’s shadow GPU, someone has to actively perform each task through the robot. That means paying human operators. That means throughput is limited by the number of operators you employ. That means cost scales linearly with data volume — the exact opposite of Tesla’s flywheel, where cost scaled inversely.

The task space is enormous. Driving is one task. Home robotics is thousands of tasks: folding clothes, loading dishwashers, making beds, organizing shelves, cooking meals, cleaning bathrooms. Each task has hundreds of variants. Folding a T-shirt is different from folding a button-down. Loading a top-rack dishwasher is different from a pull-out drawer. Every home is slightly different. Every towel has a different texture.

Privacy is a nightmare. 1X employees are remotely operating robots inside your home. They can see your living room, your bedroom, your kitchen counter with the medication bottles you forgot to put away. The company says they handle this with care, but “trust us” is not a privacy policy. When Tesla records your driving, the camera sees public roads. When a robot operator records your home, the camera sees everything.

The fleet doesn’t exist yet. Tesla had 7 million data-generating cars before it needed the data. 1X has pre-orders. Figure AI has a few hundred factory deployments. Even the most bullish projections show 50,000 humanoid shipments in 2026. That’s seven million versus fifty thousand — a 140x gap in fleet size, before accounting for the fact that each car generates data passively and each robot requires active demonstration.


Enter the Gloves

On April 2, 2026, Forbes reported on a startup called Generalist that thinks gloves are the answer.

The pitch: sensor-packed gloves that capture human hand movements — sub-millimeter finger positioning, grip force, contact pressure, release timing — and translate them into robot training data. No robot in the loop. No teleoperation rig. No VR headset. You just put on the gloves and do the task.

A hundred warehouse workers wearing gloves for a month could theoretically generate more manipulation data than a fleet of teleoperated robots produces in a year.

The idea is architecturally different from everything else being tried. Google DeepMind uses teleoperation rigs — someone remote-controls the robot while it records. Toyota Research Institute uses VR interfaces. Meta has worked on tactile sensing. But all of those approaches still require a robot on the receiving end during data collection. The gloves remove the robot from the training loop entirely.

Generalist calls this robotics’ “ChatGPT moment.” They’re comparing their approach to the internet-scale data collection that made language AI possible. The comparison flatters them, but the logic holds in one specific way: the bottleneck for language AI was never the model architecture — it was having enough data. The same is true for robotics. Physical Intelligence raised $1 billion to build robot foundation models. They have the compute. They’re starving for demonstration data.

If the gloves work at the claimed fidelity, Generalist becomes the data infrastructure layer for the entire robotics industry.

If they don’t, it’s another conference demo that doesn’t survive real-world noise.


The Privacy Question Nobody Wants to Answer

Here’s where things get personal. Literally.

1X’s model requires employees to remotely perform your tasks so the robot can learn. That means a stranger — operating from a control center in Moss, Norway, or wherever 1X routes its teleoperations — is virtually inside your home, seeing your stuff, watching how you live.

I don’t want that.

I’d rather put on the gloves myself and fold the laundry. I’d rather teach my robot how I like my espresso by making it myself a few times while wearing sensors. I’d rather rope my teenager into wearing the gloves and doing the dishes (the first time they’ve voluntarily done dishes would involve wearable technology, naturally).

This isn’t just a personal preference. It’s an insight about the data collection model.

The Tesla analogy reveals the answer. Tesla didn’t hire professional drivers to generate training data. It sold cars to regular people and let them drive. The training happened as a byproduct of normal use. The privacy issue was manageable because the data came from public roads.

The robot equivalent isn’t a teleoperator in your kitchen. It’s you in your kitchen, wearing gloves or a sensor suit, doing the things you were going to do anyway. The training happens as a byproduct of normal life. The privacy issue disappears because the data never leaves your home — or at least, you control whether it does.

This is where Generalist’s glove approach converges with 1X’s consumer strategy. The end state isn’t companies collecting your physical data. It’s individuals generating it, owning it, and choosing whether to share it.

The question nobody is litigating yet: when you teach a robot how to fold laundry and that technique gets incorporated into a foundation model used by millions of robots, who owns that data? Who gets paid? If your proprietary espresso technique — the specific pressure, the exact tamp angle, the precise extraction timing — gets absorbed into a model that makes every coffee robot better, are you entitled to compensation?

This is the intellectual property fight of the 2030s, and it makes the text/image training data lawsuits of 2024 look like small claims court.


The Three Paths Forward

The robot training data problem will be solved. The question is which approach wins.

Path 1: The Tesla Playbook (Fleet Learning)

Ship robots that are useful enough to justify purchase, even with limited autonomy. Use onboard sensors to collect data from the robot’s environment. Let the fleet teach itself over time.

Who’s doing it: Tesla Optimus (1,000+ units in Tesla factories), 1X Neo (consumer deployments starting 2026), Figure AI (BMW factory deployment).

The catch: You need a large fleet first. And the fleet needs to do something useful enough that people keep the robots around while they’re still learning. It’s a chicken-and-egg problem that Tesla solved for cars by shipping a product people wanted regardless of Autopilot (a fast, cool electric car). Humanoid robots need to find their “heated seats” equivalent — the core value proposition that justifies purchase even before the AI is good.

Path 2: The Glove Pipeline (Decoupled Data Collection)

Separate data collection from robot deployment entirely. Use wearable sensors to capture human expertise, then transfer it to any robot platform.

Who’s doing it: Generalist (gloves), various research labs working on motion capture for manipulation.

The catch: The data needs to transfer. Human hands and robot grippers have different kinematics, different force profiles, different degrees of freedom. Capturing a human folding a towel in exquisite detail is worthless if the mapping to a Figure 03’s actuators introduces too much noise. The translation layer between “human motion” and “robot motion” is a hard problem that glove-based approaches must solve and teleoperation approaches sidestep.

Path 3: The Foundation Model Play (Simulation + Transfer)

Train in simulation at massive scale, then transfer to physical robots. Use a small amount of real-world data to calibrate the sim-to-real gap.

Who’s doing it: Physical Intelligence ($1B raised at $11B valuation), Google DeepMind Gemini Robotics, NVIDIA Isaac Sim.

The catch: Simulation doesn’t capture the full messiness of reality. The “sim-to-real gap” — the difference between how physics works in a game engine and how it works in your kitchen — has humbled every robotics team that’s attempted pure simulation training. Physical Intelligence’s approach is promising because their foundation model combines simulation with real-world demonstration data, but the real-world data problem remains the limiting factor.


Why This Matters for Everyone (Not Just Roboticists)

In the Unscarcity framework, the robot training data bottleneck isn’t a technical curiosity — it’s the pacing constraint on the Labor Cliff.

The hardware exists. 50,000 humanoid robots are shipping in 2026. Unitree sells one for $16,000. 1X offers one at $499/month. The mechanical bodies are ready.

The AI architecture exists. Physical Intelligence, Google DeepMind, and a dozen other labs have foundation models that can process visual input, understand natural language commands, and generate motor plans.

What’s missing is the training data that connects those models to the physical world. Without it, the robots can walk and talk but can’t reliably fold your laundry, sort your recycling, or make your coffee. They’re athletes who’ve never played a game.

This is actually good news — and here’s why.

The data bottleneck buys us time. Time to build the institutional infrastructure — the Foundation, the transition systems, the new social contract — that the book argues we need before robots replace human labor at scale. If the training data problem were already solved, we’d be staring down mass unemployment today, with no systems in place. Instead, we have a window. Maybe five years. Maybe ten.

But that window is closing. Generalist’s gloves, 1X’s teleoperation, Physical Intelligence’s foundation models — these are all serious attempts to crack the bottleneck. When one of them succeeds (and one will), the floodgates open. Robots that can learn any physical task as easily as ChatGPT learned to write emails. Robots that improve every day, fed by a growing pool of human demonstration data.

The countdown to the Labor Cliff isn’t measured in hardware shipments. It’s measured in training data. And the race to collect it has begun.


What You Can Do Right Now

Here’s a thought experiment. Imagine you could train your home robot yourself. Not through a PhD-level robotics interface. Through gloves, or a VR headset, or just by doing the task while the robot watches.

Would you do it?

I would. I’d teach it how I like my espresso. I’d show it where the dishes go. I’d have my kids demonstrate their elaborate system for “organizing” their rooms (the system is called “shove everything under the bed,” but the robot doesn’t need to know that).

And here’s the thing: if the training data you generate makes the robot smarter, and that intelligence gets shared with other robots, you’ve contributed to something larger. You’ve added to a collective pool of human physical knowledge that was previously trapped in individual muscle memory, passed down through apprenticeship, or lost when someone retired.

That’s the vision the book calls the Ascent — humans contributing to civilization not through menial labor, but through teaching, creating, and sharing what they know. Training your robot to fold laundry is a small act. But it’s an act of contribution, not consumption. And in the emerging economy, that distinction matters.

The robots have bodies. They need teachers. And those teachers should be us — not because we owe the machines our knowledge, but because teaching them is how we free ourselves.


Further Reading

Internal:

Share this article: