NVIDIA’s Physical AI Dataset Aims to Boost Robot and AV Model Training

NVIDIA has released a large, open-source dataset to support the development of physical AI systems, including robotics and autonomous vehicles (AVs). Announced at the NVIDIA GTC conference in San Jose, the Physical AI Dataset is available on Hugging Face and provides 15 terabytes of data, featuring more than 320,000 robotics trajectories and up to 1,000 Universal Scene Description (OpenUSD) assets. NVIDIA plans to expand the dataset to include data supporting end-to-end AV development, with 20-second traffic scenario clips from over 1,000 U.S. cities and multiple European countries.

The dataset is intended to accelerate pretraining and post-training for AI models used in applications like warehouse robotics, humanoid surgical assistants, and AVs navigating complex traffic conditions. NVIDIA said the dataset will also feed into its existing platforms, including Cosmos, DRIVE AV, Isaac, and Metropolis. Research institutions such as the Berkeley DeepDrive Center, Carnegie Mellon Safe AI Lab, and UC San Diego’s Contextual Robotics Institute are early adopters.

NVIDIA highlighted that physical AI model development typically requires extensive, diverse data to train robust systems. The company emphasized that collecting and curating such data can be cost-prohibitive, especially for smaller organizations. The dataset’s scale is designed to enhance safety research, with tools like NVIDIA NeMo Curator allowing for faster processing of large video datasets. Developers can also leverage NVIDIA’s Isaac GR00T workflow for generating synthetic robot manipulation data.

• NVIDIA launched a 15TB open-source dataset for physical AI development.

• Dataset includes 320,000+ robotics training trajectories and 1,000 OpenUSD assets.

• Autonomous vehicle dataset expansion will cover 1,000+ cities across the U.S. and Europe.

• Supports NVIDIA Cosmos, DRIVE AV, Isaac, and Metropolis platforms.

• Early adopters: UC Berkeley, Carnegie Mellon, UC San Diego labs.

• Enables faster AI model training for safety-critical applications.

• NVIDIA NeMo Curator processes 20 million hours of video in two weeks on Blackwell GPUs.

• Dataset is available now on Hugging Face.

“We can do a lot of things with this dataset, such as training predictive AI models that help autonomous vehicles better track the movements of vulnerable road users like pedestrians to improve safety,” said Henrik Christensen, director of robotics and AV labs at UC San Diego.

NVIDIA’s Physical AI Dataset Aims to Boost Robot and AV Model Training

Zayo Commits $90 Million to Expand Fiber in Tennessee

NVIDIA Scales AI-Driven Vehicle Platforms

Jim Carroll

Related Posts

FCC AWS-3 Auction Generates $3.5B, Returns 200 Licenses to Commercial Use

Orange Appoints Usman Javaid as Chief AI Officer

IBM Reveals 0.7 nm Chip with 3D Nanostack Architecture

Applied Materials Expands DRAM and Advanced Packaging Portfolio

Netris Raises $15M Series A to Scale AI Network Automation

Linux Foundation Launches Akrites for Open Source Vulnerability Response

NVIDIA Scales AI-Driven Vehicle Platforms

Categories

Archives

NVIDIA’s Physical AI Dataset Aims to Boost Robot and AV Model Training

Zayo Commits $90 Million to Expand Fiber in Tennessee

NVIDIA Scales AI-Driven Vehicle Platforms

Related Posts

Categories

Archives

Tags