Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

📝 Abstract

Click to expand full abstract

Spatial intelligence is emerging as a transformative frontier in AI, yet it remains constrained by the scarcity of large-scale 3D datasets. Unlike the abundant 2D imagery, acquiring 3D data typically requires specialized sensors and laborious annotation.

In this work, we present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations - including point clouds, camera poses, depth maps, and pseudo-RGBD - via integrated depth estimation, camera calibration, and scale calibration.

Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding. By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence.

We release two generated spatial datasets, i.e., COCO-3D and Objects365-v2-3D, and demonstrate through extensive experiments that our generated data can benefit various 3D tasks, ranging from fundamental perception to MLLM-based reasoning. These results validate our pipeline as an effective solution for developing AI systems capable of perceiving, understanding, and interacting with physical environments.

🔧 Installation

Prerequisites

Python 3.8+
PyTorch > 2.0 with CUDA support

Step 1: System Dependencies

# Update system packages
sudo apt-get update

# Install essential libraries
apt-get install -y \
    libgl1-mesa-dev \
    libglib2.0-0 \
    ffmpeg libsm6 libxext6 \
    aria2 \
    git-lfs \
    vim tmux wget unzip htop rsync

# Configure HuggingFace endpoint (only for China mainland users)
echo 'export HF_ENDPOINT=https://hf-mirror.com' >> ~/.bashrc
source ~/.bashrc

Step 2: Python Dependencies

# Install core dependencies
pip install -r requirements.txt

# Install flash-attention (requires special build configuration)
pip install flash-attn==2.5.8 --no-build-isolation

# Only for H20 platform compatibility (run at the end if needed)
pip install nvidia-cublas-cu12==12.4.5.8

Step 3: PerspectiveFields Setup

cd ./PerspectiveFields 
pip install -r requirements.txt
python setup.py install
cd ..

🚀 Quick Start

Generate Spatial Images from MS-COCO

# Single GPU version
# Note: First run will download pre-trained checkpoints (~10 minutes)
python generate_spatial_img_coco.py \
    -i /path/to/Datasets/coco/train2017 \
    -a /path/to/Datasets/coco/annotations/instances_train2017.json \
    -o ./path/to/output/

💡 Example Usage

# Process COCO validation set
python generate_spatial_img_coco.py \
    -i ./data/coco/val2017 \
    -a ./data/coco/annotations/instances_val2017.json \
    -o ./output/coco_val_3d/

✅ Validation

1️⃣ Validate Spatial Image Generation

Verify that spatial images are correctly generated by reconstructing the 3D scene:

python validate_spatial_img_coco.py \
    --rgb_image_path ./demo_output/rgb/000000000632.png \
    --depth_image_path ./demo_output/depth/000000000632_remove_edges.png \
    --camera_params_path ./demo_output/camera_parameters/000000000632.json \
    --output_ply_path ./validation/output.ply \
    --visualize  # Optional: directly visualize the point cloud

Expected Result:

✅ Semantically meaningful point cloud
✅ Correct scale representation
✅ Z-axis pointing upward

2️⃣ Validate Point Cloud Annotations

Ensure point clouds and annotations are correctly aligned:

python validate_pointcloud_and_anno_coco.py \
    --point_cloud_path ./demo_output/point/000000000632.ply \
    --json_path ./demo_output/json/000000000632.json \
    --output_ply_path ./validation/annotated.ply \
    --visualize  # Optional: directly visualize

Expected Result:

✅ Instances marked with distinct colors
✅ Correct spatial boundaries

📖 Citation

If you find our work useful in your research, please consider citing:

@inproceedings{miao2025towards,
  title={Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting},
  author={Miao, Xingyu and Duan, Haoran and Qian, Quanhao and 
          Wang, Jiuniu and Long, Yang and Shao, Ling and 
          Zhao, Deli and Xu, Ran and Zhang, Gongjie},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Metric3D		Metric3D
PerspectiveFields		PerspectiveFields
WildCamera		WildCamera
demo_output_coco17val		demo_output_coco17val
moge		moge
resources		resources
utils3d		utils3d
.gitignore		.gitignore
README.md		README.md
generate_spatial_img_coco.py		generate_spatial_img_coco.py
requirements.txt		requirements.txt
validate_pointcloud_and_anno_coco.py		validate_pointcloud_and_anno_coco.py
validate_spatial_img_coco.py		validate_spatial_img_coco.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

📝 Abstract

🔧 Installation

Prerequisites

Step 1: System Dependencies

Step 2: Python Dependencies

Step 3: PerspectiveFields Setup

🚀 Quick Start

Generate Spatial Images from MS-COCO

💡 Example Usage

✅ Validation

1️⃣ Validate Spatial Image Generation

2️⃣ Validate Point Cloud Annotations

📖 Citation

About

Uh oh!

Releases

Packages

Languages

haoranD/2D-3D-Lifting

Folders and files

Latest commit

History

Repository files navigation

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

📝 Abstract

🔧 Installation

Prerequisites

Step 1: System Dependencies

Step 2: Python Dependencies

Step 3: PerspectiveFields Setup

🚀 Quick Start

Generate Spatial Images from MS-COCO

💡 Example Usage

✅ Validation

1️⃣ Validate Spatial Image Generation

2️⃣ Validate Point Cloud Annotations

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages