Skip to content

haoranD/2D-3D-Lifting

 
 

Repository files navigation

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

ICCV 2025 Paper Project Page

Pipeline


📝 Abstract

Click to expand full abstract

Spatial intelligence is emerging as a transformative frontier in AI, yet it remains constrained by the scarcity of large-scale 3D datasets. Unlike the abundant 2D imagery, acquiring 3D data typically requires specialized sensors and laborious annotation.

In this work, we present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations - including point clouds, camera poses, depth maps, and pseudo-RGBD - via integrated depth estimation, camera calibration, and scale calibration.

Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding. By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence.

We release two generated spatial datasets, i.e., COCO-3D and Objects365-v2-3D, and demonstrate through extensive experiments that our generated data can benefit various 3D tasks, ranging from fundamental perception to MLLM-based reasoning. These results validate our pipeline as an effective solution for developing AI systems capable of perceiving, understanding, and interacting with physical environments.

🔧 Installation

Prerequisites

  • Python 3.8+
  • PyTorch > 2.0 with CUDA support

Step 1: System Dependencies

# Update system packages
sudo apt-get update

# Install essential libraries
apt-get install -y \
    libgl1-mesa-dev \
    libglib2.0-0 \
    ffmpeg libsm6 libxext6 \
    aria2 \
    git-lfs \
    vim tmux wget unzip htop rsync

# Configure HuggingFace endpoint (only for China mainland users)
echo 'export HF_ENDPOINT=https://hf-mirror.com' >> ~/.bashrc
source ~/.bashrc

Step 2: Python Dependencies

# Install core dependencies
pip install -r requirements.txt

# Install flash-attention (requires special build configuration)
pip install flash-attn==2.5.8 --no-build-isolation

# Only for H20 platform compatibility (run at the end if needed)
pip install nvidia-cublas-cu12==12.4.5.8

Step 3: PerspectiveFields Setup

cd ./PerspectiveFields 
pip install -r requirements.txt
python setup.py install
cd ..

🚀 Quick Start

Generate Spatial Images from MS-COCO

# Single GPU version
# Note: First run will download pre-trained checkpoints (~10 minutes)
python generate_spatial_img_coco.py \
    -i /path/to/Datasets/coco/train2017 \
    -a /path/to/Datasets/coco/annotations/instances_train2017.json \
    -o ./path/to/output/

💡 Example Usage

# Process COCO validation set
python generate_spatial_img_coco.py \
    -i ./data/coco/val2017 \
    -a ./data/coco/annotations/instances_val2017.json \
    -o ./output/coco_val_3d/

✅ Validation

1️⃣ Validate Spatial Image Generation

Verify that spatial images are correctly generated by reconstructing the 3D scene:

python validate_spatial_img_coco.py \
    --rgb_image_path ./demo_output/rgb/000000000632.png \
    --depth_image_path ./demo_output/depth/000000000632_remove_edges.png \
    --camera_params_path ./demo_output/camera_parameters/000000000632.json \
    --output_ply_path ./validation/output.ply \
    --visualize  # Optional: directly visualize the point cloud

Expected Result:

  • ✅ Semantically meaningful point cloud
  • ✅ Correct scale representation
  • ✅ Z-axis pointing upward

2️⃣ Validate Point Cloud Annotations

Ensure point clouds and annotations are correctly aligned:

python validate_pointcloud_and_anno_coco.py \
    --point_cloud_path ./demo_output/point/000000000632.ply \
    --json_path ./demo_output/json/000000000632.json \
    --output_ply_path ./validation/annotated.ply \
    --visualize  # Optional: directly visualize

Expected Result:

  • ✅ Instances marked with distinct colors
  • ✅ Correct spatial boundaries

📖 Citation

If you find our work useful in your research, please consider citing:

@inproceedings{miao2025towards,
  title={Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting},
  author={Miao, Xingyu and Duan, Haoran and Qian, Quanhao and 
          Wang, Jiuniu and Long, Yang and Shao, Ling and 
          Zhao, Deli and Xu, Ran and Zhang, Gongjie},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 47.1%
  • Jupyter Notebook 33.7%
  • Shell 19.2%