Skip to content

[AAAI 2026] The code repository for "ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding" in PyTorch.

Notifications You must be signed in to change notification settings

robin-hlt/AAAI26-ReaSon

Repository files navigation

[AAAI 2026] ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding

The official implementation of "ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding" (AAAI 2026) in Pytorch.

📢 News

  • [2025.11.16] We realsed the paper arXiv.
  • [2025.11.13] We realsed codes of inference demo.
  • [2025.11.08] 🎉🎉 Our paper "ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding" has been accepted to AAAI 2026!

🧩 To-Do List

  • 📄 Release the paper (arXiv preprint & project page)
  • 🚀 Release checkpoint of ReaSon policy
  • 💻 Release full codes, including training and inference

🚀 Quick Start

🔧 Environment Setup

We provide a one-click installation script:

bash install.sh

Or install manually:

conda create -n reason python=3.9 -y
conda activate reason
git clone https://github.com/robin-hlt/AAAI26-ReaSon.git
cd AAAI26-ReaSon

# Install LLaVA-Video (optional)
git clone https://github.com/LLaVA-VL/LLaVA-NeXT
cd LLaVA-NeXT && pip install -e . && cd ..

# Install YOLO-World
git clone --recursive https://github.com/AILab-CVC/YOLO-World.git
cd YOLO-World && pip install -e . && cd ..

# Install ReaSon dependencies
pip install -r requirements_basic.txt
pip install "flash-attn==2.6.3" --no-build-isolation

# Fix mmdet/mmyolo related issues
sed -i "s/mmcv_maximum_version = '2.1.0'/mmcv_maximum_version = '2.3.0'/g" $(python -c "import importlib.util; filename=importlib.util.find_spec('mmdet').origin;print(filename)")
sed -i "s/mmcv_maximum_version = '2.1.0'/mmcv_maximum_version = '2.3.0'/g" $(python -c "import importlib.util; filename=importlib.util.find_spec('mmyolo').origin;print(filename)")
# pip install --upgrade setuptools

# Download model
mkdir pretrained && cd pretrained
mkdir YOLO-World && cd YOLO-World
wget https://huggingface.co/wondervictor/YOLO-World/resolve/main/yolo_world_v2_xl_obj365v1_goldg_cc3mlite_pretrain-5daf1395.pth && cd ../..

# Download data
mkdir -p data/coco/lvis
wget -O data/coco/lvis/lvis_v1_minival_inserted_image_name.json https://huggingface.co/GLIPModel/GLIP/resolve/main/lvis_v1_minival_inserted_image_name.json
mkdir -p data/texts
wget -O data/texts/lvis_v1_class_texts.json https://github.com/AILab-CVC/YOLO-World/raw/refs/heads/master/data/texts/lvis_v1_class_texts.json

# Fix YOLO-World small bug
sed -i "s/self.text_feats, None/self.text_feats, _/g" YOLO-World/yolo_world/models/detectors/yolo_world.py
📁 Project Structure
AAAI26-ReaSon/
├── LLaVA-NeXT/                     # LLaVA-Video (or Qwen if you used Qwen)
├── checkpoints/                    # Save checkpoints
├── ReaSon/                         # Core implementation of the ReaSon framework
│   ├── interface_grounding.py      # Video–language grounding (LLaVA-Video\Qwen\GPT)
│   ├── interface_heuristic.py      # YOLO-World heuristic object extraction
│   ├── interface_searcher.py       # Detection for candidate pool
│   ├── policy_core.py              # Policy network and trainer
│   ├── ReaSonFramework.py          # Reinforced causal search pipeline
│   └── utilites.py                 # Hepler and shared utilities
├── YOLO-World/                     # YOLO-World detector repo
├── test_video/                     # Example videos for demo
├── ann_for_test.json               # Annotation JSON for inference demo
├── demo_reason.py                  # Inference demo script
├── train.py                        # ReaSon training script
├── install.sh                      # Environment setup
├── requirements_basic.txt          # Basic dependencies
└── README.md                       # Documentation
🤗 Policy Checkpoints
Model Description Link
ReaSon-Policy selection policy checkpoint 🤗 Hugging Face
🎬 Inference Demo

Download the policy checkpoint and place it place into checkpoints/. Run demo_reason.py to perform reinforced causal search and answer video questions:

python demo_reason.py \
   --ann ann_for_test.json \
   --video-id 0074f737-11cb-497d-8d07-77c3a8127391

📦 Train on Your Own Dataset

📘 Dataset Preparation

To train ReaSon on your own data, prepare a JSON file where each element represents one video–question sample.

Each item requires the following keys:

  • video_id: unique identifier
  • video_path: path to the video file
  • question: natural language question
  • options: multi-choice text (single line or multi-line)
  • answer: ground-truth answer label (A/B/C/…)

Example:

[
  {
    "video_id": "0074f737-11cb-497d-8d07-77c3a8127391",
    "video_path": "/path/to/videos/0074f737-11cb-497d-8d07-77c3a8127391.mp4",
    "question": "Taking into account all the actions performed by C, what can you deduce about the primary objective and focus within the video content?",
    "options": "A) C is cooking. B) C is doing laundry. C) C is cleaning the kitchen. D) C is cleaning dishes. E) C is cleaning the bathroom.",
    "answer": "D"
  },
  {
    "video_id": "00b9a0de-c59e-49cb-a127-6081e2fb8c8e",
    "video_path": "/path/to/videos/00b9a0de-c59e-49cb-a127-6081e2fb8c8e.mp4",
    "question": "What was the primary purpose of the cup of water in this video, and how did it contribute to the overall painting process?",
    "options": "A) To provide a source of water for the paintbrush. B) To provide a place to store the paintbrush. C) To provide a place to dispose of the paintbrush. D) To provide a place to rest the paintbrush. E) To clean the paintbrush.",
    "answer": "E"
  }
]

The original datasets used in our experiments can be obtained from the following sources:

🛠️ Training Script

Run the following command to train ReaSon:

python train.py \
    --data-json your_dataset.json \
    --save-dir checkpoints/

🙏 Acknowledgements

We sincerely thank the following open-source projects for providing essential components that contributed to our work

About

[AAAI 2026] The code repository for "ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding" in PyTorch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •