model April 02, 2026

TRIBE v2: Multimodal Brain‑Encoding Model for fMRI Prediction

TRIBE v2 is a foundation multimodal model released by Facebook Research that predicts functional MRI (fMRI) brain responses to naturalistic stimuli across vision, audition, and language. The model integrates state‑of‑the‑art encoders—LLaMA 3.2 for text, V‑JEPA2 for video, and Wav2Vec‑BERT for audio—into a unified Transformer that maps these representations onto the cortical surface (fsaverage5 mesh, ~20k vertices). It is designed for in‑silico neuroscience, enabling researchers to simulate how the human brain would react to complex, multimodal inputs.

The repository provides an easy‑to‑use inference API via the `TribeModel` class. Users can load the pretrained weights, supply a video, audio, or text file, and obtain a time‑by‑vertex prediction matrix. The predictions correspond to an "average" subject and can be visualized on the cortical mesh using optional plotting dependencies. A Colab notebook demonstrates a full workflow, from data loading to brain‑surface visualization. Installation options cover inference‑only, visualization, and full training environments, with training scripts that support Slurm‑based grid searches.

TRIBE v2 is licensed under CC‑BY‑NC‑4.0, encouraging academic reuse while restricting commercial exploitation. The model’s release includes links to the accompanying paper, demo site, and pretrained weights. Researchers are encouraged to cite the work and contribute via the provided contribution guidelines, fostering open science in computational neuroscience.

Project Ideas

  1. Generate predicted fMRI responses for a library of video clips to study visual processing patterns across cortical regions.
  2. Create an interactive web app that lets users upload an audio file and visualizes the model's estimated brain activation on the fsaverage5 surface.
  3. Compare simulated brain responses to natural language sentences versus their spoken equivalents to investigate multimodal integration.
  4. Use the model as a benchmark for evaluating new multimodal encoders by measuring how well they improve fMRI prediction accuracy.
  5. Fine‑tune TRIBE v2 on a custom dataset of stimulus‑response pairs from a specific experimental cohort to personalize the average‑subject predictions.
← Back to all reports