Project
Autoregressive Video Models
Driving large video models with next token prediction In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image…
Publication
Video In-context Learning
Project
A Dynamic Benchmark for Image Understanding
We have created a procedurally generatable, synthetic dataset for testing spatial reasoning, visual prompting, object recognition and detection. A key question for understanding multimodal model performance is how well is can understand images, in particular…
Project
Flood Forecasting in Critical Flood Prone Areas of Rwanda
Developing short-term flood predictions for timely warnings and damage minimization in highly urbanized regions in Rwanda