MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
The video introduces MindJourney, a framework that enhances Vision-Language Models (VLMs), which excel at interpreting single images but struggle to infer the underlying three-dimensional world. By allowing the VLM to “imagine” moving through the scene…
MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation
MindJourney can enable AI to navigate and interpret 3D environments from limited visual input, potentially improving performance in navigation, planning, and safety-critical tasks.
MindJourney
MindJourney is a framework that equips AI agents with a “simulation loop” to explore hypothetical 3D viewpoints before answering spatial reasoning questions—tackling a core limitation of vision-language models (VLMs), which recognize objects well in 2D…
VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
Virtual 3D meetings offer the potential to enhance copresence, increase engagement and thus improve effectiveness of remote meetings compared to standard 2D video calls. However, representing people in 3D meetings remains a challenge; existing solutions…