Microsoft Research 블로그

ADeLe: Predicting and explaining AI performance across tasks

4월 1, 2026 | Lexin Zhou 그리고 Xing Xie

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton…

최근 게시물

연구 분야별 필터링

ADeLe: Predicting and explaining AI performance across tasks

4월 1, 2026 | Lexin Zhou 그리고 Xing Xie

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton…
AsgardBench: A benchmark for visually grounded interactive planning

3월 26, 2026

Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don't go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of…
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

3월 26, 2026

Vision-language models (VLMs) use images and text to plan robot actions, but they still struggle to decide what actions to take and where to take them. Most systems split these decisions into two steps: a VLM generates a plan in natural language, and a separate…
Systematic debugging for AI agents: Introducing the AgentRx framework

3월 12, 2026 | Shraddha Barke, Arnav Goyal, Alind Khare, 그리고 Chetan Bansal

As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a mistake, we can usually trace the logic. But when an…
PlugMem: Transforming raw agent interactions into reusable knowledge

3월 10, 2026

It seems counterintuitive: giving AI agents more memory can make them less effective. As interaction logs accumulate, they grow large, fill with irrelevant content, and become increasingly difficult to use. More memory means that agents must search through larger volumes of past interactions to find information…
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

3월 4, 2026

We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a…
CORPGEN advances AI agents for real work

2월 26, 2026

By mid-morning, a typical knowledge worker is already juggling a client report, a budget spreadsheet, a slide deck, and an email backlog, all interdependent and all demanding attention at once. For AI agents to be genuinely useful in that environment, they will need to operate…
Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions

2월 19, 2026 | Eric Horvitz, Andrew Jenks, 그리고 Jessica Young

As synthetic media grows, verifying what’s real, and the origin of content, matters more than ever. Our latest report explores media integrity and authentication methods, their limits, and practical paths toward trustworthy provenance across images, audio, and video.
Project Silica’s advances in glass storage technology

2월 18, 2026 | Richard Black

Project Silica introduces new techniques for encoding data in borosilicate glass, as described in the journal Nature. These advances lower media cost and simplify writing and reading systems while supporting 10,000-year data preservation.
Rethinking imitation learning with Predictive Inverse Dynamics Models

2월 5, 2026

This research looks at why Predictive Inverse Dynamics Models often outperform standard Behavior Cloning in imitation learning. By using simple predictions of what happens next, PIDMs reduce ambiguity and learn from far fewer demonstrations.
Paza: Introducing automatic speech recognition benchmarks and models for low resource languages

2월 4, 2026 | Mercy Muchai, Kevin Chege, Nick Mumero, 그리고 Stephanie Nyairo

Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings.
UniRG: Scaling medical imaging report generation with multimodal reinforcement learning

1월 27, 2026

AI can help generate medical image reports, but today’s models struggle with varying reporting schemes. Learn how UniRG uses reinforcement learning to boost performance of medical vision-language models.

Explore More

Events & conferences

Meet our community of researchers, learn about exciting research topics, and grow your network
Podcasts

Ongoing conversations at the cutting edge of research
Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI

Microsoft Research 블로그

Microsoft Research 팔로우

뉴스 레터 구독

최근 게시물

Explore More