Learning outcomes with GenAI in the classroom: A review of empirical evidence

MSR-TR-2025-42 |

Published by Microsoft

This report presents a review of recent empirical evidence of generative AI (GenAI) impact on learning outcomes in formal education. Its purpose is to provide educators with an overview of top concerns for ensuring studentsโ€™ learning gains when using LLM-based learning tools and concludes with research-derived guidance for deciding when and how to use these tools in the classroom. The reportย unfoldsย as follows:ย ย 

Section 1 distinguishes between the needs of education and industry,ย where the benefits of LLMs were first explored, primarily for productivity gains.ย Educatorsโ€™ priorities are different.ย Pedagogical concerns includeย consideration ofย inequities in education, developing studentsโ€™ critical thinking skills, and the potential forย GenAIย to inhibit social development. These concerns extend beyond technologistsโ€™ focus on mitigating technical harms such as toxic content, bias, or accuracy in system outputs.ย ย 

Section 2 presents several key variablesย that affect learningย withย GenAI:ย (1)ย AI literacyโ€”understanding the capabilities and limitations of an AI systemโ€”is a critical new variable for student success when usingย GenAI.ย (2)ย Educational equity is a variable whereย GenAIย renders mixed experiences for marginalized groups. Studies show howย GenAIย can be an effective resource for students with disabilities. In other contexts, it entrenches existing patterns in academic performance of the weakest students and canย exacerbateย inequities for economically marginalized students.ย (3)ย GenAIย canย impactย psychological and social conditions long recognized toย facilitateย learning: self-efficacy, individual pace, and human connection. On self-efficacy, studies show that students can be overconfident about their skill mastery when usingย GenAIย and need help calibrating their mental model of learning gains. For self-paced learning,ย GenAIย introduces both efficiencies and pitfalls depending onย learning domain andย context, including whether AI tools are general purpose chatbots or scaffolded tutors.ย Studies also highlightย GenAIย impact on human connection, the foundation for developing higher-order skills of critical thinking and creativity.ย GenAIโ€™sย on-demand availability but lack of social presenceย can present opportunities and disadvantages, from providing a nonjudgmental environment for exploring topics to reducing collaboration with peers in group projects. Yet, studies show that human tutorsย remainย studentsโ€™ preferred source for trusted information.ย ย 

Section 3 examines howย GenAIย usage aligns with learningย objectivesย in Bloomโ€™s taxonomy.ย Basic cognitive skillsโ€”Bloomโ€™sย rememberingย andย understandingโ€”are fundamental to success across academic domains. Studies show that there can beย an overdependenceย and lack of engagement thatย resultย in impaired memoryย formation when using LLM chatbots. Development of higher-order thinkingโ€”analysis, reasoning,ย andย creativityโ€”can be compromised ifย GenAIย is used in ways that bypass the necessary struggle that is integral toย acquiringย skills. Studies illustrate how use of general-purposeย GenAIย tools such as ChatGPT, without scaffolding or other pedagogical guardrails, can be detrimental to critical thinking.ย GenAIย can alsoย impactย creativity. Students usingย GenAIย for creative problem-solving canย benefitย from fast prototype iteration and greater project completeness or detail but can also tend toward idea fixation and less originality and complexity in their work.ย 

Section 4 highlights howย GenAIย learning tools need greater pedagogical complexity.ย Up to now,ย state-of-the-artย tools have been ChatGPT or similar, with prompt engineering for the model to assume an instructor role or restrain itsย outputs. However,ย modifiedย general-purpose chatbots cannot address the broad range of pedagogical considerations involved in learning success. New types of experimental AI tutors with embedded proven pedagogical strategiesโ€”for example, capable of detecting and effectively responding to a range of student cognitive statesโ€”show promise. Consulting educators inย the designย is key for success of systems like these that are on the horizon.ย 

A concluding synthesis of the empirical evidence offers four guidelines for integratingย GenAIย in learning environments: (1)ย Ensure student readinessโ€”avoid introducingย GenAIย too early, before students master domain basics.ย (2)ย Teach AI literacyโ€”build an awareness ofย GenAIย capabilities and limitations so students can assess system outputs and learn domain-specific techniques forย optimalย results.ย (3)ย Useย GenAIย as a supplement to traditional learning methodsโ€”GenAIย explanations and examples are capabilities that students value, but teacher guidance with these explanationsย remainsย necessary.ย (4)ย Promote design interventions that foster studentย engagementโ€”limiting copy-paste functionality, supporting studentsโ€™ metacognitive calibration to reduce overestimation of their learning progress, nudging learnersย towardsย critical thinking, and evaluatingย GenAIย tools for provenย engagement strategies.ย 

 

Cite as:
Walker, K. and Vorvoreanu, M. 2025. Learning outcomes with GenAI in the classroom: A review of empirical evidence. Microsoft Technical Report MSR-TR-2025-42 October 2025.