EmoCtrl-TTS
Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech EmoCtrl-TTS is an emotion-controllable zero-shot TTS that can generate highly emotional speech with non-verbal vocalizations such as laughter and crying for any speaker. EmoCtrl-TTS is purely a…
Research Focus: Week of June 24, 2024
In this issue: RENC makes 5G vRAN servers more energy efficient; CoExplorer uses AI to keep video meetings on track; Automatic bug detection in LLM-powered text-based games; MAIRA-2: Grounded radiology report generation.
E2 TTS
Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS E2 TTS (Embarrassingly Easy TTS) is a fully non-autoregressive zero-shot text-to-speech (TTS) system capable of generating the voice of any speaker. Despite its extremely simple model architecture and training…
Making Sentence Embeddings Robust to User-Generated Content
This seminar was hosted by Microsoft Research Africa, Nairobi together with the Microsoft AI for Good team in May 2024. User-generated content (UGC), e.g. social media posts written in “Internet language”, presents a lot of…
Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
Daniela Massiceti delves into the transformative potential of multimodal models such as CLIP for assistive technologies. Specifically focusing on the blind/low-vision community, the talk explores the current distance from realizing this potential and the advancements…
Panel: Generative AI for Global Impact: Challenges and Opportunities
Microsoft researchers discuss the challenges and opportunities of making AI more inclusive and impactful for everyone—from data that represents a broader range of communities and cultures to novel use cases for AI that are globally…
DOSA
A dataset of social artifacts from different Indian geographical subcultures. This repo hosts the code to run experiments on the DOSA dataset.