Microsoft launched three special Artificial Intelligence (AI) models last Thursday, which focus on image generation, voice generation and speech-to-text transcription. This Redmond-based tech company claims that these models perform better than specific models from Google, OpenAI and other companies. These models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image—are also said to be designed primarily to enable faster content creation and at affordable prices. These models are currently available through Microsoft Foundry and are also being integrated into various consumer products.
Microsoft introduces three new AI models
In a news post, the tech company unveiled three new Large Language Models (LLMs). All of these are accessible now through Microsoft Foundry and MAI Playground. The most notable of these is MAI-Transcribe-1, which the company claims provides state-of-the-art (SOTA) speech-to-text transcription capabilities in the 25 most commonly spoken languages.
These claims are based on Microsoft’s internal tests using the FLEURS benchmark. It is reported that this model performs better than Gemini 3.1 Flash and GPT-Transcribe in terms of error rate. Additionally, the company says that for Foundry users, this model will offer “the best price-performance ratio of any major cloud provider.”
Sample of an image made with AI
As for MAI-Voice-1, this LLM is described as “capable of creating natural, life-like voices that clearly convey nuance, a full range of emotions, and distinctive style.” This model is able to maintain consistency in voice and delivery style even when creating long content. Within Foundry, the model will also allow users to create their own custom sounds using just a few seconds of audio input.
Microsoft insists that the entire process is completely safe and secure. According to the available information, it can create a 60-second audio clip in just one second. The special thing is that this AI model will also power Copilot Audio Expressions and Copilot Podcasts.
Finally, the MAI-Image-2 model builds on the properties of its predecessor; It is said to produce faster and better quality output than ever before. Microsoft said the model was developed in collaboration with photographers, designers, and visual storytellers, with a primary focus on natural lighting, accurate textures, and clear text within images.
The special thing is that WPP is one of the first enterprise partners to adopt this AI model. Like the other two models, this model will also be available through Microsoft Foundry and MAI Playground. Apart from this, it is also being rolled out on Copilot, Bing and PowerPoint.











