{"id":69051,"date":"2026-02-05T11:18:25","date_gmt":"2026-02-05T11:18:25","guid":{"rendered":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/"},"modified":"2026-02-05T11:18:25","modified_gmt":"2026-02-05T11:18:25","slug":"mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/","title":{"rendered":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale"},"content":{"rendered":"<p>Automatic speech recognition (ASR) is becoming a core building block for AI products, from meeting tools to voice agents. Mistral\u2019s new <strong>Voxtral Transcribe 2<\/strong> family targets this space with 2 models that split cleanly into batch and realtime use cases, while keeping cost, latency, and deployment constraints in focus.<\/p>\n<p><strong>The release includes:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Voxtral Mini Transcribe V2<\/strong> for batch transcription with diarization.<\/li>\n<li><strong>Voxtral Realtime (Voxtral Mini 4B Realtime 2602)<\/strong> for low-latency streaming transcription, released as open weights. <\/li>\n<\/ul>\n<p>Both models are designed for <strong>13 languages<\/strong>: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Model family: batch and streaming, with clear roles<\/strong><\/h3>\n<p>Mistral positions Voxtral Transcribe 2 as \u2018two next-generation speech-to-text models\u2019 with <strong>state-of-the-art transcription quality, diarization, and ultra-low latency<\/strong>. <\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Voxtral Mini Transcribe V2<\/strong> is the <strong>batch model<\/strong>. It is optimized for transcription quality and diarization across domains and languages and exposed as an efficient audio input model in the Mistral API. <\/li>\n<li><strong>Voxtral Realtime<\/strong> is the <strong>streaming model<\/strong>. It is built with a dedicated streaming architecture and is released as an open-weights model under <strong>Apache 2.0<\/strong> on Hugging Face, with a recommended vLLM runtime. <\/li>\n<\/ul>\n<p>A key detail: <strong>speaker diarization is provided by Voxtral Mini Transcribe V2<\/strong>, not by Voxtral Realtime. Realtime focuses strictly on fast, accurate streaming transcription.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Voxtral Realtime: 4B-parameter streaming ASR with configurable delay<\/strong><\/h3>\n<p><strong>Voxtral Mini 4B Realtime 2602<\/strong> is a <strong>4B-parameter multilingual realtime speech-transcription model<\/strong>. It is among the first open-weights models to reach accuracy comparable to offline systems with a delay under 500 ms.<\/p>\n<p><strong>Architecture:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>\u22483.4B-parameter <strong>language model<\/strong>.<\/li>\n<li>\u22480.6B-parameter <strong>audio encoder<\/strong>.<\/li>\n<li>The audio encoder is trained from scratch with <strong>causal attention<\/strong>.<\/li>\n<li>Both encoder and LM use <strong>sliding-window attention<\/strong>, enabling effectively \u201cinfinite\u201d streaming.<\/li>\n<\/ul>\n<p><strong>Latency vs accuracy is explicitly configurable:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Transcription delay is tunable from 80 ms to 2.4 s<\/strong> via a <code>transcription_delay_ms<\/code> parameter. <\/li>\n<li>The Mistral describes latency as <strong>\u201cconfigurable down to sub-200 ms\u201d<\/strong> for live applications. <\/li>\n<li>At <strong>480 ms delay<\/strong>, Realtime matches leading offline open-source transcription models and realtime APIs on benchmarks such as FLEURS and long-form English. <\/li>\n<li>At <strong>2.4 s delay<\/strong>, Realtime matches <strong>Voxtral Mini Transcribe V2<\/strong> on FLEURS, which is appropriate for subtitling tasks where slightly higher latency is acceptable. <\/li>\n<\/ul>\n<p><strong>From a deployment standpoint:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>The model is released in <strong>BF16<\/strong> and is designed for <strong>on-device or edge deployment<\/strong>.<\/li>\n<li>It can run in realtime on a <strong>single GPU with \u226516 GB memory<\/strong>, according to the vLLM serving instructions in the model card.<\/li>\n<\/ul>\n<p><strong>The main control knob is the delay setting:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Lower delays (\u224880\u2013200 ms) for interactive agents where responsiveness dominates.<\/li>\n<li>Around <strong>480 ms<\/strong> as the recommended \u201csweet spot\u201d between latency and accuracy.<\/li>\n<li>Higher delays (up to 2.4 s) when you need accuracy as close as possible to the batch model.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Voxtral Mini Transcribe V2: batch ASR with diarization and context biasing<\/strong><\/h3>\n<p><strong>Voxtral Mini Transcribe V2<\/strong> is a closed-weights <strong>audio input model<\/strong> optimized only for transcription. It is exposed in the Mistral API as <code>voxtral-mini-2602<\/code> at <strong>$0.003 per minute<\/strong>.<\/p>\n<p><strong>On benchmarks and pricing:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Around <strong>4% word error rate (WER)<\/strong> on the FLEURS transcription benchmark, averaged over the top 10 languages.<\/li>\n<li><strong>\u201cBest price-performance of any transcription API\u201d<\/strong> at $0.003\/min.<\/li>\n<li>Outperforms <strong>GPT-4o mini Transcribe<\/strong>, <strong>Gemini 2.5 Flash<\/strong>, <strong>Assembly Universal<\/strong>, and <strong>Deepgram Nova<\/strong> on accuracy in their comparisons.<\/li>\n<li>Processes audio <strong>\u22483\u00d7 faster than ElevenLabs\u2019 Scribe v2<\/strong> while matching quality at <strong>one-fifth the cost<\/strong>.<\/li>\n<\/ul>\n<p><strong>Enterprise-oriented features are concentrated in this model:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Speaker diarization<\/strong>\n<ul class=\"wp-block-list\">\n<li>Outputs speaker labels with precise start and end times.<\/li>\n<li>Designed for meetings, interviews, and multi-party calls.<\/li>\n<li>For overlapping speech, the model typically emits a single speaker label.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Context biasing<\/strong>\n<ul class=\"wp-block-list\">\n<li>Accepts up to <strong>100 words or phrases<\/strong> to bias transcription toward specific names or domain terms.<\/li>\n<li>Optimized for English, with <strong>experimental support<\/strong> for other languages.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Word-level timestamps<\/strong>\n<ul class=\"wp-block-list\">\n<li>Per-word start and end timestamps for subtitles, alignment, and searchable audio workflows.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Noise robustness<\/strong>\n<ul class=\"wp-block-list\">\n<li>Maintains accuracy in noisy environments such as factory floors, call centers, and field recordings.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Longer audio support<\/strong>\n<ul class=\"wp-block-list\">\n<li>Handles up to <strong>3 hours<\/strong> of audio in a single request.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Language coverage mirrors Realtime: 13 languages, with Mistral noting that non-English performance \u201csignificantly outpaces competitors\u201d in their evaluation. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1956\" height=\"1024\" data-attachment-id=\"77750\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/04\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/screenshot-2026-02-04-at-11-28-56-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\" data-orig-size=\"1956,1024\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-04 at 11.28.56\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1-300x157.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1-1024x536.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\" alt=\"\" class=\"wp-image-77750\" \/><figcaption class=\"wp-element-caption\">https:\/\/mistral.ai\/news\/voxtral-transcribe-2<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>APIs, tooling, and deployment options<\/strong><\/h3>\n<p><strong>The integration paths are straightforward and differ slightly between the two models:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Voxtral Mini Transcribe V2<\/strong>\n<ul class=\"wp-block-list\">\n<li>Served via the Mistral <strong>audio transcription API<\/strong> (<code>\/v1\/audio\/transcriptions<\/code>) as an efficient transcription-only service. <\/li>\n<li>Priced at <strong>$0.003\/min<\/strong>. (<a href=\"https:\/\/mistral.ai\/news\/voxtral-transcribe-2\">Mistral AI<\/a>)<\/li>\n<li>Available in <strong>Mistral Studio\u2019s audio playground<\/strong> and in <strong>Le Chat<\/strong> for interactive testing.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Voxtral Realtime<\/strong>\n<ul class=\"wp-block-list\">\n<li>Available via the Mistral API at <strong>$0.006\/min<\/strong>. <\/li>\n<li>Released as <strong>open weights<\/strong> on Hugging Face (<code>mistralai\/Voxtral-Mini-4B-Realtime-2602<\/code>) under Apache 2.0, with official vLLM Realtime support.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>The audio playground in Mistral Studio lets users: <\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Upload up to <strong>10 audio files<\/strong> (.mp3, .wav, .m4a, .flac, .ogg) up to <strong>1 GB<\/strong> each.<\/li>\n<li>Toggle diarization, choose timestamp granularity, and configure context bias terms.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>Two-model family with clear roles<\/strong>: Voxtral Mini Transcribe V2 targets batch transcription and diarization, while Voxtral Realtime targets low-latency streaming ASR, both across 13 languages.<\/li>\n<li><strong>Realtime model- 4B parameters with tunable delay<\/strong>: Voxtral Realtime uses a 4B architecture (\u22483.4B LM + \u22480.6B encoder) with sliding-window and causal attention, and supports configurable transcription delay from 80 ms to 2.4 s.<\/li>\n<li><strong>Latency vs accuracy trade-off is explicit<\/strong>: Around 480 ms delay, Voxtral Realtime reaches accuracy comparable to strong offline and realtime systems, and at 2.4 s it matches Voxtral Mini Transcribe V2 on FLEURS.<\/li>\n<li><strong>Batch model adds diarization and enterprise features<\/strong>: Voxtral Mini Transcribe V2 provides diarization, context biasing with up to 100 phrases, word-level timestamps, noise robustness, and supports up to 3 hours of audio per request at $0.003\/min.<\/li>\n<li><strong>Deployment- closed batch API, open realtime weights<\/strong>: Mini Transcribe V2 is served via Mistral\u2019s audio transcription API and playground, while Voxtral Realtime is priced at $0.006\/min and also available as Apache 2.0 open weights with official vLLM Realtime support.<\/li>\n<\/ol>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/mistral.ai\/news\/voxtral-transcribe-2\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a> and <a href=\"https:\/\/huggingface.co\/mistralai\/Voxtral-Mini-4B-Realtime-2602\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/04\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\">Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Automatic speech recognition (ASR) is becoming a core building block for AI products, from meeting tools to voice agents. Mistral\u2019s new Voxtral Transcribe 2 family targets this space with 2 models that split cleanly into batch and realtime use cases, while keeping cost, latency, and deployment constraints in focus. The release includes: Voxtral Mini Transcribe V2 for batch transcription with diarization. Voxtral Realtime (Voxtral Mini 4B Realtime 2602) for low-latency streaming transcription, released as open weights. Both models are designed for 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Model family: batch and streaming, with clear roles Mistral positions Voxtral Transcribe 2 as \u2018two next-generation speech-to-text models\u2019 with state-of-the-art transcription quality, diarization, and ultra-low latency. Voxtral Mini Transcribe V2 is the batch model. It is optimized for transcription quality and diarization across domains and languages and exposed as an efficient audio input model in the Mistral API. Voxtral Realtime is the streaming model. It is built with a dedicated streaming architecture and is released as an open-weights model under Apache 2.0 on Hugging Face, with a recommended vLLM runtime. A key detail: speaker diarization is provided by Voxtral Mini Transcribe V2, not by Voxtral Realtime. Realtime focuses strictly on fast, accurate streaming transcription. Voxtral Realtime: 4B-parameter streaming ASR with configurable delay Voxtral Mini 4B Realtime 2602 is a 4B-parameter multilingual realtime speech-transcription model. It is among the first open-weights models to reach accuracy comparable to offline systems with a delay under 500 ms. Architecture: \u22483.4B-parameter language model. \u22480.6B-parameter audio encoder. The audio encoder is trained from scratch with causal attention. Both encoder and LM use sliding-window attention, enabling effectively \u201cinfinite\u201d streaming. Latency vs accuracy is explicitly configurable: Transcription delay is tunable from 80 ms to 2.4 s via a transcription_delay_ms parameter. The Mistral describes latency as \u201cconfigurable down to sub-200 ms\u201d for live applications. At 480 ms delay, Realtime matches leading offline open-source transcription models and realtime APIs on benchmarks such as FLEURS and long-form English. At 2.4 s delay, Realtime matches Voxtral Mini Transcribe V2 on FLEURS, which is appropriate for subtitling tasks where slightly higher latency is acceptable. From a deployment standpoint: The model is released in BF16 and is designed for on-device or edge deployment. It can run in realtime on a single GPU with \u226516 GB memory, according to the vLLM serving instructions in the model card. The main control knob is the delay setting: Lower delays (\u224880\u2013200 ms) for interactive agents where responsiveness dominates. Around 480 ms as the recommended \u201csweet spot\u201d between latency and accuracy. Higher delays (up to 2.4 s) when you need accuracy as close as possible to the batch model. Voxtral Mini Transcribe V2: batch ASR with diarization and context biasing Voxtral Mini Transcribe V2 is a closed-weights audio input model optimized only for transcription. It is exposed in the Mistral API as voxtral-mini-2602 at $0.003 per minute. On benchmarks and pricing: Around 4% word error rate (WER) on the FLEURS transcription benchmark, averaged over the top 10 languages. \u201cBest price-performance of any transcription API\u201d at $0.003\/min. Outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy in their comparisons. Processes audio \u22483\u00d7 faster than ElevenLabs\u2019 Scribe v2 while matching quality at one-fifth the cost. Enterprise-oriented features are concentrated in this model: Speaker diarization Outputs speaker labels with precise start and end times. Designed for meetings, interviews, and multi-party calls. For overlapping speech, the model typically emits a single speaker label. Context biasing Accepts up to 100 words or phrases to bias transcription toward specific names or domain terms. Optimized for English, with experimental support for other languages. Word-level timestamps Per-word start and end timestamps for subtitles, alignment, and searchable audio workflows. Noise robustness Maintains accuracy in noisy environments such as factory floors, call centers, and field recordings. Longer audio support Handles up to 3 hours of audio in a single request. Language coverage mirrors Realtime: 13 languages, with Mistral noting that non-English performance \u201csignificantly outpaces competitors\u201d in their evaluation. https:\/\/mistral.ai\/news\/voxtral-transcribe-2 APIs, tooling, and deployment options The integration paths are straightforward and differ slightly between the two models: Voxtral Mini Transcribe V2 Served via the Mistral audio transcription API (\/v1\/audio\/transcriptions) as an efficient transcription-only service. Priced at $0.003\/min. (Mistral AI) Available in Mistral Studio\u2019s audio playground and in Le Chat for interactive testing. Voxtral Realtime Available via the Mistral API at $0.006\/min. Released as open weights on Hugging Face (mistralai\/Voxtral-Mini-4B-Realtime-2602) under Apache 2.0, with official vLLM Realtime support. The audio playground in Mistral Studio lets users: Upload up to 10 audio files (.mp3, .wav, .m4a, .flac, .ogg) up to 1 GB each. Toggle diarization, choose timestamp granularity, and configure context bias terms. Key Takeaways Two-model family with clear roles: Voxtral Mini Transcribe V2 targets batch transcription and diarization, while Voxtral Realtime targets low-latency streaming ASR, both across 13 languages. Realtime model- 4B parameters with tunable delay: Voxtral Realtime uses a 4B architecture (\u22483.4B LM + \u22480.6B encoder) with sliding-window and causal attention, and supports configurable transcription delay from 80 ms to 2.4 s. Latency vs accuracy trade-off is explicit: Around 480 ms delay, Voxtral Realtime reaches accuracy comparable to strong offline and realtime systems, and at 2.4 s it matches Voxtral Mini Transcribe V2 on FLEURS. Batch model adds diarization and enterprise features: Voxtral Mini Transcribe V2 provides diarization, context biasing with up to 100 phrases, word-level timestamps, noise robustness, and supports up to 3 hours of audio per request at $0.003\/min. Deployment- closed batch API, open realtime weights: Mini Transcribe V2 is served via Mistral\u2019s audio transcription API and playground, while Voxtral Realtime is priced at $0.006\/min and also available as Apache 2.0 open weights with official vLLM Realtime support. Check out the\u00a0Technical details and Model Weights.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-69051","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-05T11:18:25+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"5\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale\",\"datePublished\":\"2026-02-05T11:18:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\"},\"wordCount\":1008,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\",\"url\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\",\"name\":\"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\",\"datePublished\":\"2026-02-05T11:18:25+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage\",\"url\":\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\",\"contentUrl\":\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/","og_locale":"de_DE","og_type":"article","og_title":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-02-05T11:18:25+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"5\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale","datePublished":"2026-02-05T11:18:25+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/"},"wordCount":1008,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/","url":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/","name":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png","datePublished":"2026-02-05T11:18:25+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#primaryimage","url":"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png","contentUrl":"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-04-at-11.28.56-PM-1.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/mistral-ai-launches-voxtral-transcribe-2-pairing-batch-diarization-and-open-realtime-asr-for-multilingual-production-workloads-at-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Mistral AI Launches Voxtral Transcribe 2: Pairing Batch Diarization And Open Realtime ASR For Multilingual Production Workloads At Scale"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Automatic speech recognition (ASR) is becoming a core building block for AI products, from meeting tools to voice agents. Mistral\u2019s new Voxtral Transcribe 2 family targets this space with 2 models that split cleanly into batch and realtime use cases, while keeping cost, latency, and deployment constraints in focus. The release includes: Voxtral Mini Transcribe&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/69051","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=69051"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/69051\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=69051"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=69051"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=69051"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}