{"id":91690,"date":"2026-05-20T16:48:51","date_gmt":"2026-05-20T16:48:51","guid":{"rendered":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/"},"modified":"2026-05-20T16:48:51","modified_gmt":"2026-05-20T16:48:51","slug":"alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/","title":{"rendered":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency"},"content":{"rendered":"<p>Simultaneous interpretation is one of the harder problems in applied AI. You\u2019re asking a model to translate speech before the speaker has finished a sentence. Every extra second of delay breaks the illusion of real-time communication. Alibaba\u2019s Qwen team has been chipping away at this with each release. Their latest model, <strong>Qwen3.5-LiveTranslate-Flash<\/strong>, brings that<strong> latency down to 2.8 seconds<\/strong> and expands input language coverage to 60 languages. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"662\" data-attachment-id=\"79980\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/20\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/screenshot-2026-05-20-at-1-09-04-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1.png\" data-orig-size=\"1906,1232\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-05-20 at 1.09.04\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-1024x662.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-1024x662.png\" alt=\"\" class=\"wp-image-79980\" \/><figcaption class=\"wp-element-caption\">https:\/\/qwen.ai\/blog?id=qwen3.5-livetranslate<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>A Meaningful Jump From the Previous Release<\/strong><\/h2>\n<p>The Qwen3-LiveTranslate-Flash handled 18 input languages at roughly three seconds of latency. <strong>Qwen3.5-LiveTranslate-Flash<\/strong> brings that down to <strong>2.8 seconds<\/strong>, expands input coverage to 60 languages, and adds speech output in 29 languages. That\u2019s more than a 3\u00d7 expansion in language coverage on the input side. For devs building multilingual products, this reduces the need for per-language model switching in most global enterprise scenarios.<\/p>\n<p>The latency improvement comes from a technique for processing what the team calls \u2018reading units.\u2019 Rather than waiting for a full sentence to arrive before producing output, the model decides when enough meaning has accumulated in a segment to commit to a translation. It streams output continuously while the speaker is still talking. This is the same underlying logic as semantic unit prediction but with a tighter implementation that shaves off that extra 200 milliseconds.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Vision Is Now a First-Class Input<\/strong><\/h2>\n<p>Most translation systems treat audio as the only input signal. That works fine in clean studio conditions. It breaks down in a crowded conference room, a noisy trade floor, or anywhere with overlapping voices and bad acoustics.<\/p>\n<p>Qwen3.5-LiveTranslate-Flash takes a different approach. It analyzes visual information in parallel with audio on-screen text, physically shown objects, lip movements, and gestures. When a word is phonetically ambiguous or the audio stream degrades, the visual context fills the gap and sharpens the translation decision. This is not a minor feature. In real-world deployment, audio quality is rarely guaranteed. Having a vision channel means the model handles the messy reality of live interpretation more gracefully than audio-only systems.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Voice Cloning Happens in Real Time<\/strong><\/h2>\n<p>This is the part that stands out most in the Qwen3.5 release. Standard translation systems replace the speaker\u2019s voice with a generic synthesis voice. Qwen3.5-LiveTranslate-Flash instead clones the characteristic voice features of the original speaker during the translation itself. A single spoken sentence is enough for the model to perform this acoustic adaptation.<\/p>\n<p>For listeners on the receiving end, the translated output sounds like the same person speaking the target language and not a robotic substitute. In live conference interpretation, multilingual livestreams, or international customer calls, this is important. The experience feels noticeably more human than what current systems deliver.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Configure Domain-Specific Keywords<\/strong><\/h2>\n<p>One persistent failure mode for translation models in professional settings is proper nouns and specialized vocabulary. A model translating a medical briefing might consistently mistranslate a drug name. A legal interpretation session breaks down over a technical statute term.<\/p>\n<p>Qwen3.5-LiveTranslate-Flash addresses this with dynamic keyword configuration at runtime. Developers can inject a glossary of brand names, medical terms, legal terminology, or technical vocabulary, and the model handles those terms significantly more reliably. This isn\u2019t available in most general-purpose translation APIs and it closes a real gap for domain-specific enterprise deployments.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Benchmark Performance<\/strong><\/h2>\n<p>On FLEURS and CoVoST2 \u2014 two established benchmarks for multilingual speech translation \u2014 Qwen3.5-LiveTranslate-Flash outperforms major commercial alternatives. FLEURS tests translation quality across a wide variety of language pairs under real acoustic conditions. CoVoST2 covers 21 translation directions from speech, making it a practical proxy for multilingual pipeline performance. <\/p>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div class=\"qlt-wrap\">\n<div class=\"qlt-header\">\n<div class=\"qlt-header-top\">\n    <span class=\"qlt-badge\">\u2713 Developer Guide<\/span>\n  <\/div>\n<div class=\"qlt-title\">How to Use Qwen3.5-LiveTranslate-Flash<\/div>\n<div class=\"qlt-subtitle\">A step-by-step integration guide \u2014 from setup to production-ready real-time translation<\/div>\n<\/div>\n<div class=\"qlt-steps-nav\">\n  <button class=\"qlt-step-btn active\"><span class=\"qlt-step-num\">1<\/span>Overview<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">2<\/span>Prerequisites<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">3<\/span>Connect<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">4<\/span>Send Audio<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">5<\/span>Visual Input<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">6<\/span>Keywords<\/button><br \/>\n  <button class=\"qlt-step-btn\"><span class=\"qlt-step-num\">7<\/span>Languages<\/button>\n<\/div>\n<div class=\"qlt-body\">\n<div class=\"qlt-panel active\">\n<div class=\"qlt-section-label\">What it does<\/div>\n<div class=\"qlt-panel-title\">Qwen3.5-LiveTranslate-Flash at a glance<\/div>\n<div class=\"qlt-panel-desc\">Qwen3.5-LiveTranslate-Flash is an API-only, closed-weight real-time translation model from Alibaba\u2019s Qwen team. It takes audio and video frames as simultaneous inputs and outputs translated text and speech. The model uses a WebSocket-based protocol over Alibaba Cloud Model Studio.<\/div>\n<div class=\"qlt-info-grid\">\n<div class=\"qlt-info-card\">\n<div class=\"qlt-info-card-label\">Latency<\/div>\n<div class=\"qlt-info-card-value\">2.8s<\/div>\n<div class=\"qlt-info-card-sub\">Per token to audio out<\/div>\n<\/div>\n<div class=\"qlt-info-card\">\n<div class=\"qlt-info-card-label\">Input languages<\/div>\n<div class=\"qlt-info-card-value\">60<\/div>\n<div class=\"qlt-info-card-sub\">Speech + visual input<\/div>\n<\/div>\n<div class=\"qlt-info-card\">\n<div class=\"qlt-info-card-label\">Speech output<\/div>\n<div class=\"qlt-info-card-value\">29<\/div>\n<div class=\"qlt-info-card-sub\">Languages with voice<\/div>\n<\/div>\n<div class=\"qlt-info-card\">\n<div class=\"qlt-info-card-label\">Protocol<\/div>\n<div class=\"qlt-info-card-value\">WebSocket<\/div>\n<div class=\"qlt-info-card-sub\">Persistent connection<\/div>\n<\/div>\n<\/div>\n<ul class=\"qlt-feat-list\">\n<li><span class=\"qlt-feat-icon green\">\u2713<\/span>\n<div><strong>Vision-enhanced comprehension<\/strong> \u2014 lip movements, gestures, and on-screen text all feed into the translation decision alongside audio<\/div>\n<\/li>\n<li><span class=\"qlt-feat-icon blue\">\u25c6<\/span>\n<div><strong>Real-time voice cloning<\/strong> \u2014 clones the original speaker\u2019s voice profile in the translated output from a single spoken sentence<\/div>\n<\/li>\n<li><span class=\"qlt-feat-icon amber\">\u25c6<\/span>\n<div><strong>Semantic unit prediction<\/strong> \u2014 commits to output segments before a full sentence ends, enabling continuous streaming without waiting for complete utterances<\/div>\n<\/li>\n<li><span class=\"qlt-feat-icon coral\">\u25c6<\/span>\n<div><strong>Dynamic keyword configuration<\/strong> \u2014 inject domain-specific glossaries at runtime for technical, medical, or legal terminology<\/div>\n<\/li>\n<\/ul><\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Before you start<\/div>\n<div class=\"qlt-panel-title\">Prerequisites<\/div>\n<div class=\"qlt-panel-desc\">You need an Alibaba Cloud account with Model Studio access and a valid DashScope API key. The model is available through the <code>qwen3-livetranslate-flash-realtime<\/code> model ID.<\/div>\n<div class=\"qlt-step-flow\">\n<div class=\"qlt-flow-item\">\n<div class=\"qlt-flow-spine\">\n<div class=\"qlt-flow-dot\">1<\/div>\n<div class=\"qlt-flow-line\"><\/div>\n<\/div>\n<div class=\"qlt-flow-content\">\n<h4>Create an Alibaba Cloud account<\/h4>\n<p>Sign up at <code>alibabacloud.com<\/code> and activate Alibaba Cloud Model Studio in your account dashboard.<\/p>\n<\/div><\/div>\n<div class=\"qlt-flow-item\">\n<div class=\"qlt-flow-spine\">\n<div class=\"qlt-flow-dot\">2<\/div>\n<div class=\"qlt-flow-line\"><\/div>\n<\/div>\n<div class=\"qlt-flow-content\">\n<h4>Get your DashScope API key<\/h4>\n<p>Navigate to Model Studio \u2192 API Keys. Generate a key and store it as the environment variable <code>DASHSCOPE_API_KEY<\/code>. Never hardcode it in source files.<\/p>\n<\/div><\/div>\n<div class=\"qlt-flow-item\">\n<div class=\"qlt-flow-spine\">\n<div class=\"qlt-flow-dot\">3<\/div>\n<div class=\"qlt-flow-line\"><\/div>\n<\/div>\n<div class=\"qlt-flow-content\">\n<h4>Install the Python dependency<\/h4>\n<p>Install the <code>websocket-client<\/code> package for the WebSocket connection. For audio capture, also install <code>pyaudio<\/code>.<\/p>\n<\/div><\/div>\n<div class=\"qlt-flow-item\">\n<div class=\"qlt-flow-spine\">\n<div class=\"qlt-flow-dot\">4<\/div>\n<\/div>\n<div class=\"qlt-flow-content\">\n<h4>Check your audio setup<\/h4>\n<p>The model accepts 16kHz, 16-bit PCM mono audio on input. Confirm your microphone or audio source can output in this format before connecting.<\/p>\n<\/div><\/div>\n<\/div>\n<div class=\"qlt-code-block\">\n<div class=\"qlt-code-top\"><span class=\"qlt-code-lang\">BASH<\/span><button class=\"qlt-code-copy\">Copy<\/button><\/div>\n<pre><code><span class=\"cm\"># Install dependencies<\/span>\n<span class=\"fn\">pip<\/span> install websocket-client pyaudio\n\n<span class=\"cm\"># Set your API key as an environment variable<\/span>\n<span class=\"kw\">export<\/span> <span class=\"nm\">DASHSCOPE_API_KEY<\/span>=<span class=\"st\">\"your_key_here\"<\/span><\/code><\/pre>\n<\/div>\n<\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Step 3 \u2014 Connection<\/div>\n<div class=\"qlt-panel-title\">Establish the WebSocket connection<\/div>\n<div class=\"qlt-panel-desc\">The model uses the WebSocket protocol for a persistent, bidirectional connection. You authenticate via a Bearer token in the connection header using your DashScope API key.<\/div>\n<div class=\"qlt-code-block\">\n<div class=\"qlt-code-top\"><span class=\"qlt-code-lang\">PYTHON<\/span><button class=\"qlt-code-copy\">Copy<\/button><\/div>\n<pre><code><span class=\"kw\">import<\/span> json, websocket, os\n\n<span class=\"nm\">API_KEY<\/span> = os.<span class=\"fn\">getenv<\/span>(<span class=\"st\">\"DASHSCOPE_API_KEY\"<\/span>)\n<span class=\"nm\">API_URL<\/span> = (\n    <span class=\"st\">\"wss:\/\/dashscope-intl.aliyuncs.com\"<\/span>\n    <span class=\"st\">\"\/api-ws\/v1\/realtime\"<\/span>\n    <span class=\"st\">\"?model=qwen3-livetranslate-flash-realtime\"<\/span>\n)\n\n<span class=\"kw\">def<\/span> <span class=\"fn\">on_open<\/span>(ws):\n    <span class=\"fn\">print<\/span>(<span class=\"st\">\"Connected to Qwen3.5-LiveTranslate-Flash\"<\/span>)\n\n<span class=\"kw\">def<\/span> <span class=\"fn\">on_message<\/span>(ws, message):\n    data = json.<span class=\"fn\">loads<\/span>(message)\n    <span class=\"fn\">print<\/span>(<span class=\"st\">\"Translation event:\"<\/span>, data)\n\n<span class=\"kw\">def<\/span> <span class=\"fn\">on_error<\/span>(ws, error):\n    <span class=\"fn\">print<\/span>(<span class=\"st\">\"Error:\"<\/span>, error)\n\n<span class=\"nm\">ws<\/span> = websocket.<span class=\"cn\">WebSocketApp<\/span>(\n    <span class=\"nm\">API_URL<\/span>,\n    header=[<span class=\"st\">\"Authorization: Bearer \"<\/span> + <span class=\"nm\">API_KEY<\/span>],\n    on_open=<span class=\"nm\">on_open<\/span>,\n    on_message=<span class=\"nm\">on_message<\/span>,\n    on_error=<span class=\"nm\">on_error<\/span>\n)\nws.<span class=\"fn\">run_forever<\/span>()<\/code><\/pre>\n<\/div>\n<div class=\"qlt-tip\"><span class=\"qlt-tip-icon\">\u24d8<\/span>\n<div>The connection stays open for the full session. You do not reconnect per utterance. Send audio chunks and image frames continuously over the same socket.<\/div>\n<\/div><\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Step 4 \u2014 Audio streaming<\/div>\n<div class=\"qlt-panel-title\">Configure and stream audio input<\/div>\n<div class=\"qlt-panel-desc\">After connecting, send a session configuration event to set the source and target languages. Then stream PCM audio chunks continuously. The model uses <code>session.input_audio_transcription.language<\/code> to identify the input language.<\/div>\n<div class=\"qlt-code-block\">\n<div class=\"qlt-code-top\"><span class=\"qlt-code-lang\">PYTHON<\/span><button class=\"qlt-code-copy\">Copy<\/button><\/div>\n<pre><code><span class=\"kw\">import<\/span> base64, pyaudio\n\n<span class=\"cm\"># Audio input config: 16kHz, 16-bit PCM mono<\/span>\n<span class=\"nm\">INPUT_RATE<\/span>    = <span class=\"st\">16000<\/span>\n<span class=\"nm\">INPUT_CHUNK<\/span>   = <span class=\"st\">1600<\/span>  <span class=\"cm\"># 100ms per chunk<\/span>\n<span class=\"nm\">INPUT_FORMAT<\/span>  = pyaudio.<span class=\"nm\">paInt16<\/span>\n<span class=\"nm\">INPUT_CHANNELS<\/span> = <span class=\"st\">1<\/span>\n\n<span class=\"kw\">def<\/span> <span class=\"fn\">on_open<\/span>(ws):\n    <span class=\"cm\"># 1. Send session config first<\/span>\n    session_cfg = {\n        <span class=\"st\">\"type\"<\/span>: <span class=\"st\">\"session.update\"<\/span>,\n        <span class=\"st\">\"session\"<\/span>: {\n            <span class=\"st\">\"input_audio_transcription\"<\/span>: {\n                <span class=\"st\">\"language\"<\/span>: <span class=\"st\">\"zh\"<\/span>  <span class=\"cm\"># source: Chinese<\/span>\n            },\n            <span class=\"st\">\"translation\"<\/span>: {\n                <span class=\"st\">\"target_language\"<\/span>: <span class=\"st\">\"en\"<\/span>  <span class=\"cm\"># target: English<\/span>\n            }\n        }\n    }\n    ws.<span class=\"fn\">send<\/span>(json.<span class=\"fn\">dumps<\/span>(session_cfg))\n\n    <span class=\"cm\"># 2. Stream microphone audio<\/span>\n    pa = pyaudio.<span class=\"cn\">PyAudio<\/span>()\n    stream = pa.<span class=\"fn\">open<\/span>(\n        rate=<span class=\"nm\">INPUT_RATE<\/span>, channels=<span class=\"nm\">INPUT_CHANNELS<\/span>,\n        format=<span class=\"nm\">INPUT_FORMAT<\/span>, input=<span class=\"kw\">True<\/span>,\n        frames_per_buffer=<span class=\"nm\">INPUT_CHUNK<\/span>\n    )\n    <span class=\"kw\">while True<\/span>:\n        chunk = stream.<span class=\"fn\">read<\/span>(<span class=\"nm\">INPUT_CHUNK<\/span>)\n        audio_b64 = base64.<span class=\"fn\">b64encode<\/span>(chunk).<span class=\"fn\">decode<\/span>()\n        ws.<span class=\"fn\">send<\/span>(json.<span class=\"fn\">dumps<\/span>({\n            <span class=\"st\">\"type\"<\/span>: <span class=\"st\">\"input_audio_buffer.append\"<\/span>,\n            <span class=\"st\">\"audio\"<\/span>: audio_b64\n        }))<\/code><\/pre>\n<\/div>\n<div class=\"qlt-warning\"><span class=\"qlt-warning-icon\">\u26a0<\/span>\n<div>Do not send audio before the <code>session.update<\/code> event is acknowledged. Wait for the server\u2019s session confirmation event before streaming audio chunks.<\/div>\n<\/div><\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Step 5 \u2014 Vision input<\/div>\n<div class=\"qlt-panel-title\">Send video frames for vision-enhanced comprehension<\/div>\n<div class=\"qlt-panel-desc\">Qwen3.5-LiveTranslate-Flash reads lip movements, gestures, and on-screen text from video frames alongside audio. Send base64-encoded JPEG frames at a regular interval during the session. Even a low frame rate significantly improves accuracy in noisy audio conditions.<\/div>\n<div class=\"qlt-code-block\">\n<div class=\"qlt-code-top\"><span class=\"qlt-code-lang\">PYTHON<\/span><button class=\"qlt-code-copy\">Copy<\/button><\/div>\n<pre><code><span class=\"kw\">import<\/span> cv2, base64, threading, time\n\n<span class=\"kw\">def<\/span> <span class=\"fn\">stream_video_frames<\/span>(ws):\n    cap = cv2.<span class=\"fn\">VideoCapture<\/span>(<span class=\"st\">0<\/span>)  <span class=\"cm\"># 0 = default camera<\/span>\n    <span class=\"kw\">while True<\/span>:\n        ret, frame = cap.<span class=\"fn\">read<\/span>()\n        <span class=\"kw\">if not<\/span> ret:\n            <span class=\"kw\">break<\/span>\n        <span class=\"cm\"># Encode frame as JPEG \u2192 base64<\/span>\n        _, buf = cv2.<span class=\"fn\">imencode<\/span>(<span class=\"st\">\".jpg\"<\/span>, frame)\n        img_b64 = base64.<span class=\"fn\">b64encode<\/span>(buf).<span class=\"fn\">decode<\/span>()\n        ws.<span class=\"fn\">send<\/span>(json.<span class=\"fn\">dumps<\/span>({\n            <span class=\"st\">\"type\"<\/span>: <span class=\"st\">\"input_image_buffer.append\"<\/span>,\n            <span class=\"st\">\"image\"<\/span>: img_b64\n        }))\n        time.<span class=\"fn\">sleep<\/span>(<span class=\"st\">0.5<\/span>)  <span class=\"cm\"># ~2fps is sufficient<\/span>\n\n<span class=\"cm\"># Run video streaming in a separate thread<\/span>\nthreading.<span class=\"cn\">Thread<\/span>(\n    target=<span class=\"nm\">stream_video_frames<\/span>,\n    args=(ws,), daemon=<span class=\"kw\">True<\/span>\n).<span class=\"fn\">start<\/span>()<\/code><\/pre>\n<\/div>\n<div class=\"qlt-tip\"><span class=\"qlt-tip-icon\">\u24d8<\/span>\n<div>Vision input is optional but recommended for live human speech scenarios. For pre-recorded audio files without a camera feed, you can omit image frames entirely and rely on audio alone.<\/div>\n<\/div><\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Step 6 \u2014 Domain accuracy<\/div>\n<div class=\"qlt-panel-title\">Dynamic keyword configuration<\/div>\n<div class=\"qlt-panel-desc\">For technical, medical, legal, or brand-specific vocabulary, you can inject a keyword glossary at session start. The model uses this list to significantly improve translation reliability for terms that standard training data may handle inconsistently.<\/div>\n<div class=\"qlt-code-block\">\n<div class=\"qlt-code-top\"><span class=\"qlt-code-lang\">PYTHON<\/span><button class=\"qlt-code-copy\">Copy<\/button><\/div>\n<pre><code><span class=\"cm\"># Add to your session.update payload<\/span>\nsession_cfg = {\n    <span class=\"st\">\"type\"<\/span>: <span class=\"st\">\"session.update\"<\/span>,\n    <span class=\"st\">\"session\"<\/span>: {\n        <span class=\"st\">\"input_audio_transcription\"<\/span>: {\n            <span class=\"st\">\"language\"<\/span>: <span class=\"st\">\"zh\"<\/span>\n        },\n        <span class=\"st\">\"translation\"<\/span>: {\n            <span class=\"st\">\"target_language\"<\/span>: <span class=\"st\">\"en\"<\/span>\n        },\n        <span class=\"cm\"># Inject domain keywords here<\/span>\n        <span class=\"st\">\"keywords\"<\/span>: [\n            {<span class=\"st\">\"source\"<\/span>: <span class=\"st\">\"\u8fbe\u82ac\u5947\u673a\u5668\u4eba\"<\/span>,  <span class=\"st\">\"target\"<\/span>: <span class=\"st\">\"da Vinci Surgical System\"<\/span>},\n            {<span class=\"st\">\"source\"<\/span>: <span class=\"st\">\"\u8179\u8154\u955c\"<\/span>,      <span class=\"st\">\"target\"<\/span>: <span class=\"st\">\"laparoscope\"<\/span>},\n            {<span class=\"st\">\"source\"<\/span>: <span class=\"st\">\"\u5b9e\u4f53\u7624\"<\/span>,      <span class=\"st\">\"target\"<\/span>: <span class=\"st\">\"solid tumor\"<\/span>}\n        ]\n    }\n}\nws.<span class=\"fn\">send<\/span>(json.<span class=\"fn\">dumps<\/span>(session_cfg))<\/code><\/pre>\n<\/div>\n<ul class=\"qlt-feat-list\">\n<li><span class=\"qlt-feat-icon green\">\u2713<\/span>Works for brand names, drug names, legal statutes, and technical model numbers<\/li>\n<li><span class=\"qlt-feat-icon green\">\u2713<\/span>Keywords are scoped to the session and do not persist across connections<\/li>\n<li><span class=\"qlt-feat-icon amber\">\u25c6<\/span>Keep the list focused \u2014 only terms where mistranslation would cause real errors<\/li>\n<\/ul><\/div>\n<div class=\"qlt-panel\">\n<div class=\"qlt-section-label\">Reference<\/div>\n<div class=\"qlt-panel-title\">Supported languages<\/div>\n<div class=\"qlt-panel-desc\">Qwen3.5-LiveTranslate-Flash understands 60 input languages and can produce speech output in 29 languages. The highlighted pills below are confirmed speech output languages. All pills represent supported input.<\/div>\n<div class=\"qlt-lang-grid\">\n<div class=\"qlt-lang-pill highlight\">Chinese<\/div>\n<div class=\"qlt-lang-pill highlight\">English<\/div>\n<div class=\"qlt-lang-pill highlight\">French<\/div>\n<div class=\"qlt-lang-pill highlight\">German<\/div>\n<div class=\"qlt-lang-pill highlight\">Spanish<\/div>\n<div class=\"qlt-lang-pill highlight\">Japanese<\/div>\n<div class=\"qlt-lang-pill highlight\">Korean<\/div>\n<div class=\"qlt-lang-pill highlight\">Russian<\/div>\n<div class=\"qlt-lang-pill highlight\">Portuguese<\/div>\n<div class=\"qlt-lang-pill highlight\">Italian<\/div>\n<div class=\"qlt-lang-pill highlight\">Arabic<\/div>\n<div class=\"qlt-lang-pill highlight\">Hindi<\/div>\n<div class=\"qlt-lang-pill highlight\">Turkish<\/div>\n<div class=\"qlt-lang-pill highlight\">Indonesian<\/div>\n<div class=\"qlt-lang-pill highlight\">Thai<\/div>\n<div class=\"qlt-lang-pill highlight\">Vietnamese<\/div>\n<div class=\"qlt-lang-pill highlight\">Greek<\/div>\n<div class=\"qlt-lang-pill\">Mandarin<\/div>\n<div class=\"qlt-lang-pill\">Cantonese<\/div>\n<div class=\"qlt-lang-pill\">Wu dialect<\/div>\n<div class=\"qlt-lang-pill\">Sichuanese<\/div>\n<div class=\"qlt-lang-pill\">Tianjin dialect<\/div>\n<div class=\"qlt-lang-pill\">Beijing dialect<\/div>\n<div class=\"qlt-lang-pill\">+ 37 more<\/div>\n<\/div>\n<div class=\"qlt-tip\"><span class=\"qlt-tip-icon\">\u24d8<\/span>\n<div>Highlighted pills have confirmed speech (audio) output support. Plain pills are input-only or unconfirmed for voice output. Verify your specific target language pair in the Alibaba Cloud Model Studio documentation before building audio-output pipelines.<\/div>\n<\/div>\n<div class=\"qlt-warning\"><span class=\"qlt-warning-icon\">\u26a0<\/span>\n<div>The model supports text output for all 60 input languages. Speech output is available for 29 languages only. If your pipeline requires audio delivery and your target language is not in the confirmed list, plan for a fallback TTS step.<\/div>\n<\/div><\/div>\n<\/div>\n<div class=\"qlt-footer\">\n  <span class=\"qlt-footer-text\">Step 1 of 7<\/span>\n<div class=\"qlt-nav-btns\">\n    <button class=\"qlt-nav-btn\" disabled>\u2190 Prev<\/button><br \/>\n    <button class=\"qlt-nav-btn primary\">Next \u2192<\/button>\n  <\/div>\n<\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Qwen3.5-LiveTranslate-Flash delivers real-time multimodal interpretation across 60 input languages and 29 speech output languages at 2.8 seconds of latency.<\/li>\n<li>The model uses vision-enhanced comprehension \u2014 reading lip movements, gestures, and on-screen text \u2014 to maintain accuracy in noisy or degraded audio environments.<\/li>\n<li>Real-time voice cloning replicates the original speaker\u2019s voice profile in the translated output using just a single spoken sentence.<\/li>\n<li>Semantic unit prediction via \u201creading units\u201d processing enables continuous streaming output without waiting for full sentences, reducing latency to 2.8 seconds.<\/li>\n<li>Dynamic keyword configuration allows developers to inject domain-specific glossaries at runtime, improving translation reliability for technical, medical, and legal terminology.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/qwen.ai\/blog?id=qwen3.5-livetranslate\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/20\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\">Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Simultaneous interpretation is one of the harder problems in applied AI. You\u2019re asking a model to translate speech before the speaker has finished a sentence. Every extra second of delay breaks the illusion of real-time communication. Alibaba\u2019s Qwen team has been chipping away at this with each release. Their latest model, Qwen3.5-LiveTranslate-Flash, brings that latency down to 2.8 seconds and expands input language coverage to 60 languages. https:\/\/qwen.ai\/blog?id=qwen3.5-livetranslate A Meaningful Jump From the Previous Release The Qwen3-LiveTranslate-Flash handled 18 input languages at roughly three seconds of latency. Qwen3.5-LiveTranslate-Flash brings that down to 2.8 seconds, expands input coverage to 60 languages, and adds speech output in 29 languages. That\u2019s more than a 3\u00d7 expansion in language coverage on the input side. For devs building multilingual products, this reduces the need for per-language model switching in most global enterprise scenarios. The latency improvement comes from a technique for processing what the team calls \u2018reading units.\u2019 Rather than waiting for a full sentence to arrive before producing output, the model decides when enough meaning has accumulated in a segment to commit to a translation. It streams output continuously while the speaker is still talking. This is the same underlying logic as semantic unit prediction but with a tighter implementation that shaves off that extra 200 milliseconds. Vision Is Now a First-Class Input Most translation systems treat audio as the only input signal. That works fine in clean studio conditions. It breaks down in a crowded conference room, a noisy trade floor, or anywhere with overlapping voices and bad acoustics. Qwen3.5-LiveTranslate-Flash takes a different approach. It analyzes visual information in parallel with audio on-screen text, physically shown objects, lip movements, and gestures. When a word is phonetically ambiguous or the audio stream degrades, the visual context fills the gap and sharpens the translation decision. This is not a minor feature. In real-world deployment, audio quality is rarely guaranteed. Having a vision channel means the model handles the messy reality of live interpretation more gracefully than audio-only systems. Voice Cloning Happens in Real Time This is the part that stands out most in the Qwen3.5 release. Standard translation systems replace the speaker\u2019s voice with a generic synthesis voice. Qwen3.5-LiveTranslate-Flash instead clones the characteristic voice features of the original speaker during the translation itself. A single spoken sentence is enough for the model to perform this acoustic adaptation. For listeners on the receiving end, the translated output sounds like the same person speaking the target language and not a robotic substitute. In live conference interpretation, multilingual livestreams, or international customer calls, this is important. The experience feels noticeably more human than what current systems deliver. Configure Domain-Specific Keywords One persistent failure mode for translation models in professional settings is proper nouns and specialized vocabulary. A model translating a medical briefing might consistently mistranslate a drug name. A legal interpretation session breaks down over a technical statute term. Qwen3.5-LiveTranslate-Flash addresses this with dynamic keyword configuration at runtime. Developers can inject a glossary of brand names, medical terms, legal terminology, or technical vocabulary, and the model handles those terms significantly more reliably. This isn\u2019t available in most general-purpose translation APIs and it closes a real gap for domain-specific enterprise deployments. Benchmark Performance On FLEURS and CoVoST2 \u2014 two established benchmarks for multilingual speech translation \u2014 Qwen3.5-LiveTranslate-Flash outperforms major commercial alternatives. FLEURS tests translation quality across a wide variety of language pairs under real acoustic conditions. CoVoST2 covers 21 translation directions from speech, making it a practical proxy for multilingual pipeline performance. Marktechpost\u2019s Visual Explainer \u2713 Developer Guide How to Use Qwen3.5-LiveTranslate-Flash A step-by-step integration guide \u2014 from setup to production-ready real-time translation 1Overview 2Prerequisites 3Connect 4Send Audio 5Visual Input 6Keywords 7Languages What it does Qwen3.5-LiveTranslate-Flash at a glance Qwen3.5-LiveTranslate-Flash is an API-only, closed-weight real-time translation model from Alibaba\u2019s Qwen team. It takes audio and video frames as simultaneous inputs and outputs translated text and speech. The model uses a WebSocket-based protocol over Alibaba Cloud Model Studio. Latency 2.8s Per token to audio out Input languages 60 Speech + visual input Speech output 29 Languages with voice Protocol WebSocket Persistent connection \u2713 Vision-enhanced comprehension \u2014 lip movements, gestures, and on-screen text all feed into the translation decision alongside audio \u25c6 Real-time voice cloning \u2014 clones the original speaker\u2019s voice profile in the translated output from a single spoken sentence \u25c6 Semantic unit prediction \u2014 commits to output segments before a full sentence ends, enabling continuous streaming without waiting for complete utterances \u25c6 Dynamic keyword configuration \u2014 inject domain-specific glossaries at runtime for technical, medical, or legal terminology Before you start Prerequisites You need an Alibaba Cloud account with Model Studio access and a valid DashScope API key. The model is available through the qwen3-livetranslate-flash-realtime model ID. 1 Create an Alibaba Cloud account Sign up at alibabacloud.com and activate Alibaba Cloud Model Studio in your account dashboard. 2 Get your DashScope API key Navigate to Model Studio \u2192 API Keys. Generate a key and store it as the environment variable DASHSCOPE_API_KEY. Never hardcode it in source files. 3 Install the Python dependency Install the websocket-client package for the WebSocket connection. For audio capture, also install pyaudio. 4 Check your audio setup The model accepts 16kHz, 16-bit PCM mono audio on input. Confirm your microphone or audio source can output in this format before connecting. BASHCopy # Install dependencies pip install websocket-client pyaudio # Set your API key as an environment variable export DASHSCOPE_API_KEY=&#8221;your_key_here&#8221; Step 3 \u2014 Connection Establish the WebSocket connection The model uses the WebSocket protocol for a persistent, bidirectional connection. You authenticate via a Bearer token in the connection header using your DashScope API key. PYTHONCopy import json, websocket, os API_KEY = os.getenv(&#8220;DASHSCOPE_API_KEY&#8221;) API_URL = ( &#8220;wss:\/\/dashscope-intl.aliyuncs.com&#8221; &#8220;\/api-ws\/v1\/realtime&#8221; &#8220;?model=qwen3-livetranslate-flash-realtime&#8221; ) def on_open(ws): print(&#8220;Connected to Qwen3.5-LiveTranslate-Flash&#8221;) def on_message(ws, message): data = json.loads(message) print(&#8220;Translation event:&#8221;, data) def on_error(ws, error): print(&#8220;Error:&#8221;, error) ws = websocket.WebSocketApp( API_URL, header=[&#8220;Authorization: Bearer &#8221; + API_KEY], on_open=on_open, on_message=on_message, on_error=on_error ) ws.run_forever() \u24d8 The connection stays open for the full session. You do not reconnect per<\/p>","protected":false},"author":2,"featured_media":91691,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-91690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-20T16:48:51+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency\",\"datePublished\":\"2026-05-20T16:48:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\"},\"wordCount\":1483,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\",\"url\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\",\"name\":\"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png\",\"datePublished\":\"2026-05-20T16:48:51+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png\",\"width\":1906,\"height\":1232},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/","og_locale":"th_TH","og_type":"article","og_title":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-20T16:48:51+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"9 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency","datePublished":"2026-05-20T16:48:51+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/"},"wordCount":1483,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/","url":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/","name":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png","datePublished":"2026-05-20T16:48:51+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png","width":1906,"height":1232},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png",1906,1232,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png",1906,1232,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png",1906,1232,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-300x194.png",300,194,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-1024x662.png",1024,662,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-1536x993.png",1536,993,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY.png",1906,1232,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-600x388.png",600,388,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-20-at-1.09.04-AM-1-p1HQyY-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Simultaneous interpretation is one of the harder problems in applied AI. You\u2019re asking a model to translate speech before the speaker has finished a sentence. Every extra second of delay breaks the illusion of real-time communication. Alibaba\u2019s Qwen team has been chipping away at this with each release. Their latest model, Qwen3.5-LiveTranslate-Flash, brings that latency&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/91690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=91690"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/91690\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/91691"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=91690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=91690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=91690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}