{"id":90575,"date":"2026-05-15T16:34:01","date_gmt":"2026-05-15T16:34:01","guid":{"rendered":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/"},"modified":"2026-05-15T16:34:01","modified_gmt":"2026-05-15T16:34:01","slug":"supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/","title":{"rendered":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags"},"content":{"rendered":"<p>Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What Changed from v2 to v3<\/strong><\/h2>\n<p>Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. Version 2 supported English, Korean, Spanish, Portuguese, and French. Version 3 adds Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Estonian, Finnish, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese \u2014 31 total ISO language codes. There is also a special <code>na<\/code> fallback for text whose language is unknown or outside the supported set.<\/p>\n<p>The model grows modestly to accommodate the added languages. At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. The update also brings the total disk footprint of the public ONNX assets to <strong>404 MB<\/strong>. Additionally, Supertone recently launched the <strong><a href=\"https:\/\/github.com\/supertone-inc\/supertonic\" target=\"_blank\" rel=\"noreferrer noopener\">Voice Builder<\/a><\/strong>, allowing developers to create custom, edge-native TTS models from their own voice recordings.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Expressive Tags<\/strong><\/h2>\n<p>One new capability in v3 that wasn\u2019t present in v2 is expressive tag support. Supertonic 3 supports simple expression tags such as <code>&lt;laugh&gt;<\/code>, <code>&lt;breath&gt;<\/code>, and <code>&lt;sigh&gt;<\/code>. These let you embed prosodic cues directly into input text without a separate preprocessing step or a separate model for expressiveness. For engineers building voice interfaces or accessibility tools, this means you can specify breathing pauses or laughter inline in your text payload.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Architecture and Runtime<\/strong><\/h2>\n<p>The underlying architecture carries over from prior versions: a speech autoencoder that encodes waveforms into continuous latent representations, a flow-matching based text-to-latent module that maps text to audio features, and a duration predictor that controls natural timing. Flow matching is a generative modeling technique that learns a vector field to transform a simple distribution into a target distribution \u2014 it samples faster than diffusion models at low step counts, which is why Supertonic can produce usable output in just 2 inference steps. To further refine output, v3 integrates <strong>Length-Aware Rotary Position Embedding (LARoPE)<\/strong> for superior text-speech alignment and utilizes a <strong>Self-Purifying Flow Matching<\/strong> technique during training to remain robust against noisy data labels.<\/p>\n<p>On runtime efficiency, Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier. <\/p>\n<h2 class=\"wp-block-heading\"><strong>Reading Accuracy<\/strong><\/h2>\n<p>Across measured languages, Supertonic 3 stays within a competitive WER\/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. WER (Word Error Rate) and CER (Character Error Rate) are standard TTS readability metrics: you synthesize a passage, run ASR over the output, and compare the transcription to the original text. CER is used for languages without clear word boundaries; the others use WER. The system\u2019s efficiency is best demonstrated on extreme edge hardware; it achieves an average <strong>RTF of 0.3x<\/strong> on an <strong>Onyx Boox Go 6<\/strong> (an E-ink e-reader) in airplane mode. Furthermore, the ecosystem has expanded to include <strong>Flutter (with macOS support)<\/strong>, <strong>.NET 9<\/strong>, and <strong>Go<\/strong>, while the web implementation leverages <code>onnxruntime-web<\/code> for pure client-side execution.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Text Normalization<\/strong><\/h2>\n<p>A differentiating property carried forward from v2 is built-in text normalization. Supertonic handles complex surface forms \u2014 financial expressions like <code>$5.2M<\/code>, phone numbers with area codes and extensions like <code>(212) 555-0142 ext. 402<\/code>, time and date formats like <code>4:45 PM on Wed, Apr 3, 2024<\/code>, and technical units like <code>2.3h<\/code> and <code>30kph<\/code> \u2014 without any preprocessing pipeline or phonetic annotations. The financial expression \u201c$5.2M\u201d must read as \u201cfive point two million dollars,\u201d and \u201c$450K\u201d as \u201cfour hundred fifty thousand dollars.\u201d All four competing systems failed this. The technical unit \u201c2.3h\u201d must read as \u201ctwo point three hours\u201d and \u201c30kph\u201d as \u201cthirty kilometers per hour.\u201d All four competitors also failed this category. The competing systems evaluated include ElevenLabs Flash v2.5, OpenAI TTS-1, Gemini 2.5 Flash TTS, and Microsoft.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1748\" height=\"686\" data-attachment-id=\"79885\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/15\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/screenshot-2026-05-15-at-12-00-15-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1.png\" data-orig-size=\"1748,686\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-05-15 at 12.00.15\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-1024x402.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1.png\" alt=\"\" class=\"wp-image-79885\" \/><figcaption class=\"wp-element-caption\">https:\/\/github.com\/supertone-inc\/supertonic<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Getting Started<\/strong><\/h2>\n<p>The Python SDK install is <code>pip install supertonic<\/code>. On first run, the SDK downloads the model assets from Hugging Face automatically. A minimal example:<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">from supertonic import TTS\ntts = TTS(auto_download=True)\nstyle = tts.get_voice_style(voice_name=\"M1\")\ntext = \"A gentle breeze moved through the open window while everyone listened to the story.\"\nwav, duration = tts.synthesize(text, voice_style=style, lang=\"en\")\ntts.save_audio(wav, \"output.wav\")\nprint(f\"Generated {duration:.2f}s of audio\")<\/code><\/pre>\n<\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<p>  <!-- TOP BAR --><\/p>\n<div class=\"st3-topbar\">\n<div class=\"st3-logo\">\n<div class=\"st3-logo-dots\">\n        <span><\/span><span><\/span><span><\/span><span><\/span>\n      <\/div>\n<p>      <span class=\"st3-logo-text\">Supertonic 3 \u2014 Developer Guide<\/span>\n    <\/p><\/div>\n<p>    <span class=\"st3-slide-label\">1 \/ 7<\/span>\n  <\/p><\/div>\n<p>  <!-- PROGRESS --><\/p>\n<div class=\"st3-progress-bar\">\n<div class=\"st3-progress-fill\"><\/div>\n<\/div>\n<p>  <!-- SLIDER --><\/p>\n<div class=\"st3-slider-wrap\">\n<div class=\"st3-track\">\n<p>      <!-- SLIDE 1: OVERVIEW --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag\">Overview<\/span>\n<h2 class=\"st3-h1\">Supertonic 3: On-Device TTS,<br \/>Now in 31 Languages<\/h2>\n<p class=\"st3-sub\">Supertonic 3 is a lightweight, open-weight text-to-speech system by Supertone Inc. It runs entirely via ONNX Runtime on your device \u2014 no cloud, no API call, no data leaving your machine. v3 expands from 5 to 31 languages, adds expressive tags, reduces reading failures, and stays compatible with the v2 ONNX interface.<\/p>\n<div class=\"st3-stats\">\n<div class=\"st3-stat\">\n            <span class=\"st3-stat-val blue\">31<\/span><br \/>\n            <span class=\"st3-stat-lbl\">Languages<\/span>\n          <\/div>\n<div class=\"st3-stat\">\n            <span class=\"st3-stat-val green\">~99M<\/span><br \/>\n            <span class=\"st3-stat-lbl\">Parameters<\/span>\n          <\/div>\n<div class=\"st3-stat\">\n            <span class=\"st3-stat-val red\">404 MB<\/span><br \/>\n            <span class=\"st3-stat-lbl\">ONNX Assets<\/span>\n          <\/div>\n<div class=\"st3-stat\">\n            <span class=\"st3-stat-val yellow\">MIT<\/span><br \/>\n            <span class=\"st3-stat-lbl\">Code License<\/span>\n          <\/div>\n<\/div>\n<\/div>\n<p>      <!-- SLIDE 2: WHAT'S NEW --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag green\">What\u2019s New in v3<\/span>\n<h2 class=\"st3-h2\">Four Core Improvements Over Supertonic 2<\/h2>\n<p class=\"st3-sub\">Version 3 is a focused upgrade \u2014 same inference contract, meaningfully better output.<\/p>\n<ul class=\"st3-newlist\">\n<li>\n            <span class=\"st3-icon blue\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f310.png\" alt=\"\ud83c\udf10\" class=\"wp-smiley\" \/><\/span><br \/>\n            <span><strong>31 languages<\/strong> \u2014 Expanded from the 5-language v2 release (en, ko, es, pt, fr). Now includes Japanese, Arabic, German, Hindi, Russian, Turkish, Vietnamese, and 20 more ISO codes, plus a special <code>na<\/code> fallback for unknown languages.<\/span>\n          <\/li>\n<li>\n            <span class=\"st3-icon green\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/><\/span><br \/>\n            <span><strong>More stable reading<\/strong> \u2014 Fewer repeat and skip failures, especially on short and long utterances. This was a known limitation in v2 that v3 directly addresses.<\/span>\n          <\/li>\n<li>\n            <span class=\"st3-icon red\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3ad.png\" alt=\"\ud83c\udfad\" class=\"wp-smiley\" \/><\/span><br \/>\n            <span><strong>Expression tags<\/strong> \u2014 Supports <code>&lt;laugh&gt;<\/code>, <code>&lt;breath&gt;<\/code>, and <code>&lt;sigh&gt;<\/code> inline in text, without any separate preprocessing or external model.<\/span>\n          <\/li>\n<li>\n            <span class=\"st3-icon yellow\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50a.png\" alt=\"\ud83d\udd0a\" class=\"wp-smiley\" \/><\/span><br \/>\n            <span><strong>Higher speaker similarity<\/strong> \u2014 Improved similarity across the shared-language set compared with Supertonic 2. Voices are more consistent across languages.<\/span>\n          <\/li>\n<\/ul><\/div>\n<p>      <!-- SLIDE 3: INSTALLATION --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag red\">Installation<\/span>\n<h2 class=\"st3-h2\">Get Running in Under a Minute<\/h2>\n<p class=\"st3-sub\">Install the Python SDK via pip. On first run, model assets are downloaded automatically from Hugging Face \u2014 no manual setup required.<\/p>\n<div class=\"st3-install\">\n          <code>pip install supertonic<\/code><br \/>\n          <button class=\"st3-install-copy\">\n<div class=\"st3-step\">\n<div class=\"st3-step-left\">\n<div class=\"st3-step-num\">1<\/div>\n<div class=\"st3-step-line\"><\/div>\n<\/div>\n<div class=\"st3-step-right\">\n<div class=\"st3-step-title\">Install the SDK<\/div>\n<div class=\"st3-step-desc\">Run <code>pip install supertonic<\/code> in your Python environment (Python 3.8+).<\/div>\n<\/div>\n<\/div>\n<div class=\"st3-step\">\n<div class=\"st3-step-left\">\n<div class=\"st3-step-num\">2<\/div>\n<div class=\"st3-step-line\"><\/div>\n<\/div>\n<div class=\"st3-step-right\">\n<div class=\"st3-step-title\">First Run \u2014 Auto Download<\/div>\n<div class=\"st3-step-desc\">On first use, <code>TTS(auto_download=True)<\/code> fetches the ONNX model assets (~404 MB) from <code>Supertone\/supertonic-3<\/code> on Hugging Face. Requires Git LFS.<\/div>\n<\/div>\n<\/div>\n<div class=\"st3-step\">\n<div class=\"st3-step-left\">\n<div class=\"st3-step-num\">3<\/div>\n<div class=\"st3-step-line\"><\/div>\n<\/div>\n<div class=\"st3-step-right\">\n<div class=\"st3-step-title\">All Inference Runs On-Device<\/div>\n<div class=\"st3-step-desc\">After the initial download, no internet connection is needed. All synthesis happens locally via ONNX Runtime.<\/div>\n<\/div>\n<\/div>\n<p>        <\/p><\/button><\/div>\n<\/div>\n<p>      <!-- SLIDE 4: QUICK START --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag\">Quick Start<\/span>\n<h2 class=\"st3-h2\">Basic Python Usage<\/h2>\n<p class=\"st3-sub\">The SDK auto-downloads model assets on first run. Specify a voice, pass your text with a language code, and save the WAV output.<\/p>\n<div class=\"st3-code-wrap\">\n<div class=\"st3-code-header\">\n            <span class=\"st3-code-lang\">Python<\/span><br \/>\n            <button class=\"st3-copy-btn\">Copy<\/button>\n          <\/div>\n<pre><code><span class=\"kw\">from<\/span> supertonic <span class=\"kw\">import<\/span> TTS\n\n<span class=\"cm\"># Auto-downloads ONNX assets on first run<\/span>\ntts = <span class=\"fn\">TTS<\/span>(auto_download=<span class=\"kw\">True<\/span>)\n\n<span class=\"cm\"># Select a preset voice (M1\u2014M5 male, F1\u2014F5 female)<\/span>\nstyle = tts.<span class=\"fn\">get_voice_style<\/span>(voice_name=<span class=\"st\">\"M1\"<\/span>)\n\ntext = <span class=\"st\">\"A gentle breeze moved through the open window.\"<\/span>\n\n<span class=\"cm\"># synthesize() returns (wav_array, duration_in_seconds)<\/span>\nwav, duration = tts.<span class=\"fn\">synthesize<\/span>(text, voice_style=style, lang=<span class=\"st\">\"en\"<\/span>)\n\ntts.<span class=\"fn\">save_audio<\/span>(wav, <span class=\"st\">\"output.wav\"<\/span>)\n<span class=\"fn\">print<\/span>(<span class=\"fn\">f<\/span><span class=\"st\">\"Generated <\/span>{duration:.2f}<span class=\"st\">s of audio\"<\/span>)<\/code><\/pre>\n<\/div>\n<div class=\"st3-code-wrap\">\n<div class=\"st3-code-header\">\n            <span class=\"st3-code-lang\">Python \u2014 With Expression Tags<\/span><br \/>\n            <button class=\"st3-copy-btn\">Copy<\/button>\n          <\/div>\n<pre><code>text = <span class=\"st\">\"I can't believe it &lt;laugh&gt; that actually worked!\"<\/span>\nwav, duration = tts.<span class=\"fn\">synthesize<\/span>(text, voice_style=style, lang=<span class=\"st\">\"en\"<\/span>)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>      <!-- SLIDE 5: LANGUAGES --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag yellow\">Languages<\/span>\n<h2 class=\"st3-h2\">31 Supported Languages + <code>na<\/code> Fallback<\/h2>\n<p class=\"st3-sub\">All 31 languages share the same model architecture and ONNX inference pipeline. Use the <code>na<\/code> code for text whose language is unknown or outside the supported set.<\/p>\n<div class=\"st3-lang-grid\">\n<div class=\"st3-lang-chip\"><span class=\"lcode\">en<\/span> English<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">ko<\/span> Korean<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">ja<\/span> Japanese<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">ar<\/span> Arabic<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">bg<\/span> Bulgarian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">cs<\/span> Czech<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">da<\/span> Danish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">de<\/span> German<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">el<\/span> Greek<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">es<\/span> Spanish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">et<\/span> Estonian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">fi<\/span> Finnish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">fr<\/span> French<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">hi<\/span> Hindi<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">hr<\/span> Croatian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">hu<\/span> Hungarian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">id<\/span> Indonesian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">it<\/span> Italian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">lt<\/span> Lithuanian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">lv<\/span> Latvian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">nl<\/span> Dutch<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">pl<\/span> Polish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">pt<\/span> Portuguese<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">ro<\/span> Romanian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">ru<\/span> Russian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">sk<\/span> Slovak<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">sl<\/span> Slovenian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">sv<\/span> Swedish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">tr<\/span> Turkish<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">uk<\/span> Ukrainian<\/div>\n<div class=\"st3-lang-chip\"><span class=\"lcode\">vi<\/span> Vietnamese<\/div>\n<\/div>\n<\/div>\n<p>      <!-- SLIDE 6: TEXT NORMALIZATION --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag green\">Text Normalization<\/span>\n<h2 class=\"st3-h2\">Handles Complex Inputs Without Pre-Processing<\/h2>\n<p class=\"st3-sub\">Supertonic 3 reads financial expressions, dates, phone numbers, and technical units correctly out of the box \u2014 no G2P module or phonetic annotations required. Below: Supertonic vs. four major commercial\/open-source systems.<\/p>\n<table class=\"st3-norm-table\">\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Input Example<\/th>\n<th>Supertonic 3<\/th>\n<th>ElevenLabs \/ OpenAI \/ Gemini \/ Microsoft<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Financial Expression<\/td>\n<td><span class=\"st3-input\">$5.2M \/ $450K<\/span><\/td>\n<td><span class=\"st3-check\">\u2713<\/span><\/td>\n<td><span class=\"st3-fail\">\u2717<\/span> All four failed<\/td>\n<\/tr>\n<tr>\n<td>Time &amp; Date<\/td>\n<td><span class=\"st3-input\">4:45 PM, Wed Apr 3<\/span><\/td>\n<td><span class=\"st3-check\">\u2713<\/span><\/td>\n<td><span class=\"st3-fail\">\u2717<\/span> All four failed<\/td>\n<\/tr>\n<tr>\n<td>Phone Number<\/td>\n<td><span class=\"st3-input\">(212) 555-0142 ext. 402<\/span><\/td>\n<td><span class=\"st3-check\">\u2713<\/span><\/td>\n<td><span class=\"st3-fail\">\u2717<\/span> All four failed<\/td>\n<\/tr>\n<tr>\n<td>Technical Unit<\/td>\n<td><span class=\"st3-input\">2.3h at 30kph<\/span><\/td>\n<td><span class=\"st3-check\">\u2713<\/span><\/td>\n<td><span class=\"st3-fail\">\u2717<\/span> All four failed<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<p>      <!-- SLIDE 7: DEPLOYMENT &amp; RESOURCES --><\/p>\n<div class=\"st3-slide\">\n        <span class=\"st3-tag\">Deployment &amp; Resources<\/span>\n<h2 class=\"st3-h2\">Runs Everywhere \u2014 11 Platforms, No GPU Required<\/h2>\n<p class=\"st3-sub\">The public ONNX assets run on CPU in fixed-voice mode with no GPU dependency. Browser support is via WebGPU and WASM through <code>onnxruntime-web<\/code>. Audio output is 16-bit WAV; batch inference is supported.<\/p>\n<div class=\"st3-platform-grid\">\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f40d.png\" alt=\"\ud83d\udc0d\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Python<\/span><span class=\"psub\">ONNX Runtime<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f7e8.png\" alt=\"\ud83d\udfe8\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Node.js<\/span><span class=\"psub\">Server-side JS<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f310.png\" alt=\"\ud83c\udf10\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Browser<\/span><span class=\"psub\">WebGPU \/ WASM<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2615.png\" alt=\"\u2615\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Java<\/span><span class=\"psub\">JVM<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2699.png\" alt=\"\u2699\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">C++<\/span><span class=\"psub\">High-perf<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f537.png\" alt=\"\ud83d\udd37\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">C#<\/span><span class=\"psub\">.NET<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f535.png\" alt=\"\ud83d\udd35\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Go<\/span><span class=\"psub\">Go runtime<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f34e.png\" alt=\"\ud83c\udf4e\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Swift \/ iOS<\/span><span class=\"psub\">Native<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f980.png\" alt=\"\ud83e\udd80\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Rust<\/span><span class=\"psub\">Systems<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f499.png\" alt=\"\ud83d\udc99\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Flutter<\/span><span class=\"psub\">Cross-platform<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c4.png\" alt=\"\ud83d\udcc4\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Code: MIT<\/span><span class=\"psub\">License<\/span><\/div>\n<div class=\"st3-platform-card\"><span class=\"picon\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/><\/span><span class=\"pname\">Model: OpenRAIL-M<\/span><span class=\"psub\">License<\/span><\/div>\n<\/div>\n<div class=\"st3-links\">\n          <a class=\"st3-link-btn blue\" href=\"https:\/\/github.com\/supertone-inc\/supertonic\" target=\"_blank\">GitHub Repo<\/a><br \/>\n          <a class=\"st3-link-btn out\" href=\"https:\/\/huggingface.co\/Supertone\/supertonic-3\" target=\"_blank\">HF Model<\/a><br \/>\n          <a class=\"st3-link-btn out\" href=\"https:\/\/huggingface.co\/spaces\/Supertone\/supertonic-3\" target=\"_blank\">Live Demo<\/a><br \/>\n          <a class=\"st3-link-btn out\" href=\"https:\/\/pypi.org\/project\/supertonic\/\" target=\"_blank\">PyPI<\/a>\n        <\/div>\n<\/div>\n<\/div>\n<p><!-- \/st3-track -->\n  <\/p><\/div>\n<p><!-- \/slider-wrap --><\/p>\n<p>  <!-- BOTTOM NAV --><\/p>\n<div class=\"st3-nav\">\n    <button class=\"st3-arrow\" disabled>\u2190<\/button>\n<div class=\"st3-dots\"><\/div>\n<p>    <button class=\"st3-arrow\">\u2192<\/button>\n  <\/p><\/div>\n<\/div>\n<p><!-- \/#st3-guide --><\/p>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Supertonic 3 expands language support from 5 (v2) to 31 languages, growing from 66M to ~99M parameters with a total ONNX asset size of 404 MB<\/li>\n<li>New in v3: expressive tags (<code>&lt;laugh&gt;<\/code>, <code>&lt;breath&gt;<\/code>, <code>&lt;sigh&gt;<\/code>), more stable reading on short and long utterances, and improved speaker similarity vs. v2<\/li>\n<li>v2-compatible public ONNX interface \u2014 existing integrations upgrade without changing inference code<\/li>\n<li>Reading accuracy benchmarked against VoxCPM2; v3 stays within a competitive WER\/CER range while being substantially smaller<\/li>\n<li>v3-specific RTF\/throughput numbers have not been published; the 167\u00d7 faster-than-real-time figure is a v2 benchmark and should not be assumed identical for v3<\/li>\n<li>Native output of <strong>16-bit WAV files<\/strong> ensuring high-fidelity audio for engineering applications<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/supertone-inc\/supertonic\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Repo<\/a> <\/strong>and<strong> <a href=\"https:\/\/huggingface.co\/spaces\/Supertone\/supertonic-2\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face Space<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/15\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\">Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS. What Changed from v2 to v3 Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. Version 2 supported English, Korean, Spanish, Portuguese, and French. Version 3 adds Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Estonian, Finnish, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese \u2014 31 total ISO language codes. There is also a special na fallback for text whose language is unknown or outside the supported set. The model grows modestly to accommodate the added languages. At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. The update also brings the total disk footprint of the public ONNX assets to 404 MB. Additionally, Supertone recently launched the Voice Builder, allowing developers to create custom, edge-native TTS models from their own voice recordings. Expressive Tags One new capability in v3 that wasn\u2019t present in v2 is expressive tag support. Supertonic 3 supports simple expression tags such as &lt;laugh&gt;, &lt;breath&gt;, and &lt;sigh&gt;. These let you embed prosodic cues directly into input text without a separate preprocessing step or a separate model for expressiveness. For engineers building voice interfaces or accessibility tools, this means you can specify breathing pauses or laughter inline in your text payload. Architecture and Runtime The underlying architecture carries over from prior versions: a speech autoencoder that encodes waveforms into continuous latent representations, a flow-matching based text-to-latent module that maps text to audio features, and a duration predictor that controls natural timing. Flow matching is a generative modeling technique that learns a vector field to transform a simple distribution into a target distribution \u2014 it samples faster than diffusion models at low step counts, which is why Supertonic can produce usable output in just 2 inference steps. To further refine output, v3 integrates Length-Aware Rotary Position Embedding (LARoPE) for superior text-speech alignment and utilizes a Self-Purifying Flow Matching technique during training to remain robust against noisy data labels. On runtime efficiency, Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier. Reading Accuracy Across measured languages, Supertonic 3 stays within a competitive WER\/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. WER (Word Error Rate) and CER (Character Error Rate) are standard TTS readability metrics: you synthesize a passage, run ASR over the output, and compare the transcription to the original text. CER is used for languages without clear word boundaries; the others use WER. The system\u2019s efficiency is best demonstrated on extreme edge hardware; it achieves an average RTF of 0.3x on an Onyx Boox Go 6 (an E-ink e-reader) in airplane mode. Furthermore, the ecosystem has expanded to include Flutter (with macOS support), .NET 9, and Go, while the web implementation leverages onnxruntime-web for pure client-side execution. Text Normalization A differentiating property carried forward from v2 is built-in text normalization. Supertonic handles complex surface forms \u2014 financial expressions like $5.2M, phone numbers with area codes and extensions like (212) 555-0142 ext. 402, time and date formats like 4:45 PM on Wed, Apr 3, 2024, and technical units like 2.3h and 30kph \u2014 without any preprocessing pipeline or phonetic annotations. The financial expression \u201c$5.2M\u201d must read as \u201cfive point two million dollars,\u201d and \u201c$450K\u201d as \u201cfour hundred fifty thousand dollars.\u201d All four competing systems failed this. The technical unit \u201c2.3h\u201d must read as \u201ctwo point three hours\u201d and \u201c30kph\u201d as \u201cthirty kilometers per hour.\u201d All four competitors also failed this category. The competing systems evaluated include ElevenLabs Flash v2.5, OpenAI TTS-1, Gemini 2.5 Flash TTS, and Microsoft. https:\/\/github.com\/supertone-inc\/supertonic Getting Started The Python SDK install is pip install supertonic. On first run, the SDK downloads the model assets from Hugging Face automatically. A minimal example: Copy CodeCopiedUse a different Browser from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name=&#8221;M1&#8243;) text = &#8220;A gentle breeze moved through the open window while everyone listened to the story.&#8221; wav, duration = tts.synthesize(text, voice_style=style, lang=&#8221;en&#8221;) tts.save_audio(wav, &#8220;output.wav&#8221;) print(f&#8221;Generated {duration:.2f}s of audio&#8221;) Marktechpost\u2019s Visual Explainer Supertonic 3 \u2014 Developer Guide 1 \/ 7 Overview Supertonic 3: On-Device TTS,Now in 31 Languages Supertonic 3 is a lightweight, open-weight text-to-speech system by Supertone Inc. It runs entirely via ONNX Runtime on your device \u2014 no cloud, no API call, no data leaving your machine. v3 expands from 5 to 31 languages, adds expressive tags, reduces reading failures, and stays compatible with the v2 ONNX interface. 31 Languages ~99M Parameters 404 MB ONNX Assets MIT Code License What\u2019s New in v3 Four Core Improvements Over Supertonic 2 Version 3 is a focused upgrade \u2014 same inference contract, meaningfully better output. 31 languages \u2014 Expanded from the 5-language v2 release (en, ko, es, pt, fr). Now includes Japanese, Arabic, German, Hindi, Russian, Turkish, Vietnamese, and 20 more ISO codes, plus a special na fallback for unknown languages. More stable reading \u2014 Fewer repeat and skip failures, especially on short and long utterances. This was a known limitation in v2 that v3 directly addresses. Expression tags \u2014 Supports &lt;laugh&gt;, &lt;breath&gt;, and &lt;sigh&gt; inline in text, without any separate preprocessing or external model. Higher speaker similarity \u2014 Improved similarity across the shared-language set compared with Supertonic 2. Voices are more consistent across languages. Installation Get Running in Under a Minute Install the Python SDK via pip. On first run, model assets are downloaded automatically from Hugging Face<\/p>","protected":false},"author":2,"featured_media":90576,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-90575","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-15T16:34:01+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags\",\"datePublished\":\"2026-05-15T16:34:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\"},\"wordCount\":1453,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\",\"url\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\",\"name\":\"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png\",\"datePublished\":\"2026-05-15T16:34:01+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png\",\"width\":1748,\"height\":686},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/","og_locale":"es_ES","og_type":"article","og_title":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-15T16:34:01+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"8 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags","datePublished":"2026-05-15T16:34:01+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/"},"wordCount":1453,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/","url":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/","name":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png","datePublished":"2026-05-15T16:34:01+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png","width":1748,"height":686},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png",1748,686,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png",1748,686,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png",1748,686,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-300x118.png",300,118,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-1024x402.png",1024,402,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-1536x603.png",1536,603,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc.png",1748,686,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-18x7.png",18,7,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-600x235.png",600,235,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-15-at-12.00.15-AM-1-TCf1Mc-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS. What Changed from v2 to v3 Compared with Supertonic 2, Supertonic 3 reduces repeat&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/90575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=90575"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/90575\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media\/90576"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=90575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=90575"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=90575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}