{"id":41934,"date":"2025-10-03T06:48:56","date_gmt":"2025-10-03T06:48:56","guid":{"rendered":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/"},"modified":"2025-10-03T06:48:56","modified_gmt":"2025-10-03T06:48:56","slug":"neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/","title":{"rendered":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning"},"content":{"rendered":"<p>Neuphonic has released <strong>NeuTTS Air<\/strong>, an open-source text-to-speech (TTS) <strong>speech language model<\/strong> designed to run locally in real time on CPUs. The <strong><a href=\"https:\/\/huggingface.co\/neuphonic\/neutts-air\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face model card<\/a><\/strong> lists <strong>748M parameters<\/strong> (Qwen2 architecture) and ships in GGUF quantizations (Q4\/Q8), enabling inference through <code>llama.cpp<\/code>\/<code>llama-cpp-python<\/code> without cloud dependencies. It is licensed under <strong>Apache-2.0<\/strong> and includes a runnable <strong><a href=\"https:\/\/huggingface.co\/spaces\/neuphonic\/neutts-air\" target=\"_blank\" rel=\"noreferrer noopener\">demo<\/a><\/strong> and examples.<\/p>\n<h3 class=\"wp-block-heading\"><strong>So, what is new?<\/strong><\/h3>\n<p>NeuTTS Air couples a <strong>0.5B-class Qwen backbone<\/strong> with Neuphonic\u2019s <strong>NeuCodec<\/strong> audio codec. Neuphonic positions the system as a \u201csuper-realistic, on-device\u201d TTS LM that clones a voice from <strong>~3 seconds of reference audio<\/strong> and synthesizes speech in that style, targeting voice agents and privacy-sensitive applications. The model card and repository explicitly emphasize <strong>real-time CPU<\/strong> generation and small-footprint deployment. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Features<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Realism at sub-1B scale:<\/strong> Human-like prosody and timbre preservation for a ~0.7B (Qwen2-class) text-to-speech LM. <\/li>\n<li><strong>On-device deployment:<\/strong> Distributed in <strong>GGUF<\/strong> (Q4\/Q8) with CPU-first paths; suitable for laptops, phones, and Raspberry Pi-class boards. <\/li>\n<li><strong>Instant speaker cloning:<\/strong> Style transfer from ~<strong>3 seconds<\/strong> of reference audio (reference WAV + transcript). <\/li>\n<li><strong>Compact LM+codec stack:<\/strong> <strong>Qwen 0.5B<\/strong> backbone paired with <strong>NeuCodec (0.8 kbps \/ 24 kHz)<\/strong> to balance latency, footprint, and output quality. <\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Explain the model architecture and runtime path<\/strong>?<\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Backbone:<\/strong> <em>Qwen 0.5B<\/em> used as a lightweight LM to condition speech generation; the hosted artifact is reported as <strong>748M params<\/strong> under the <strong>qwen2<\/strong> architecture on Hugging Face.<\/li>\n<li><strong>Codec:<\/strong> <em>NeuCodec<\/em> provides low-bitrate acoustic tokenization\/decoding; it targets <strong>0.8 kbps<\/strong> with <strong>24 kHz<\/strong> output, enabling compact representations for efficient on-device use.<\/li>\n<li><strong>Quantization &amp; format:<\/strong> Prebuilt <strong>GGUF<\/strong> backbones (Q4\/Q8) are available; the repo includes instructions for <strong><code>llama-cpp-python<\/code><\/strong> and an optional <strong>ONNX<\/strong> decoder path.<\/li>\n<li><strong>Dependencies:<\/strong> Uses <code>espeak<\/code> for phonemization; examples and a Jupyter notebook are provided for end-to-end synthesis. <\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>On-device performance focus<\/strong><\/h3>\n<p><strong>NeuTTS Air<\/strong> showcases <strong>\u2018real-time generation on mid-range devices<\/strong>\u2018 and offers <strong>CPU-first<\/strong> defaults; GGUF quantization is intended for laptops and single-board computers. While no fps\/RTF numbers are published on the card, the distribution targets <strong>local inference without a GPU<\/strong> and demonstrates a working flow through the provided examples and Space.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Voice cloning workflow<\/strong><\/h3>\n<p>NeuTTS Air requires (1) a <strong>reference WAV<\/strong> and (2) the <strong>transcript text<\/strong> for that reference. It encodes the reference to style tokens and then synthesizes arbitrary text <strong>in the reference speaker\u2019s timbre<\/strong>. The Neuphonic team recommends <strong>3\u201315 s<\/strong> clean, mono audio and provides pre-encoded samples.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Privacy, responsibility, and watermarking<\/strong><\/h3>\n<p>Neuphonic frames the model for <strong>on-device privacy<\/strong> (no audio\/text leaves the machine without user\u2019s approval) and notes that all generated audio includes a <strong>Perth (Perceptual Threshold) watermarker<\/strong> to support responsible use and provenance. <\/p>\n<h3 class=\"wp-block-heading\"><strong>How it compares?<\/strong><\/h3>\n<p>Open, local TTS systems exist (e.g., GGUF-based pipelines), but NeuTTS Air is notable for packaging a <strong>small LM + neural codec<\/strong> with <strong>instant cloning<\/strong>, <strong>CPU-first quantizations<\/strong>, and <strong>watermarking<\/strong> under a permissive license. The \u201cworld\u2019s first super-realistic, on-device speech LM\u201d phrasing is the vendor\u2019s claim; the verifiable facts are the <strong>size, formats, cloning procedure, license, and provided runtimes<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Our Comments<\/strong><\/h3>\n<p>The focus is on system trade-offs: a ~0.7B Qwen-class backbone with GGUF quantization paired with NeuCodec at 0.8 kbps\/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that preserves timbre using ~3\u201315 s style references while keeping latency and memory predictable. The Apache-2.0 licensing and built-in watermarking are deployment-friendly, but publishing RTF\/latency on commodity CPUs and cloning-quality vs. reference-length curves would enable rigorous benchmarking against existing local pipelines. Operationally, an offline path with minimal dependencies (eSpeak, llama.cpp\/ONNX) lowers privacy\/compliance risk for edge agents without sacrificing intelligibility.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/huggingface.co\/neuphonic\/neutts-air\" target=\"_blank\" rel=\"noreferrer noopener\">Model Card on Hugging Face<\/a> and <a href=\"https:\/\/github.com\/neuphonic\/neutts-air\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a><\/strong>. Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/10\/02\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\">Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Neuphonic has released NeuTTS Air, an open-source text-to-speech (TTS) speech language model designed to run locally in real time on CPUs. The Hugging Face model card lists 748M parameters (Qwen2 architecture) and ships in GGUF quantizations (Q4\/Q8), enabling inference through llama.cpp\/llama-cpp-python without cloud dependencies. It is licensed under Apache-2.0 and includes a runnable demo and examples. So, what is new? NeuTTS Air couples a 0.5B-class Qwen backbone with Neuphonic\u2019s NeuCodec audio codec. Neuphonic positions the system as a \u201csuper-realistic, on-device\u201d TTS LM that clones a voice from ~3 seconds of reference audio and synthesizes speech in that style, targeting voice agents and privacy-sensitive applications. The model card and repository explicitly emphasize real-time CPU generation and small-footprint deployment. Key Features Realism at sub-1B scale: Human-like prosody and timbre preservation for a ~0.7B (Qwen2-class) text-to-speech LM. On-device deployment: Distributed in GGUF (Q4\/Q8) with CPU-first paths; suitable for laptops, phones, and Raspberry Pi-class boards. Instant speaker cloning: Style transfer from ~3 seconds of reference audio (reference WAV + transcript). Compact LM+codec stack: Qwen 0.5B backbone paired with NeuCodec (0.8 kbps \/ 24 kHz) to balance latency, footprint, and output quality. Explain the model architecture and runtime path? Backbone: Qwen 0.5B used as a lightweight LM to condition speech generation; the hosted artifact is reported as 748M params under the qwen2 architecture on Hugging Face. Codec: NeuCodec provides low-bitrate acoustic tokenization\/decoding; it targets 0.8 kbps with 24 kHz output, enabling compact representations for efficient on-device use. Quantization &amp; format: Prebuilt GGUF backbones (Q4\/Q8) are available; the repo includes instructions for llama-cpp-python and an optional ONNX decoder path. Dependencies: Uses espeak for phonemization; examples and a Jupyter notebook are provided for end-to-end synthesis. On-device performance focus NeuTTS Air showcases \u2018real-time generation on mid-range devices\u2018 and offers CPU-first defaults; GGUF quantization is intended for laptops and single-board computers. While no fps\/RTF numbers are published on the card, the distribution targets local inference without a GPU and demonstrates a working flow through the provided examples and Space. Voice cloning workflow NeuTTS Air requires (1) a reference WAV and (2) the transcript text for that reference. It encodes the reference to style tokens and then synthesizes arbitrary text in the reference speaker\u2019s timbre. The Neuphonic team recommends 3\u201315 s clean, mono audio and provides pre-encoded samples. Privacy, responsibility, and watermarking Neuphonic frames the model for on-device privacy (no audio\/text leaves the machine without user\u2019s approval) and notes that all generated audio includes a Perth (Perceptual Threshold) watermarker to support responsible use and provenance. How it compares? Open, local TTS systems exist (e.g., GGUF-based pipelines), but NeuTTS Air is notable for packaging a small LM + neural codec with instant cloning, CPU-first quantizations, and watermarking under a permissive license. The \u201cworld\u2019s first super-realistic, on-device speech LM\u201d phrasing is the vendor\u2019s claim; the verifiable facts are the size, formats, cloning procedure, license, and provided runtimes. Our Comments The focus is on system trade-offs: a ~0.7B Qwen-class backbone with GGUF quantization paired with NeuCodec at 0.8 kbps\/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that preserves timbre using ~3\u201315 s style references while keeping latency and memory predictable. The Apache-2.0 licensing and built-in watermarking are deployment-friendly, but publishing RTF\/latency on commodity CPUs and cloning-quality vs. reference-length curves would enable rigorous benchmarking against existing local pipelines. Operationally, an offline path with minimal dependencies (eSpeak, llama.cpp\/ONNX) lowers privacy\/compliance risk for edge agents without sacrificing intelligibility. Check out the\u00a0Model Card on Hugging Face and GitHub Page. Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-41934","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-03T06:48:56+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning\",\"datePublished\":\"2025-10-03T06:48:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\"},\"wordCount\":661,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\",\"url\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\",\"name\":\"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-10-03T06:48:56+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/","og_locale":"zh_CN","og_type":"article","og_title":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-10-03T06:48:56+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"3 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning","datePublished":"2025-10-03T06:48:56+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/"},"wordCount":661,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/","url":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/","name":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-10-03T06:48:56+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/neuphonic-open-sources-neutts-air-a-748m-parameter-on-device-speech-language-model-with-instant-voice-cloning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Neuphonic has released NeuTTS Air, an open-source text-to-speech (TTS) speech language model designed to run locally in real time on CPUs. The Hugging Face model card lists 748M parameters (Qwen2 architecture) and ships in GGUF quantizations (Q4\/Q8), enabling inference through llama.cpp\/llama-cpp-python without cloud dependencies. It is licensed under Apache-2.0 and includes a runnable demo and&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/41934","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=41934"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/41934\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=41934"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=41934"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=41934"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}