{"id":71315,"date":"2026-02-15T11:48:17","date_gmt":"2026-02-15T11:48:17","guid":{"rendered":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/"},"modified":"2026-02-15T11:48:17","modified_gmt":"2026-02-15T11:48:17","slug":"meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/","title":{"rendered":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support"},"content":{"rendered":"<p>The landscape of generative audio is shifting toward efficiency. A new open-source contender, <strong>Kani-TTS-2<\/strong>, has been released by the team at <strong>nineninesix<\/strong>.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.<\/p>\n<p>Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both <a href=\"https:\/\/huggingface.co\/nineninesix\/kani-tts-2-en\" target=\"_blank\" rel=\"noreferrer noopener\">English (<strong>EN<\/strong>)<\/a> and <a href=\"https:\/\/huggingface.co\/nineninesix\/kani-tts-2-pt\" target=\"_blank\" rel=\"noreferrer noopener\">Portuguese (<strong>PT<\/strong>)<\/a> versions.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Architecture: LFM2 and NanoCodec<\/strong><\/h3>\n<p>Kani-TTS-2 follows the <strong>\u2018Audio-as-Language<\/strong>\u2018 philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec.<\/p>\n<p><strong>The system relies on a two-stage process:<\/strong><\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>The Language Backbone:<\/strong> The model is built on <strong>LiquidAI\u2019s LFM2 (350M)<\/strong> architecture. This backbone generates \u2018audio intent\u2019 by predicting the next audio tokens. Because LFM (Liquid Foundation Models) are designed for efficiency, they provide a faster alternative to standard transformers.<\/li>\n<li><strong>The Neural Codec:<\/strong> It uses the <strong>NVIDIA NanoCodec<\/strong> to turn those tokens into 22kHz waveforms.<\/li>\n<\/ol>\n<p>By using this architecture, the model captures human-like prosody\u2014the rhythm and intonation of speech\u2014without the \u2018robotic\u2019 artifacts found in older TTS systems.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Efficiency: 10,000 Hours in 6 Hours<\/strong><\/h3>\n<p>The training metrics for Kani-TTS-2 are a masterclass in optimization. The English model was trained on <strong>10,000 hours<\/strong> of high-quality speech data.<\/p>\n<p>While that scale is impressive, the speed of training is the real story. The research team trained the model in only <strong>6 hours<\/strong> using a cluster of <strong>8 NVIDIA H100 GPUs<\/strong>. This proves that massive datasets no longer require weeks of compute time when paired with efficient architectures like LFM2.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Zero-Shot Voice Cloning and Performance<\/strong><\/h3>\n<p>The standout feature for developers is <strong>zero-shot voice cloning<\/strong>. Unlike traditional models that require fine-tuning for new voices, Kani-TTS-2 uses <strong>speaker embeddings<\/strong>.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>How it works:<\/strong> You provide a short reference audio clip.<\/li>\n<li><strong>The result:<\/strong> The model extracts the unique characteristics of that voice and applies them to the generated text instantly.<\/li>\n<\/ul>\n<p><strong>From a deployment perspective, the model is highly accessible:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Parameter Count:<\/strong> 400M (0.4B) parameters.<\/li>\n<li><strong>Speed:<\/strong> It features a <strong>Real-Time Factor (RTF) of 0.2<\/strong>. This means it can generate 10 seconds of speech in roughly 2 seconds.<\/li>\n<li><strong>Hardware:<\/strong> It requires only <strong>3GB of VRAM<\/strong>, making it compatible with consumer-grade GPUs like the RTX 3060 or 4050.<\/li>\n<li><strong>License:<\/strong> Released under the <strong>Apache 2.0<\/strong> license, allowing for commercial use.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Efficient Architecture:<\/strong> The model uses a <strong>400M parameter<\/strong> backbone based on <strong>LiquidAI\u2019s LFM2 (350M)<\/strong>. This \u2018Audio-as-Language\u2019 approach treats speech as discrete tokens, allowing for faster processing and more human-like intonation compared to traditional architectures.<\/li>\n<li><strong>Rapid Training at Scale:<\/strong> Kani-TTS-2-EN was trained on <strong>10,000 hours<\/strong> of high-quality speech data in just <strong>6 hours<\/strong> using <strong>8 NVIDIA H100 GPUs<\/strong>. <\/li>\n<li><strong>Instant Zero-Shot Cloning:<\/strong> There is no need for fine-tuning to replicate a specific voice. By providing a short reference audio clip, the model uses <strong>speaker embeddings<\/strong> to instantly synthesize text in the target speaker\u2019s voice.<\/li>\n<li><strong>High Performance on Edge Hardware:<\/strong> With a <strong>Real-Time Factor (RTF) of 0.2<\/strong>, the model can generate 10 seconds of audio in approximately 2 seconds. It requires only <strong>3GB of VRAM<\/strong>, making it fully functional on consumer-grade GPUs like the RTX 3060.<\/li>\n<li><strong>Developer-Friendly Licensing:<\/strong> Released under the <strong>Apache 2.0 license<\/strong>, Kani-TTS-2 is ready for commercial integration. It offers a local-first, low-latency alternative to expensive closed-source TTS APIs.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/huggingface.co\/nineninesix\/kani-tts-2-en\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weight<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/15\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\">Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint. Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English (EN) and Portuguese (PT) versions. The Architecture: LFM2 and NanoCodec Kani-TTS-2 follows the \u2018Audio-as-Language\u2018 philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec. The system relies on a two-stage process: The Language Backbone: The model is built on LiquidAI\u2019s LFM2 (350M) architecture. This backbone generates \u2018audio intent\u2019 by predicting the next audio tokens. Because LFM (Liquid Foundation Models) are designed for efficiency, they provide a faster alternative to standard transformers. The Neural Codec: It uses the NVIDIA NanoCodec to turn those tokens into 22kHz waveforms. By using this architecture, the model captures human-like prosody\u2014the rhythm and intonation of speech\u2014without the \u2018robotic\u2019 artifacts found in older TTS systems. Efficiency: 10,000 Hours in 6 Hours The training metrics for Kani-TTS-2 are a masterclass in optimization. The English model was trained on 10,000 hours of high-quality speech data. While that scale is impressive, the speed of training is the real story. The research team trained the model in only 6 hours using a cluster of 8 NVIDIA H100 GPUs. This proves that massive datasets no longer require weeks of compute time when paired with efficient architectures like LFM2. Zero-Shot Voice Cloning and Performance The standout feature for developers is zero-shot voice cloning. Unlike traditional models that require fine-tuning for new voices, Kani-TTS-2 uses speaker embeddings. How it works: You provide a short reference audio clip. The result: The model extracts the unique characteristics of that voice and applies them to the generated text instantly. From a deployment perspective, the model is highly accessible: Parameter Count: 400M (0.4B) parameters. Speed: It features a Real-Time Factor (RTF) of 0.2. This means it can generate 10 seconds of speech in roughly 2 seconds. Hardware: It requires only 3GB of VRAM, making it compatible with consumer-grade GPUs like the RTX 3060 or 4050. License: Released under the Apache 2.0 license, allowing for commercial use. Key Takeaways Efficient Architecture: The model uses a 400M parameter backbone based on LiquidAI\u2019s LFM2 (350M). This \u2018Audio-as-Language\u2019 approach treats speech as discrete tokens, allowing for faster processing and more human-like intonation compared to traditional architectures. Rapid Training at Scale: Kani-TTS-2-EN was trained on 10,000 hours of high-quality speech data in just 6 hours using 8 NVIDIA H100 GPUs. Instant Zero-Shot Cloning: There is no need for fine-tuning to replicate a specific voice. By providing a short reference audio clip, the model uses speaker embeddings to instantly synthesize text in the target speaker\u2019s voice. High Performance on Edge Hardware: With a Real-Time Factor (RTF) of 0.2, the model can generate 10 seconds of audio in approximately 2 seconds. It requires only 3GB of VRAM, making it fully functional on consumer-grade GPUs like the RTX 3060. Developer-Friendly Licensing: Released under the Apache 2.0 license, Kani-TTS-2 is ready for commercial integration. It offers a local-first, low-latency alternative to expensive closed-source TTS APIs. Check out the\u00a0Model Weight.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-71315","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-15T11:48:17+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support\",\"datePublished\":\"2026-02-15T11:48:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\"},\"wordCount\":618,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\",\"url\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\",\"name\":\"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-02-15T11:48:17+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/","og_locale":"th_TH","og_type":"article","og_title":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-02-15T11:48:17+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"3 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support","datePublished":"2026-02-15T11:48:17+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/"},"wordCount":618,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/","url":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/","name":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-02-15T11:48:17+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Meet \u2018Kani-TTS-2\u2019: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint. Kani-TTS-2 offers a lean, high-performance alternative to&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/71315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=71315"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/71315\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=71315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=71315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=71315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}