{"id":58332,"date":"2025-12-18T09:47:35","date_gmt":"2025-12-18T09:47:35","guid":{"rendered":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/"},"modified":"2025-12-18T09:47:35","modified_gmt":"2025-12-18T09:47:35","slug":"meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation","status":"publish","type":"post","link":"https:\/\/youzum.net\/it\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/","title":{"rendered":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation"},"content":{"rendered":"<p>Meta has released SAM Audio, a prompt driven audio separation model that targets a common editing bottleneck, isolating one sound from a real world mix without building a custom model per sound class. Meta released 3 main sizes, <code>sam-audio-small<\/code>, <code>sam-audio-base<\/code>, and <code>sam-audio-large<\/code>. The model is available to download and to try in the Segment Anything Playground.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architecture<\/strong><\/h3>\n<p>SAM Audio uses separate encoders for each conditioning signal, an audio encoder for the mixture, a text encoder for the natural language description, a span encoder for time anchors, and a visual encoder that consumes a visual prompt derived from video plus an object mask. The encoded streams are concatenated into time aligned features, then processed by a diffusion transformer that applies self attention over the time aligned representation and cross attention to the textual feature, then a DACVAE decoder reconstructs waveforms and emits 2 outputs, target audio and residual audio. <\/p>\n<figure class=\"wp-block-video aligncenter\"><video controls src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/12\/AQNrthXFv1fzHk0RXSZU6HrRsfB71yq81ODhE3V0KwXVj8pjFFj2YBEfz6lZBmMFE7362vqLjGdhDg9aBdZlyvPPSlWsiM4nWe72RkI5MNtiIg-1.mp4\" preload=\"none\"><\/video><figcaption class=\"wp-element-caption\">https:\/\/ai.meta.com\/blog\/sam-audio\/<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\"><strong>What SAM Audio does, and what \u2018segment\u2019 means here<\/strong>?<\/h3>\n<p>SAM Audio takes an input recording that contains multiple overlapping sources, for example speech plus traffic plus music, and separates out a target source based on a prompt. In the public inference API, the model produces 2 outputs, <code>result.target<\/code> and <code>result.residual<\/code>. The research team describes <code>target<\/code> as the isolated sound, and <code>residual<\/code> as everything else.<\/p>\n<p>That target plus residual interface maps directly to editor operations. If you want to remove a dog bark across a podcast track, you can treat the bark as the target, then subtract it by keeping only residual. If you want to extract a guitar part from a concert clip, you keep the target waveform instead. Meta uses these exact kinds of examples to explain what the model is meant to enable.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The 3 prompt types Meta is shipping<\/strong><\/h3>\n<p>Meta positions SAM Audio as a single unified model that supports 3 prompt types, and it says these prompts can be used alone or combined. <\/p>\n<ol class=\"wp-block-list\">\n<li>Text prompting: You describe the sound in natural language, for example \u201cdog barking\u201d or \u201csinging voice\u201d, and the model separates that sound from the mixture. Meta lists text prompts as one of the core interaction modes, and the open source repo includes an end to end example using <code>SAMAudioProcessor<\/code> and <code>model.separate<\/code>. <\/li>\n<li>Visual prompting: You click the person or object in a video and ask the model to isolate the audio associated with that visual object. Meta team describes visual prompting as selecting the sounding object in the video. In the released code path, visual prompting is implemented by passing video frames plus masks into the processor via <code>masked_videos<\/code>.<\/li>\n<li>Span prompting: Meta team calls span prompting an industry first. You mark time segments where the target sound occurs, then the model uses those spans to guide separation. This matters for ambiguous cases, for example when the same instrument appears in multiple passages, or when a sound is present only briefly and you want to prevent the model from over separating.<\/li>\n<\/ol>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"2192\" height=\"1266\" data-attachment-id=\"76946\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/12\/17\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/screenshot-2025-12-17-at-9-17-56-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-12-17-at-9.17.56-AM-1.png\" data-orig-size=\"2192,1266\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-12-17 at 9.17.56\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-12-17-at-9.17.56-AM-1-300x173.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-12-17-at-9.17.56-AM-1-1024x591.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-12-17-at-9.17.56-AM-1.png\" alt=\"\" class=\"wp-image-76946\" \/><figcaption class=\"wp-element-caption\">https:\/\/ai.meta.com\/blog\/sam-audio\/<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Results<\/strong><\/h3>\n<p>Meta team positions SAM Audio as achieving cutting edge performance across diverse, real world scenarios, and frames it as a unified alternative to single purpose audio tools. The team publishes a subjective evaluation table across categories, General, SFX, Speech, Speaker, Music, Instr(wild), Instr(pro), with General scores of 3.62 for sam audio small, 3.28 for sam audio base, and 3.50 for sam audio large, and Instr(pro) scores reaching 4.49 for sam audio large. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>SAM Audio is a unified audio separation model<\/strong>, it segments sound from complex mixtures using <strong>text prompts, visual prompts, and time span prompts<\/strong>.<\/li>\n<li><strong>The core API produces two waveforms per request<\/strong>, <code>target<\/code> for the isolated sound and <code>residual<\/code> for everything else, which maps cleanly to common edit operations like remove noise, extract stem, or keep ambience.<\/li>\n<li><strong>Meta released multiple checkpoints and variants<\/strong>, including <code>sam-audio-small<\/code>, <code>sam-audio-base<\/code>, <code>sam-audio-large<\/code>, plus <code>tv<\/code> variants that the repo says perform better for visual prompting, the repo also publishes a subjective evaluation table by category. <\/li>\n<li><strong>The release includes tooling beyond inference<\/strong>, Meta provides a <code>sam-audio-judge<\/code> model that scores separation results against a text description with overall quality, recall, precision, and faithfulness.<\/li>\n<\/ol>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/ai.meta.com\/blog\/sam-audio\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a> <\/strong>and<strong> <a href=\"https:\/\/github.com\/facebookresearch\/sam-audio\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a><\/strong>.\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/12\/17\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\">Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Meta has released SAM Audio, a prompt driven audio separation model that targets a common editing bottleneck, isolating one sound from a real world mix without building a custom model per sound class. Meta released 3 main sizes, sam-audio-small, sam-audio-base, and sam-audio-large. The model is available to download and to try in the Segment Anything Playground. Architecture SAM Audio uses separate encoders for each conditioning signal, an audio encoder for the mixture, a text encoder for the natural language description, a span encoder for time anchors, and a visual encoder that consumes a visual prompt derived from video plus an object mask. The encoded streams are concatenated into time aligned features, then processed by a diffusion transformer that applies self attention over the time aligned representation and cross attention to the textual feature, then a DACVAE decoder reconstructs waveforms and emits 2 outputs, target audio and residual audio. https:\/\/ai.meta.com\/blog\/sam-audio\/ What SAM Audio does, and what \u2018segment\u2019 means here? SAM Audio takes an input recording that contains multiple overlapping sources, for example speech plus traffic plus music, and separates out a target source based on a prompt. In the public inference API, the model produces 2 outputs, result.target and result.residual. The research team describes target as the isolated sound, and residual as everything else. That target plus residual interface maps directly to editor operations. If you want to remove a dog bark across a podcast track, you can treat the bark as the target, then subtract it by keeping only residual. If you want to extract a guitar part from a concert clip, you keep the target waveform instead. Meta uses these exact kinds of examples to explain what the model is meant to enable. The 3 prompt types Meta is shipping Meta positions SAM Audio as a single unified model that supports 3 prompt types, and it says these prompts can be used alone or combined. Text prompting: You describe the sound in natural language, for example \u201cdog barking\u201d or \u201csinging voice\u201d, and the model separates that sound from the mixture. Meta lists text prompts as one of the core interaction modes, and the open source repo includes an end to end example using SAMAudioProcessor and model.separate. Visual prompting: You click the person or object in a video and ask the model to isolate the audio associated with that visual object. Meta team describes visual prompting as selecting the sounding object in the video. In the released code path, visual prompting is implemented by passing video frames plus masks into the processor via masked_videos. Span prompting: Meta team calls span prompting an industry first. You mark time segments where the target sound occurs, then the model uses those spans to guide separation. This matters for ambiguous cases, for example when the same instrument appears in multiple passages, or when a sound is present only briefly and you want to prevent the model from over separating. https:\/\/ai.meta.com\/blog\/sam-audio\/ Results Meta team positions SAM Audio as achieving cutting edge performance across diverse, real world scenarios, and frames it as a unified alternative to single purpose audio tools. The team publishes a subjective evaluation table across categories, General, SFX, Speech, Speaker, Music, Instr(wild), Instr(pro), with General scores of 3.62 for sam audio small, 3.28 for sam audio base, and 3.50 for sam audio large, and Instr(pro) scores reaching 4.49 for sam audio large. Key Takeaways SAM Audio is a unified audio separation model, it segments sound from complex mixtures using text prompts, visual prompts, and time span prompts. The core API produces two waveforms per request, target for the isolated sound and residual for everything else, which maps cleanly to common edit operations like remove noise, extract stem, or keep ambience. Meta released multiple checkpoints and variants, including sam-audio-small, sam-audio-base, sam-audio-large, plus tv variants that the repo says perform better for visual prompting, the repo also publishes a subjective evaluation table by category. The release includes tooling beyond inference, Meta provides a sam-audio-judge model that scores separation results against a text description with overall quality, recall, precision, and faithfulness. Check out the\u00a0Technical details and GitHub Page.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":58333,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-58332","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/it\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/it\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-18T09:47:35+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation\",\"datePublished\":\"2025-12-18T09:47:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\"},\"wordCount\":762,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\",\"url\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\",\"name\":\"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png\",\"datePublished\":\"2025-12-18T09:47:35+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png\",\"width\":2188,\"height\":1563},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/it\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/it\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/","og_locale":"it_IT","og_type":"article","og_title":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/it\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-12-18T09:47:35+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Scritto da":"admin NU","Tempo di lettura stimato":"4 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation","datePublished":"2025-12-18T09:47:35+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/"},"wordCount":762,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"it-IT","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/","url":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/","name":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png","datePublished":"2025-12-18T09:47:35+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/"]}]},{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png","width":2188,"height":1563},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/it\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png",2188,1563,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png",2188,1563,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD.png",2188,1563,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-300x214.png",300,214,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-1024x731.png",1024,731,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-1536x1097.png",1536,1097,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-2048x1463.png",2048,1463,true],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-600x429.png",600,429,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/12\/blog-banner23-1-3-4aBfMD-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/it\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/it\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Meta has released SAM Audio, a prompt driven audio separation model that targets a common editing bottleneck, isolating one sound from a real world mix without building a custom model per sound class. Meta released 3 main sizes, sam-audio-small, sam-audio-base, and sam-audio-large. The model is available to download and to try in the Segment Anything&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/58332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/comments?post=58332"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/58332\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media\/58333"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media?parent=58332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/categories?post=58332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/tags?post=58332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}