{"id":28060,"date":"2025-07-29T05:47:41","date_gmt":"2025-07-29T05:47:41","guid":{"rendered":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/"},"modified":"2025-07-29T05:47:41","modified_gmt":"2025-07-29T05:47:41","slug":"amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/","title":{"rendered":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons"},"content":{"rendered":"<p>Amazon researchers developed a new AI architecture that cuts inference time by 30% by selecting only task-relevant neurons, similar to how the brain uses specialized regions for specific tasks. This breakthrough approach addresses one of the biggest challenges facing large AI models: the computational expense and latency associated with activating every neuron for every request, regardless of their relevance.<\/p>\n<p>The traditional deployment of large language models (LLMs) and foundational AI systems has relied on activating the full network for every input. While this guarantees versatility, it results in significant inefficiency\u2014much of the network\u2019s activity is superfluous for any given prompt. Inspired by the human brain\u2019s efficiency\u2014the brain flexibly recruits only the circuits it needs for a given cognitive task\u2014Amazon\u2019s architecture mimics this behavior by activating neurons most relevant to the current input context.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"587\" data-attachment-id=\"73042\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/07\/28\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/screenshot-2025-07-28-at-9-00-36-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1.png\" data-orig-size=\"1396,800\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-07-28 at 9.00.36\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-300x172.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587.png\" alt=\"\" class=\"wp-image-73042\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Dynamic, Context-Aware Pruning<\/strong><\/h3>\n<p>At the heart of this innovation is <strong>dynamic, context-aware pruning<\/strong>. Rather than trimming the model statically during training and locking in those changes, Amazon\u2019s solution prunes the network \u201con the fly,\u201d during inference itself. This enables the model to remain large and versatile, yet efficient and fast-active for any specific task.<\/p>\n<ul class=\"wp-block-list\">\n<li>Before processing an input, the model evaluates which neurons or modules will be most useful, based on signals such as the type of task (e.g., legal writing, translation, or coding assistance), language, and other context features.<\/li>\n<li>It leverages a <strong>gate predictor<\/strong>, a lightweight neural component trained to generate a \u201cmask\u201d that determines which neurons are switched on for that particular sequence.<\/li>\n<li>The gating decisions are binary, so neurons are either fully active or completely skipped, ensuring real compute savings.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>How the System Works<\/strong><\/h3>\n<p>The architecture introduces a <em>context-aware gating mechanism<\/em>. This mechanism analyzes input features (and, for speech models, auxiliary information such as language and task tokens) to decide which modules\u2014such as self-attention blocks, feed-forward networks, or specialized convolutions\u2014are essential for the current step. For example, in a speech recognition task, it may activate local context modules for detailed sound analysis while skipping unnecessary components that are only beneficial for other tasks.<\/p>\n<p>This pruning strategy is structured and modular: instead of removing individual weights (which can lead to hardware inefficiency), it skips entire modules or layers. This preserves the model\u2019s structural integrity and ensures compatibility with GPU and modern hardware accelerators.<\/p>\n<p>The gate predictor model is trained with a sparsity loss to achieve a target sparsity: the proportion of modules skipped. Training uses techniques like the Gumbel-Softmax estimator, ensuring that gating behavior remains differentiable during optimization, but ultimately yields crisp, binary neuron selection at inference.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Demonstrated Results: Speed Without Sacrificing Quality<\/strong><\/h3>\n<p><strong>Experiments show that dynamically skipping irrelevant modules can:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Reduce inference time by up to 34%<\/strong> for multilingual speech-to-text or automatic speech recognition (ASR) tasks\u2014where typical baseline models suffered 9.28s latency, pruned models ran in as little as 5.22s, depending on the task and desired sparsity level.<\/li>\n<li><strong>Decrease FLOPs (floating-point operations) by over 60%<\/strong> at high sparsity levels, greatly lowering cloud and hardware costs.<\/li>\n<li><strong>Maintain output quality<\/strong>: Pruning the decoder in particular preserves BLEU scores (for translation tasks) and Word Error Rate (WER) for ASR up to moderate sparsity, meaning users see no drop in model performance until very aggressive pruning is applied.<\/li>\n<li><strong>Provide interpretability<\/strong>: Analyzing pruned module patterns reveals which parts of the model are essential for each context\u2014local context modules dominate in ASR, while feed-forward networks are prioritized for speech translation.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Task and Language Adaptation<\/strong><\/h3>\n<p>A core insight is that optimal pruning strategies\u2014meaning which modules to retain or skip\u2014can change dramatically depending on the task and language. <strong>For instance:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>In ASR, the importance of local context modules (cgMLP) is paramount, while the decoder can be sparsified heavily with little accuracy loss.<\/li>\n<li>For speech translation (ST), both the encoder and the decoder require more balanced attention, as the decoder\u2019s feed-forward layers are essential.<\/li>\n<li>In multilingual or multitask scenarios, module selection adapts but shows consistent patterns within each type, highlighting the learned specialization within the architecture.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Broader Implications<\/strong><\/h3>\n<p><strong>This dynamic, modular pruning opens the door for:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>More energy-efficient, scalable AI\u2014especially vital as LLMs and multimodal models continue to grow.<\/li>\n<li>AI models that can personalize their compute pathways\u2014not only by task but potentially by user profile, region, or device.<\/li>\n<li>Transferability to other domains, such as natural language processing and computer vision, wherever foundation models are used.<\/li>\n<\/ul>\n<p>By selectively activating only task-relevant modules in real time, inspired by biological neural efficiency, Amazon\u2019s architecture points the way toward AI that is both powerful and practical for global, real-world use.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/www.amazon.science\/publications\/context-aware-dynamic-pruning-for-speech-foundation-models\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/strong> and <strong><a href=\"https:\/\/www.amazon.science\/blog\/pruning-network-nodes-on-the-fly-to-improve-llm-efficiency?\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><em>.<\/em><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/07\/28\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\">Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Amazon researchers developed a new AI architecture that cuts inference time by 30% by selecting only task-relevant neurons, similar to how the brain uses specialized regions for specific tasks. This breakthrough approach addresses one of the biggest challenges facing large AI models: the computational expense and latency associated with activating every neuron for every request, regardless of their relevance. The traditional deployment of large language models (LLMs) and foundational AI systems has relied on activating the full network for every input. While this guarantees versatility, it results in significant inefficiency\u2014much of the network\u2019s activity is superfluous for any given prompt. Inspired by the human brain\u2019s efficiency\u2014the brain flexibly recruits only the circuits it needs for a given cognitive task\u2014Amazon\u2019s architecture mimics this behavior by activating neurons most relevant to the current input context. Dynamic, Context-Aware Pruning At the heart of this innovation is dynamic, context-aware pruning. Rather than trimming the model statically during training and locking in those changes, Amazon\u2019s solution prunes the network \u201con the fly,\u201d during inference itself. This enables the model to remain large and versatile, yet efficient and fast-active for any specific task. Before processing an input, the model evaluates which neurons or modules will be most useful, based on signals such as the type of task (e.g., legal writing, translation, or coding assistance), language, and other context features. It leverages a gate predictor, a lightweight neural component trained to generate a \u201cmask\u201d that determines which neurons are switched on for that particular sequence. The gating decisions are binary, so neurons are either fully active or completely skipped, ensuring real compute savings. How the System Works The architecture introduces a context-aware gating mechanism. This mechanism analyzes input features (and, for speech models, auxiliary information such as language and task tokens) to decide which modules\u2014such as self-attention blocks, feed-forward networks, or specialized convolutions\u2014are essential for the current step. For example, in a speech recognition task, it may activate local context modules for detailed sound analysis while skipping unnecessary components that are only beneficial for other tasks. This pruning strategy is structured and modular: instead of removing individual weights (which can lead to hardware inefficiency), it skips entire modules or layers. This preserves the model\u2019s structural integrity and ensures compatibility with GPU and modern hardware accelerators. The gate predictor model is trained with a sparsity loss to achieve a target sparsity: the proportion of modules skipped. Training uses techniques like the Gumbel-Softmax estimator, ensuring that gating behavior remains differentiable during optimization, but ultimately yields crisp, binary neuron selection at inference. Demonstrated Results: Speed Without Sacrificing Quality Experiments show that dynamically skipping irrelevant modules can: Reduce inference time by up to 34% for multilingual speech-to-text or automatic speech recognition (ASR) tasks\u2014where typical baseline models suffered 9.28s latency, pruned models ran in as little as 5.22s, depending on the task and desired sparsity level. Decrease FLOPs (floating-point operations) by over 60% at high sparsity levels, greatly lowering cloud and hardware costs. Maintain output quality: Pruning the decoder in particular preserves BLEU scores (for translation tasks) and Word Error Rate (WER) for ASR up to moderate sparsity, meaning users see no drop in model performance until very aggressive pruning is applied. Provide interpretability: Analyzing pruned module patterns reveals which parts of the model are essential for each context\u2014local context modules dominate in ASR, while feed-forward networks are prioritized for speech translation. Task and Language Adaptation A core insight is that optimal pruning strategies\u2014meaning which modules to retain or skip\u2014can change dramatically depending on the task and language. For instance: In ASR, the importance of local context modules (cgMLP) is paramount, while the decoder can be sparsified heavily with little accuracy loss. For speech translation (ST), both the encoder and the decoder require more balanced attention, as the decoder\u2019s feed-forward layers are essential. In multilingual or multitask scenarios, module selection adapts but shows consistent patterns within each type, highlighting the learned specialization within the architecture. Broader Implications This dynamic, modular pruning opens the door for: More energy-efficient, scalable AI\u2014especially vital as LLMs and multimodal models continue to grow. AI models that can personalize their compute pathways\u2014not only by task but potentially by user profile, region, or device. Transferability to other domains, such as natural language processing and computer vision, wherever foundation models are used. By selectively activating only task-relevant modules in real time, inspired by biological neural efficiency, Amazon\u2019s architecture points the way toward AI that is both powerful and practical for global, real-world use. Check out the\u00a0Paper and Technical details.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":28061,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-28060","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-29T05:47:41+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons\",\"datePublished\":\"2025-07-29T05:47:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\"},\"wordCount\":835,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\",\"url\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\",\"name\":\"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png\",\"datePublished\":\"2025-07-29T05:47:41+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png\",\"width\":1024,\"height\":587},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/","og_locale":"es_ES","og_type":"article","og_title":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-07-29T05:47:41+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"4 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons","datePublished":"2025-07-29T05:47:41+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/"},"wordCount":835,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/","url":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/","name":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png","datePublished":"2025-07-29T05:47:41+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png","width":1024,"height":587},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/amazon-develops-an-ai-architecture-that-cuts-inference-time-30-by-activating-only-relevant-neurons\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-300x172.png",300,172,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU.png",1024,587,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-18x10.png",18,10,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-600x344.png",600,344,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-28-at-9.00.36-PM-1-1024x587-Ao5jpU-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Amazon researchers developed a new AI architecture that cuts inference time by 30% by selecting only task-relevant neurons, similar to how the brain uses specialized regions for specific tasks. This breakthrough approach addresses one of the biggest challenges facing large AI models: the computational expense and latency associated with activating every neuron for every request,&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/28060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=28060"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/28060\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media\/28061"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=28060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=28060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=28060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}