{"id":89858,"date":"2026-05-12T16:25:39","date_gmt":"2026-05-12T16:25:39","guid":{"rendered":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/"},"modified":"2026-05-12T16:25:39","modified_gmt":"2026-05-12T16:25:39","slug":"tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/","title":{"rendered":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon"},"content":{"rendered":"<p>Researchers at Tilde Research have released <strong>Aurora<\/strong>, a new optimizer for training neural networks that addresses a structural flaw in the widely-used Muon optimizer. The flaw quietly kills off a significant fraction of MLP neurons during training and keeps them permanently dead. <strong>Aurora<\/strong> comes with a 1.1B parameter pretraining experiment, a new state-of-the-art result on the modded-nanoGPT speedrun benchmark, and open codes.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What is Muon?<\/strong><\/h2>\n<p>To understand Aurora, it helps to first understand Muon. The Muon optimizer attracted attention in the ML community after outperforming AdamW <strong>in wall-clock time to convergence<\/strong> on the nanoGPT speedrun competition \u2014 a community benchmark that measures how fast you can train a GPT-style model to a target validation loss. Since then, Muon has been adopted in frontier-scale model training by several research groups.<\/p>\n<p>Muon\u2019s key algorithmic step is computing the <strong>polar factor<\/strong> of the gradient matrix. For a gradient matrix <strong>G<\/strong> with thin Singular Value Decomposition (SVD) <strong>G = U\u03a3V\u1d40<\/strong>, Muon computes <code>polar(G) = UV\u1d40<\/code>, which is the closest semi-orthogonal matrix to <strong>G<\/strong> in the Frobenius norm. This orthogonalized gradient is then used to update the weights: <strong>W \u2190 W \u2212 \u03b7 UV\u1d40<\/strong> for a learning rate \u03b7. The use of matmul-only iterative algorithms to compute the polar factor is what makes Muon practical at scale.<\/p>\n<h2 class=\"wp-block-heading\"><strong>The NorMuon Puzzle: Row Normalization Helps, But Why?<\/strong><\/h2>\n<p>Before Aurora, NorMuon led the modded-nanoGPT speedrun. It introduced a row-normalization step\u2014similar to Adam\u2019s per-parameter scaling\u2014that adjusted the polar factor by its inverse RMS norm. While this often pulls the update away from a strictly orthogonal gradient, NorMuon still yields impressive results.  The Tilde team set out to understand exactly what gap in Muon\u2019s formulation NorMuon was addressing.  <\/p>\n<h2 class=\"wp-block-heading\"><strong>The Core Problem: Row-Norm Anisotropy and Neuron Death in Tall Matrices<\/strong><\/h2>\n<p>The research team discovered that the Muon optimizer unintentionally \u201ckills\u201d a large portion of neurons in <strong>tall weight matrices<\/strong>, such as those found in SwiGLU-based MLP layers. Because it is mathematically impossible for these specific matrix shapes to stay perfectly orthogonal while keeping row updates even, the optimizer ends up giving massive updates to some neurons while virtually ignoring others. This results in a \u201cdeath spiral\u201d where under-performing neurons receive less signal over time, eventually becoming permanently inactive.<\/p>\n<p>The research study revealed that by the 500th training step, more than one in four neurons are effectively dead. This isn\u2019t just a local issue; the lack of activity in these neurons starves subsequent layers of necessary data, spreading the inefficiency throughout the model. <strong>Aurora<\/strong> solves this by using a new mathematical approach that enforces uniform updates across all neurons without sacrificing the benefits of orthogonalization.<\/p>\n<h2 class=\"wp-block-heading\"><strong>The Intermediate Step: U-NorMuon<\/strong><\/h2>\n<p>Before arriving at Aurora, the research introduces an intermediate fix called <strong>U-NorMuon<\/strong>. The key observation is that NorMuon normalizes each row to unit norm (norm = 1), but this is actually the wrong target for a tall matrix. For a column-orthogonal tall matrix, the mathematically correct average row norm is \u221a(n\/m), not 1.  U-NorMuon corrects this by normalizing tall matrix rows to have norm \u221a(n\/m) instead of 1.<\/p>\n<p>In experiments at 340M scale, U-NorMuon outperforms both Muon and standard NorMuon and completely eliminates the neuron death phenomenon \u2014 leverage scores become approximately isotropic throughout training. Crucially, U-NorMuon propagates this benefit to layers it doesn\u2019t directly touch: keeping up\/gate rows alive ensures isotropic gradient flow into the down-projection, stabilizing its column leverage without any direct intervention.<\/p>\n<p>However, U-NorMuon still has a problem: it forcefully overrides the polar factor with uniform row norms, sacrificing polar factor precision, which is both theoretically undesirable and empirically costly in the Muon framework (the paper shows that Muon achieves monotonically lower loss with more precise orthogonalization). This is the motivation for Aurora.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Aurora: Steepest Descent Under Two Joint Constraints<\/strong><\/h2>\n<p>Aurora reformulates the update-selection problem from scratch. Rather than running orthogonalization and then patching it with row normalization, Aurora asks: what is the optimal update under the <strong>joint<\/strong> constraint of left semi-orthogonality and uniform row norms?<\/p>\n<p>Formally, for tall matrices, Aurora solves:<\/p>\n<p> <strong>U\u2217=argUmax\u200bTr(G\u22a4U)s.t.U\u22a4U=In\u200b,\u2225Ui:\u200b\u22252=mn\u200b\u2200iU  \u2217  =arg  U max \u200b  Tr(G  \u22a4  U)s.t.U  \u22a4  U=I  n \u200b  ,\u2225U  i: \u200b  \u2225  2  =  m n \u200b  \u2200i<\/strong><\/p>\n<p>The research shows that these two constraints together force all singular values of U to exactly equal 1. This means the joint constraint still produces a valid left semi-orthogonal update, not a compromised one. This is the key insight that separates Aurora from NorMuon and U-NorMuon: it achieves row-norm uniformity and orthogonality simultaneously rather than trading one off against the other.<\/p>\n<p>The research also provides two algorithmic implementations of Aurora\u2019s solution. The <strong>Riemannian Aurora<\/strong> uses a gradient projection approach restricted to the joint Stiefel\/equal-row-leverage manifold. The <strong>vanilla Aurora<\/strong> is a simpler, more practical implementation. Both are open-sourced. For non-tall (wide and square) matrices, row-norm uniformity is already implied by orthogonality, so Aurora leaves those parameters unchanged.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Results<\/strong><\/h2>\n<p>Aurora was used to train a 1.1B model that achieves 100x data efficiency on open-source internet data and outperforms larger models on general evals like HellaSwag. At 1B scale, Aurora achieves large gains over both Muon and NorMuon. On the modded-nanoGPT optimization speedrun, Aurora\u2019s submitted run outperforms the prior state-of-the-art (which was NorMuon). Untuned Aurora carries only a 6% compute overhead over traditional Muon and is designed as a drop-in replacement.<\/p>\n<p>The research team also found that Aurora\u2019s performance gains scale with MLP width, suggesting it is particularly effective for networks with large MLP expansion factors \u2014 which is consistent with the neuron death hypothesis, since wider MLPs have more tall matrices and more opportunity for leverage anisotropy to compound.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Muon\u2019s polar factor update inherits row-norm anisotropy on tall matrices, causing over 25% of MLP neurons to permanently die as early as step 500 of training.<\/li>\n<li>Aurora solves this by finding the optimal update under a joint constraint of left semi-orthogonality and uniform row norms \u2014 achieving both simultaneously rather than trading one off against the other.<\/li>\n<li>At 1.1B scale, Aurora achieves 100x data efficiency on open-source internet data, outperforms larger models on HellaSwag, and sets a new SoTA on the modded-nanoGPT speedrun.<\/li>\n<li>Aurora is a near-drop-in replacement for Muon with only 6% compute overhead, and its gains scale with MLP width.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/blog.tilderesearch.com\/blog\/aurora\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a> <\/strong>and<strong> <a href=\"https:\/\/github.com\/tilde-research\/aurora-release\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Repo<\/a>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/12\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\">Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Researchers at Tilde Research have released Aurora, a new optimizer for training neural networks that addresses a structural flaw in the widely-used Muon optimizer. The flaw quietly kills off a significant fraction of MLP neurons during training and keeps them permanently dead. Aurora comes with a 1.1B parameter pretraining experiment, a new state-of-the-art result on the modded-nanoGPT speedrun benchmark, and open codes. What is Muon? To understand Aurora, it helps to first understand Muon. The Muon optimizer attracted attention in the ML community after outperforming AdamW in wall-clock time to convergence on the nanoGPT speedrun competition \u2014 a community benchmark that measures how fast you can train a GPT-style model to a target validation loss. Since then, Muon has been adopted in frontier-scale model training by several research groups. Muon\u2019s key algorithmic step is computing the polar factor of the gradient matrix. For a gradient matrix G with thin Singular Value Decomposition (SVD) G = U\u03a3V\u1d40, Muon computes polar(G) = UV\u1d40, which is the closest semi-orthogonal matrix to G in the Frobenius norm. This orthogonalized gradient is then used to update the weights: W \u2190 W \u2212 \u03b7 UV\u1d40 for a learning rate \u03b7. The use of matmul-only iterative algorithms to compute the polar factor is what makes Muon practical at scale. The NorMuon Puzzle: Row Normalization Helps, But Why? Before Aurora, NorMuon led the modded-nanoGPT speedrun. It introduced a row-normalization step\u2014similar to Adam\u2019s per-parameter scaling\u2014that adjusted the polar factor by its inverse RMS norm. While this often pulls the update away from a strictly orthogonal gradient, NorMuon still yields impressive results. The Tilde team set out to understand exactly what gap in Muon\u2019s formulation NorMuon was addressing. The Core Problem: Row-Norm Anisotropy and Neuron Death in Tall Matrices The research team discovered that the Muon optimizer unintentionally \u201ckills\u201d a large portion of neurons in tall weight matrices, such as those found in SwiGLU-based MLP layers. Because it is mathematically impossible for these specific matrix shapes to stay perfectly orthogonal while keeping row updates even, the optimizer ends up giving massive updates to some neurons while virtually ignoring others. This results in a \u201cdeath spiral\u201d where under-performing neurons receive less signal over time, eventually becoming permanently inactive. The research study revealed that by the 500th training step, more than one in four neurons are effectively dead. This isn\u2019t just a local issue; the lack of activity in these neurons starves subsequent layers of necessary data, spreading the inefficiency throughout the model. Aurora solves this by using a new mathematical approach that enforces uniform updates across all neurons without sacrificing the benefits of orthogonalization. The Intermediate Step: U-NorMuon Before arriving at Aurora, the research introduces an intermediate fix called U-NorMuon. The key observation is that NorMuon normalizes each row to unit norm (norm = 1), but this is actually the wrong target for a tall matrix. For a column-orthogonal tall matrix, the mathematically correct average row norm is \u221a(n\/m), not 1. U-NorMuon corrects this by normalizing tall matrix rows to have norm \u221a(n\/m) instead of 1. In experiments at 340M scale, U-NorMuon outperforms both Muon and standard NorMuon and completely eliminates the neuron death phenomenon \u2014 leverage scores become approximately isotropic throughout training. Crucially, U-NorMuon propagates this benefit to layers it doesn\u2019t directly touch: keeping up\/gate rows alive ensures isotropic gradient flow into the down-projection, stabilizing its column leverage without any direct intervention. However, U-NorMuon still has a problem: it forcefully overrides the polar factor with uniform row norms, sacrificing polar factor precision, which is both theoretically undesirable and empirically costly in the Muon framework (the paper shows that Muon achieves monotonically lower loss with more precise orthogonalization). This is the motivation for Aurora. Aurora: Steepest Descent Under Two Joint Constraints Aurora reformulates the update-selection problem from scratch. Rather than running orthogonalization and then patching it with row normalization, Aurora asks: what is the optimal update under the joint constraint of left semi-orthogonality and uniform row norms? Formally, for tall matrices, Aurora solves: U\u2217=argUmax\u200bTr(G\u22a4U)s.t.U\u22a4U=In\u200b,\u2225Ui:\u200b\u22252=mn\u200b\u2200iU \u2217 =arg U max \u200b Tr(G \u22a4 U)s.t.U \u22a4 U=I n \u200b ,\u2225U i: \u200b \u2225 2 = m n \u200b \u2200i The research shows that these two constraints together force all singular values of U to exactly equal 1. This means the joint constraint still produces a valid left semi-orthogonal update, not a compromised one. This is the key insight that separates Aurora from NorMuon and U-NorMuon: it achieves row-norm uniformity and orthogonality simultaneously rather than trading one off against the other. The research also provides two algorithmic implementations of Aurora\u2019s solution. The Riemannian Aurora uses a gradient projection approach restricted to the joint Stiefel\/equal-row-leverage manifold. The vanilla Aurora is a simpler, more practical implementation. Both are open-sourced. For non-tall (wide and square) matrices, row-norm uniformity is already implied by orthogonality, so Aurora leaves those parameters unchanged. Results Aurora was used to train a 1.1B model that achieves 100x data efficiency on open-source internet data and outperforms larger models on general evals like HellaSwag. At 1B scale, Aurora achieves large gains over both Muon and NorMuon. On the modded-nanoGPT optimization speedrun, Aurora\u2019s submitted run outperforms the prior state-of-the-art (which was NorMuon). Untuned Aurora carries only a 6% compute overhead over traditional Muon and is designed as a drop-in replacement. The research team also found that Aurora\u2019s performance gains scale with MLP width, suggesting it is particularly effective for networks with large MLP expansion factors \u2014 which is consistent with the neuron death hypothesis, since wider MLPs have more tall matrices and more opportunity for leverage anisotropy to compound. Key Takeaways Muon\u2019s polar factor update inherits row-norm anisotropy on tall matrices, causing over 25% of MLP neurons to permanently die as early as step 500 of training. Aurora solves this by finding the optimal update under a joint constraint of left semi-orthogonality and uniform row norms \u2014 achieving both simultaneously rather than trading one off against the other. At 1.1B scale, Aurora achieves 100x data efficiency on open-source internet data, outperforms larger models on HellaSwag, and<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-89858","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-12T16:25:39+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon\",\"datePublished\":\"2026-05-12T16:25:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\"},\"wordCount\":1133,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\",\"url\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\",\"name\":\"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-05-12T16:25:39+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/","og_locale":"th_TH","og_type":"article","og_title":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-12T16:25:39+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"6 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon","datePublished":"2026-05-12T16:25:39+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/"},"wordCount":1133,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/","url":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/","name":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-05-12T16:25:39+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/tilde-research-introduces-aurora-a-leverage-aware-optimizer-that-fixes-a-hidden-neuron-death-problem-in-muon\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Researchers at Tilde Research have released Aurora, a new optimizer for training neural networks that addresses a structural flaw in the widely-used Muon optimizer. The flaw quietly kills off a significant fraction of MLP neurons during training and keeps them permanently dead. Aurora comes with a 1.1B parameter pretraining experiment, a new state-of-the-art result on&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/89858","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=89858"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/89858\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=89858"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=89858"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=89858"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}