{"id":11696,"date":"2025-05-10T02:07:24","date_gmt":"2025-05-10T02:07:24","guid":{"rendered":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/"},"modified":"2025-05-10T02:07:24","modified_gmt":"2025-05-10T02:07:24","slug":"ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/","title":{"rendered":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data"},"content":{"rendered":"<p>LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes increasingly unsustainable, similar to bottlenecks identified in LLM pretraining. Moreover, exclusive dependency on human-designed tasks may constrain AI systems\u2019 capacity for autonomous learning and development, especially as they evolve beyond human intellectual capabilities.<\/p>\n<p>Researchers have explored various approaches to enhance LLM reasoning capabilities. STaR pioneered self-bootstrapping using expert iteration and rejection sampling of outcome-verified responses to improve CoT reasoning. The o1 model deployed this concept at scale, achieving state-of-the-art results, and R1 later became the first open-weight model to match or surpass o1\u2019s performance by introducing the \u201czero\u201d setting where RL is applied directly to the base LLM. Further, self-play paradigms have evolved from Schmidhuber\u2019s early two-agent setups to more complex implementations like AlphaGo and AlphaZero. Recent methods such as SPIN, Self-Rewarding Language Models, SPC, and SPAG have applied self-play to language models for alignment and reasoning.<\/p>\n<p>Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to enable a single model to autonomously generate and solve tasks that maximize its own learning progress without relying on any external data. Under this method, researchers have introduced the Absolute Zero Reasoner (AZR) that self-evolves its training curriculum and reasoning ability through a code executor that validates proposed code reasoning tasks and verifies answers, providing a unified source of verifiable reward to guide open-ended yet grounded learning. AZR can be effectively implemented across different model scales and remains compatible with various model classes, suggesting broad applicability.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA?key=geNQX5JKHIVaMERpEGMYZA\" alt=\"\"\/><\/figure>\n<p>LLMs provide an ideal framework for implementing AZR in multitask learning contexts. During each online rollout iteration in the absolute zero setting\u2019s objective equation, AZR proposes new reasoning tasks based on task type and past self-generated examples, with explicit prompting to generate diverse tasks and then attempts to solve them, receiving grounded feedback for its model responses. AZR utilizes a code executor as both a flexible interface and verifiable environment, enabling automatic construction, execution, and validation of code reasoning tasks. Lastly, the AZR Algorithm includes buffer initialization, Task Proposal Inputs and Buffer Management, valid task construction, solution validation, and advantage estimator calculation through Task-Relative REINFORCE++.<\/p>\n<p>The Absolute Zero Reasoner-Coder-7B has achieved state-of-the-art performance in the 7B overall average and coding average categories, surpassing previous best models by 1.8 absolute percentage points despite being entirely out-of-distribution for both math and code reasoning benchmarks. It outperforms models trained with expert-curated human data in coding by 0.3 absolute percentage points while never accessing such data itself. Scaling analysis reveals that AZR delivers greater gains on larger models, with the 7B and 14B models continuing to improve beyond 200 training steps while the 3B model plateaus. Out-of-distribution performance gains increase with model size: +5.7, +10.2, and +13.2 for 3B, 7B, and 14B, respectively.<\/p>\n<p>In conclusion, researchers introduced the Absolute Zero paradigm to address data limitations in existing RLVR frameworks. Under this method, researchers present AZR, which trains models to propose and solve code-related reasoning tasks grounded by a code executor. However, there is a limitation regarding safety management in self-improving systems. The team observed several instances of safety-concerning CoT reasoning from the Llama-3.1-8B model, termed \u201cuh-oh moments.\u201d The findings indicate that while the Absolute Zero paradigm reduces human intervention needs in task curation, ongoing oversight remains necessary to address lingering safety concerns, highlighting a critical direction for future research.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<p>Check out the<a href=\"https:\/\/arxiv.org\/pdf\/2505.02471\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><strong><a href=\"https:\/\/arxiv.org\/abs\/2505.03335\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/huggingface.co\/collections\/andrewzh\/absolute-zero-reasoner-68139b2bca82afb00bc69e5b\" target=\"_blank\" rel=\"noreferrer noopener\">Model on Hugging Face<\/a> and <a href=\"https:\/\/github.com\/LeapLabTHU\/Absolute-Zero-Reasoner?tab=readme-ov-file\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a>.<\/strong> Also,\u00a0don\u2019t forget to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>.<\/p>\n<p><strong>Here\u2019s a brief overview of what we\u2019re building at Marktechpost:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>ML News Community \u2013<a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0r\/machinelearningnews<\/a>\u00a0(92k+ members)<\/strong><\/li>\n<li><strong>Newsletter\u2013\u00a0<a href=\"https:\/\/minicon.marktechpost.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">airesearchinsights.com\/<\/a>(30k+ subscribers)<\/strong><\/li>\n<li><strong>miniCON AI Events \u2013\u00a0<a href=\"https:\/\/minicon.marktechpost.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">minicon.marktechpost.com<\/a><\/strong><\/li>\n<li><strong>AI Reports &amp; Magazines \u2013\u00a0<a href=\"https:\/\/magazine.marktechpost.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">magazine.marktechpost.com<\/a><\/strong><\/li>\n<li><strong>AI Dev &amp; Research News \u2013\u00a0<a href=\"https:\/\/marktechpost.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">marktechpost.com<\/a>\u00a0(1M+ monthly readers)<\/strong><\/li>\n<li><strong><a href=\"https:\/\/forms.gle\/cnXafrh6Be8UigQ68\" target=\"_blank\" rel=\"noreferrer noopener\">Partner with us<\/a><\/strong><\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/05\/09\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\">AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes increasingly unsustainable, similar to bottlenecks identified in LLM pretraining. Moreover, exclusive dependency on human-designed tasks may constrain AI systems\u2019 capacity for autonomous learning and development, especially as they evolve beyond human intellectual capabilities. Researchers have explored various approaches to enhance LLM reasoning capabilities. STaR pioneered self-bootstrapping using expert iteration and rejection sampling of outcome-verified responses to improve CoT reasoning. The o1 model deployed this concept at scale, achieving state-of-the-art results, and R1 later became the first open-weight model to match or surpass o1\u2019s performance by introducing the \u201czero\u201d setting where RL is applied directly to the base LLM. Further, self-play paradigms have evolved from Schmidhuber\u2019s early two-agent setups to more complex implementations like AlphaGo and AlphaZero. Recent methods such as SPIN, Self-Rewarding Language Models, SPC, and SPAG have applied self-play to language models for alignment and reasoning. Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to enable a single model to autonomously generate and solve tasks that maximize its own learning progress without relying on any external data. Under this method, researchers have introduced the Absolute Zero Reasoner (AZR) that self-evolves its training curriculum and reasoning ability through a code executor that validates proposed code reasoning tasks and verifies answers, providing a unified source of verifiable reward to guide open-ended yet grounded learning. AZR can be effectively implemented across different model scales and remains compatible with various model classes, suggesting broad applicability. LLMs provide an ideal framework for implementing AZR in multitask learning contexts. During each online rollout iteration in the absolute zero setting\u2019s objective equation, AZR proposes new reasoning tasks based on task type and past self-generated examples, with explicit prompting to generate diverse tasks and then attempts to solve them, receiving grounded feedback for its model responses. AZR utilizes a code executor as both a flexible interface and verifiable environment, enabling automatic construction, execution, and validation of code reasoning tasks. Lastly, the AZR Algorithm includes buffer initialization, Task Proposal Inputs and Buffer Management, valid task construction, solution validation, and advantage estimator calculation through Task-Relative REINFORCE++. The Absolute Zero Reasoner-Coder-7B has achieved state-of-the-art performance in the 7B overall average and coding average categories, surpassing previous best models by 1.8 absolute percentage points despite being entirely out-of-distribution for both math and code reasoning benchmarks. It outperforms models trained with expert-curated human data in coding by 0.3 absolute percentage points while never accessing such data itself. Scaling analysis reveals that AZR delivers greater gains on larger models, with the 7B and 14B models continuing to improve beyond 200 training steps while the 3B model plateaus. Out-of-distribution performance gains increase with model size: +5.7, +10.2, and +13.2 for 3B, 7B, and 14B, respectively. In conclusion, researchers introduced the Absolute Zero paradigm to address data limitations in existing RLVR frameworks. Under this method, researchers present AZR, which trains models to propose and solve code-related reasoning tasks grounded by a code executor. However, there is a limitation regarding safety management in self-improving systems. The team observed several instances of safety-concerning CoT reasoning from the Llama-3.1-8B model, termed \u201cuh-oh moments.\u201d The findings indicate that while the Absolute Zero paradigm reduces human intervention needs in task curation, ongoing oversight remains necessary to address lingering safety concerns, highlighting a critical direction for future research. Check out the Paper, Model on Hugging Face and GitHub Page. Also,\u00a0don\u2019t forget to follow us on\u00a0Twitter. Here\u2019s a brief overview of what we\u2019re building at Marktechpost: ML News Community \u2013\u00a0r\/machinelearningnews\u00a0(92k+ members) Newsletter\u2013\u00a0airesearchinsights.com\/(30k+ subscribers) miniCON AI Events \u2013\u00a0minicon.marktechpost.com AI Reports &amp; Magazines \u2013\u00a0magazine.marktechpost.com AI Dev &amp; Research News \u2013\u00a0marktechpost.com\u00a0(1M+ monthly readers) Partner with us The post AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":11697,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-11696","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-10T02:07:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1364\" \/>\n\t<meta property=\"og:image:height\" content=\"716\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data\",\"datePublished\":\"2025-05-10T02:07:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\"},\"wordCount\":712,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\",\"url\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\",\"name\":\"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png\",\"datePublished\":\"2025-05-10T02:07:24+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png\",\"width\":1364,\"height\":716},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/","og_locale":"zh_CN","og_type":"article","og_title":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-05-10T02:07:24+00:00","og_image":[{"width":1364,"height":716,"url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png","type":"image\/png"}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"3 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data","datePublished":"2025-05-10T02:07:24+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/"},"wordCount":712,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/","url":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/","name":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png","datePublished":"2025-05-10T02:07:24+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png","width":1364,"height":716},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/ai-that-teaches-itself-tsinghua-universitys-absolute-zero-trains-llms-with-zero-external-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"AI That Teaches Itself: Tsinghua University\u2019s \u2018Absolute Zero\u2019 Trains LLMs With Zero External Data"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png",1364,716,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png",1364,716,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png",1364,716,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-300x157.png",300,157,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-1024x538.png",1024,538,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png",1364,716,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt.png",1364,716,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-18x9.png",18,9,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-600x315.png",600,315,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/AD_4nXfSNlhBQLOPYW_dUkvwpeyU0VIwSBVmbq-xSmDj_gB-P6z2xxGZAdiR4yhQi6sf9mf0kCnScCiLb3bD5PKlAad88WAaHiJN4KP0kpEICDw6ZP0ztm9Za8mKIlztEoYz3RMPn2OcbA-R5j8nt-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/11696","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=11696"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/11696\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media\/11697"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=11696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=11696"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=11696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}