{"id":18339,"date":"2025-06-12T04:05:36","date_gmt":"2025-06-12T04:05:36","guid":{"rendered":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/"},"modified":"2025-06-12T04:05:36","modified_gmt":"2025-06-12T04:05:36","slug":"cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms","status":"publish","type":"post","link":"https:\/\/youzum.net\/it\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/","title":{"rendered":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs"},"content":{"rendered":"<h3 class=\"wp-block-heading\"><strong>Introduction<\/strong><\/h3>\n<p>Large Language Models (LLMs) have shown substantial improvements in reasoning and precision through reinforcement learning (RL) and test-time scaling techniques. Despite outperforming traditional unit test generation methods, most existing approaches such as O1-Coder and UTGEN require supervision from ground-truth code. This supervision increases data collection costs and limits the scale of usable training data.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Limitations of Existing Approaches<\/strong><\/h3>\n<p><strong>Conventional unit test generation relies on:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Software analysis methods<\/strong>, which are rule-based and rigid.<\/li>\n<li><strong>Neural machine translation techniques<\/strong>, which often lack semantic alignment.<\/li>\n<\/ul>\n<p>While recent prompt-based and agentic methods improve performance, they still depend heavily on labeled code for fine-tuning. This reliance restricts adaptability and scalability, particularly in real-world, large-scale deployment scenarios.<\/p>\n<h3 class=\"wp-block-heading\"><strong>CURE: A Self-Supervised Co-Evolutionary Approach<\/strong><\/h3>\n<p>Researchers from the University of Chicago, Princeton University, Peking University, and ByteDance Seed introduce <strong>CURE<\/strong>, a self-supervised reinforcement learning framework that jointly trains a code generator and a unit test generator without any ground-truth code. <\/p>\n<p><strong>CURE operates using a self-play mechanism in which:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>The LLM generates both correct and incorrect code.<\/li>\n<li>The unit test generator learns to distinguish failure modes and refines itself accordingly.<\/li>\n<\/ul>\n<p>This bidirectional co-evolution enhances both code generation and verification without external supervision.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"731\" data-attachment-id=\"71936\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/06\/11\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/screenshot-2025-06-11-at-7-25-10-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10\u202fPM-1.png\" data-orig-size=\"1690,1206\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-06-11 at 7.25.10\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10\u202fPM-1-300x214.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10\u202fPM-1-1024x731.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10%E2%80%AFPM-1-1024x731.png\" alt=\"\" class=\"wp-image-71936\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Architecture and Methodology<\/strong><\/h3>\n<h4 class=\"wp-block-heading\"><strong>Base Models and Sampling Strategy<\/strong><\/h4>\n<p><strong>CURE is built on Qwen2.5-7B and 14B Instruct models, with Qwen3-4B used for long-chain-of-thought (CoT) variants. Each training step samples:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>16 candidate code completions.<\/li>\n<li>16 task-derived unit tests.<\/li>\n<\/ul>\n<p>Sampling is performed using vLLM with temperature 1.0 and top-p 1.0. For long-CoT models, a response-length-aware transformation penalizes lengthy outputs, improving inference-time efficiency.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Reward Function and Optimization<\/strong><\/h4>\n<p><strong>CURE introduces a mathematically grounded reward formulation to:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Maximize <strong>reward precision<\/strong>, defined as the likelihood that correct code scores higher than incorrect code across generated unit tests.<\/li>\n<li>Apply response-based reward adjustments for long responses to reduce latency.<\/li>\n<\/ul>\n<p>Optimization proceeds via policy gradient methods, jointly updating the coder and unit tester to improve their mutual performance.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"516\" data-attachment-id=\"71937\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/06\/11\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/screenshot-2025-06-11-at-7-25-30-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.30\u202fPM.png\" data-orig-size=\"1702,858\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-06-11 at 7.25.30\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.30\u202fPM-300x151.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.30\u202fPM-1024x516.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.30%E2%80%AFPM-1024x516.png\" alt=\"\" class=\"wp-image-71937\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Benchmark Datasets and Evaluation Metrics<\/strong><\/h3>\n<p><strong>CURE is evaluated on five standard coding datasets:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>LiveBench<\/li>\n<li>MBPP<\/li>\n<li>LiveCodeBench<\/li>\n<li>CodeContests<\/li>\n<li>CodeForces<\/li>\n<\/ul>\n<p><strong>Performance is measured across:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Unit test accuracy<\/li>\n<li>One-shot code generation accuracy<\/li>\n<li>Best-of-N (BoN) accuracy using 16 code and test samples.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"493\" data-attachment-id=\"71934\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/06\/11\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/screenshot-2025-06-11-at-7-24-35-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.24.35\u202fPM-1.png\" data-orig-size=\"1686,812\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-06-11 at 7.24.35\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.24.35\u202fPM-1-300x144.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.24.35\u202fPM-1-1024x493.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.24.35%E2%80%AFPM-1-1024x493.png\" alt=\"\" class=\"wp-image-71934\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Performance and Efficiency Gains<\/strong><\/h3>\n<p>The <strong>ReasonFlux-Coder<\/strong> models derived via CURE achieve:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>+37.8%<\/strong> in unit test accuracy.<\/li>\n<li><strong>+5.3%<\/strong> in one-shot code generation accuracy.<\/li>\n<li><strong>+9.0%<\/strong> in BoN accuracy.<\/li>\n<\/ul>\n<p>Notably, ReasonFlux-Coder-4B achieves <strong>64.8%<\/strong> reduction in average unit test response length\u2014substantially improving inference speed. Across all benchmarks, these models outperform traditional coding-supervised fine-tuned models (e.g., Qwen2.5-Coder-Instruct).<\/p>\n<h3 class=\"wp-block-heading\"><strong>Application to Commercial LLMs<\/strong><\/h3>\n<p>When ReasonFlux-Coder-4B is paired with <strong>GPT-series models<\/strong>:<\/p>\n<ul class=\"wp-block-list\">\n<li>GPT-4o-mini gains <strong>+5.5% BoN accuracy<\/strong>.<\/li>\n<li>GPT-4.1-mini improves by <strong>+1.8%<\/strong>.<\/li>\n<li>API costs are reduced while performance is enhanced, indicating a cost-effective solution for production-level inference pipelines.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Use as Reward Model for Label-Free Fine-Tuning<\/strong><\/h3>\n<p>CURE-trained unit test generators can be repurposed as reward models in RL training. Using ReasonFlux-Coder-4B\u2019s generated unit tests yields comparable improvements to human-labeled test supervision\u2014enabling <strong>fully label-free reinforcement learning pipelines<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Broader Applicability and Future Directions<\/strong><\/h3>\n<p>Beyond BoN, ReasonFlux-Coder models integrate seamlessly with agentic coding frameworks like:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>MPSC (Multi-Perspective Self-Consistency)<\/strong><\/li>\n<li><strong>AlphaCodium<\/strong><\/li>\n<li><strong>S<\/strong>*<\/li>\n<\/ul>\n<p>These systems benefit from CURE\u2019s ability to refine both code and tests iteratively. CURE also boosts agentic unit test generation accuracy by over <strong>25.1%<\/strong>, reinforcing its versatility.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>CURE represents a significant advancement in self-supervised learning for code generation and validation, enabling large language models to jointly evolve their coding and unit test generation capabilities without reliance on ground-truth code. By leveraging a co-evolutionary reinforcement learning framework, CURE not only enhances core performance metrics such as one-shot accuracy and Best-of-N selection but also improves inference efficiency through response-length-aware optimization. Its compatibility with existing agentic coding pipelines and ability to function as a label-free reward model make it a scalable and cost-effective solution for both training and deployment scenarios.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p><strong>Check out the\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2506.03136\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a> and <a href=\"https:\/\/github.com\/Gen-Verse\/CURE\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a><em>.<\/em><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">99k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.airesearchinsights.com\/subscribe\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/06\/11\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\">CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Introduction Large Language Models (LLMs) have shown substantial improvements in reasoning and precision through reinforcement learning (RL) and test-time scaling techniques. Despite outperforming traditional unit test generation methods, most existing approaches such as O1-Coder and UTGEN require supervision from ground-truth code. This supervision increases data collection costs and limits the scale of usable training data. Limitations of Existing Approaches Conventional unit test generation relies on: Software analysis methods, which are rule-based and rigid. Neural machine translation techniques, which often lack semantic alignment. While recent prompt-based and agentic methods improve performance, they still depend heavily on labeled code for fine-tuning. This reliance restricts adaptability and scalability, particularly in real-world, large-scale deployment scenarios. CURE: A Self-Supervised Co-Evolutionary Approach Researchers from the University of Chicago, Princeton University, Peking University, and ByteDance Seed introduce CURE, a self-supervised reinforcement learning framework that jointly trains a code generator and a unit test generator without any ground-truth code. CURE operates using a self-play mechanism in which: The LLM generates both correct and incorrect code. The unit test generator learns to distinguish failure modes and refines itself accordingly. This bidirectional co-evolution enhances both code generation and verification without external supervision. Architecture and Methodology Base Models and Sampling Strategy CURE is built on Qwen2.5-7B and 14B Instruct models, with Qwen3-4B used for long-chain-of-thought (CoT) variants. Each training step samples: 16 candidate code completions. 16 task-derived unit tests. Sampling is performed using vLLM with temperature 1.0 and top-p 1.0. For long-CoT models, a response-length-aware transformation penalizes lengthy outputs, improving inference-time efficiency. Reward Function and Optimization CURE introduces a mathematically grounded reward formulation to: Maximize reward precision, defined as the likelihood that correct code scores higher than incorrect code across generated unit tests. Apply response-based reward adjustments for long responses to reduce latency. Optimization proceeds via policy gradient methods, jointly updating the coder and unit tester to improve their mutual performance. Benchmark Datasets and Evaluation Metrics CURE is evaluated on five standard coding datasets: LiveBench MBPP LiveCodeBench CodeContests CodeForces Performance is measured across: Unit test accuracy One-shot code generation accuracy Best-of-N (BoN) accuracy using 16 code and test samples. Performance and Efficiency Gains The ReasonFlux-Coder models derived via CURE achieve: +37.8% in unit test accuracy. +5.3% in one-shot code generation accuracy. +9.0% in BoN accuracy. Notably, ReasonFlux-Coder-4B achieves 64.8% reduction in average unit test response length\u2014substantially improving inference speed. Across all benchmarks, these models outperform traditional coding-supervised fine-tuned models (e.g., Qwen2.5-Coder-Instruct). Application to Commercial LLMs When ReasonFlux-Coder-4B is paired with GPT-series models: GPT-4o-mini gains +5.5% BoN accuracy. GPT-4.1-mini improves by +1.8%. API costs are reduced while performance is enhanced, indicating a cost-effective solution for production-level inference pipelines. Use as Reward Model for Label-Free Fine-Tuning CURE-trained unit test generators can be repurposed as reward models in RL training. Using ReasonFlux-Coder-4B\u2019s generated unit tests yields comparable improvements to human-labeled test supervision\u2014enabling fully label-free reinforcement learning pipelines. Broader Applicability and Future Directions Beyond BoN, ReasonFlux-Coder models integrate seamlessly with agentic coding frameworks like: MPSC (Multi-Perspective Self-Consistency) AlphaCodium S* These systems benefit from CURE\u2019s ability to refine both code and tests iteratively. CURE also boosts agentic unit test generation accuracy by over 25.1%, reinforcing its versatility. Conclusion CURE represents a significant advancement in self-supervised learning for code generation and validation, enabling large language models to jointly evolve their coding and unit test generation capabilities without reliance on ground-truth code. By leveraging a co-evolutionary reinforcement learning framework, CURE not only enhances core performance metrics such as one-shot accuracy and Best-of-N selection but also improves inference efficiency through response-length-aware optimization. Its compatibility with existing agentic coding pipelines and ability to function as a label-free reward model make it a scalable and cost-effective solution for both training and deployment scenarios. Check out the\u00a0Paper and GitHub Page.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a099k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":18340,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-18339","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/it\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/it\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-06-12T04:05:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"731\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs\",\"datePublished\":\"2025-06-12T04:05:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\"},\"wordCount\":692,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\",\"url\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\",\"name\":\"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png\",\"datePublished\":\"2025-06-12T04:05:36+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png\",\"width\":1024,\"height\":731},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/it\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/it\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/","og_locale":"it_IT","og_type":"article","og_title":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/it\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-06-12T04:05:36+00:00","og_image":[{"width":1024,"height":731,"url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png","type":"image\/png"}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Scritto da":"admin NU","Tempo di lettura stimato":"3 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs","datePublished":"2025-06-12T04:05:36+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/"},"wordCount":692,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"it-IT","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/","url":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/","name":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png","datePublished":"2025-06-12T04:05:36+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/"]}]},{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png","width":1024,"height":731},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/cure-a-reinforcement-learning-framework-for-co-evolving-code-and-unit-test-generation-in-llms\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/it\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-300x214.png",300,214,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD.png",1024,731,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-600x428.png",600,428,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-11-at-7.25.10E280AFPM-1-1024x731-fMB3bD-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/it\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/it\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Introduction Large Language Models (LLMs) have shown substantial improvements in reasoning and precision through reinforcement learning (RL) and test-time scaling techniques. Despite outperforming traditional unit test generation methods, most existing approaches such as O1-Coder and UTGEN require supervision from ground-truth code. This supervision increases data collection costs and limits the scale of usable training data.&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/18339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/comments?post=18339"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/18339\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media\/18340"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media?parent=18339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/categories?post=18339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/tags?post=18339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}