{"id":99880,"date":"2026-06-25T18:29:21","date_gmt":"2026-06-25T18:29:21","guid":{"rendered":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/"},"modified":"2026-06-25T18:29:21","modified_gmt":"2026-06-25T18:29:21","slug":"deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/","title":{"rendered":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds"},"content":{"rendered":"<p class=\"wp-block-paragraph\">DeepReinforce has released <strong><a href=\"https:\/\/huggingface.co\/collections\/deepreinforce-ai\/ornith-10\" target=\"_blank\" rel=\"noreferrer noopener\">Ornith-1.0<\/a><\/strong>, an open-source model family built for agentic coding. The lineup spans four sizes, from a 9B dense model to a 397B mixture-of-experts flagship. Every checkpoint ships under the MIT license on Hugging Face. The models are post-trained on top of pretrained Gemma 4 and Qwen 3.5.<\/p>\n<p class=\"wp-block-paragraph\">Most coding agents pair a model with a fixed, human-designed harness. Ornith-1.0 instead learns to write its own. The DeepReinforce research team reports state-of-the-art results among open models of comparable size.<\/p>\n<h2 class=\"wp-block-heading\"><strong>TL;DR<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.<\/li>\n<li>The model learns its own scaffold during RL, jointly optimizing the harness and the solution.<\/li>\n<li>Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B.<\/li>\n<li>Three layers \u2014 fixed trust boundary, deterministic monitor, frozen LLM judge \u2014 guard against reward hacking.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>What is Ornith-1.0?<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Ornith-1.0 is a set of reasoning models tuned for coding agents. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is mixture-of-experts and activates roughly 3B parameters per token. FP8 and GGUF builds are also published for faster local serving.<\/p>\n<p class=\"wp-block-paragraph\">Each model is a reasoning model. Replies open with a <code>&lt;think&gt;<\/code> block before the final answer. The serving recipes enable a reasoning parser, so that trace returns in a separate <code>reasoning_content<\/code> field. The models also emit well-formed tool calls for agent loops.<\/p>\n<p class=\"wp-block-paragraph\">Deployment is straightforward. The 9B model is about 19GB in bf16 and serves on a single 80GB GPU. Serving recipes target vLLM, SGLang, and Transformers. Each model exposes an OpenAI-compatible endpoint. Standard agent frameworks therefore work without code changes.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Interactive Explainer<\/strong><\/h2>\n<p><!-- Ornith-1.0 Interactive Explainer \u2014 Marktechpost. Paste into a WordPress Custom HTML block. --><br \/>\n&lt;\/button&gt;<br \/>\n&lt;button class=&#8221;btn gho&#8221; id=&#8221;resetBtn&#8221;&gt;Reset&lt;\/button&gt;<br \/>\n&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;stepout&rdquo;&quot; id=&quot;&rdquo;stepOut&rdquo;&quot;&gt;Step 0 &mdash; untrained policy with a fixed, hand-written harness.&lt;\/div&gt;<br \/>\n&lt;\/div&gt;<\/p>\n<p>&lt;!&#8211; PANEL 2: BENCH &#8211;&gt;<br \/>\n&lt;div class=&quot;&rdquo;panel&rdquo;&quot; data-panel=&quot;&rdquo;bench&rdquo;&quot;&gt;<br \/>\n&lt;div class=&quot;&rdquo;lead&rdquo;&quot;&gt;Vendor-reported scores from DeepReinforce. Pick a model tier and a benchmark. Ornith is highlighted in green. Higher is better.&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;seg&rdquo;&quot;&gt;&lt;span class=&quot;&rdquo;lab&rdquo;&quot;&gt;Model tier&lt;\/span&gt;<br \/>\n&lt;div class=&quot;&rdquo;chip&quot; on&rdquo; data-tier=&quot;&rdquo;t397&Prime;&quot;&gt;397B flagship&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;chip&rdquo;&quot; data-tier=&quot;&rdquo;t35&Prime;&quot;&gt;35B MoE&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;chip&rdquo;&quot; data-tier=&quot;&rdquo;t9&Prime;&quot;&gt;9B dense&lt;\/div&gt;<br \/>\n&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;seg&rdquo;&quot; id=&quot;&rdquo;benchChips&rdquo;&quot;&gt;&lt;span class=&quot;&rdquo;lab&rdquo;&quot;&gt;Benchmark&lt;\/span&gt;&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;chart&rdquo;&quot; id=&quot;&rdquo;chart&rdquo;&quot;&gt;&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;foot-note&rdquo;&quot; id=&quot;&rdquo;benchNote&rdquo;&quot;&gt;&lt;\/div&gt;<br \/>\n&lt;\/div&gt;<\/p>\n<p>&lt;!&#8211; PANEL 3: DEFENSES &#8211;&gt;<br \/>\n&lt;div class=&quot;&rdquo;panel&rdquo;&quot; data-panel=&quot;&rdquo;def&rdquo;&quot;&gt;<br \/>\n&lt;div class=&quot;&rdquo;lead&rdquo;&quot;&gt;A model that writes its own scaffold could cheat the verifier. DeepReinforce describes three defense layers. Tap each to expand.&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;layers&rdquo;&quot;&gt;<br \/>\n&lt;div class=&quot;&rdquo;layer&quot; open&rdquo;&gt;&lt;div class=&quot;&rdquo;lh&rdquo;&quot;&gt;&lt;span class=&quot;&rdquo;num&rdquo;&quot;&gt;1&lt;\/span&gt;&lt;span class=&quot;&rdquo;lt&rdquo;&quot;&gt;Fixed trust boundary&lt;\/span&gt;&lt;span class=&quot;&rdquo;more&rdquo;&quot;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&quot;&rdquo;lb&rdquo;&quot;&gt;The environment, tool surface, and test isolation are immutable and outside the model&rsquo;s reach. The model evolves only its inner policy scaffold &mdash; memory, error-handling, and orchestration logic.&lt;\/div&gt;&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;layer&rdquo;&quot;&gt;&lt;div class=&quot;&rdquo;lh&rdquo;&quot;&gt;&lt;span class=&quot;&rdquo;num&rdquo;&quot;&gt;2&lt;\/span&gt;&lt;span class=&quot;&rdquo;lt&rdquo;&quot;&gt;Deterministic monitor&lt;\/span&gt;&lt;span class=&quot;&rdquo;more&rdquo;&quot;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&quot;&rdquo;lb&rdquo;&quot;&gt;A rule-based monitor flags any attempt to read withheld paths, modify verification scripts, or invoke unsanctioned tools. Such trajectories get zero reward and are excluded from the advantage computation.&lt;\/div&gt;&lt;\/div&gt;<br \/>\n&lt;div class=&quot;&rdquo;layer&rdquo;&quot;&gt;&lt;div class=&quot;&rdquo;lh&rdquo;&quot;&gt;&lt;span class=&quot;&rdquo;num&rdquo;&quot;&gt;3&lt;\/span&gt;&lt;span class=&quot;&rdquo;lt&rdquo;&quot;&gt;Frozen LLM judge&lt;\/span&gt;&lt;span class=&quot;&rdquo;more&rdquo;&quot;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&quot;&rdquo;lb&rdquo;&quot;&gt;Because intent-level gaming can happen inside the allowed tool surface, a frozen LLM judge acts as a veto on top of the verifier &mdash; not as the primary reward signal.&lt;\/div&gt;&lt;\/div&gt;<br \/>\n&lt;\/div&gt;<br \/>\n&lt;\/div&gt;<\/p>\n<p>&lt;div class=&quot;&rdquo;ftr&rdquo;&quot;&gt;&lt;span&gt;Source: &lt;a href=&quot;\/de\/&rdquo;https:\/\/deep-reinforce.com\/ornith_1_0.html&rdquo;\/&quot; target=&quot;&rdquo;_blank&rdquo;&quot; rel=&quot;&rdquo;noopener&rdquo;&quot;&gt;deep-reinforce.com&lt;\/a&gt; &middot; MIT licensed &middot; numbers vendor-reported&lt;\/span&gt;&lt;span&gt;&lt;b&gt;Marktechpost&lt;\/b&gt; &middot; AI Dev Signals&lt;\/span&gt;&lt;\/div&gt;<\/p>\n<p>&lt;script&gt;<br \/>\n(function(){<br \/>\nvar root=document.getElementById(&#8216;mtp-ornith-demo&#8217;);<br \/>\n\/* tabs *\/<br \/>\nroot.querySelectorAll(&#8216;.tab&#8217;).forEach(function(t){<br \/>\nt.addEventListener(&#8216;click&#8217;,function(){<br \/>\nroot.querySelectorAll(&#8216;.tab&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});<br \/>\nroot.querySelectorAll(&#8216;.panel&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});<br \/>\nt.classList.add(&#8216;on&#8217;);<br \/>\nroot.querySelector(&#8216;.panel[data-panel=&#8221;&#8216;+t.dataset.p+'&#8221;]&#8217;).classList.add(&#8216;on&#8217;);<br \/>\nresize();<br \/>\n});<br \/>\n});<br \/>\n\/* loop sim *\/<br \/>\nvar step=0,reward=0.08,timer=null;<br \/>\nvar scaffs=[<br \/>\n&#8216;Baseline harness: linear retries, no memory.&#8217;,<br \/>\n&#8216;Adds scratchpad memory across tool calls.&#8217;,<br \/>\n&#8216;Adds error-triage branch before re-edit.&#8217;,<br \/>\n&#8216;Reorders: read tests, then plan, then patch.&#8217;,<br \/>\n&#8216;Caches sub-results; prunes dead branches.&#8217;,<br \/>\n&#8216;Task-specific orchestration emerges automatically.&#8217;];<br \/>\nvar outs=[<br \/>\n&#8216;Fixed harness, no learning yet.&#8217;,<br \/>\n&#8216;Fewer redundant file reads observed.&#8217;,<br \/>\n&#8216;Recovers from failed edits more often.&#8217;,<br \/>\n&#8216;Higher first-pass test success.&#8217;,<br \/>\n&#8216;Shorter trajectories, same accuracy.&#8217;,<br \/>\n&#8216;Stable high-reward scaffold selected.&#8217;];<br \/>\nvar nodes=root.querySelectorAll(&#8216;.node&#8217;);<br \/>\nfunction lightSeq(cb){<br \/>\nvar i=0;nodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)});<br \/>\nvar iv=setInterval(function(){<br \/>\nnodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)});<br \/>\nnodes[i].classList.add(&#8216;act&#8217;);i++;<br \/>\nif(i&gt;=nodes.length){clearInterval(iv);setTimeout(function(){nodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)});cb&amp;&amp;cb();},260);}<br \/>\n},220);<br \/>\n}<br \/>\nfunction doStep(){<br \/>\nif(step&gt;=5){return;}<br \/>\nstep++;<br \/>\nlightSeq(function(){<br \/>\nreward=[0.08,0.27,0.43,0.58,0.69,0.77][step];<br \/>\nroot.querySelector(&#8216;#rFill&#8217;).style.width=(reward*100)+&#8217;%&#8217;;<br \/>\nroot.querySelector(&#8216;#rVal&#8217;).textContent=reward.toFixed(2);<br \/>\nroot.querySelector(&#8216;#scaffTxt&#8217;).textContent=scaffs[step];<br \/>\nroot.querySelector(&#8216;#outTxt&#8217;).textContent=outs[step];<br \/>\nroot.querySelector(&lsquo;#stepOut&rsquo;).innerHTML=&rsquo;Step &lsquo;+step+&rsquo; &mdash; &lt;b&gt;scaffold mutated&lt;\/b&gt;; reward propagated to both stages.&rsquo;;<br \/>\nresize();<br \/>\n});<br \/>\n}<br \/>\nroot.querySelector(&#8216;#stepBtn&#8217;).addEventListener(&#8216;click&#8217;,doStep);<br \/>\nroot.querySelector(&#8216;#autoBtn&#8217;).addEventListener(&#8216;click&#8217;,function(){<br \/>\nif(timer){clearInterval(timer);timer=null;this.textContent=&#8217;Auto-run <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\" alt=\"\u25b6\" class=\"wp-smiley\" \/>&#8216;;return;}<br \/>\nthis.textContent=&#8217;Pause <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/23f8.png\" alt=\"\u23f8\" class=\"wp-smiley\" \/>&#8216;;var b=this;<br \/>\ntimer=setInterval(function(){if(step&gt;=5){clearInterval(timer);timer=null;b.textContent=&#8217;Auto-run <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\" alt=\"\u25b6\" class=\"wp-smiley\" \/>&#8216;;}else{doStep();}},1400);<br \/>\n});<br \/>\nroot.querySelector(&#8216;#resetBtn&#8217;).addEventListener(&#8216;click&#8217;,function(){<br \/>\nif(timer){clearInterval(timer);timer=null;root.querySelector(&#8216;#autoBtn&#8217;).textContent=&#8217;Auto-run <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\" alt=\"\u25b6\" class=\"wp-smiley\" \/>&#8216;;}<br \/>\nstep=0;reward=0.08;<br \/>\nroot.querySelector(&#8216;#rFill&#8217;).style.width=&#8217;8%&#8217;;<br \/>\nroot.querySelector(&#8216;#rVal&#8217;).textContent=&#8217;0.08&#8242;;<br \/>\nroot.querySelector(&#8216;#scaffTxt&#8217;).textContent=scaffs[0];<br \/>\nroot.querySelector(&#8216;#outTxt&#8217;).textContent=&#8217;Press \u201cRun training step\u201d to begin.&#8217;;<br \/>\nroot.querySelector(&#8216;#stepOut&#8217;).innerHTML=&#8217;Step 0 \u2014 untrained policy with a fixed, hand-written harness.&#8217;;<br \/>\nresize();<br \/>\n});<\/p>\n<p>\/* benchmark data (vendor-reported) *\/<br \/>\nvar BENCHES=[&#8216;Terminal-Bench 2.1&#8242;,&#8217;SWE-Bench Verified&#8217;,&#8217;SWE-Bench Pro&#8217;,&#8217;SWE-Bench Multilingual&#8217;,&#8217;NL2Repo&#8217;,&#8217;ClawEval Avg&#8217;];<br \/>\nvar DATA={<br \/>\nt397:{label:&#8217;Ornith-1.0-397B&#8217;,hero:&#8217;Ornith-1.0-397B&#8217;,<br \/>\nmodels:[&#8216;Ornith-1.0-397B&#8217;,&#8217;Qwen3.5-397B&#8217;,&#8217;Qwen3.7-Max&#8217;,&#8217;GLM-5.2-744B&#8217;,&#8217;Minimax-M3-428B&#8217;,&#8217;DeepSeek-V4-Pro-1.6T&#8217;,&#8217;Claude Opus 4.7&#8242;,&#8217;Claude Opus 4.8&#8242;],<br \/>\nvals:[[77.5,53.5,73.5,81.0,64,64,70.3,85],[82.4,76.4,80.4,null,null,80.6,80.8,87.6],[62.2,51.6,60.6,62.1,59,55.4,64.3,69.2],[78.9,69.3,78.3,null,null,76.2,null,null],[48.2,36.8,47.2,48.9,42.1,null,null,69.7],[77.1,70.7,65.2,null,null,75.8,78.2,null]]},<br \/>\nt35:{label:&#8217;Ornith-1.0-35B-A3B&#8217;,hero:&#8217;Ornith-1.0-35B-A3B&#8217;,<br \/>\nmodels:[&#8216;Ornith-1.0-35B-A3B&#8217;,&#8217;Qwen3.5-35B-A3B&#8217;,&#8217;Qwen3.6-35B-A3B&#8217;,&#8217;Gemma4-31B&#8217;,&#8217;Qwen3.5-397B&#8217;],<br \/>\nvals:[[64.2,41.4,52.5,42.1,53.5],[75.6,70,73.4,52,76.4],[50.4,44.6,49.5,35.7,51.6],[69.3,60.3,67.2,51.7,69.3],[34.6,20.5,29.4,15.5,36.8],[69.8,65.4,68.7,48.5,70.7]]},<br \/>\nt9:{label:&#8217;Ornith-1.0-9B&#8217;,hero:&#8217;Ornith-1.0-9B&#8217;,<br \/>\nmodels:[&#8216;Ornith-1.0-9B&#8217;,&#8217;Qwen3.5-9B&#8217;,&#8217;Qwen3.5-35B-A3B&#8217;,&#8217;Gemma4-12B&#8217;,&#8217;Gemma4-31B&#8217;],<br \/>\nvals:[[43.1,21.3,41.4,21,42.1],[69.4,53.2,70,44.2,52],[42.9,31.3,44.6,27.6,35.7],[52,39.7,60.3,32.5,51.7],[27.2,16.2,20.5,10.3,15.5],[63.1,53.2,65.4,32.5,48.5]]}<br \/>\n};<br \/>\nvar curTier=&#8217;t397&#8242;,curB=0;<br \/>\nvar bchips=root.querySelector(&#8216;#benchChips&#8217;);<br \/>\nBENCHES.forEach(function(b,i){<br \/>\nvar c=document.createElement(&#8216;div&#8217;);c.className=&#8217;chip&#8217;+(i===0?&#8217; on&#8217;:&#8221;);c.textContent=b;c.dataset.b=i;<br \/>\nc.addEventListener(&#8216;click&#8217;,function(){curB=i;bchips.querySelectorAll(&#8216;.chip&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});c.classList.add(&#8216;on&#8217;);draw();});<br \/>\nbchips.appendChild(c);<br \/>\n});<br \/>\nroot.querySelectorAll(&#8216;.chip[data-tier]&#8217;).forEach(function(c){<br \/>\nc.addEventListener(&#8216;click&#8217;,function(){curTier=c.dataset.tier;root.querySelectorAll(&#8216;.chip[data-tier]&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});c.classList.add(&#8216;on&#8217;);draw();});<br \/>\n});<br \/>\nfunction draw(){<br \/>\nvar d=DATA[curTier];var row=d.vals[curB];var chart=root.querySelector(&#8216;#chart&#8217;);chart.innerHTML=&#8221;;<br \/>\nvar max=Math.max.apply(null,row.filter(function(v){return v!=null}));<br \/>\nd.models.forEach(function(m,i){<br \/>\nvar v=row[i];var hero=(m===d.hero);<br \/>\nvar div=document.createElement(&#8216;div&#8217;);div.className=&#8217;row&#8217;+(hero?&#8217; hero&#8217;:&#8221;)+(v==null?&#8217; na&#8217;:&#8221;);<br \/>\ndiv.innerHTML=&#039;&lt;div class=&quot;&rdquo;nm&rdquo;&quot;&gt;&rsquo;+m+&#039;&lt;\/div&gt;&lt;div class=&quot;&rdquo;bt&rdquo;&quot;&gt;&lt;div class=&quot;&rdquo;bf&rdquo;&quot;&gt;&lt;\/div&gt;&lt;\/div&gt;&lt;div class=&quot;&rdquo;vl&rdquo;&quot;&gt;&rsquo;+(v==null?&rsquo;n\/a&rsquo;:v)+&#039;&lt;\/div&gt;&rsquo;;<br \/>\nchart.appendChild(div);<br \/>\n(function(bf,val){setTimeout(function(){bf.style.width=(val==null?0:(val\/max*100))+&#8217;%&#8217;;},40);})(div.querySelector(&#8216;.bf&#8217;),v);<br \/>\n});<br \/>\nroot.querySelector(&#8216;#benchNote&#8217;).textContent=&#8217;Benchmark: &#8216;+BENCHES[curB]+&#8217;. Bars scaled to the highest score shown. &#8220;n\/a&#8221; = not reported by the vendor. Self-reported, not independently verified.&#8217;;<br \/>\nresize();<br \/>\n}<br \/>\ndraw();<\/p>\n<p>\/* defenses accordion *\/<br \/>\nroot.querySelectorAll(&#8216;.layer&#8217;).forEach(function(l){<br \/>\nl.addEventListener(&#8216;click&#8217;,function(){l.classList.toggle(&#8216;open&#8217;);resize();});<br \/>\n});<\/p>\n<p>\/* auto-resize for WordPress iframe *\/<br \/>\nfunction resize(){<br \/>\ntry{<br \/>\nvar h=root.offsetHeight+40;<br \/>\nif(window.parent){window.parent.postMessage({type:&#8217;mtp-ornith-height&#8217;,height:h},&#8217;*&#8217;);}<br \/>\n}catch(e){}<br \/>\n}<br \/>\nwindow.addEventListener(&#8216;load&#8217;,resize);<br \/>\nsetTimeout(resize,300);<br \/>\nwindow.addEventListener(&#8216;resize&#8217;,resize);<br \/>\n})();<br \/>\n&lt;\/script&gt;<br \/>\n&lt;\/div&gt;<br \/>\n&#8221; style=&#8221;width:100%;border:0;display:block;min-height:600px;overflow:hidden&#8221; height=&#8221;600&#8243; scrolling=&#8221;no&#8221; loading=&#8221;lazy&#8221; title=&#8221;Ornith-1.0 Interactive Explainer&#8221;&gt;<\/p>\n<p class=\"wp-block-paragraph\">\n<h2 class=\"wp-block-heading\"><strong>The Self-Scaffolding Idea<\/strong><\/h2>\n<\/p><p class=\"wp-block-paragraph\">Most coding agents rely on a scaffold, also called a <strong>harness<\/strong>. A scaffold wraps the model with memory, tools, error handling, and orchestration logic. AI teams usually hand-design one scaffold per task category.<\/p>\n<p class=\"wp-block-paragraph\">Ornith-1.0 treats the scaffold as a learnable object instead. During reinforcement learning, the scaffold co-evolves with the model\u2019s policy. <strong>Each RL step runs in two stages<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\"><strong>First<\/strong>, the model reads the task and its previous scaffold. It then proposes a refined scaffold. <strong>Second<\/strong>, it uses that scaffold and the task to generate a solution rollout. Reward from the rollout flows back to both stages.<\/p>\n<p class=\"wp-block-paragraph\">So the model is optimized to author orchestration, not just answers. Over training, higher-reward scaffolds are mutated and selected automatically. Per-task strategies emerge without hand-engineered harness design.<\/p>\n<p class=\"wp-block-paragraph\">Training also runs asynchronously, using a pipeline-RL setup. A staleness weight downweights older, off-policy tokens and drops them past a threshold. The optimization uses a token-level GRPO objective.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Guarding Against Reward Hacking<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Letting a model write its own scaffold invites reward hacking. A scaffold could read visible test files and hardcode expected outputs. It could also copy an oracle solution sitting in the environment. DeepReinforce team describes three defense layers.<\/p>\n<ol class=\"wp-block-list\">\n<li>The outer trust boundary is fixed and immutable. The environment, tool surface, and test isolation stay outside the model\u2019s reach. The model evolves only its inner policy scaffold.<\/li>\n<li>A deterministic monitor flags banned actions. Reading withheld paths or editing verification scripts earns zero reward. Those trajectories are excluded from the advantage computation.<\/li>\n<li>A frozen LLM judge acts as a veto. It sits on top of the verifier, not as the primary reward.<\/li>\n<\/ol>\n<h2 class=\"wp-block-heading\"><strong>Benchmark<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">DeepReinforce reports vendor numbers across several agentic coding benchmarks. At flagship scale, Ornith-1.0-397B posts 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified. On SWE-Bench Verified, that 82.4 trails only Claude Opus 4.8 (87.6) among the listed models. On Terminal-Bench 2.1, the picture is more mixed.<\/p>\n<p class=\"wp-block-paragraph\">Ornith-1.0-397B beats Claude Opus 4.7 (70.3) on Terminal-Bench 2.1. But it trails Claude Opus 4.8 (85) and the larger GLM-5.2-744B (81.0). So the \u2018state-of-the-art\u2019 claim is scoped to open models of comparable size.<\/p>\n<p class=\"wp-block-paragraph\">The smaller models carry the efficiency case. The 35B model scores 64.2 on Terminal-Bench 2.1, above Qwen 3.5-397B\u2019s 53.5. The 9B model reaches 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified.<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Ornith-1.0-397B<\/th>\n<th>Qwen3.5-397B<\/th>\n<th>Qwen3.7-Max<\/th>\n<th>GLM-5.2-744B<\/th>\n<th>Minimax-M3-428B<\/th>\n<th>DeepSeek-V4-Pro-1.6T<\/th>\n<th>Claude Opus 4.7<\/th>\n<th>Claude Opus 4.8<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Terminal-Bench 2.1<\/td>\n<td>77.5<\/td>\n<td>53.5<\/td>\n<td>73.5<\/td>\n<td>81.0<\/td>\n<td>64<\/td>\n<td>64<\/td>\n<td>70.3<\/td>\n<td>85<\/td>\n<\/tr>\n<tr>\n<td>SWE-Bench Verified<\/td>\n<td>82.4<\/td>\n<td>76.4<\/td>\n<td>80.4<\/td>\n<td>\u2013<\/td>\n<td>\u2013<\/td>\n<td>80.6<\/td>\n<td>80.8<\/td>\n<td>87.6<\/td>\n<\/tr>\n<tr>\n<td>SWE-Bench Pro<\/td>\n<td>62.2<\/td>\n<td>51.6<\/td>\n<td>60.6<\/td>\n<td>62.1<\/td>\n<td>59<\/td>\n<td>55.4<\/td>\n<td>64.3<\/td>\n<td>69.2<\/td>\n<\/tr>\n<tr>\n<td>SWE-Bench Multilingual<\/td>\n<td>78.9<\/td>\n<td>69.3<\/td>\n<td>78.3<\/td>\n<td>\u2013<\/td>\n<td>\u2013<\/td>\n<td>76.2<\/td>\n<td>\u2013<\/td>\n<td>\u2013<\/td>\n<\/tr>\n<tr>\n<td>NL2Repo<\/td>\n<td>48.2<\/td>\n<td>36.8<\/td>\n<td>47.2<\/td>\n<td>48.9<\/td>\n<td>42.1<\/td>\n<td>\u2013<\/td>\n<td>\u2013<\/td>\n<td>69.7<\/td>\n<\/tr>\n<tr>\n<td>ClawEval Avg<\/td>\n<td>77.1<\/td>\n<td>70.7<\/td>\n<td>65.2<\/td>\n<td>\u2013<\/td>\n<td>\u2013<\/td>\n<td>75.8<\/td>\n<td>78.2<\/td>\n<td>\u2013<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h2 class=\"wp-block-heading\"><strong>Use Cases and a Quick Start<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The models target terminal-native coding agents and repository-scale work. Practical fits include multi-file refactors, bug localization, and test-driven patches. The 9B model suits edge or single-GPU setups where latency and cost matter. The 397B model targets maximum accuracy on long, multi-step tasks.<\/p>\n<p class=\"wp-block-paragraph\">For example, a dev can run the 9B model locally to triage a failing test suite. A platform team can self-host the 397B model for an internal coding agent.<\/p>\n<p class=\"wp-block-paragraph\">Serving is a one-liner with vLLM:<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">vllm serve deepreinforce-ai\/Ornith-1.0-9B \n    --served-model-name Ornith-1.0-9B \n    --max-model-len 262144 \n    --enable-auto-tool-choice --tool-call-parser qwen3_xml \n    --reasoning-parser qwen3 \n    --trust-remote-code<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">Then call it with any OpenAI client:<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">from openai import OpenAI\n\nclient = OpenAI(base_url=\"http:\/\/localhost:8000\/v1\", api_key=\"EMPTY\")\n\nresp = client.chat.completions.create(\n    model=\"Ornith-1.0-9B\",\n    messages=[{\"role\": \"user\", \"content\": \"Write a Python is_prime(n).\"}],\n    temperature=0.6, top_p=0.95,\n)\nmsg = resp.choices[0].message\nprint(getattr(msg, \"reasoning_content\", None))  # the &lt;think&gt; trace\nprint(msg.content)                              # the final answer<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">The reasoning trace returns in <code>reasoning_content<\/code>, with the answer in <code>content<\/code>. Recommended sampling is <code>temperature=0.6<\/code>, <code>top_p=0.95<\/code>, <code>top_k=20<\/code>. The model also plugs into OpenHands, OpenClaw, and OpenCode.<\/p>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">Check out the\u00a0<strong><a href=\"https:\/\/huggingface.co\/collections\/deepreinforce-ai\/ornith-10\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a><\/strong> and <strong><a href=\"https:\/\/deep-reinforce.com\/ornith_1_0.html\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/25\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\">DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>DeepReinforce has released Ornith-1.0, an open-source model family built for agentic coding. The lineup spans four sizes, from a 9B dense model to a 397B mixture-of-experts flagship. Every checkpoint ships under the MIT license on Hugging Face. The models are post-trained on top of pretrained Gemma 4 and Qwen 3.5. Most coding agents pair a model with a fixed, human-designed harness. Ornith-1.0 instead learns to write its own. The DeepReinforce research team reports state-of-the-art results among open models of comparable size. TL;DR Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5. The model learns its own scaffold during RL, jointly optimizing the harness and the solution. Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B. Three layers \u2014 fixed trust boundary, deterministic monitor, frozen LLM judge \u2014 guard against reward hacking. What is Ornith-1.0? Ornith-1.0 is a set of reasoning models tuned for coding agents. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is mixture-of-experts and activates roughly 3B parameters per token. FP8 and GGUF builds are also published for faster local serving. Each model is a reasoning model. Replies open with a &lt;think&gt; block before the final answer. The serving recipes enable a reasoning parser, so that trace returns in a separate reasoning_content field. The models also emit well-formed tool calls for agent loops. Deployment is straightforward. The 9B model is about 19GB in bf16 and serves on a single 80GB GPU. Serving recipes target vLLM, SGLang, and Transformers. Each model exposes an OpenAI-compatible endpoint. Standard agent frameworks therefore work without code changes. Interactive Explainer &lt;\/button&gt; &lt;button class=&#8221;btn gho&#8221; id=&#8221;resetBtn&#8221;&gt;Reset&lt;\/button&gt; &lt;\/div&gt; &lt;div class=&#8221;stepout&#8221; id=&#8221;stepOut&#8221;&gt;Step 0 \u2014 untrained policy with a fixed, hand-written harness.&lt;\/div&gt; &lt;\/div&gt; &lt;!&#8211; PANEL 2: BENCH &#8211;&gt; &lt;div class=&#8221;panel&#8221; data-panel=&#8221;bench&#8221;&gt; &lt;div class=&#8221;lead&#8221;&gt;Vendor-reported scores from DeepReinforce. Pick a model tier and a benchmark. Ornith is highlighted in green. Higher is better.&lt;\/div&gt; &lt;div class=&#8221;seg&#8221;&gt;&lt;span class=&#8221;lab&#8221;&gt;Model tier&lt;\/span&gt; &lt;div class=&#8221;chip on&#8221; data-tier=&#8221;t397&#8243;&gt;397B flagship&lt;\/div&gt; &lt;div class=&#8221;chip&#8221; data-tier=&#8221;t35&#8243;&gt;35B MoE&lt;\/div&gt; &lt;div class=&#8221;chip&#8221; data-tier=&#8221;t9&#8243;&gt;9B dense&lt;\/div&gt; &lt;\/div&gt; &lt;div class=&#8221;seg&#8221; id=&#8221;benchChips&#8221;&gt;&lt;span class=&#8221;lab&#8221;&gt;Benchmark&lt;\/span&gt;&lt;\/div&gt; &lt;div class=&#8221;chart&#8221; id=&#8221;chart&#8221;&gt;&lt;\/div&gt; &lt;div class=&#8221;foot-note&#8221; id=&#8221;benchNote&#8221;&gt;&lt;\/div&gt; &lt;\/div&gt; &lt;!&#8211; PANEL 3: DEFENSES &#8211;&gt; &lt;div class=&#8221;panel&#8221; data-panel=&#8221;def&#8221;&gt; &lt;div class=&#8221;lead&#8221;&gt;A model that writes its own scaffold could cheat the verifier. DeepReinforce describes three defense layers. Tap each to expand.&lt;\/div&gt; &lt;div class=&#8221;layers&#8221;&gt; &lt;div class=&#8221;layer open&#8221;&gt;&lt;div class=&#8221;lh&#8221;&gt;&lt;span class=&#8221;num&#8221;&gt;1&lt;\/span&gt;&lt;span class=&#8221;lt&#8221;&gt;Fixed trust boundary&lt;\/span&gt;&lt;span class=&#8221;more&#8221;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&#8221;lb&#8221;&gt;The environment, tool surface, and test isolation are immutable and outside the model&#8217;s reach. The model evolves only its inner policy scaffold \u2014 memory, error-handling, and orchestration logic.&lt;\/div&gt;&lt;\/div&gt; &lt;div class=&#8221;layer&#8221;&gt;&lt;div class=&#8221;lh&#8221;&gt;&lt;span class=&#8221;num&#8221;&gt;2&lt;\/span&gt;&lt;span class=&#8221;lt&#8221;&gt;Deterministic monitor&lt;\/span&gt;&lt;span class=&#8221;more&#8221;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&#8221;lb&#8221;&gt;A rule-based monitor flags any attempt to read withheld paths, modify verification scripts, or invoke unsanctioned tools. Such trajectories get zero reward and are excluded from the advantage computation.&lt;\/div&gt;&lt;\/div&gt; &lt;div class=&#8221;layer&#8221;&gt;&lt;div class=&#8221;lh&#8221;&gt;&lt;span class=&#8221;num&#8221;&gt;3&lt;\/span&gt;&lt;span class=&#8221;lt&#8221;&gt;Frozen LLM judge&lt;\/span&gt;&lt;span class=&#8221;more&#8221;&gt;tap&lt;\/span&gt;&lt;\/div&gt;&lt;div class=&#8221;lb&#8221;&gt;Because intent-level gaming can happen inside the allowed tool surface, a frozen LLM judge acts as a veto on top of the verifier \u2014 not as the primary reward signal.&lt;\/div&gt;&lt;\/div&gt; &lt;\/div&gt; &lt;\/div&gt; &lt;div class=&#8221;ftr&#8221;&gt;&lt;span&gt;Source: &lt;a href=&#8221;https:\/\/deep-reinforce.com\/ornith_1_0.html&#8221; target=&#8221;_blank&#8221; rel=&#8221;noopener&#8221;&gt;deep-reinforce.com&lt;\/a&gt; \u00b7 MIT licensed \u00b7 numbers vendor-reported&lt;\/span&gt;&lt;span&gt;&lt;b&gt;Marktechpost&lt;\/b&gt; \u00b7 AI Dev Signals&lt;\/span&gt;&lt;\/div&gt; &lt;script&gt; (function(){ var root=document.getElementById(&#8216;mtp-ornith-demo&#8217;); \/* tabs *\/ root.querySelectorAll(&#8216;.tab&#8217;).forEach(function(t){ t.addEventListener(&#8216;click&#8217;,function(){ root.querySelectorAll(&#8216;.tab&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)}); root.querySelectorAll(&#8216;.panel&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)}); t.classList.add(&#8216;on&#8217;); root.querySelector(&#8216;.panel[data-panel=&#8221;&#8216;+t.dataset.p+&#8217;&#8221;]&#8217;).classList.add(&#8216;on&#8217;); resize(); }); }); \/* loop sim *\/ var step=0,reward=0.08,timer=null; var scaffs=[ &#8216;Baseline harness: linear retries, no memory.&#8217;, &#8216;Adds scratchpad memory across tool calls.&#8217;, &#8216;Adds error-triage branch before re-edit.&#8217;, &#8216;Reorders: read tests, then plan, then patch.&#8217;, &#8216;Caches sub-results; prunes dead branches.&#8217;, &#8216;Task-specific orchestration emerges automatically.&#8217;]; var outs=[ &#8216;Fixed harness, no learning yet.&#8217;, &#8216;Fewer redundant file reads observed.&#8217;, &#8216;Recovers from failed edits more often.&#8217;, &#8216;Higher first-pass test success.&#8217;, &#8216;Shorter trajectories, same accuracy.&#8217;, &#8216;Stable high-reward scaffold selected.&#8217;]; var nodes=root.querySelectorAll(&#8216;.node&#8217;); function lightSeq(cb){ var i=0;nodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)}); var iv=setInterval(function(){ nodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)}); nodes[i].classList.add(&#8216;act&#8217;);i++; if(i&gt;=nodes.length){clearInterval(iv);setTimeout(function(){nodes.forEach(function(n){n.classList.remove(&#8216;act&#8217;)});cb&amp;&amp;cb();},260);} },220); } function doStep(){ if(step&gt;=5){return;} step++; lightSeq(function(){ reward=[0.08,0.27,0.43,0.58,0.69,0.77][step]; root.querySelector(&#8216;#rFill&#8217;).style.width=(reward*100)+&#8217;%&#8217;; root.querySelector(&#8216;#rVal&#8217;).textContent=reward.toFixed(2); root.querySelector(&#8216;#scaffTxt&#8217;).textContent=scaffs[step]; root.querySelector(&#8216;#outTxt&#8217;).textContent=outs[step]; root.querySelector(&#8216;#stepOut&#8217;).innerHTML=&#8217;Step &#8216;+step+&#8217; \u2014 &lt;b&gt;scaffold mutated&lt;\/b&gt;; reward propagated to both stages.&#8217;; resize(); }); } root.querySelector(&#8216;#stepBtn&#8217;).addEventListener(&#8216;click&#8217;,doStep); root.querySelector(&#8216;#autoBtn&#8217;).addEventListener(&#8216;click&#8217;,function(){ if(timer){clearInterval(timer);timer=null;this.textContent=&#8217;Auto-run &#8216;;return;} this.textContent=&#8217;Pause &#8216;;var b=this; timer=setInterval(function(){if(step&gt;=5){clearInterval(timer);timer=null;b.textContent=&#8217;Auto-run &#8216;;}else{doStep();}},1400); }); root.querySelector(&#8216;#resetBtn&#8217;).addEventListener(&#8216;click&#8217;,function(){ if(timer){clearInterval(timer);timer=null;root.querySelector(&#8216;#autoBtn&#8217;).textContent=&#8217;Auto-run &#8216;;} step=0;reward=0.08; root.querySelector(&#8216;#rFill&#8217;).style.width=&#8217;8%&#8217;; root.querySelector(&#8216;#rVal&#8217;).textContent=&#8217;0.08&#8242;; root.querySelector(&#8216;#scaffTxt&#8217;).textContent=scaffs[0]; root.querySelector(&#8216;#outTxt&#8217;).textContent=&#8217;Press \u201cRun training step\u201d to begin.&#8217;; root.querySelector(&#8216;#stepOut&#8217;).innerHTML=&#8217;Step 0 \u2014 untrained policy with a fixed, hand-written harness.&#8217;; resize(); }); \/* benchmark data (vendor-reported) *\/ var BENCHES=[&#8216;Terminal-Bench 2.1&#8242;,&#8217;SWE-Bench Verified&#8217;,&#8217;SWE-Bench Pro&#8217;,&#8217;SWE-Bench Multilingual&#8217;,&#8217;NL2Repo&#8217;,&#8217;ClawEval Avg&#8217;]; var DATA={ t397:{label:&#8217;Ornith-1.0-397B&#8217;,hero:&#8217;Ornith-1.0-397B&#8217;, models:[&#8216;Ornith-1.0-397B&#8217;,&#8217;Qwen3.5-397B&#8217;,&#8217;Qwen3.7-Max&#8217;,&#8217;GLM-5.2-744B&#8217;,&#8217;Minimax-M3-428B&#8217;,&#8217;DeepSeek-V4-Pro-1.6T&#8217;,&#8217;Claude Opus 4.7&#8242;,&#8217;Claude Opus 4.8&#8242;], vals:[[77.5,53.5,73.5,81.0,64,64,70.3,85],[82.4,76.4,80.4,null,null,80.6,80.8,87.6],[62.2,51.6,60.6,62.1,59,55.4,64.3,69.2],[78.9,69.3,78.3,null,null,76.2,null,null],[48.2,36.8,47.2,48.9,42.1,null,null,69.7],[77.1,70.7,65.2,null,null,75.8,78.2,null]]}, t35:{label:&#8217;Ornith-1.0-35B-A3B&#8217;,hero:&#8217;Ornith-1.0-35B-A3B&#8217;, models:[&#8216;Ornith-1.0-35B-A3B&#8217;,&#8217;Qwen3.5-35B-A3B&#8217;,&#8217;Qwen3.6-35B-A3B&#8217;,&#8217;Gemma4-31B&#8217;,&#8217;Qwen3.5-397B&#8217;], vals:[[64.2,41.4,52.5,42.1,53.5],[75.6,70,73.4,52,76.4],[50.4,44.6,49.5,35.7,51.6],[69.3,60.3,67.2,51.7,69.3],[34.6,20.5,29.4,15.5,36.8],[69.8,65.4,68.7,48.5,70.7]]}, t9:{label:&#8217;Ornith-1.0-9B&#8217;,hero:&#8217;Ornith-1.0-9B&#8217;, models:[&#8216;Ornith-1.0-9B&#8217;,&#8217;Qwen3.5-9B&#8217;,&#8217;Qwen3.5-35B-A3B&#8217;,&#8217;Gemma4-12B&#8217;,&#8217;Gemma4-31B&#8217;], vals:[[43.1,21.3,41.4,21,42.1],[69.4,53.2,70,44.2,52],[42.9,31.3,44.6,27.6,35.7],[52,39.7,60.3,32.5,51.7],[27.2,16.2,20.5,10.3,15.5],[63.1,53.2,65.4,32.5,48.5]]} }; var curTier=&#8217;t397&#8242;,curB=0; var bchips=root.querySelector(&#8216;#benchChips&#8217;); BENCHES.forEach(function(b,i){ var c=document.createElement(&#8216;div&#8217;);c.className=&#8217;chip&#8217;+(i===0?&#8217; on&#8217;:&#8221;);c.textContent=b;c.dataset.b=i; c.addEventListener(&#8216;click&#8217;,function(){curB=i;bchips.querySelectorAll(&#8216;.chip&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});c.classList.add(&#8216;on&#8217;);draw();}); bchips.appendChild(c); }); root.querySelectorAll(&#8216;.chip[data-tier]&#8217;).forEach(function(c){ c.addEventListener(&#8216;click&#8217;,function(){curTier=c.dataset.tier;root.querySelectorAll(&#8216;.chip[data-tier]&#8217;).forEach(function(x){x.classList.remove(&#8216;on&#8217;)});c.classList.add(&#8216;on&#8217;);draw();}); }); function draw(){ var d=DATA[curTier];var row=d.vals[curB];var chart=root.querySelector(&#8216;#chart&#8217;);chart.innerHTML=&#8221;; var max=Math.max.apply(null,row.filter(function(v){return v!=null})); d.models.forEach(function(m,i){ var v=row[i];var hero=(m===d.hero); var div=document.createElement(&#8216;div&#8217;);div.className=&#8217;row&#8217;+(hero?&#8217; hero&#8217;:&#8221;)+(v==null?&#8217; na&#8217;:&#8221;); div.innerHTML='&lt;div class=&#8221;nm&#8221;&gt;&#8217;+m+'&lt;\/div&gt;&lt;div class=&#8221;bt&#8221;&gt;&lt;div class=&#8221;bf&#8221;&gt;&lt;\/div&gt;&lt;\/div&gt;&lt;div class=&#8221;vl&#8221;&gt;&#8217;+(v==null?&#8217;n\/a&#8217;:v)+'&lt;\/div&gt;&#8217;; chart.appendChild(div); (function(bf,val){setTimeout(function(){bf.style.width=(val==null?0:(val\/max*100))+&#8217;%&#8217;;},40);})(div.querySelector(&#8216;.bf&#8217;),v); }); root.querySelector(&#8216;#benchNote&#8217;).textContent=&#8217;Benchmark: &#8216;+BENCHES[curB]+&#8217;. Bars scaled to the highest score shown. &#8220;n\/a&#8221; = not reported by the vendor. Self-reported, not independently verified.&#8217;; resize(); } draw(); \/* defenses accordion *\/ root.querySelectorAll(&#8216;.layer&#8217;).forEach(function(l){ l.addEventListener(&#8216;click&#8217;,function(){l.classList.toggle(&#8216;open&#8217;);resize();}); }); \/* auto-resize for WordPress iframe *\/ function resize(){ try{ var h=root.offsetHeight+40; if(window.parent){window.parent.postMessage({type:&#8217;mtp-ornith-height&#8217;,height:h},&#8217;*&#8217;);} }catch(e){} } window.addEventListener(&#8216;load&#8217;,resize); setTimeout(resize,300); window.addEventListener(&#8216;resize&#8217;,resize); })(); &lt;\/script&gt; &lt;\/div&gt; &#8221; style=&#8221;width:100%;border:0;display:block;min-height:600px;overflow:hidden&#8221; height=&#8221;600&#8243; scrolling=&#8221;no&#8221; loading=&#8221;lazy&#8221; title=&#8221;Ornith-1.0 Interactive Explainer&#8221;&gt; The Self-Scaffolding Idea Most coding agents rely on a scaffold, also called a harness. A scaffold wraps the model with memory, tools, error handling, and orchestration logic. AI teams usually hand-design one scaffold per task category. Ornith-1.0 treats the scaffold as a learnable object instead. During reinforcement learning, the scaffold co-evolves with the model\u2019s policy. Each RL step runs in two stages. First, the model reads the task and its previous scaffold. It then proposes a refined scaffold. Second, it uses that scaffold and the task to generate a solution rollout. Reward from the rollout flows back to both stages. So the model is optimized to author orchestration, not just answers. Over training, higher-reward scaffolds are mutated and selected automatically. Per-task strategies emerge without hand-engineered harness design. Training also runs asynchronously, using a pipeline-RL setup. A staleness weight downweights older, off-policy tokens and drops them past a threshold. The optimization uses a token-level GRPO objective. Guarding Against Reward Hacking Letting a model write its own scaffold invites reward hacking. A scaffold could read visible test files and hardcode expected outputs. It could also copy an oracle solution sitting in the environment. DeepReinforce team describes three defense layers. The outer trust<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-99880","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-25T18:29:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"12\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds\",\"datePublished\":\"2026-06-25T18:29:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\"},\"wordCount\":2193,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\",\"url\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\",\"name\":\"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\",\"datePublished\":\"2026-06-25T18:29:21+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage\",\"url\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\",\"contentUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/","og_locale":"de_DE","og_type":"article","og_title":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-06-25T18:29:21+00:00","og_image":[{"url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"12\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds","datePublished":"2026-06-25T18:29:21+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/"},"wordCount":2193,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/","url":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/","name":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png","datePublished":"2026-06-25T18:29:21+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#primaryimage","url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png","contentUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/deepreinforce-releases-ornith-1-0-an-open-source-coding-model-family-that-learns-its-own-rl-scaffolds\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"DeepReinforce has released Ornith-1.0, an open-source model family built for agentic coding. The lineup spans four sizes, from a 9B dense model to a 397B mixture-of-experts flagship. Every checkpoint ships under the MIT license on Hugging Face. The models are post-trained on top of pretrained Gemma 4 and Qwen 3.5. Most coding agents pair a&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/99880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=99880"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/99880\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=99880"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=99880"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=99880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}