{"id":89397,"date":"2026-05-10T16:17:19","date_gmt":"2026-05-10T16:17:19","guid":{"rendered":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/"},"modified":"2026-05-10T16:17:19","modified_gmt":"2026-05-10T16:17:19","slug":"how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/","title":{"rendered":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching"},"content":{"rendered":"<p>In this tutorial, we explore<a href=\"https:\/\/github.com\/NadirRouter\/NadirClaw\"> <strong>NadirClaw<\/strong><\/a> as an intelligent routing layer that classifies prompts into simple and complex tiers before sending them to the most suitable model. We start by installing the required packages, setting up an optional Gemini API key, and testing the local classifier through the NadirClaw CLI without making any live LLM calls. We then inspect the centroid vectors that power the routing decision, embed our own prompts, visualize how similarity scores separate simple and complex tasks, and experiment with confidence thresholds. After understanding the local routing logic, we move into live routing by launching the NadirClaw proxy server, sending OpenAI-compatible requests through it, comparing routed model behavior, and estimating cost savings against an always-Pro baseline.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import subprocess, sys\ndef _pip(*pkgs):\n   subprocess.run([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", *pkgs], check=True)\n_pip(\"nadirclaw\", \"openai\", \"sentence-transformers\", \"matplotlib\",\n    \"scikit-learn\", \"pandas\", \"requests\")\nimport os, json, time, signal, shutil, getpass\nfrom pathlib import Path\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport requests\nGEMINI_API_KEY = os.environ.get(\"GEMINI_API_KEY\", \"\").strip()\nif not GEMINI_API_KEY:\n   print(\"Paste your Gemini API key (input hidden), or press Enter to skip:\")\n   try:\n       GEMINI_API_KEY = getpass.getpass(prompt=\"GEMINI_API_KEY: \").strip()\n   except (EOFError, KeyboardInterrupt):\n       GEMINI_API_KEY = \"\"\nLIVE_ROUTING = bool(GEMINI_API_KEY)\nif LIVE_ROUTING:\n   os.environ[\"GEMINI_API_KEY\"] = GEMINI_API_KEY\n   print(f\"\u2713 key captured ({len(GEMINI_API_KEY)} chars) \u2014 sections 8\u201311 enabled.\")\nelse:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\" alt=\"\u2139\" class=\"wp-smiley\" \/> no key entered \u2014 sections 3\u20137 still run; live routing skipped.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We install NadirClaw and the supporting Python libraries required for routing, embeddings, plotting, API calls, and data handling. We then import all required modules and securely capture the Gemini API key through the environment or a hidden prompt. We also decide whether live routing sections should run, while still allowing the local classifier sections to work without an API key.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def classify(prompt: str) -&gt; dict:\n   r = subprocess.run(\n       [\"nadirclaw\", \"classify\", \"--format\", \"json\", prompt],\n       capture_output=True, text=True, timeout=180,\n   )\n   if r.returncode != 0:\n       return {\"prompt\": prompt, \"error\": (r.stderr or r.stdout).strip()}\n   return json.loads(r.stdout.strip())\nprompts = [\n   \"What is 2+2?\",\n   \"Format this JSON: {\"a\":1,\"b\":2}\",\n   \"Read the file at src\/main.py\",\n   \"Add a docstring to the foo function\",\n   \"What does this function do?\",\n   \"Refactor the auth module to use dependency injection without breaking existing callers\",\n   \"Design a distributed event-sourced order pipeline that handles 50k req\/s with strict ordering\",\n   \"Analyze the tradeoffs between actor-model and CSP-style concurrency for our codebase\",\n   \"Debug why this asyncio.gather call deadlocks under high load and provide a fix\",\n   \"Prove that this scheduling algorithm is optimal step by step and derive the worst-case bound\",\n]\nprint(\"n[3] Classifying 10 prompts (first call warms the encoder)\u2026\")\nrows = [classify(p) for p in prompts]\ndf = pd.DataFrame(rows)\ncols = [c for c in [\"tier\", \"score\", \"confidence\", \"model\", \"prompt\"] if c in df.columns]\nprint(df[cols].to_string(index=False))\nimport nadirclaw\nPKG = Path(nadirclaw.__file__).parent\nSIMPLE_C = np.load(PKG \/ \"simple_centroid.npy\").astype(np.float32).flatten()\nCOMPLEX_C = np.load(PKG \/ \"complex_centroid.npy\").astype(np.float32).flatten()\ndef cosine(a, b):\n   return float(a @ b \/ (np.linalg.norm(a) * np.linalg.norm(b) + 1e-12))\nprint(f\"n[4] simple_centroid  shape={SIMPLE_C.shape}  \u2016\u00b7\u2016={np.linalg.norm(SIMPLE_C):.3f}\")\nprint(f\"    complex_centroid shape={COMPLEX_C.shape}  \u2016\u00b7\u2016={np.linalg.norm(COMPLEX_C):.3f}\")\nprint(f\"    cosine(simple,complex) = {cosine(SIMPLE_C, COMPLEX_C):.4f}  \"\n     \"\u2190 if this were 1.0 the classifier couldn't distinguish them.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define a reusable classify() function that sends prompts to the NadirClaw CLI and returns structured JSON results. We create a mixed set of simple and complex prompts, classify them, and display the routing tier, score, confidence, model, and prompt text in a table. We then load the simple and complex centroid vectors from the NadirClaw package and compare their shapes, norms, and cosine similarity.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">from sentence_transformers import SentenceTransformer\nprint(\"n[5] Loading the same encoder NadirClaw uses (all-MiniLM-L6-v2)\u2026\")\nencoder = SentenceTransformer(\"sentence-transformers\/all-MiniLM-L6-v2\")\nembs = encoder.encode(prompts, normalize_embeddings=True)\nsim_simple = np.array([cosine(e, SIMPLE_C) for e in embs])\nsim_complex = np.array([cosine(e, COMPLEX_C) for e in embs])\nfig, ax = plt.subplots(figsize=(8.5, 6))\ncolors = [\"tab:blue\"] * 5 + [\"tab:red\"] * 5\nax.scatter(sim_simple, sim_complex, c=colors, s=110, edgecolor=\"k\", linewidth=0.5)\nfor i, _ in enumerate(prompts):\n   ax.annotate(str(i + 1), (sim_simple[i], sim_complex[i]),\n               xytext=(6, 4), textcoords=\"offset points\", fontsize=10)\nxs = np.linspace(min(sim_simple.min(), sim_complex.min()),\n                max(sim_simple.max(), sim_complex.max()), 50)\nax.plot(xs, xs, \"k--\", alpha=0.4, label=\"cos(simple) = cos(complex)\")\nax.set_xlabel(\"cosine similarity to SIMPLE centroid\")\nax.set_ylabel(\"cosine similarity to COMPLEX centroid\")\nax.set_title(\"Routing decision boundaryn(blue = expected simple, red = expected complex)\")\nax.legend(loc=\"lower right\")\nax.grid(alpha=0.25)\nplt.tight_layout()\nplt.savefig(\"centroid_decision_plot.png\", dpi=120)\nplt.show()\nprint(\"Legend: prompts above the dashed line route to COMPLEX, below to SIMPLE.\")\nprint(\"n[6] Prompts sorted by complexity score:\")\nsdf = df.sort_values(\"score\").reset_index(drop=True)\nfor _, row in sdf.iterrows():\n   bar = \"\u2588\" * int(round(float(row[\"score\"]) * 30))\n   print(f\"  score={float(row['score']):.2f}  conf={float(row['confidence']):.2f}  \"\n         f\"{row['tier']:7s} |{bar:&lt;30s}| {row['prompt'][:55]}\")\nprint(\"n[6] Confidence-threshold sweep (low confidence \u2192 forced complex):\")\nprint(\"    NadirClaw default threshold is 0.06.\")\nfor thr in [0.02, 0.06, 0.10, 0.20, 0.30]:\n   forced_complex = sum(1 for r in rows if float(r[\"confidence\"]) &lt; thr)\n   natural_complex = sum(1 for r in rows if float(r[\"score\"]) &gt;= 0.5)\n   print(f\"    threshold={thr:.2f} \u2192 {forced_complex} prompts force-complex \"\n         f\"(low-confidence), {natural_complex} naturally complex by score\")\nmodifier_demos = [\n   (\"agentic \u2014 text-only marker\",\n    \"You are a coding agent that can execute commands. Now add tests for the new endpoint.\"),\n   (\"reasoning \u2014 chain-of-thought markers\",\n    \"Step by step, derive the closed form and prove correctness mathematically. \"\n    \"Compare and contrast both approaches.\"),\n   (\"vision \u2014 would arrive with image_url part (only text shown)\",\n    \"Describe the screenshot.\"),\n]\nprint(\"n[7] Modifier-marker scan:\")\nfor label, p in modifier_demos:\n   r = classify(p)\n   print(f\"    {label}\")\n   print(f\"      prompt='{p[:65]}\u2026'\")\n   print(f\"      tier={r['tier']}  score={float(r['score']):.2f}  conf={float(r['confidence']):.2f}\")\nprint(\"    NB: agentic &amp; vision routing also trigger from request shape \"\n     \"(tools=[\u2026], image_url parts) \u2014 see live calls below.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We use the same SentenceTransformer encoder as NadirClaw and embed all tutorial prompts locally. We compare each prompt embedding against the simple and complex centroids, then visualize the routing boundary with a scatter plot. We also sort prompts by complexity score, test confidence thresholds, and inspect routing modifier examples for agentic, reasoning, and vision-style requests.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">PORT = 8856\nserver_proc = None\nif LIVE_ROUTING:\n   print(f\"n[8] Starting `nadirclaw serve` on :{PORT} (background subprocess)\u2026\")\n   env = os.environ.copy()\n   env.update({\n       \"GEMINI_API_KEY\": GEMINI_API_KEY,\n       \"NADIRCLAW_SIMPLE_MODEL\": \"gemini-2.5-flash\",\n       \"NADIRCLAW_COMPLEX_MODEL\": \"gemini-2.5-pro\",\n       \"NADIRCLAW_PORT\": str(PORT),\n   })\n   server_proc = subprocess.Popen(\n       [\"nadirclaw\", \"serve\", \"--verbose\"],\n       env=env,\n       stdout=subprocess.PIPE, stderr=subprocess.STDOUT,\n       preexec_fn=os.setsid if hasattr(os, \"setsid\") else None,\n   )\n   ready = False\n   for _ in range(60):\n       if server_proc.poll() is not None:\n           break\n       try:\n           if requests.get(f\"http:\/\/localhost:{PORT}\/health\", timeout=1).ok:\n               ready = True\n               break\n       except Exception:\n           time.sleep(1)\n   if ready:\n       print(\"    \u2713 \/health returned 200 \u2014 proxy is live.\")\n   else:\n       print(\"    <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> proxy did not come up; dumping last log lines:\")\n       if server_proc.stdout:\n           try:\n               lines = server_proc.stdout.read1(4096).decode(\"utf-8\", errors=\"replace\")\n               print(lines[-2000:])\n           except Exception as e:\n               print(f\"    (could not read server stdout: {e})\")\nelse:\n   print(\"n[8] Skipped \u2014 no GEMINI_API_KEY.\")\ndef proxy_alive():\n   return server_proc is not None and server_proc.poll() is None\nif proxy_alive():\n   from openai import OpenAI\n   client = OpenAI(base_url=f\"http:\/\/localhost:{PORT}\/v1\", api_key=\"local\")\n   side_by_side = [\n       (\"simple-ish\", \"Write a one-line docstring for: def add(a, b): return a + b\"),\n       (\"complex\",    \"Refactor a Python class to a dependency-injection pattern, \"\n                      \"explain the trade-offs, and produce migration steps for callers.\"),\n   ]\n   summary = []\n   for label, p in side_by_side:\n       t0 = time.time()\n       try:\n           resp = client.chat.completions.create(\n               model=\"auto\",\n               messages=[{\"role\": \"user\", \"content\": p}],\n               max_tokens=220,\n           )\n           dt = time.time() - t0\n           text = (resp.choices[0].message.content or \"\").strip()\n           print(f\"n--- [{label}] {dt:.2f}s \u00b7 model={resp.model} ---\")\n           print(text[:500] + (\"\u2026\" if len(text) &gt; 500 else \"\"))\n           summary.append({\n               \"label\": label, \"model_used\": resp.model,\n               \"latency_s\": round(dt, 2),\n               \"tokens\": getattr(resp.usage, \"total_tokens\", None),\n           })\n       except Exception as e:\n           summary.append({\"label\": label, \"model_used\": \"ERROR\",\n                           \"latency_s\": None, \"tokens\": str(e)[:80]})\n           print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> [{label}] failed: {e}\")\n   print(\"n[9] Summary:\")\n   print(pd.DataFrame(summary).to_string(index=False))<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We start the NadirClaw proxy server locally when a Gemini API key is available and configure it to route between Flash and Pro models. We check the \/health endpoint to confirm that the proxy is running before sending requests. We then use the OpenAI SDK against the local proxy and compare how a simple prompt and a complex prompt are routed and answered.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">if proxy_alive():\n   print(\"n[10] Mixed 10-prompt workload\u2026\")\n   workload = [\n       \"Capital of France?\",\n       \"Read foo.py\",\n       \"Type hint for a list of dicts\",\n       \"Lowercase: HELLO\",\n       \"One-sentence summary of REST\",\n       \"Refactor a callback chain into async\/await with proper error handling\",\n       \"Design a sharded multi-region key-value store with linearizable reads\",\n       \"Analyze the asymptotic complexity of this code and prove the bound rigorously\",\n       \"Debug why our gRPC stream stalls when the client TCP window saturates\",\n       \"Compare and contrast B-trees and LSM-trees for write-heavy workloads\",\n   ]\n   runs = []\n   client = OpenAI(base_url=f\"http:\/\/localhost:{PORT}\/v1\", api_key=\"local\")\n   for p in workload:\n       t0 = time.time()\n       try:\n           r = client.chat.completions.create(\n               model=\"auto\",\n               messages=[{\"role\": \"user\", \"content\": p}],\n               max_tokens=140,\n           )\n           usage = getattr(r, \"usage\", None)\n           runs.append({\n               \"prompt\": p[:55],\n               \"model\": r.model,\n               \"latency_s\": round(time.time() - t0, 2),\n               \"in_tok\": getattr(usage, \"prompt_tokens\", 0) if usage else 0,\n               \"out_tok\": getattr(usage, \"completion_tokens\", 0) if usage else 0,\n           })\n       except Exception as e:\n           runs.append({\"prompt\": p[:55], \"model\": \"ERROR\",\n                        \"latency_s\": None, \"in_tok\": 0, \"out_tok\": 0,\n                        \"error\": str(e)[:80]})\n   rdf = pd.DataFrame(runs)\n   print(rdf.to_string(index=False))\n   PRICE = {\n       \"flash\": {\"in\": 0.30 \/ 1e6, \"out\": 2.50 \/ 1e6},\n       \"pro\":   {\"in\": 1.25 \/ 1e6, \"out\": 10.0 \/ 1e6},\n   }\n   def price_for(model_str, in_t, out_t):\n       m = (model_str or \"\").lower()\n       tier = \"flash\" if \"flash\" in m else \"pro\"\n       return in_t * PRICE[tier][\"in\"] + out_t * PRICE[tier][\"out\"]\n   cost_routed = sum(price_for(r[\"model\"], r[\"in_tok\"], r[\"out_tok\"]) for r in runs)\n   cost_no_route = sum(price_for(\"gemini-2.5-pro\", r[\"in_tok\"], r[\"out_tok\"]) for r in runs)\n   print(f\"n[10] Cost (NadirClaw routed)        : ${cost_routed:.6f}\")\n   print(f\"     Cost (always-Pro baseline)     : ${cost_no_route:.6f}\")\n   if cost_no_route &gt; 0:\n       print(f\"     Estimated savings on this run  : \"\n             f\"{(1 - cost_routed\/cost_no_route) * 100:.1f}%\")\nprint(\"n[11] `nadirclaw report` (parses the JSONL request log):\")\nrep = subprocess.run([\"nadirclaw\", \"report\"], capture_output=True, text=True, timeout=60)\nprint(rep.stdout or rep.stderr)\nif proxy_alive():\n   print(\"n[12] Stopping the proxy\u2026\")\n   try:\n       if hasattr(os, \"killpg\"):\n           os.killpg(os.getpgid(server_proc.pid), signal.SIGTERM)\n       else:\n           server_proc.terminate()\n       server_proc.wait(timeout=10)\n   except Exception:\n       try:\n           server_proc.kill()\n       except Exception:\n           pass\n   print(\"    \u2713 proxy stopped.\")\nprint(\"nDone. <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f389.png\" alt=\"\ud83c\udf89\" class=\"wp-smiley\" \/>\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We send a mixed 10-prompt workload through the NadirClaw proxy to observe which model each prompt uses. We calculate an illustrative routed cost and compare it with an always-Pro baseline to estimate savings. We finally run the built-in NadirClaw report command, stop the proxy cleanly, and finish the tutorial workflow.<\/p>\n<p>In conclusion, we built a complete hands-on understanding of how NadirClaw routes prompts based on complexity, confidence, and request modifiers. We saw how local classification occurs before any API call, how centroid-based similarity helps explain routing behavior, and how threshold tuning affects whether uncertain prompts are escalated to a stronger model. We also ran NadirClaw as a proxy, tested it with the OpenAI SDK, analyzed a mixed workload, and generated a routing report from the request log. Also, we learned how to use NadirClaw to make model routing more transparent, cost-aware, and practical for real-world AI applications.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Agents-Projects-Tutorials\/blob\/main\/AI%20Agents%20Codes\/nadirclaw_cost_aware_llm_routing_tutorial.py\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Repo<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/10\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\">How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we explore NadirClaw as an intelligent routing layer that classifies prompts into simple and complex tiers before sending them to the most suitable model. We start by installing the required packages, setting up an optional Gemini API key, and testing the local classifier through the NadirClaw CLI without making any live LLM calls. We then inspect the centroid vectors that power the routing decision, embed our own prompts, visualize how similarity scores separate simple and complex tasks, and experiment with confidence thresholds. After understanding the local routing logic, we move into live routing by launching the NadirClaw proxy server, sending OpenAI-compatible requests through it, comparing routed model behavior, and estimating cost savings against an always-Pro baseline. Copy CodeCopiedUse a different Browser import subprocess, sys def _pip(*pkgs): subprocess.run([sys.executable, &#8220;-m&#8221;, &#8220;pip&#8221;, &#8220;install&#8221;, &#8220;-q&#8221;, *pkgs], check=True) _pip(&#8220;nadirclaw&#8221;, &#8220;openai&#8221;, &#8220;sentence-transformers&#8221;, &#8220;matplotlib&#8221;, &#8220;scikit-learn&#8221;, &#8220;pandas&#8221;, &#8220;requests&#8221;) import os, json, time, signal, shutil, getpass from pathlib import Path import numpy as np import pandas as pd import matplotlib.pyplot as plt import requests GEMINI_API_KEY = os.environ.get(&#8220;GEMINI_API_KEY&#8221;, &#8220;&#8221;).strip() if not GEMINI_API_KEY: print(&#8220;Paste your Gemini API key (input hidden), or press Enter to skip:&#8221;) try: GEMINI_API_KEY = getpass.getpass(prompt=&#8221;GEMINI_API_KEY: &#8220;).strip() except (EOFError, KeyboardInterrupt): GEMINI_API_KEY = &#8220;&#8221; LIVE_ROUTING = bool(GEMINI_API_KEY) if LIVE_ROUTING: os.environ[&#8220;GEMINI_API_KEY&#8221;] = GEMINI_API_KEY print(f&#8221;\u2713 key captured ({len(GEMINI_API_KEY)} chars) \u2014 sections 8\u201311 enabled.&#8221;) else: print(&#8221; no key entered \u2014 sections 3\u20137 still run; live routing skipped.&#8221;) We install NadirClaw and the supporting Python libraries required for routing, embeddings, plotting, API calls, and data handling. We then import all required modules and securely capture the Gemini API key through the environment or a hidden prompt. We also decide whether live routing sections should run, while still allowing the local classifier sections to work without an API key. Copy CodeCopiedUse a different Browser def classify(prompt: str) -&gt; dict: r = subprocess.run( [&#8220;nadirclaw&#8221;, &#8220;classify&#8221;, &#8220;&#8211;format&#8221;, &#8220;json&#8221;, prompt], capture_output=True, text=True, timeout=180, ) if r.returncode != 0: return {&#8220;prompt&#8221;: prompt, &#8220;error&#8221;: (r.stderr or r.stdout).strip()} return json.loads(r.stdout.strip()) prompts = [ &#8220;What is 2+2?&#8221;, &#8220;Format this JSON: {&#8220;a&#8221;:1,&#8221;b&#8221;:2}&#8221;, &#8220;Read the file at src\/main.py&#8221;, &#8220;Add a docstring to the foo function&#8221;, &#8220;What does this function do?&#8221;, &#8220;Refactor the auth module to use dependency injection without breaking existing callers&#8221;, &#8220;Design a distributed event-sourced order pipeline that handles 50k req\/s with strict ordering&#8221;, &#8220;Analyze the tradeoffs between actor-model and CSP-style concurrency for our codebase&#8221;, &#8220;Debug why this asyncio.gather call deadlocks under high load and provide a fix&#8221;, &#8220;Prove that this scheduling algorithm is optimal step by step and derive the worst-case bound&#8221;, ] print(&#8220;n[3] Classifying 10 prompts (first call warms the encoder)\u2026&#8221;) rows = [classify(p) for p in prompts] df = pd.DataFrame(rows) cols = [c for c in [&#8220;tier&#8221;, &#8220;score&#8221;, &#8220;confidence&#8221;, &#8220;model&#8221;, &#8220;prompt&#8221;] if c in df.columns] print(df[cols].to_string(index=False)) import nadirclaw PKG = Path(nadirclaw.__file__).parent SIMPLE_C = np.load(PKG \/ &#8220;simple_centroid.npy&#8221;).astype(np.float32).flatten() COMPLEX_C = np.load(PKG \/ &#8220;complex_centroid.npy&#8221;).astype(np.float32).flatten() def cosine(a, b): return float(a @ b \/ (np.linalg.norm(a) * np.linalg.norm(b) + 1e-12)) print(f&#8221;n[4] simple_centroid shape={SIMPLE_C.shape} \u2016\u00b7\u2016={np.linalg.norm(SIMPLE_C):.3f}&#8221;) print(f&#8221; complex_centroid shape={COMPLEX_C.shape} \u2016\u00b7\u2016={np.linalg.norm(COMPLEX_C):.3f}&#8221;) print(f&#8221; cosine(simple,complex) = {cosine(SIMPLE_C, COMPLEX_C):.4f} &#8221; &#8220;\u2190 if this were 1.0 the classifier couldn&#8217;t distinguish them.&#8221;) We define a reusable classify() function that sends prompts to the NadirClaw CLI and returns structured JSON results. We create a mixed set of simple and complex prompts, classify them, and display the routing tier, score, confidence, model, and prompt text in a table. We then load the simple and complex centroid vectors from the NadirClaw package and compare their shapes, norms, and cosine similarity. Copy CodeCopiedUse a different Browser from sentence_transformers import SentenceTransformer print(&#8220;n[5] Loading the same encoder NadirClaw uses (all-MiniLM-L6-v2)\u2026&#8221;) encoder = SentenceTransformer(&#8220;sentence-transformers\/all-MiniLM-L6-v2&#8221;) embs = encoder.encode(prompts, normalize_embeddings=True) sim_simple = np.array([cosine(e, SIMPLE_C) for e in embs]) sim_complex = np.array([cosine(e, COMPLEX_C) for e in embs]) fig, ax = plt.subplots(figsize=(8.5, 6)) colors = [&#8220;tab:blue&#8221;] * 5 + [&#8220;tab:red&#8221;] * 5 ax.scatter(sim_simple, sim_complex, c=colors, s=110, edgecolor=&#8221;k&#8221;, linewidth=0.5) for i, _ in enumerate(prompts): ax.annotate(str(i + 1), (sim_simple[i], sim_complex[i]), xytext=(6, 4), textcoords=&#8221;offset points&#8221;, fontsize=10) xs = np.linspace(min(sim_simple.min(), sim_complex.min()), max(sim_simple.max(), sim_complex.max()), 50) ax.plot(xs, xs, &#8220;k&#8211;&#8220;, alpha=0.4, label=&#8221;cos(simple) = cos(complex)&#8221;) ax.set_xlabel(&#8220;cosine similarity to SIMPLE centroid&#8221;) ax.set_ylabel(&#8220;cosine similarity to COMPLEX centroid&#8221;) ax.set_title(&#8220;Routing decision boundaryn(blue = expected simple, red = expected complex)&#8221;) ax.legend(loc=&#8221;lower right&#8221;) ax.grid(alpha=0.25) plt.tight_layout() plt.savefig(&#8220;centroid_decision_plot.png&#8221;, dpi=120) plt.show() print(&#8220;Legend: prompts above the dashed line route to COMPLEX, below to SIMPLE.&#8221;) print(&#8220;n[6] Prompts sorted by complexity score:&#8221;) sdf = df.sort_values(&#8220;score&#8221;).reset_index(drop=True) for _, row in sdf.iterrows(): bar = &#8220;\u2588&#8221; * int(round(float(row[&#8220;score&#8221;]) * 30)) print(f&#8221; score={float(row[&#8216;score&#8217;]):.2f} conf={float(row[&#8216;confidence&#8217;]):.2f} &#8221; f&#8221;{row[&#8216;tier&#8217;]:7s} |{bar:&lt;30s}| {row[&#8216;prompt&#8217;][:55]}&#8221;) print(&#8220;n[6] Confidence-threshold sweep (low confidence \u2192 forced complex):&#8221;) print(&#8221; NadirClaw default threshold is 0.06.&#8221;) for thr in [0.02, 0.06, 0.10, 0.20, 0.30]: forced_complex = sum(1 for r in rows if float(r[&#8220;confidence&#8221;]) &lt; thr) natural_complex = sum(1 for r in rows if float(r[&#8220;score&#8221;]) &gt;= 0.5) print(f&#8221; threshold={thr:.2f} \u2192 {forced_complex} prompts force-complex &#8221; f&#8221;(low-confidence), {natural_complex} naturally complex by score&#8221;) modifier_demos = [ (&#8220;agentic \u2014 text-only marker&#8221;, &#8220;You are a coding agent that can execute commands. Now add tests for the new endpoint.&#8221;), (&#8220;reasoning \u2014 chain-of-thought markers&#8221;, &#8220;Step by step, derive the closed form and prove correctness mathematically. &#8221; &#8220;Compare and contrast both approaches.&#8221;), (&#8220;vision \u2014 would arrive with image_url part (only text shown)&#8221;, &#8220;Describe the screenshot.&#8221;), ] print(&#8220;n[7] Modifier-marker scan:&#8221;) for label, p in modifier_demos: r = classify(p) print(f&#8221; {label}&#8221;) print(f&#8221; prompt='{p[:65]}\u2026'&#8221;) print(f&#8221; tier={r[&#8216;tier&#8217;]} score={float(r[&#8216;score&#8217;]):.2f} conf={float(r[&#8216;confidence&#8217;]):.2f}&#8221;) print(&#8221; NB: agentic &amp; vision routing also trigger from request shape &#8221; &#8220;(tools=[\u2026], image_url parts) \u2014 see live calls below.&#8221;) We use the same SentenceTransformer encoder as NadirClaw and embed all tutorial prompts locally. We compare each prompt embedding against the simple and complex centroids, then visualize the routing boundary with a scatter plot. We also sort prompts by complexity score, test confidence thresholds, and inspect routing modifier examples for agentic, reasoning, and vision-style requests. Copy CodeCopiedUse a different Browser PORT = 8856 server_proc = None if LIVE_ROUTING: print(f&#8221;n[8] Starting `nadirclaw serve` on :{PORT} (background subprocess)\u2026&#8221;) env = os.environ.copy() env.update({ &#8220;GEMINI_API_KEY&#8221;: GEMINI_API_KEY, &#8220;NADIRCLAW_SIMPLE_MODEL&#8221;: &#8220;gemini-2.5-flash&#8221;, &#8220;NADIRCLAW_COMPLEX_MODEL&#8221;: &#8220;gemini-2.5-pro&#8221;, &#8220;NADIRCLAW_PORT&#8221;: str(PORT), }) server_proc = subprocess.Popen( [&#8220;nadirclaw&#8221;, &#8220;serve&#8221;, &#8220;&#8211;verbose&#8221;], env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, preexec_fn=os.setsid if hasattr(os, &#8220;setsid&#8221;) else None, ) ready = False for _ in range(60): if server_proc.poll() is not None: break try: if requests.get(f&#8221;http:\/\/localhost:{PORT}\/health&#8221;, timeout=1).ok:<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-89397","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-10T16:17:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching\",\"datePublished\":\"2026-05-10T16:17:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\"},\"wordCount\":651,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\",\"url\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\",\"name\":\"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\",\"datePublished\":\"2026-05-10T16:17:19+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage\",\"url\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\",\"contentUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/","og_locale":"de_DE","og_type":"article","og_title":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-10T16:17:19+00:00","og_image":[{"url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"11\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching","datePublished":"2026-05-10T16:17:19+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/"},"wordCount":651,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/","url":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/","name":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png","datePublished":"2026-05-10T16:17:19+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#primaryimage","url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png","contentUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2139.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we explore NadirClaw as an intelligent routing layer that classifies prompts into simple and complex tiers before sending them to the most suitable model. We start by installing the required packages, setting up an optional Gemini API key, and testing the local classifier through the NadirClaw CLI without making any live LLM&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/89397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=89397"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/89397\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=89397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=89397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=89397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}