{"id":78220,"date":"2026-03-22T14:34:37","date_gmt":"2026-03-22T14:34:37","guid":{"rendered":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/"},"modified":"2026-03-22T14:34:37","modified_gmt":"2026-03-22T14:34:37","slug":"a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/","title":{"rendered":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research"},"content":{"rendered":"<p>In this tutorial, we build an uncertainty-aware large language model system that not only generates answers but also estimates the confidence in those answers. We implement a three-stage reasoning pipeline in which the model first produces an answer along with a self-reported confidence score and a justification. We then introduce a self-evaluation step that allows the model to critique and refine its own response, simulating a meta-cognitive check. If the model determines that its confidence is low, we automatically trigger a web research phase that retrieves relevant information from live sources and synthesizes a more reliable answer. By combining confidence estimation, self-reflection, and automated research, we create a practical framework for building more trustworthy and transparent AI systems that can recognize uncertainty and actively seek better information.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import os, json, re, textwrap, getpass, sys, warnings\nfrom dataclasses import dataclass, field\nfrom typing import Optional\nfrom openai import OpenAI\nfrom ddgs import DDGS\nfrom rich.console import Console\nfrom rich.table import Table\nfrom rich.panel import Panel\nfrom rich import box\n\n\nwarnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n\n\ndef _get_api_key() -&gt; str:\n   key = os.environ.get(\"OPENAI_API_KEY\", \"\").strip()\n   if key:\n       return key\n   try:\n       from google.colab import userdata\n       key = userdata.get(\"OPENAI_API_KEY\") or \"\"\n       if key.strip():\n           return key.strip()\n   except Exception:\n       pass\n   console = Console()\n   console.print(\n       \"n[bold cyan]OpenAI API Key required[\/bold cyan]n\"\n       \"[dim]Your key will not be echoed and is never stored to disk.n\"\n       \"To skip this prompt in future runs, set the environment variable:n\"\n       \"  export OPENAI_API_KEY=sk-...[\/dim]n\"\n   )\n   key = getpass.getpass(\"  Enter your OpenAI API key: \").strip()\n   if not key:\n       Console().print(\"[bold red]No API key provided \u2014 exiting.[\/bold red]\")\n       sys.exit(1)\n   return key\n\n\nOPENAI_API_KEY = _get_api_key()\nMODEL           = \"gpt-4o-mini\"\nCONFIDENCE_LOW  = 0.55\nCONFIDENCE_MED  = 0.80\n\n\nclient  = OpenAI(api_key=OPENAI_API_KEY)\nconsole = Console()\n\n\n@dataclass\nclass LLMResponse:\n   question:    str\n   answer:      str\n   confidence:  float\n   reasoning:   str\n   sources:     list[str] = field(default_factory=list)\n   researched:  bool = False\n   raw_json:    dict = field(default_factory=dict)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We import all required libraries and configure the runtime environment for the uncertainty-aware LLM pipeline. We securely retrieve the OpenAI API key using environment variables, Colab secrets, or a hidden terminal prompt. We also define the LLMResponse data structure that stores the question, answer, confidence score, reasoning, and research metadata used throughout the system.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">SYSTEM_UNCERTAINTY = \"\"\"\nYou are an expert AI assistant that is HONEST about what it knows and doesn't know.\nFor every question you MUST respond with valid JSON only (no markdown, no prose outside JSON):\n\n\n{\n \"answer\": \"&lt;your best answer \u2014 thorough, factual&gt;\",\n \"confidence\": &lt;float 0.0-1.0&gt;,\n \"reasoning\": \"&lt;explain WHY you are or aren't confident; mention specific knowledge gaps&gt;\"\n}\n\n\nConfidence scale:\n 0.90-1.00 \u2192 very high: well-established fact, you are certain\n 0.75-0.89 \u2192 high: strong knowledge, minor uncertainty\n 0.55-0.74 \u2192 medium: plausible but you may be wrong, could be outdated\n 0.30-0.54 \u2192 low: significant uncertainty, answer is a best guess\n 0.00-0.29 \u2192 very low: mostly guessing, minimal reliable knowledge\n\n\nBe CALIBRATED \u2014 do not always give high confidence. Genuinely reflect uncertainty\nabout recent events (after your knowledge cutoff), niche topics, numerical claims,\nand anything that changes over time.\n\"\"\".strip()\n\n\nSYSTEM_SYNTHESIS = \"\"\"\nYou are a research synthesizer. Given a question, a preliminary answer,\nand web-search snippets, produce an improved final answer grounded in the evidence.\nRespond in JSON only:\n\n\n{\n \"answer\": \"&lt;improved, evidence-grounded answer&gt;\",\n \"confidence\": &lt;float 0.0-1.0&gt;,\n \"reasoning\": \"&lt;explain how the search evidence changed or confirmed the answer&gt;\"\n}\n\"\"\".strip()\n\n\ndef query_llm_with_confidence(question: str) -&gt; LLMResponse:\n   completion = client.chat.completions.create(\n       model=MODEL,\n       temperature=0.2,\n       response_format={\"type\": \"json_object\"},\n       messages=[\n           {\"role\": \"system\", \"content\": SYSTEM_UNCERTAINTY},\n           {\"role\": \"user\",   \"content\": question},\n       ],\n   )\n   raw = json.loads(completion.choices[0].message.content)\n\n\n   return LLMResponse(\n       question=question,\n       answer=raw.get(\"answer\", \"\"),\n       confidence=float(raw.get(\"confidence\", 0.5)),\n       reasoning=raw.get(\"reasoning\", \"\"),\n       raw_json=raw,\n   )<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the system prompts that instruct the model to report answers along with calibrated confidence and reasoning. We then implement the query_llm_with_confidence function that performs the first stage of the pipeline. This stage generates the model\u2019s answer while forcing the output to be structured JSON containing the answer, confidence score, and explanation.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def self_evaluate(response: LLMResponse) -&gt; LLMResponse:\n   critique_prompt = f\"\"\"\nReview this answer and its stated confidence. Check for:\n1. Logical consistency\n2. Whether the confidence matches the actual quality of the answer\n3. Any factual errors you can spot\n\n\nQuestion: {response.question}\n\n\nProposed answer: {response.answer}\nStated confidence: {response.confidence}\nStated reasoning: {response.reasoning}\n\n\nRespond in JSON:\n{{\n \"revised_confidence\": &lt;float \u2014 adjust if the self-check changes your view&gt;,\n \"critique\": \"&lt;brief critique of the answer quality&gt;\",\n \"revised_answer\": \"&lt;improved answer, or repeat original if fine&gt;\"\n}}\n\"\"\".strip()\n\n\n   completion = client.chat.completions.create(\n       model=MODEL,\n       temperature=0.1,\n       response_format={\"type\": \"json_object\"},\n       messages=[\n           {\"role\": \"system\", \"content\": \"You are a rigorous self-critic. Respond in JSON only.\"},\n           {\"role\": \"user\",   \"content\": critique_prompt},\n       ],\n   )\n   ev = json.loads(completion.choices[0].message.content)\n\n\n   response.confidence = float(ev.get(\"revised_confidence\", response.confidence))\n   response.answer     = ev.get(\"revised_answer\", response.answer)\n   response.reasoning += f\"nn[Self-Eval Critique]: {ev.get('critique', '')}\"\n   return response\n\n\n\n\ndef web_search(query: str, max_results: int = 5) -&gt; list[dict]:\n   results = DDGS().text(query, max_results=max_results)\n   return list(results) if results else []\n\n\n\n\ndef research_and_synthesize(response: LLMResponse) -&gt; LLMResponse:\n   console.print(f\"  [yellow]<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\" alt=\"\ud83d\udd0d\" class=\"wp-smiley\" \/> Confidence {response.confidence:.0%} is low \u2014 triggering auto-research...[\/yellow]\")\n\n\n   snippets = web_search(response.question)\n   if not snippets:\n       console.print(\"  [red]No search results found.[\/red]\")\n       return response\n\n\n   formatted = \"nn\".join(\n       f\"[{i+1}] {s.get('title','')}n{s.get('body','')}nURL: {s.get('href','')}\"\n       for i, s in enumerate(snippets)\n   )\n\n\n   synthesis_prompt = f\"\"\"\nQuestion: {response.question}\n\n\nPreliminary answer (low confidence): {response.answer}\n\n\nWeb search snippets:\n{formatted}\n\n\nSynthesize an improved answer using the evidence above.\n\"\"\".strip()\n\n\n   completion = client.chat.completions.create(\n       model=MODEL,\n       temperature=0.2,\n       response_format={\"type\": \"json_object\"},\n       messages=[\n           {\"role\": \"system\", \"content\": SYSTEM_SYNTHESIS},\n           {\"role\": \"user\",   \"content\": synthesis_prompt},\n       ],\n   )\n   syn = json.loads(completion.choices[0].message.content)\n\n\n   response.answer      = syn.get(\"answer\", response.answer)\n   response.confidence  = float(syn.get(\"confidence\", response.confidence))\n   response.reasoning  += f\"nn[Post-Research]: {syn.get('reasoning', '')}\"\n   response.sources     = [s.get(\"href\", \"\") for s in snippets if s.get(\"href\")]\n   response.researched  = True\n   return response<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement a self-evaluation stage in which the model critiques its own answer and revises its confidence as needed. We also introduce the web search capability that retrieves live information using DuckDuckGo. If the model\u2019s confidence is low, we synthesize the search results with the preliminary answer to produce an improved response grounded in external evidence.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def self_evaluate(response: LLMResponse) -&gt; LLMResponse:\n   critique_prompt = f\"\"\"\nReview this answer and its stated confidence. Check for:\n1. Logical consistency\n2. Whether the confidence matches the actual quality of the answer\n3. Any factual errors you can spot\n\n\nQuestion: {response.question}\n\n\nProposed answer: {response.answer}\nStated confidence: {response.confidence}\nStated reasoning: {response.reasoning}\n\n\nRespond in JSON:\n{{\n \"revised_confidence\": &lt;float \u2014 adjust if the self-check changes your view&gt;,\n \"critique\": \"&lt;brief critique of the answer quality&gt;\",\n \"revised_answer\": \"&lt;improved answer, or repeat original if fine&gt;\"\n}}\n\"\"\".strip()\n\n\n   completion = client.chat.completions.create(\n       model=MODEL,\n       temperature=0.1,\n       response_format={\"type\": \"json_object\"},\n       messages=[\n           {\"role\": \"system\", \"content\": \"You are a rigorous self-critic. Respond in JSON only.\"},\n           {\"role\": \"user\",   \"content\": critique_prompt},\n       ],\n   )\n   ev = json.loads(completion.choices[0].message.content)\n\n\n   response.confidence = float(ev.get(\"revised_confidence\", response.confidence))\n   response.answer     = ev.get(\"revised_answer\", response.answer)\n   response.reasoning += f\"nn[Self-Eval Critique]: {ev.get('critique', '')}\"\n   return response\n\n\n\n\ndef web_search(query: str, max_results: int = 5) -&gt; list[dict]:\n   results = DDGS().text(query, max_results=max_results)\n   return list(results) if results else []\n\n\n\n\ndef research_and_synthesize(response: LLMResponse) -&gt; LLMResponse:\n   console.print(f\"  [yellow]<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\" alt=\"\ud83d\udd0d\" class=\"wp-smiley\" \/> Confidence {response.confidence:.0%} is low \u2014 triggering auto-research...[\/yellow]\")\n\n\n   snippets = web_search(response.question)\n   if not snippets:\n       console.print(\"  [red]No search results found.[\/red]\")\n       return response\n\n\n   formatted = \"nn\".join(\n       f\"[{i+1}] {s.get('title','')}n{s.get('body','')}nURL: {s.get('href','')}\"\n       for i, s in enumerate(snippets)\n   )\n\n\n   synthesis_prompt = f\"\"\"\nQuestion: {response.question}\n\n\nPreliminary answer (low confidence): {response.answer}\n\n\nWeb search snippets:\n{formatted}\n\n\nSynthesize an improved answer using the evidence above.\n\"\"\".strip()\n\n\n   completion = client.chat.completions.create(\n       model=MODEL,\n       temperature=0.2,\n       response_format={\"type\": \"json_object\"},\n       messages=[\n           {\"role\": \"system\", \"content\": SYSTEM_SYNTHESIS},\n           {\"role\": \"user\",   \"content\": synthesis_prompt},\n       ],\n   )\n   syn = json.loads(completion.choices[0].message.content)\n\n\n   response.answer      = syn.get(\"answer\", response.answer)\n   response.confidence  = float(syn.get(\"confidence\", response.confidence))\n   response.reasoning  += f\"nn[Post-Research]: {syn.get('reasoning', '')}\"\n   response.sources     = [s.get(\"href\", \"\") for s in snippets if s.get(\"href\")]\n   response.researched  = True\n   return response<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We construct the main reasoning pipeline that orchestrates answer generation, self-evaluation, and optional research. We compute visual confidence indicators and implement helper functions to label their confidence levels. We also built a formatted display system that presents the final answer, reasoning, confidence meter, and sources in a clean console interface.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">DEMO_QUESTIONS = [\n   \"What is the speed of light in a vacuum?\",\n   \"What were the main causes of the 2008 global financial crisis?\",\n   \"What is the latest version of Python released in 2025?\",\n   \"What is the current population of Tokyo as of 2025?\",\n]\n\n\ndef run_comparison_table(questions: list[str]) -&gt; None:\n   console.rule(\"[bold cyan]UNCERTAINTY-AWARE LLM \u2014 BATCH RUN[\/bold cyan]\")\n   results = []\n\n\n   for i, q in enumerate(questions, 1):\n       console.print(f\"n[bold]Question {i}\/{len(questions)}:[\/bold] {q}\")\n       r = uncertainty_aware_query(q)\n       display_response(r)\n       results.append(r)\n\n\n   console.rule(\"[bold cyan]SUMMARY TABLE[\/bold cyan]\")\n   tbl = Table(box=box.ROUNDED, show_lines=True, highlight=True)\n   tbl.add_column(\"#\",          style=\"dim\", width=3)\n   tbl.add_column(\"Question\",   max_width=40)\n   tbl.add_column(\"Confidence\", justify=\"center\", width=12)\n   tbl.add_column(\"Level\",      justify=\"center\", width=10)\n   tbl.add_column(\"Researched\", justify=\"center\", width=10)\n\n\n   for i, r in enumerate(results, 1):\n       emoji, label = confidence_label(r.confidence)\n       col = \"green\" if r.confidence &gt;= 0.75 else \"yellow\" if r.confidence &gt;= 0.55 else \"red\"\n       tbl.add_row(\n           str(i),\n           textwrap.shorten(r.question, 55),\n           f\"[{col}]{r.confidence:.0%}[\/{col}]\",\n           f\"{emoji} {label}\",\n           \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Yes\" if r.researched else \"\u2014\",\n       )\n\n\n   console.print(tbl)\n\n\n\n\ndef interactive_mode() -&gt; None:\n   console.rule(\"[bold cyan]INTERACTIVE MODE[\/bold cyan]\")\n   console.print(\"  Type any question. Type [bold]quit[\/bold] to exit.n\")\n   while True:\n       q = console.input(\"[bold cyan]You <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/25b6.png\" alt=\"\u25b6\" class=\"wp-smiley\" \/>[\/bold cyan] \").strip()\n       if q.lower() in (\"quit\", \"exit\", \"q\"):\n           console.print(\"Goodbye!\")\n           break\n       if not q:\n           continue\n       resp = uncertainty_aware_query(q)\n       display_response(resp)\n\n\n\n\nif __name__ == \"__main__\":\n   console.print(Panel(\n       \"[bold white]Uncertainty-Aware LLM Tutorial[\/bold white]n\"\n       \"[dim]Confidence Estimation \u00b7 Self-Evaluation \u00b7 Auto-Research[\/dim]\",\n       border_style=\"cyan\",\n       expand=False,\n   ))\n\n\n   run_comparison_table(DEMO_QUESTIONS)\n\n\n   console.print(\"n\")\n   interactive_mode()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define demonstration questions and implement a batch pipeline that evaluates the uncertainty-aware system across multiple queries. We generate a summary table that compares confidence levels and whether research was triggered. Finally, we implement an interactive mode that continuously accepts user questions and runs the full uncertainty-aware reasoning workflow.<\/p>\n<p>In conclusion, we designed and implemented a complete uncertainty-aware reasoning pipeline for large language models using Python and the OpenAI API. We demonstrated how models can verbalize confidence, perform internal self-evaluation, and automatically conduct research when uncertainty is detected. This approach improves reliability by enabling the system to acknowledge knowledge gaps and augment its answers with external evidence when needed. By integrating these components into a unified workflow, we showed how developers can build AI systems that are intelligent, calibrated, transparent, and adaptive, making them far more suitable for real-world decision-support applications.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the <strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/LLM%20Projects\/uncertainty_aware_llm_confidence_self_evaluation_auto_research_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL Notebook Here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/21\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\">A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an uncertainty-aware large language model system that not only generates answers but also estimates the confidence in those answers. We implement a three-stage reasoning pipeline in which the model first produces an answer along with a self-reported confidence score and a justification. We then introduce a self-evaluation step that allows the model to critique and refine its own response, simulating a meta-cognitive check. If the model determines that its confidence is low, we automatically trigger a web research phase that retrieves relevant information from live sources and synthesizes a more reliable answer. By combining confidence estimation, self-reflection, and automated research, we create a practical framework for building more trustworthy and transparent AI systems that can recognize uncertainty and actively seek better information. Copy CodeCopiedUse a different Browser import os, json, re, textwrap, getpass, sys, warnings from dataclasses import dataclass, field from typing import Optional from openai import OpenAI from ddgs import DDGS from rich.console import Console from rich.table import Table from rich.panel import Panel from rich import box warnings.filterwarnings(&#8220;ignore&#8221;, category=DeprecationWarning) def _get_api_key() -&gt; str: key = os.environ.get(&#8220;OPENAI_API_KEY&#8221;, &#8220;&#8221;).strip() if key: return key try: from google.colab import userdata key = userdata.get(&#8220;OPENAI_API_KEY&#8221;) or &#8220;&#8221; if key.strip(): return key.strip() except Exception: pass console = Console() console.print( &#8220;n[bold cyan]OpenAI API Key required[\/bold cyan]n&#8221; &#8220;[dim]Your key will not be echoed and is never stored to disk.n&#8221; &#8220;To skip this prompt in future runs, set the environment variable:n&#8221; &#8221; export OPENAI_API_KEY=sk-&#8230;[\/dim]n&#8221; ) key = getpass.getpass(&#8221; Enter your OpenAI API key: &#8220;).strip() if not key: Console().print(&#8220;[bold red]No API key provided \u2014 exiting.[\/bold red]&#8221;) sys.exit(1) return key OPENAI_API_KEY = _get_api_key() MODEL = &#8220;gpt-4o-mini&#8221; CONFIDENCE_LOW = 0.55 CONFIDENCE_MED = 0.80 client = OpenAI(api_key=OPENAI_API_KEY) console = Console() @dataclass class LLMResponse: question: str answer: str confidence: float reasoning: str sources: list[str] = field(default_factory=list) researched: bool = False raw_json: dict = field(default_factory=dict) We import all required libraries and configure the runtime environment for the uncertainty-aware LLM pipeline. We securely retrieve the OpenAI API key using environment variables, Colab secrets, or a hidden terminal prompt. We also define the LLMResponse data structure that stores the question, answer, confidence score, reasoning, and research metadata used throughout the system. Copy CodeCopiedUse a different Browser SYSTEM_UNCERTAINTY = &#8220;&#8221;&#8221; You are an expert AI assistant that is HONEST about what it knows and doesn&#8217;t know. For every question you MUST respond with valid JSON only (no markdown, no prose outside JSON): { &#8220;answer&#8221;: &#8220;&lt;your best answer \u2014 thorough, factual&gt;&#8221;, &#8220;confidence&#8221;: &lt;float 0.0-1.0&gt;, &#8220;reasoning&#8221;: &#8220;&lt;explain WHY you are or aren&#8217;t confident; mention specific knowledge gaps&gt;&#8221; } Confidence scale: 0.90-1.00 \u2192 very high: well-established fact, you are certain 0.75-0.89 \u2192 high: strong knowledge, minor uncertainty 0.55-0.74 \u2192 medium: plausible but you may be wrong, could be outdated 0.30-0.54 \u2192 low: significant uncertainty, answer is a best guess 0.00-0.29 \u2192 very low: mostly guessing, minimal reliable knowledge Be CALIBRATED \u2014 do not always give high confidence. Genuinely reflect uncertainty about recent events (after your knowledge cutoff), niche topics, numerical claims, and anything that changes over time. &#8220;&#8221;&#8221;.strip() SYSTEM_SYNTHESIS = &#8220;&#8221;&#8221; You are a research synthesizer. Given a question, a preliminary answer, and web-search snippets, produce an improved final answer grounded in the evidence. Respond in JSON only: { &#8220;answer&#8221;: &#8220;&lt;improved, evidence-grounded answer&gt;&#8221;, &#8220;confidence&#8221;: &lt;float 0.0-1.0&gt;, &#8220;reasoning&#8221;: &#8220;&lt;explain how the search evidence changed or confirmed the answer&gt;&#8221; } &#8220;&#8221;&#8221;.strip() def query_llm_with_confidence(question: str) -&gt; LLMResponse: completion = client.chat.completions.create( model=MODEL, temperature=0.2, response_format={&#8220;type&#8221;: &#8220;json_object&#8221;}, messages=[ {&#8220;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: SYSTEM_UNCERTAINTY}, {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: question}, ], ) raw = json.loads(completion.choices[0].message.content) return LLMResponse( question=question, answer=raw.get(&#8220;answer&#8221;, &#8220;&#8221;), confidence=float(raw.get(&#8220;confidence&#8221;, 0.5)), reasoning=raw.get(&#8220;reasoning&#8221;, &#8220;&#8221;), raw_json=raw, ) We define the system prompts that instruct the model to report answers along with calibrated confidence and reasoning. We then implement the query_llm_with_confidence function that performs the first stage of the pipeline. This stage generates the model\u2019s answer while forcing the output to be structured JSON containing the answer, confidence score, and explanation. Copy CodeCopiedUse a different Browser def self_evaluate(response: LLMResponse) -&gt; LLMResponse: critique_prompt = f&#8221;&#8221;&#8221; Review this answer and its stated confidence. Check for: 1. Logical consistency 2. Whether the confidence matches the actual quality of the answer 3. Any factual errors you can spot Question: {response.question} Proposed answer: {response.answer} Stated confidence: {response.confidence} Stated reasoning: {response.reasoning} Respond in JSON: {{ &#8220;revised_confidence&#8221;: &lt;float \u2014 adjust if the self-check changes your view&gt;, &#8220;critique&#8221;: &#8220;&lt;brief critique of the answer quality&gt;&#8221;, &#8220;revised_answer&#8221;: &#8220;&lt;improved answer, or repeat original if fine&gt;&#8221; }} &#8220;&#8221;&#8221;.strip() completion = client.chat.completions.create( model=MODEL, temperature=0.1, response_format={&#8220;type&#8221;: &#8220;json_object&#8221;}, messages=[ {&#8220;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: &#8220;You are a rigorous self-critic. Respond in JSON only.&#8221;}, {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: critique_prompt}, ], ) ev = json.loads(completion.choices[0].message.content) response.confidence = float(ev.get(&#8220;revised_confidence&#8221;, response.confidence)) response.answer = ev.get(&#8220;revised_answer&#8221;, response.answer) response.reasoning += f&#8221;nn[Self-Eval Critique]: {ev.get(&#8216;critique&#8217;, &#8221;)}&#8221; return response def web_search(query: str, max_results: int = 5) -&gt; list[dict]: results = DDGS().text(query, max_results=max_results) return list(results) if results else [] def research_and_synthesize(response: LLMResponse) -&gt; LLMResponse: console.print(f&#8221; [yellow] Confidence {response.confidence:.0%} is low \u2014 triggering auto-research&#8230;[\/yellow]&#8221;) snippets = web_search(response.question) if not snippets: console.print(&#8221; [red]No search results found.[\/red]&#8221;) return response formatted = &#8220;nn&#8221;.join( f&#8221;[{i+1}] {s.get(&#8216;title&#8217;,&#8221;)}n{s.get(&#8216;body&#8217;,&#8221;)}nURL: {s.get(&#8216;href&#8217;,&#8221;)}&#8221; for i, s in enumerate(snippets) ) synthesis_prompt = f&#8221;&#8221;&#8221; Question: {response.question} Preliminary answer (low confidence): {response.answer} Web search snippets: {formatted} Synthesize an improved answer using the evidence above. &#8220;&#8221;&#8221;.strip() completion = client.chat.completions.create( model=MODEL, temperature=0.2, response_format={&#8220;type&#8221;: &#8220;json_object&#8221;}, messages=[ {&#8220;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: SYSTEM_SYNTHESIS}, {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: synthesis_prompt}, ], ) syn = json.loads(completion.choices[0].message.content) response.answer = syn.get(&#8220;answer&#8221;, response.answer) response.confidence = float(syn.get(&#8220;confidence&#8221;, response.confidence)) response.reasoning += f&#8221;nn[Post-Research]: {syn.get(&#8216;reasoning&#8217;, &#8221;)}&#8221; response.sources = [s.get(&#8220;href&#8221;, &#8220;&#8221;) for s in snippets if s.get(&#8220;href&#8221;)] response.researched = True return response We implement a self-evaluation stage in which the model critiques its own answer and revises its confidence as needed. We also introduce the web search capability that retrieves live information using DuckDuckGo. If the model\u2019s confidence is low, we synthesize the search results with the preliminary answer to produce an improved response grounded in external evidence. Copy CodeCopiedUse a different Browser def self_evaluate(response: LLMResponse) -&gt; LLMResponse: critique_prompt = f&#8221;&#8221;&#8221; Review this answer and its stated confidence. Check for: 1. Logical consistency 2. Whether the confidence matches the actual quality of the answer 3. Any factual errors<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-78220","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-22T14:34:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research\",\"datePublished\":\"2026-03-22T14:34:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\"},\"wordCount\":604,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\",\"url\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\",\"name\":\"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\",\"datePublished\":\"2026-03-22T14:34:37+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage\",\"url\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\",\"contentUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/","og_locale":"th_TH","og_type":"article","og_title":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-03-22T14:34:37+00:00","og_image":[{"url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"10 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research","datePublished":"2026-03-22T14:34:37+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/"},"wordCount":604,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/","url":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/","name":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png","datePublished":"2026-03-22T14:34:37+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#primaryimage","url":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png","contentUrl":"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-an-uncertainty-aware-llm-system-with-confidence-estimation-self-evaluation-and-automatic-web-research\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we build an uncertainty-aware large language model system that not only generates answers but also estimates the confidence in those answers. We implement a three-stage reasoning pipeline in which the model first produces an answer along with a self-reported confidence score and a justification. We then introduce a self-evaluation step that allows&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/78220","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=78220"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/78220\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=78220"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=78220"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=78220"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}