{"id":95766,"date":"2026-06-07T17:38:58","date_gmt":"2026-06-07T17:38:58","guid":{"rendered":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/"},"modified":"2026-06-07T17:38:58","modified_gmt":"2026-06-07T17:38:58","slug":"building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/","title":{"rendered":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation"},"content":{"rendered":"<p class=\"wp-block-paragraph\">In this tutorial, we use<a href=\"https:\/\/github.com\/gepa-ai\/gepa\"> <strong>GEPA<\/strong><\/a> as a reflective prompt-evolution framework to improve the way a language model solves arithmetic word problems. We begin with a weak seed prompt, create a small deterministic benchmark, define a structured evaluator, and pass actionable feedback to GEPA so it can understand why a candidate prompt fails. We also use a multi-component prompt setup in which both the instruction field and the output-format rules evolve together. By the end, we compare the baseline prompt with the optimized prompt on a held-out validation set and inspect how the evolutionary process improves performance.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Installing GEPA and LiteLLM and Configuring the Task and Reflection Models<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">!pip install -q gepa litellm\nimport os, re, json, random, getpass, textwrap\nimport litellm\nimport gepa.optimize_anything as oa\nfrom gepa.optimize_anything import (\n   optimize_anything, GEPAConfig, EngineConfig, ReflectionConfig,\n)\nlitellm.suppress_debug_info = True\nif not os.environ.get(\"OPENAI_API_KEY\"):\n   os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")\nTASK_LM        = \"openai\/gpt-4o-mini\"\nREFLECTION_LM  = \"openai\/gpt-4.1\"\nMAX_METRIC_CALLS = 100<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">We install GEPA and LiteLLM, then import the required libraries for prompt optimization and model calls. We securely set up the OpenAI API key and define two models: a task model that solves the problem and a reflection model that improves the prompt. We also set the maximum metric-call budget to keep the optimization process under control.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Building a Deterministic Arithmetic Benchmark Dataset<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def make_problems(n, seed=0):\n   rng = random.Random(seed)\n   out = []\n   for _ in range(n):\n       t = rng.choice([\"discount\", \"travel\", \"wallet\", \"chain\"])\n       if t == \"discount\":\n           unit  = rng.choice([40, 60, 80, 120])\n           qty   = rng.choice([5, 6, 8, 10])\n           disc  = rng.choice([10, 20, 25, 50])\n           total = unit * qty\n           gold  = total - total * disc \/\/ 100\n           q = (f\"A shop sells notebooks at {unit} rupees each. You buy {qty} \"\n                f\"notebooks and get a {disc}% discount on the total bill. \"\n                f\"How many rupees do you pay in total?\")\n       elif t == \"travel\":\n           s1, h1 = rng.choice([40, 50, 60]), rng.choice([2, 3])\n           s2, h2 = rng.choice([30, 45, 70]), rng.choice([1, 2, 3])\n           gold = s1 * h1 + s2 * h2\n           q = (f\"A car drives at {s1} km\/h for {h1} hours, then at {s2} km\/h \"\n                f\"for {h2} hours. What is the total distance travelled, in km?\")\n       elif t == \"wallet\":\n           tens   = rng.choice([3, 5, 7, 9])\n           fifties= rng.choice([2, 4, 6])\n           spent  = rng.choice([50, 80, 110, 150])\n           gold = tens * 10 + fifties * 50 - spent\n           q = (f\"You have {tens} ten-rupee notes and {fifties} fifty-rupee \"\n                f\"notes. You spend {spent} rupees. How many rupees are left?\")\n       else:\n           x = rng.choice([6, 9, 12, 15]); y = rng.choice([4, 7, 10]); z = rng.choice([3, 8, 11])\n           gold = x * 2 - y + z\n           q = (f\"Start with the number {x}. Double it, then subtract {y}, \"\n                f\"then add {z}. What number do you end with?\")\n       out.append({\"question\": q, \"answer\": gold})\n   return out\nall_problems = make_problems(18, seed=42)\nrandom.Random(1).shuffle(all_problems)\ntrainset = all_problems[:12]\nvalset   = all_problems[12:]\nprint(f\"Dataset: {len(trainset)} train \/ {len(valset)} val problemsn\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">We create a small deterministic dataset of arithmetic word problems covering discounts, travel distance, wallet calculations, and chained operations. We generate the correct answer for each problem programmatically, which keeps the benchmark reliable and easy to evaluate. We then shuffle the examples and split them into a training set for optimization and a validation set for testing generalization.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Defining the Evaluator and Structured Feedback for GEPA<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def build_system_prompt(candidate: dict) -&gt; str:\n   return (f\"{candidate['instructions']}nn\"\n           f\"OUTPUT FORMAT RULES:n{candidate['format_rules']}\")\ndef call_task_lm(system_prompt: str, question: str) -&gt; str:\n   for attempt in range(3):\n       try:\n           r = litellm.completion(\n               model=TASK_LM,\n               messages=[{\"role\": \"system\", \"content\": system_prompt},\n                         {\"role\": \"user\",   \"content\": question}],\n               temperature=0, max_tokens=600, timeout=60,\n           )\n           return r[\"choices\"][0][\"message\"][\"content\"] or \"\"\n       except Exception as e:\n           if attempt == 2:\n               return f\"[LM_ERROR] {e}\"\n   return \"\"\ndef parse_answers(text: str):\n   formatted = re.search(r\"####s*(-?d+)\", text)\n   all_nums  = re.findall(r\"-?d+\", text)\n   fmt_val   = int(formatted.group(1)) if formatted else None\n   last_val  = int(all_nums[-1]) if all_nums else None\n   return fmt_val, last_val\ndef evaluate(candidate: dict, example: dict):\n   system = build_system_prompt(candidate)\n   raw    = call_task_lm(system, example[\"question\"])\n   gold   = example[\"answer\"]\n   fmt_val, last_val = parse_answers(raw)\n   if fmt_val is not None and fmt_val == gold:\n       score, fb = 1.0, \"Correct and correctly formatted.\"\n   elif fmt_val is not None and fmt_val != gold:\n       score, fb = 0.0, (f\"WRONG ANSWER. You output '#### {fmt_val}' but the \"\n                         f\"correct answer is {gold}. Re-check the arithmetic and \"\n                         f\"the order of the steps.\")\n   elif last_val == gold:\n       score, fb = 0.5, (f\"Right number ({gold}) but FORMAT VIOLATION: the final \"\n                         f\"line was not exactly '#### {gold}'. Always end with a \"\n                         f\"line of the form '#### &lt;integer&gt;' and nothing else.\")\n   else:\n       score, fb = 0.0, (f\"WRONG. Correct answer is {gold}. The model's final \"\n                         f\"number was {last_val}. Likely a multi-step reasoning \"\n                         f\"slip; show each step and verify before answering.\")\n   oa.log(f\"score={score} gold={gold} parsed_fmt={fmt_val} parsed_last={last_val}\")\n   side_info = {\n       \"feedback\": fb,\n       \"problem\": example[\"question\"],\n       \"gold_answer\": gold,\n       \"model_output\": raw[:500],\n   }\n   return score, side_info\ndef eval_set(candidate, dataset, label=\"\"):\n   scores, exact, formatted = [], 0, 0\n   for ex in dataset:\n       s, info = evaluate(candidate, ex)\n       scores.append(s)\n       if s == 1.0: exact += 1; formatted += 1\n       elif s == 0.5: formatted += 0\n   acc = exact \/ len(dataset)\n   avg = sum(scores) \/ len(dataset)\n   print(f\"  [{label}] avg_score={avg:.3f}  exact_correct+formatted={exact}\/{len(dataset)}\")\n   return avg, acc<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">We define how the candidate prompt is converted into a system prompt and how the task model receives each question. We also create the evaluator that parses the model output, checks whether the final answer follows the required #### &lt;integer&gt; format, and assigns a score. We return structured feedback as actionable side information so that GEPA can determine whether the issue is incorrect reasoning, poor formatting, or both.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Configuring GEPA and Running the Prompt Optimization<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">seed_candidate = {\n   \"instructions\": \"Solve the math problem.\",\n   \"format_rules\": \"Give the answer.\",\n}\nprint(\"=== BASELINE (seed prompt) ===\")\nprint(\"Train:\"); base_train = eval_set(seed_candidate, trainset, \"train\")\nprint(\"Val:  \"); base_val   = eval_set(seed_candidate, valset,   \"val\")\nprint()\nobjective = (\n   \"Evolve a system prompt (the 'instructions' and 'format_rules' fields) so a \"\n   \"small LLM reliably solves multi-step arithmetic word problems AND always \"\n   \"ends with a line of exactly the form '#### &lt;integer&gt;'. Maximize the score.\"\n)\nbackground = (\n   \"Scoring: 1.0 = correct number in the exact '#### &lt;int&gt;' format; 0.5 = correct \"\n   \"number but wrong\/missing format; 0.0 = wrong number. Common failures are (a) not \"\n   \"emitting the '####' line, and (b) order-of-operations or multi-step slips. The \"\n   \"winning prompt should force explicit step-by-step work, a verification step, and \"\n   \"a strict final-answer line.\"\n)\nconfig = GEPAConfig(\n   engine=EngineConfig(\n       max_metric_calls=MAX_METRIC_CALLS,\n       max_workers=4,\n       parallel=True,\n       display_progress_bar=True,\n       seed=0,\n   ),\n   reflection=ReflectionConfig(\n       reflection_lm=REFLECTION_LM,\n   ),\n)\nprint(\"=== RUNNING GEPA (this calls the LLMs; ~1-4 min) ===\")\nresult = optimize_anything(\n   seed_candidate=seed_candidate,\n   evaluator=evaluate,\n   dataset=trainset,\n   valset=valset,\n   objective=objective,\n   background=background,\n   config=config,\n)\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">We start with a weak seed prompt and evaluate its baseline performance on both the training and validation sets. We then define the optimization objective, background scoring rules, and GEPA configuration, including parallel evaluation and the reflection model. Finally, we run optimize_anything so GEPA can evolve the instruction and format-rule fields using the evaluator feedback.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Comparing the Baseline and GEPA-Optimized Prompts on the Validation Set<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">best = result.best_candidate\nprint(\"n\" + \"=\" * 78)\nprint(\"OPTIMIZED CANDIDATE\")\nprint(\"=\" * 78)\nprint(\"n--- instructions ---n\" + textwrap.fill(best[\"instructions\"], 96))\nprint(\"n--- format_rules ---n\" + textwrap.fill(best[\"format_rules\"], 96))\nprint(\"n\" + \"=\" * 78)\nprint(\"BEFORE vs AFTER  (held-out validation set)\")\nprint(\"=\" * 78)\nprint(\"Seed prompt:\"); _ = eval_set(seed_candidate, valset, \"val-seed\")\nprint(\"GEPA  prompt:\"); _ = eval_set(best,          valset, \"val-gepa\")\nprint(f\"nBaseline val avg_score : {base_val[0]:.3f}\")\nprint(\"n\" + \"=\" * 78)\nprint(\"EVOLUTION HISTORY (candidate index -&gt; val score, parents)\")\nprint(\"=\" * 78)\ncands   = getattr(result, \"candidates\", [])\nvscores = getattr(result, \"val_aggregate_scores\", [])\nparents = getattr(result, \"parents\", [None] * len(cands))\nfor i, sc in enumerate(vscores):\n   par = parents[i] if i &lt; len(parents) else None\n   tag = \"  &lt;-- BEST\" if cands and cands[i] == best else \"\"\n   print(f\"  cand {i:2d}: val_score={sc:.3f}  parents={par}{tag}\")\nprint(f\"nTotal metric calls used : {getattr(result, 'total_metric_calls', 'n\/a')}\")\nprint(f\"Full validation evals   : {getattr(result, 'num_full_val_evals', 'n\/a')}\")\nprint(\"nDone. Try raising MAX_METRIC_CALLS or swapping REFLECTION_LM for a stronger model.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">We extract the best prompt found by GEPA and print its optimized instruction and format-rule components. We compare the seed prompt and the GEPA-optimized prompt on the held-out validation set to check whether the improvement transfers to unseen examples. We also inspect the evolution history, validation scores, parent relationships, and total metric calls to understand how the prompt improved over the course of optimization.<\/p>\n<p class=\"wp-block-paragraph\">In conclusion, we used GEPA to show how prompt optimization can move beyond manual trial and error. We created a complete workflow where a task model solves examples, an evaluator scores the outputs, and a reflection model uses detailed feedback to propose better prompts. We also tested the optimized prompt on unseen validation problems, which helps us assess whether the improvement generalizes rather than merely fitting the training set. Also, we built a practical example of reflective prompt evolution in which structured feedback, strict evaluation, and iterative refinement work together to produce a stronger, more reliable prompt.<\/p>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/MARKTECHPOST-AI-MEDIA-INC\/AI-Agents-Projects-Tutorials\/blob\/main\/LLM%20Evaluation\/gepa_reflective_prompt_evolution_feedback_validation_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes with Notebook<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/07\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\">Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we use GEPA as a reflective prompt-evolution framework to improve the way a language model solves arithmetic word problems. We begin with a weak seed prompt, create a small deterministic benchmark, define a structured evaluator, and pass actionable feedback to GEPA so it can understand why a candidate prompt fails. We also use a multi-component prompt setup in which both the instruction field and the output-format rules evolve together. By the end, we compare the baseline prompt with the optimized prompt on a held-out validation set and inspect how the evolutionary process improves performance. Installing GEPA and LiteLLM and Configuring the Task and Reflection Models Copy CodeCopiedUse a different Browser !pip install -q gepa litellm import os, re, json, random, getpass, textwrap import litellm import gepa.optimize_anything as oa from gepa.optimize_anything import ( optimize_anything, GEPAConfig, EngineConfig, ReflectionConfig, ) litellm.suppress_debug_info = True if not os.environ.get(&#8220;OPENAI_API_KEY&#8221;): os.environ[&#8220;OPENAI_API_KEY&#8221;] = getpass.getpass(&#8220;Enter your OpenAI API key: &#8220;) TASK_LM = &#8220;openai\/gpt-4o-mini&#8221; REFLECTION_LM = &#8220;openai\/gpt-4.1&#8221; MAX_METRIC_CALLS = 100 We install GEPA and LiteLLM, then import the required libraries for prompt optimization and model calls. We securely set up the OpenAI API key and define two models: a task model that solves the problem and a reflection model that improves the prompt. We also set the maximum metric-call budget to keep the optimization process under control. Building a Deterministic Arithmetic Benchmark Dataset Copy CodeCopiedUse a different Browser def make_problems(n, seed=0): rng = random.Random(seed) out = [] for _ in range(n): t = rng.choice([&#8220;discount&#8221;, &#8220;travel&#8221;, &#8220;wallet&#8221;, &#8220;chain&#8221;]) if t == &#8220;discount&#8221;: unit = rng.choice([40, 60, 80, 120]) qty = rng.choice([5, 6, 8, 10]) disc = rng.choice([10, 20, 25, 50]) total = unit * qty gold = total &#8211; total * disc \/\/ 100 q = (f&#8221;A shop sells notebooks at {unit} rupees each. You buy {qty} &#8221; f&#8221;notebooks and get a {disc}% discount on the total bill. &#8221; f&#8221;How many rupees do you pay in total?&#8221;) elif t == &#8220;travel&#8221;: s1, h1 = rng.choice([40, 50, 60]), rng.choice([2, 3]) s2, h2 = rng.choice([30, 45, 70]), rng.choice([1, 2, 3]) gold = s1 * h1 + s2 * h2 q = (f&#8221;A car drives at {s1} km\/h for {h1} hours, then at {s2} km\/h &#8221; f&#8221;for {h2} hours. What is the total distance travelled, in km?&#8221;) elif t == &#8220;wallet&#8221;: tens = rng.choice([3, 5, 7, 9]) fifties= rng.choice([2, 4, 6]) spent = rng.choice([50, 80, 110, 150]) gold = tens * 10 + fifties * 50 &#8211; spent q = (f&#8221;You have {tens} ten-rupee notes and {fifties} fifty-rupee &#8221; f&#8221;notes. You spend {spent} rupees. How many rupees are left?&#8221;) else: x = rng.choice([6, 9, 12, 15]); y = rng.choice([4, 7, 10]); z = rng.choice([3, 8, 11]) gold = x * 2 &#8211; y + z q = (f&#8221;Start with the number {x}. Double it, then subtract {y}, &#8221; f&#8221;then add {z}. What number do you end with?&#8221;) out.append({&#8220;question&#8221;: q, &#8220;answer&#8221;: gold}) return out all_problems = make_problems(18, seed=42) random.Random(1).shuffle(all_problems) trainset = all_problems[:12] valset = all_problems[12:] print(f&#8221;Dataset: {len(trainset)} train \/ {len(valset)} val problemsn&#8221;) We create a small deterministic dataset of arithmetic word problems covering discounts, travel distance, wallet calculations, and chained operations. We generate the correct answer for each problem programmatically, which keeps the benchmark reliable and easy to evaluate. We then shuffle the examples and split them into a training set for optimization and a validation set for testing generalization. Defining the Evaluator and Structured Feedback for GEPA Copy CodeCopiedUse a different Browser def build_system_prompt(candidate: dict) -&gt; str: return (f&#8221;{candidate[&#8216;instructions&#8217;]}nn&#8221; f&#8221;OUTPUT FORMAT RULES:n{candidate[&#8216;format_rules&#8217;]}&#8221;) def call_task_lm(system_prompt: str, question: str) -&gt; str: for attempt in range(3): try: r = litellm.completion( model=TASK_LM, messages=[{&#8220;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: system_prompt}, {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: question}], temperature=0, max_tokens=600, timeout=60, ) return r[&#8220;choices&#8221;][0][&#8220;message&#8221;][&#8220;content&#8221;] or &#8220;&#8221; except Exception as e: if attempt == 2: return f&#8221;[LM_ERROR] {e}&#8221; return &#8220;&#8221; def parse_answers(text: str): formatted = re.search(r&#8221;####s*(-?d+)&#8221;, text) all_nums = re.findall(r&#8221;-?d+&#8221;, text) fmt_val = int(formatted.group(1)) if formatted else None last_val = int(all_nums[-1]) if all_nums else None return fmt_val, last_val def evaluate(candidate: dict, example: dict): system = build_system_prompt(candidate) raw = call_task_lm(system, example[&#8220;question&#8221;]) gold = example[&#8220;answer&#8221;] fmt_val, last_val = parse_answers(raw) if fmt_val is not None and fmt_val == gold: score, fb = 1.0, &#8220;Correct and correctly formatted.&#8221; elif fmt_val is not None and fmt_val != gold: score, fb = 0.0, (f&#8221;WRONG ANSWER. You output &#8216;#### {fmt_val}&#8217; but the &#8221; f&#8221;correct answer is {gold}. Re-check the arithmetic and &#8221; f&#8221;the order of the steps.&#8221;) elif last_val == gold: score, fb = 0.5, (f&#8221;Right number ({gold}) but FORMAT VIOLATION: the final &#8221; f&#8221;line was not exactly &#8216;#### {gold}&#8217;. Always end with a &#8221; f&#8221;line of the form &#8216;#### &lt;integer&gt;&#8217; and nothing else.&#8221;) else: score, fb = 0.0, (f&#8221;WRONG. Correct answer is {gold}. The model&#8217;s final &#8221; f&#8221;number was {last_val}. Likely a multi-step reasoning &#8221; f&#8221;slip; show each step and verify before answering.&#8221;) oa.log(f&#8221;score={score} gold={gold} parsed_fmt={fmt_val} parsed_last={last_val}&#8221;) side_info = { &#8220;feedback&#8221;: fb, &#8220;problem&#8221;: example[&#8220;question&#8221;], &#8220;gold_answer&#8221;: gold, &#8220;model_output&#8221;: raw[:500], } return score, side_info def eval_set(candidate, dataset, label=&#8221;&#8221;): scores, exact, formatted = [], 0, 0 for ex in dataset: s, info = evaluate(candidate, ex) scores.append(s) if s == 1.0: exact += 1; formatted += 1 elif s == 0.5: formatted += 0 acc = exact \/ len(dataset) avg = sum(scores) \/ len(dataset) print(f&#8221; [{label}] avg_score={avg:.3f} exact_correct+formatted={exact}\/{len(dataset)}&#8221;) return avg, acc We define how the candidate prompt is converted into a system prompt and how the task model receives each question. We also create the evaluator that parses the model output, checks whether the final answer follows the required #### &lt;integer&gt; format, and assigns a score. We return structured feedback as actionable side information so that GEPA can determine whether the issue is incorrect reasoning, poor formatting, or both. Configuring GEPA and Running the Prompt Optimization Copy CodeCopiedUse a different Browser seed_candidate = { &#8220;instructions&#8221;: &#8220;Solve the math problem.&#8221;, &#8220;format_rules&#8221;: &#8220;Give the answer.&#8221;, } print(&#8220;=== BASELINE (seed prompt) ===&#8221;) print(&#8220;Train:&#8221;); base_train = eval_set(seed_candidate, trainset, &#8220;train&#8221;) print(&#8220;Val: &#8220;); base_val = eval_set(seed_candidate, valset, &#8220;val&#8221;) print() objective = ( &#8220;Evolve a system prompt (the &#8216;instructions&#8217; and &#8216;format_rules&#8217; fields) so a &#8221; &#8220;small LLM reliably solves multi-step<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-95766","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-07T17:38:58+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation\",\"datePublished\":\"2026-06-07T17:38:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\"},\"wordCount\":674,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\",\"url\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\",\"name\":\"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-06-07T17:38:58+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/","og_locale":"zh_CN","og_type":"article","og_title":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-06-07T17:38:58+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"8 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation","datePublished":"2026-06-07T17:38:58+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/"},"wordCount":674,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/","url":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/","name":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-06-07T17:38:58+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/building-reflective-prompt-optimization-with-gepa-multi-component-prompts-structured-feedback-and-held-out-validation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we use GEPA as a reflective prompt-evolution framework to improve the way a language model solves arithmetic word problems. We begin with a weak seed prompt, create a small deterministic benchmark, define a structured evaluator, and pass actionable feedback to GEPA so it can understand why a candidate prompt fails. We also&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/95766","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=95766"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/95766\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=95766"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=95766"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=95766"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}