{"id":36615,"date":"2025-09-07T06:27:29","date_gmt":"2025-09-07T06:27:29","guid":{"rendered":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/"},"modified":"2025-09-07T06:27:29","modified_gmt":"2025-09-07T06:27:29","slug":"from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem","status":"publish","type":"post","link":"https:\/\/youzum.net\/ja\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/","title":{"rendered":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem"},"content":{"rendered":"<p>Large language models (LLMs) very often generate \u201challucinations\u201d\u2014confident yet incorrect outputs that appear plausible. Despite improvements in training methods and architectures, hallucinations persist. <strong>A new research from OpenAI<\/strong> provides a rigorous explanation: hallucinations stem from statistical properties of supervised versus self-supervised learning, and their persistence is reinforced by misaligned evaluation benchmarks.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What Makes Hallucinations Statistically Inevitable?<\/strong><\/h2>\n<p>The research team explains hallucinations as errors inherent to generative modeling. Even with perfectly clean training data, the cross-entropy objective used in pretraining introduces statistical pressures that produce errors.<\/p>\n<p>The research team reduce the problem to a supervised binary classification task called <em>Is-It-Valid (IIV)<\/em>: determining whether a model\u2019s output is valid or erroneous. They prove that the generative error rate of an LLM is at least twice its IIV misclassification rate. In other words, hallucinations occur for the same reasons misclassifications appear in supervised learning: epistemic uncertainty, poor models, distribution shift, or noisy data.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Why Do Rare Facts Trigger More Hallucinations?<\/strong><\/h2>\n<p>One major driver is the <strong>singleton rate<\/strong>\u2014the fraction of facts that appear only once in training data. By analogy to Good\u2013Turing missing-mass estimation, if 20% of facts are singletons, at least 20% of them will be hallucinated. This explains why LLMs answer reliably about widely repeated facts (e.g., Einstein\u2019s birthday) but fail on obscure or rarely mentioned ones.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Can Poor Model Families Lead to Hallucinations?<\/strong><\/h2>\n<p>Yes. Hallucinations also emerge when the model class cannot adequately represent a pattern. Classic examples include n-gram models generating ungrammatical sentences, or modern tokenized models miscounting letters because characters are hidden inside subword tokens. These representational limits cause systematic errors even when the data itself is sufficient.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"508\" data-attachment-id=\"74352\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/09\/06\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/screenshot-2025-09-06-at-9-55-36-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1.png\" data-orig-size=\"1424,706\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-09-06 at 9.55.36\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-300x149.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508.png\" alt=\"\" class=\"wp-image-74352\" \/><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Why Doesn\u2019t Post-Training Eliminate Hallucinations?<\/strong><\/h2>\n<p>Post-training methods such as RLHF (reinforcement learning from human feedback), DPO, and RLAIF reduce some errors, especially harmful or conspiratorial outputs. But overconfident hallucinations remain because evaluation incentives are misaligned.<\/p>\n<p>Like students guessing on multiple-choice exams, LLMs are rewarded for bluffing when unsure. Most benchmarks\u2014such as MMLU, GPQA, and SWE-bench\u2014apply binary scoring: correct answers get credit, abstentions (\u201cI don\u2019t know\u201d) get none, and incorrect answers are penalized no more harshly than abstentions. Under this scheme, guessing maximizes benchmark scores, even if it fosters hallucinations.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How Do Leaderboards Reinforce Hallucinations?<\/strong><\/h2>\n<p>A review of popular benchmarks shows that nearly all use binary grading with no partial credit for uncertainty. As a result, models that truthfully express uncertainty perform worse than those that always guess. This creates systemic pressure for developers to optimize models for confident answers rather than calibrated ones.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What Changes Could Reduce Hallucinations?<\/strong><\/h2>\n<p>The research team argue that fixing hallucinations requires socio-technical change, not just new evaluation suites. They propose <strong>explicit confidence targets<\/strong>: benchmarks should clearly specify penalties for wrong answers and partial credit for abstentions.<\/p>\n<p>For example: <em>\u201cAnswer only if you are &gt;75% confident. Mistakes lose 2 points; correct answers earn 1; \u2018I don\u2019t know\u2019 earns 0.\u201d<\/em><\/p>\n<p>This design mirrors real-world exams like earlier SAT and GRE formats, where guessing carried penalties. It encourages <strong>behavioral calibration<\/strong>\u2014models abstain when their confidence is below the threshold, producing fewer overconfident hallucinations while still optimizing for benchmark performance.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What Are the Broader Implications?<\/strong><\/h2>\n<p>This work reframes hallucinations as predictable outcomes of training objectives and evaluation misalignment rather than inexplicable quirks. The findings highlight:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Pretraining inevitability:<\/strong> Hallucinations parallel misclassification errors in supervised learning.<\/li>\n<li><strong>Post-training reinforcement:<\/strong> Binary grading schemes incentivize guessing.<\/li>\n<li><strong>Evaluation reform:<\/strong> Adjusting mainstream benchmarks to reward uncertainty can realign incentives and improve trustworthiness.<\/li>\n<\/ul>\n<p>By connecting hallucinations to established learning theory, the research demystifies their origin and suggests practical mitigation strategies that shift responsibility from model architectures to evaluation design.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/cdn.openai.com\/pdf\/d04913be-3f6f-4d2b-b283-ff432ef4aaa5\/why-language-models-hallucinate.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">PAPER<\/a> and <a href=\"https:\/\/openai.com\/index\/why-language-models-hallucinate\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details here<\/a><em>.<\/em><\/strong>\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/09\/06\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\">From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Large language models (LLMs) very often generate \u201challucinations\u201d\u2014confident yet incorrect outputs that appear plausible. Despite improvements in training methods and architectures, hallucinations persist. A new research from OpenAI provides a rigorous explanation: hallucinations stem from statistical properties of supervised versus self-supervised learning, and their persistence is reinforced by misaligned evaluation benchmarks. What Makes Hallucinations Statistically Inevitable? The research team explains hallucinations as errors inherent to generative modeling. Even with perfectly clean training data, the cross-entropy objective used in pretraining introduces statistical pressures that produce errors. The research team reduce the problem to a supervised binary classification task called Is-It-Valid (IIV): determining whether a model\u2019s output is valid or erroneous. They prove that the generative error rate of an LLM is at least twice its IIV misclassification rate. In other words, hallucinations occur for the same reasons misclassifications appear in supervised learning: epistemic uncertainty, poor models, distribution shift, or noisy data. Why Do Rare Facts Trigger More Hallucinations? One major driver is the singleton rate\u2014the fraction of facts that appear only once in training data. By analogy to Good\u2013Turing missing-mass estimation, if 20% of facts are singletons, at least 20% of them will be hallucinated. This explains why LLMs answer reliably about widely repeated facts (e.g., Einstein\u2019s birthday) but fail on obscure or rarely mentioned ones. Can Poor Model Families Lead to Hallucinations? Yes. Hallucinations also emerge when the model class cannot adequately represent a pattern. Classic examples include n-gram models generating ungrammatical sentences, or modern tokenized models miscounting letters because characters are hidden inside subword tokens. These representational limits cause systematic errors even when the data itself is sufficient. Why Doesn\u2019t Post-Training Eliminate Hallucinations? Post-training methods such as RLHF (reinforcement learning from human feedback), DPO, and RLAIF reduce some errors, especially harmful or conspiratorial outputs. But overconfident hallucinations remain because evaluation incentives are misaligned. Like students guessing on multiple-choice exams, LLMs are rewarded for bluffing when unsure. Most benchmarks\u2014such as MMLU, GPQA, and SWE-bench\u2014apply binary scoring: correct answers get credit, abstentions (\u201cI don\u2019t know\u201d) get none, and incorrect answers are penalized no more harshly than abstentions. Under this scheme, guessing maximizes benchmark scores, even if it fosters hallucinations. How Do Leaderboards Reinforce Hallucinations? A review of popular benchmarks shows that nearly all use binary grading with no partial credit for uncertainty. As a result, models that truthfully express uncertainty perform worse than those that always guess. This creates systemic pressure for developers to optimize models for confident answers rather than calibrated ones. What Changes Could Reduce Hallucinations? The research team argue that fixing hallucinations requires socio-technical change, not just new evaluation suites. They propose explicit confidence targets: benchmarks should clearly specify penalties for wrong answers and partial credit for abstentions. For example: \u201cAnswer only if you are &gt;75% confident. Mistakes lose 2 points; correct answers earn 1; \u2018I don\u2019t know\u2019 earns 0.\u201d This design mirrors real-world exams like earlier SAT and GRE formats, where guessing carried penalties. It encourages behavioral calibration\u2014models abstain when their confidence is below the threshold, producing fewer overconfident hallucinations while still optimizing for benchmark performance. What Are the Broader Implications? This work reframes hallucinations as predictable outcomes of training objectives and evaluation misalignment rather than inexplicable quirks. The findings highlight: Pretraining inevitability: Hallucinations parallel misclassification errors in supervised learning. Post-training reinforcement: Binary grading schemes incentivize guessing. Evaluation reform: Adjusting mainstream benchmarks to reward uncertainty can realign incentives and improve trustworthiness. By connecting hallucinations to established learning theory, the research demystifies their origin and suggests practical mitigation strategies that shift responsibility from model architectures to evaluation design. Check out the\u00a0PAPER and Technical details here.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":36616,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-36615","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/ja\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/ja\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-07T06:27:29+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem\",\"datePublished\":\"2025-09-07T06:27:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\"},\"wordCount\":677,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\",\"url\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\",\"name\":\"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp\",\"datePublished\":\"2025-09-07T06:27:29+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp\",\"width\":1024,\"height\":508},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/ja\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/ja\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/","og_locale":"ja_JP","og_type":"article","og_title":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/ja\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-09-07T06:27:29+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"admin NU","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"3\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem","datePublished":"2025-09-07T06:27:29+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/"},"wordCount":677,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"ja","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/","url":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/","name":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp","datePublished":"2025-09-07T06:27:29+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp","width":1024,"height":508},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/from-pretraining-to-post-training-why-language-models-hallucinate-and-how-evaluation-methods-reinforce-the-problem\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/ja\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-150x150.webp",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-300x149.webp",300,149,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti.webp",1024,508,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-18x9.webp",18,9,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-300x300.webp",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-600x298.webp",600,298,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-06-at-9.55.36-PM-1-1024x508-XMe8Ti-100x100.webp",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/ja\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/ja\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Large language models (LLMs) very often generate \u201challucinations\u201d\u2014confident yet incorrect outputs that appear plausible. Despite improvements in training methods and architectures, hallucinations persist. A new research from OpenAI provides a rigorous explanation: hallucinations stem from statistical properties of supervised versus self-supervised learning, and their persistence is reinforced by misaligned evaluation benchmarks. What Makes Hallucinations Statistically&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/36615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/comments?post=36615"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/36615\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media\/36616"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media?parent=36615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/categories?post=36615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/tags?post=36615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}