{"id":47048,"date":"2025-10-26T07:41:27","date_gmt":"2025-10-26T07:41:27","guid":{"rendered":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/"},"modified":"2025-10-26T07:41:27","modified_gmt":"2025-10-26T07:41:27","slug":"a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/","title":{"rendered":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models"},"content":{"rendered":"<p>AI companies use model specifications to define target behaviors during training and evaluation. Do current specs state the intended behaviors with enough precision, and do frontier models exhibit distinct behavioral profiles under the same spec? <strong>A team of researchers from Anthropic, Thinking Machines Lab and Constellation <\/strong>present a systematic method that stress tests model specs using value tradeoff scenarios, then quantifies cross model disagreement as a signal of gaps or contradictions in the spec. The research team analyzed 12 frontier LLMs from Anthropic, OpenAI, Google, and xAI and links high disagreement to specification violations, missing guidance on response quality, and evaluator ambiguity. The team also released a public dataset<\/p>\n<p>Model specifications are the written rules that alignment systems try to enforce. If a spec is complete and precise, models trained to follow it should not diverge widely on the same input. The research team operationalizes this intuition. It generates more than 300,000 scenarios that force a choice between two legitimate values, such as social equity and business effectiveness. It then scores responses on a 0 to 6 spectrum using value spectrum rubrics and measures disagreement as the standard deviation across models. High disagreement localizes the spec clauses that need clarification or additional examples.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1874\" height=\"1148\" data-attachment-id=\"75707\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-05-43-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1.png\" data-orig-size=\"1874,1148\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.05.43\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-300x184.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-1024x627.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1.png\" alt=\"\" class=\"wp-image-75707\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2510.07686<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>So, what is the method used in this research?<\/strong><\/h2>\n<p>The research team starts from a taxonomy of 3,307 fine grained values observed in natural Claude traffic, which is more granular than typical model specs. For each pair of values, they generate a neutral query and two biased variants that lean toward one value. They build value spectrum rubrics that map positions from 0, which means strongly opposing the value, to 6, which means strongly favoring the value. They classify responses from 12 models against these rubrics and define disagreement as the maximum standard deviation across the two value dimensions. To remove near duplicates while keeping the hard cases, they use a disagreement weighted k center selection with Gemini embeddings and a 2 approximation greedy algorithm.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1880\" height=\"604\" data-attachment-id=\"75709\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-07-25-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.07.25-PM-1.png\" data-orig-size=\"1880,604\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.07.25\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.07.25-PM-1-300x96.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.07.25-PM-1-1024x329.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.07.25-PM-1.png\" alt=\"\" class=\"wp-image-75709\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2510.07686<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Scale and releases<\/strong><\/h3>\n<p>The dataset on Hugging Face shows three subsets. The default split has about 132,000 rows, the complete split has about 411,000 rows, and the judge evaluations split has about 24,600 rows. The card lists modality, format as parquet, and license as Apache 2.0.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Understanding the Results<\/strong><\/h3>\n<p><strong>Disagreement predicts spec violations<\/strong>: Testing five OpenAI models against the public OpenAI model spec, high disagreement scenarios have 5 to 13 times higher frequent non compliance. The research team interprets the pattern as evidence of contradictions and ambiguities in the spec text rather than idiosyncrasies of a single model. <\/p>\n<p><strong>Specs lack granularity on quality inside the safe region<\/strong>: Some scenarios produce responses that all pass compliance, yet differ in helpfulness. For instance, one model refuses and offers safe alternatives, while another only refuses. The spec accepts both, which indicates missing guidance on quality standards. <\/p>\n<p><strong>Evaluator models disagree on compliance<\/strong>: Three LLM judges, Claude 4 Sonnet, o3, and Gemini 2.5 Pro, show only moderate agreement with Fleiss Kappa near 0.42. The blog attributes conflicts to interpretive differences such as conscientious pushback versus transformation exceptions. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1630\" height=\"854\" data-attachment-id=\"75711\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-17-21-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.17.21-PM-1.png\" data-orig-size=\"1630,854\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.17.21\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.17.21-PM-1-300x157.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.17.21-PM-1-1024x537.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.17.21-PM-1.png\" alt=\"\" class=\"wp-image-75711\" \/><figcaption class=\"wp-element-caption\">https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/<\/figcaption><\/figure>\n<\/div>\n<p><strong>Provider level character patterns<\/strong>: Aggregating high disagreement scenarios reveals consistent value preferences. Claude models prioritize ethical responsibility and intellectual integrity and objectivity. OpenAI models tend to favor efficiency and resource optimization. Gemini 2.5 Pro and Grok more often emphasize emotional depth and authentic connection. Other values, such as business effectiveness, personal growth and wellbeing, and social equity and justice, show mixed patterns across providers.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1600\" height=\"982\" data-attachment-id=\"75713\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-19-20-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.19.20-PM-1.png\" data-orig-size=\"1600,982\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.19.20\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.19.20-PM-1-300x184.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.19.20-PM-1-1024x628.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.19.20-PM-1.png\" alt=\"\" class=\"wp-image-75713\" \/><\/figure>\n<\/div>\n<p><strong>Refusals and false positives<\/strong>: The analysis shows topic sensitive refusal spikes. It documents false positive refusals, including legitimate synthetic biology study plans and standard Rust unsafe types that are often safe in context. Claude models are the most cautious by rate of refusal and often provide alternative suggestions, and o3 most often issues direct refusals without elaboration. All models show high refusal rates on child grooming risks.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1546\" height=\"894\" data-attachment-id=\"75715\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-20-51-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.20.51-PM-1.png\" data-orig-size=\"1546,894\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.20.51\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.20.51-PM-1-300x173.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.20.51-PM-1-1024x592.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.20.51-PM-1.png\" alt=\"\" class=\"wp-image-75715\" \/><figcaption class=\"wp-element-caption\">https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/<\/figcaption><\/figure>\n<\/div>\n<p><strong>Outliers reveal misalignment and over conservatism<\/strong>: Grok 4 and Claude 3.5 Sonnet produce the most outlier responses, but for different reasons. Grok is more permissive on requests that others consider harmful. Claude 3.5 sometimes over rejects benign content. Outlier mining is a useful lens for locating both safety gaps and excessive filtering. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1590\" height=\"1046\" data-attachment-id=\"75717\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/screenshot-2025-10-25-at-7-21-37-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.21.37-PM-1.png\" data-orig-size=\"1590,1046\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-25 at 7.21.37\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.21.37-PM-1-300x197.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.21.37-PM-1-1024x674.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.21.37-PM-1.png\" alt=\"\" class=\"wp-image-75717\" \/><figcaption class=\"wp-element-caption\">https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li>Method and scale: The study stress-tests model specs using value-tradeoff scenarios generated from a 3,307-value taxonomy, producing 300,000+ scenarios and evaluating 12 frontier LLMs across Anthropic, OpenAI, Google, and xAI.<\/li>\n<li>Disagreement \u21d2 spec problems: High cross-model disagreement strongly predicts issues in specs, including contradictions and coverage gaps. In tests against the OpenAI model spec, high-disagreement items show 5 to 13\u00d7 higher frequent non-compliance.<\/li>\n<li>Public release: The team released a dataset for independent auditing and reproduction.<\/li>\n<li>Provider-level behavior: Aggregated results reveal systematic value preferences, for example Claude prioritizes ethical responsibility, Gemini emphasizes emotional depth, while OpenAI and Grok optimize for efficiency. Some values, such as business effectiveness and social equity and justice, show mixed patterns.<\/li>\n<li>Refusals and outliers: High-disagreement slices expose both false-positive refusals on benign topics and permissive responses on risky ones. Outlier analysis identifies cases where one model diverges from at least 9 of the other 11, useful for pinpointing misalignment and over-conservatism.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Editorial Comments<\/strong><\/h3>\n<p>This research turns disagreement into a measurable diagnostic for spec quality, not a vibe. The research team generates 300,000 plus value trade off scenarios, scores responses on a 0 to 6 rubric, then uses cross model standard deviation to locate specification gaps. High disagreement predicts frequent non compliance by 5 to 13 times under the OpenAI model spec. Judge models show only moderate agreement, Fleiss Kappa near 0.42, which exposes interpretive ambiguity. Provider level value patterns are clear, Claude favors ethical responsibility, OpenAI favors efficiency and resource optimization, Gemini and Grok emphasize emotional depth and authentic connection. The dataset enables reproduction. Deploy this to debug specs before deployment, not after.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/abs\/2510.07686\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/huggingface.co\/datasets\/jifanz\/stress_testing_model_spec\" target=\"_blank\" rel=\"noreferrer noopener\">Dataset,<\/a> and <a href=\"https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>. Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/10\/25\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\">A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>AI companies use model specifications to define target behaviors during training and evaluation. Do current specs state the intended behaviors with enough precision, and do frontier models exhibit distinct behavioral profiles under the same spec? A team of researchers from Anthropic, Thinking Machines Lab and Constellation present a systematic method that stress tests model specs using value tradeoff scenarios, then quantifies cross model disagreement as a signal of gaps or contradictions in the spec. The research team analyzed 12 frontier LLMs from Anthropic, OpenAI, Google, and xAI and links high disagreement to specification violations, missing guidance on response quality, and evaluator ambiguity. The team also released a public dataset Model specifications are the written rules that alignment systems try to enforce. If a spec is complete and precise, models trained to follow it should not diverge widely on the same input. The research team operationalizes this intuition. It generates more than 300,000 scenarios that force a choice between two legitimate values, such as social equity and business effectiveness. It then scores responses on a 0 to 6 spectrum using value spectrum rubrics and measures disagreement as the standard deviation across models. High disagreement localizes the spec clauses that need clarification or additional examples. https:\/\/arxiv.org\/pdf\/2510.07686 So, what is the method used in this research? The research team starts from a taxonomy of 3,307 fine grained values observed in natural Claude traffic, which is more granular than typical model specs. For each pair of values, they generate a neutral query and two biased variants that lean toward one value. They build value spectrum rubrics that map positions from 0, which means strongly opposing the value, to 6, which means strongly favoring the value. They classify responses from 12 models against these rubrics and define disagreement as the maximum standard deviation across the two value dimensions. To remove near duplicates while keeping the hard cases, they use a disagreement weighted k center selection with Gemini embeddings and a 2 approximation greedy algorithm. https:\/\/arxiv.org\/pdf\/2510.07686 Scale and releases The dataset on Hugging Face shows three subsets. The default split has about 132,000 rows, the complete split has about 411,000 rows, and the judge evaluations split has about 24,600 rows. The card lists modality, format as parquet, and license as Apache 2.0. Understanding the Results Disagreement predicts spec violations: Testing five OpenAI models against the public OpenAI model spec, high disagreement scenarios have 5 to 13 times higher frequent non compliance. The research team interprets the pattern as evidence of contradictions and ambiguities in the spec text rather than idiosyncrasies of a single model. Specs lack granularity on quality inside the safe region: Some scenarios produce responses that all pass compliance, yet differ in helpfulness. For instance, one model refuses and offers safe alternatives, while another only refuses. The spec accepts both, which indicates missing guidance on quality standards. Evaluator models disagree on compliance: Three LLM judges, Claude 4 Sonnet, o3, and Gemini 2.5 Pro, show only moderate agreement with Fleiss Kappa near 0.42. The blog attributes conflicts to interpretive differences such as conscientious pushback versus transformation exceptions. https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/ Provider level character patterns: Aggregating high disagreement scenarios reveals consistent value preferences. Claude models prioritize ethical responsibility and intellectual integrity and objectivity. OpenAI models tend to favor efficiency and resource optimization. Gemini 2.5 Pro and Grok more often emphasize emotional depth and authentic connection. Other values, such as business effectiveness, personal growth and wellbeing, and social equity and justice, show mixed patterns across providers. Refusals and false positives: The analysis shows topic sensitive refusal spikes. It documents false positive refusals, including legitimate synthetic biology study plans and standard Rust unsafe types that are often safe in context. Claude models are the most cautious by rate of refusal and often provide alternative suggestions, and o3 most often issues direct refusals without elaboration. All models show high refusal rates on child grooming risks. https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/ Outliers reveal misalignment and over conservatism: Grok 4 and Claude 3.5 Sonnet produce the most outlier responses, but for different reasons. Grok is more permissive on requests that others consider harmful. Claude 3.5 sometimes over rejects benign content. Outlier mining is a useful lens for locating both safety gaps and excessive filtering. https:\/\/alignment.anthropic.com\/2025\/stress-testing-model-specs\/ Key Takeaways Method and scale: The study stress-tests model specs using value-tradeoff scenarios generated from a 3,307-value taxonomy, producing 300,000+ scenarios and evaluating 12 frontier LLMs across Anthropic, OpenAI, Google, and xAI. Disagreement \u21d2 spec problems: High cross-model disagreement strongly predicts issues in specs, including contradictions and coverage gaps. In tests against the OpenAI model spec, high-disagreement items show 5 to 13\u00d7 higher frequent non-compliance. Public release: The team released a dataset for independent auditing and reproduction. Provider-level behavior: Aggregated results reveal systematic value preferences, for example Claude prioritizes ethical responsibility, Gemini emphasizes emotional depth, while OpenAI and Grok optimize for efficiency. Some values, such as business effectiveness and social equity and justice, show mixed patterns. Refusals and outliers: High-disagreement slices expose both false-positive refusals on benign topics and permissive responses on risky ones. Outlier analysis identifies cases where one model diverges from at least 9 of the other 11, useful for pinpointing misalignment and over-conservatism. Editorial Comments This research turns disagreement into a measurable diagnostic for spec quality, not a vibe. The research team generates 300,000 plus value trade off scenarios, scores responses on a 0 to 6 rubric, then uses cross model standard deviation to locate specification gaps. High disagreement predicts frequent non compliance by 5 to 13 times under the OpenAI model spec. Judge models show only moderate agreement, Fleiss Kappa near 0.42, which exposes interpretive ambiguity. Provider level value patterns are clear, Claude favors ethical responsibility, OpenAI favors efficiency and resource optimization, Gemini and Grok emphasize emotional depth and authentic connection. The dataset enables reproduction. Deploy this to debug specs before deployment, not after. Check out the\u00a0Paper, Dataset, and Technical details. Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our<\/p>","protected":false},"author":2,"featured_media":47049,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-47048","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-26T07:41:27+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"5\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models\",\"datePublished\":\"2025-10-26T07:41:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\"},\"wordCount\":1055,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\",\"url\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\",\"name\":\"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png\",\"datePublished\":\"2025-10-26T07:41:27+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png\",\"width\":1874,\"height\":1148},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/","og_locale":"de_DE","og_type":"article","og_title":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-10-26T07:41:27+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"5\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models","datePublished":"2025-10-26T07:41:27+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/"},"wordCount":1055,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/","url":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/","name":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png","datePublished":"2025-10-26T07:41:27+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png","width":1874,"height":1148},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/a-new-ai-research-from-anthropic-and-thinking-machines-lab-stress-tests-model-specs-and-reveal-character-differences-among-language-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png",1874,1148,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png",1874,1148,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png",1874,1148,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-300x184.png",300,184,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-1024x627.png",1024,627,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-1536x941.png",1536,941,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo.png",1874,1148,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-600x368.png",600,368,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-25-at-7.05.43-PM-1-7B8Ujo-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"AI companies use model specifications to define target behaviors during training and evaluation. Do current specs state the intended behaviors with enough precision, and do frontier models exhibit distinct behavioral profiles under the same spec? A team of researchers from Anthropic, Thinking Machines Lab and Constellation present a systematic method that stress tests model specs&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/47048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=47048"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/47048\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/47049"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=47048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=47048"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=47048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}