{"id":28733,"date":"2025-08-01T05:49:40","date_gmt":"2025-08-01T05:49:40","guid":{"rendered":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/"},"modified":"2025-08-01T05:49:40","modified_gmt":"2025-08-01T05:49:40","slug":"transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/","title":{"rendered":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs"},"content":{"rendered":"<p>Translation systems powered by LLMs have become so advanced that they can outperform human translators in some cases. As LLMs improve, especially in complex tasks such as document-level or literary translation, it becomes increasingly challenging to make further progress and to accurately evaluate that progress. Traditional automated metrics, such as BLEU, are still used but fail to explain why a score is given. With translation quality reaching near-human levels, users require evaluations that extend beyond numerical metrics, providing reasoning across key dimensions, such as accuracy, terminology, and audience suitability. This transparency enables users to assess evaluations, identify errors, and make more informed decisions.\u00a0<\/p>\n<p>While BLEU has long been the standard for evaluating machine translation (MT), its usefulness is fading as modern systems now rival or outperform human translators. Newer metrics, such as BLEURT, COMET, and MetricX, fine-tune powerful language models to assess translation quality more accurately. Large models, such as GPT and PaLM2, can now offer zero-shot or structured evaluations, even generating MQM-style feedback. Techniques such as pairwise comparison have also enhanced alignment with human judgments. Recent studies have shown that asking models to explain their choices improves decision quality; yet, such rationale-based methods are still underutilized in MT evaluation, despite their growing potential.\u00a0<\/p>\n<p>Researchers at Sakana.ai have developed TransEvalnia, a translation evaluation and ranking system that uses prompting-based reasoning to assess translation quality. It provides detailed feedback using selected MQM dimensions, ranks translations, and assigns scores on a 5-point Likert scale, including an overall rating. The system performs competitively with, or even better than, the leading MT-Ranker model across several language pairs and tasks, including English-Japanese, Chinese-English, and more. Tested with LLMs like Claude 3.5 and Qwen-2.5, its judgments aligned well with human ratings. The team also tackled position bias and has released all data, reasoning outputs, and code for public use.\u00a0<\/p>\n<p>The methodology centers on evaluating translations across key quality aspects, including accuracy, terminology, audience suitability, and clarity. For poetic texts like haikus, emotional tone replaces standard grammar checks. Translations are broken down and assessed span by span, scored on a 1\u20135 scale, and then ranked. To reduce bias, the study compares three evaluation strategies: single-step, two-step, and a more reliable interleaving method. A \u201cno-reasoning\u201d method is also tested but lacks transparency and is prone to bias. Finally, human experts reviewed selected translations to compare their judgments with those of the system, offering insights into its alignment with professional standards.\u00a0<\/p>\n<p>The researchers evaluated translation ranking systems using datasets with human scores, comparing their TransEvalnia models (Qwen and Sonnet) with MT-Ranker, COMET-22\/23, XCOMET-XXL, and MetricX-XXL. On WMT-2024 en-es, MT-Ranker performed best, likely due to rich training data. However, in most other datasets, TransEvalnia matched or outperformed MT-Ranker; for example, Qwen\u2019s no-reasoning approach led to a win on WMT-2023 en-de. Position bias was analyzed using inconsistency scores, where interleaved methods often had the lowest bias (e.g., 1.04 on Hard en-ja). Human raters gave Sonnet the highest overall Likert scores (4.37\u20134.61), with Sonnet\u2019s evaluations correlating well with human judgment (Spearman\u2019s R~0.51\u20130.54).\u00a0<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF?key=K3VVRJrsRqasAgD69In5NQ\" alt=\"\" \/><\/figure>\n<\/div>\n<p>In conclusion, TransEvalnia is a prompting-based system for evaluating and ranking translations using LLMs like Claude 3.5 Sonnet and Qwen. The system provides detailed scores across key quality dimensions, inspired by the MQM framework, and selects the better translation among options. It often matches or outperforms MT-Ranker on several WMT language pairs, although MetricX-XXL leads on WMT due to fine-tuning. Human raters found Sonnet\u2019s outputs to be reliable, and scores showed a strong correlation with human judgments. Fine-tuning Qwen improved performance notably. The team also explored solutions to position bias, a persistent challenge in ranking systems, and shared all evaluation data and code.\u00a0<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/abs\/2507.12724\" target=\"_blank\" rel=\"noreferrer noopener\">Paper here<\/a><em>.<\/em><\/strong>\u00a0Feel free to\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">check our Tutorials page on AI Agent and Agentic AI for various applications<\/a><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/07\/31\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\">TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Translation systems powered by LLMs have become so advanced that they can outperform human translators in some cases. As LLMs improve, especially in complex tasks such as document-level or literary translation, it becomes increasingly challenging to make further progress and to accurately evaluate that progress. Traditional automated metrics, such as BLEU, are still used but fail to explain why a score is given. With translation quality reaching near-human levels, users require evaluations that extend beyond numerical metrics, providing reasoning across key dimensions, such as accuracy, terminology, and audience suitability. This transparency enables users to assess evaluations, identify errors, and make more informed decisions.\u00a0 While BLEU has long been the standard for evaluating machine translation (MT), its usefulness is fading as modern systems now rival or outperform human translators. Newer metrics, such as BLEURT, COMET, and MetricX, fine-tune powerful language models to assess translation quality more accurately. Large models, such as GPT and PaLM2, can now offer zero-shot or structured evaluations, even generating MQM-style feedback. Techniques such as pairwise comparison have also enhanced alignment with human judgments. Recent studies have shown that asking models to explain their choices improves decision quality; yet, such rationale-based methods are still underutilized in MT evaluation, despite their growing potential.\u00a0 Researchers at Sakana.ai have developed TransEvalnia, a translation evaluation and ranking system that uses prompting-based reasoning to assess translation quality. It provides detailed feedback using selected MQM dimensions, ranks translations, and assigns scores on a 5-point Likert scale, including an overall rating. The system performs competitively with, or even better than, the leading MT-Ranker model across several language pairs and tasks, including English-Japanese, Chinese-English, and more. Tested with LLMs like Claude 3.5 and Qwen-2.5, its judgments aligned well with human ratings. The team also tackled position bias and has released all data, reasoning outputs, and code for public use.\u00a0 The methodology centers on evaluating translations across key quality aspects, including accuracy, terminology, audience suitability, and clarity. For poetic texts like haikus, emotional tone replaces standard grammar checks. Translations are broken down and assessed span by span, scored on a 1\u20135 scale, and then ranked. To reduce bias, the study compares three evaluation strategies: single-step, two-step, and a more reliable interleaving method. A \u201cno-reasoning\u201d method is also tested but lacks transparency and is prone to bias. Finally, human experts reviewed selected translations to compare their judgments with those of the system, offering insights into its alignment with professional standards.\u00a0 The researchers evaluated translation ranking systems using datasets with human scores, comparing their TransEvalnia models (Qwen and Sonnet) with MT-Ranker, COMET-22\/23, XCOMET-XXL, and MetricX-XXL. On WMT-2024 en-es, MT-Ranker performed best, likely due to rich training data. However, in most other datasets, TransEvalnia matched or outperformed MT-Ranker; for example, Qwen\u2019s no-reasoning approach led to a win on WMT-2023 en-de. Position bias was analyzed using inconsistency scores, where interleaved methods often had the lowest bias (e.g., 1.04 on Hard en-ja). Human raters gave Sonnet the highest overall Likert scores (4.37\u20134.61), with Sonnet\u2019s evaluations correlating well with human judgment (Spearman\u2019s R~0.51\u20130.54).\u00a0 In conclusion, TransEvalnia is a prompting-based system for evaluating and ranking translations using LLMs like Claude 3.5 Sonnet and Qwen. The system provides detailed scores across key quality dimensions, inspired by the MQM framework, and selects the better translation among options. It often matches or outperforms MT-Ranker on several WMT language pairs, although MetricX-XXL leads on WMT due to fine-tuning. Human raters found Sonnet\u2019s outputs to be reliable, and scores showed a strong correlation with human judgments. Fine-tuning Qwen improved performance notably. The team also explored solutions to position bias, a persistent challenge in ranking systems, and shared all evaluation data and code.\u00a0 Check out the\u00a0Paper here.\u00a0Feel free to\u00a0check our Tutorials page on AI Agent and Agentic AI for various applications.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":28734,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-28733","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-01T05:49:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF?key=K3VVRJrsRqasAgD69In5NQ\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs\",\"datePublished\":\"2025-08-01T05:49:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\"},\"wordCount\":676,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\",\"url\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\",\"name\":\"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png\",\"datePublished\":\"2025-08-01T05:49:40+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png\",\"width\":880,\"height\":662},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/","og_locale":"de_DE","og_type":"article","og_title":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-01T05:49:40+00:00","og_image":[{"url":"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF?key=K3VVRJrsRqasAgD69In5NQ","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"3\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs","datePublished":"2025-08-01T05:49:40+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/"},"wordCount":676,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/","url":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/","name":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png","datePublished":"2025-08-01T05:49:40+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png","width":880,"height":662},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/transevalnia-a-prompting-based-system-for-fine-grained-human-aligned-translation-evaluation-using-llms\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-300x226.png",300,226,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ.png",880,662,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-16x12.png",16,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-600x451.png",600,451,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/AD_4nXd-Pn6Ucq2EiAeoduCTspbDYKnO86CAP0RVbOWsY9Jf5xaHLb24scTmVebYgYiukWXPlVs6Rb0Bp7ohHlu9oKkj8Of6U-4Sbxk-KITp9hGehRlC5CG91xQwh9RS3UnRrJ-ldArF-XusWhJ-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Translation systems powered by LLMs have become so advanced that they can outperform human translators in some cases. As LLMs improve, especially in complex tasks such as document-level or literary translation, it becomes increasingly challenging to make further progress and to accurately evaluate that progress. Traditional automated metrics, such as BLEU, are still used but&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/28733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=28733"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/28733\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/28734"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=28733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=28733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=28733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}