{"id":42324,"date":"2025-10-05T07:23:31","date_gmt":"2025-10-05T07:23:31","guid":{"rendered":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/"},"modified":"2025-10-05T07:23:31","modified_gmt":"2025-10-05T07:23:31","slug":"how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise","status":"publish","type":"post","link":"https:\/\/youzum.net\/fr\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/","title":{"rendered":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise"},"content":{"rendered":"<div class=\"wp-block-yoast-seo-table-of-contents yoast-table-of-contents\">\n<h3><strong>Table of contents<\/strong><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-why-wer-isn-t-enough\" data-level=\"3\">Why WER Isn\u2019t Enough ?<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-what-to-measure-and-how\" data-level=\"3\">What to Measure (and How) ?<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-benchmark-landscape-what-each-covers\" data-level=\"3\">Benchmark Landscape: What Each Covers<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-filling-the-gaps-what-you-still-need-to-add\" data-level=\"3\">Filling the Gaps: What You Still Need to Add<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-a-concrete-reproducible-evaluation-plan\" data-level=\"3\">A Concrete, Reproducible Evaluation Plan<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#h-references\" data-level=\"3\">References<\/a><\/li>\n<\/ul>\n<\/div>\n<p>Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise\u2014alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet <em>speech-interaction<\/em> benchmark across general knowledge, instruction following, safety, and robustness to speaker\/environment\/content variations, but it does not cover barge-in or real-device task completion. SLUE (and Phase-2) target spoken language understanding (SLU); MASSIVE and Spoken-SQuAD probe multilingual and spoken QA; DSTC tracks add spoken, task-oriented robustness. Combine these with explicit barge-in\/endpointing tests, user-centric task-success measurement, and controlled noise-stress protocols to obtain a complete picture.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Why WER Isn\u2019t Enough<\/strong>?<\/h3>\n<p>WER measures transcription fidelity, not interaction quality. Two agents with similar WER can diverge widely in dialog success because latency, turn-taking, misunderstanding recovery, safety, and robustness to acoustic and content perturbations dominate user experience. Prior work on real systems shows the need to evaluate <em>user satisfaction<\/em> and <em>task success<\/em> directly\u2014e.g., Cortana\u2019s automatic online evaluation predicted user satisfaction from in-situ interaction signals, not only ASR accuracy. <\/p>\n<h3 class=\"wp-block-heading\"><strong>What to Measure (and How)<\/strong>?<\/h3>\n<h4 class=\"wp-block-heading\"><strong>1) End-to-End Task Success<\/strong><\/h4>\n<p><strong>Metric<\/strong>: Task Success Rate (TSR) with strict success criteria per task (goal completion, constraints met), plus <strong>Task Completion Time (TCT)<\/strong> and <strong>Turns-to-Success<\/strong>.<br \/><strong>Why.<\/strong> Real assistants are judged by outcomes. Competitions like Alexa Prize TaskBot explicitly measured users\u2019 ability to finish multi-step tasks (e.g., cooking, DIY) with ratings and completion.<\/p>\n<p><strong>Protocol.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Define tasks with verifiable endpoints (e.g., \u201cassemble shopping list with N items and constraints\u201d).<\/li>\n<li>Use blinded human raters and automatic logs to compute TSR\/TCT\/Turns.<\/li>\n<li>For multilingual\/SLU coverage, draw task intents\/slots from MASSIVE. <\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\"><strong>2) Barge-In and Turn-Taking<\/strong><\/h4>\n<p><strong>Metrics<\/strong>:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Barge-In Detection Latency (ms):<\/strong> time from user onset to TTS suppression.<\/li>\n<li><strong>True\/False Barge-In Rates:<\/strong> correct interruptions vs. spurious stops.<\/li>\n<li><strong>Endpointing Latency (ms):<\/strong> time to ASR finalization after user stop.<\/li>\n<\/ul>\n<p><strong>Why.<\/strong> Smooth interruption handling and fast endpointing determine perceived responsiveness. Research formalizes barge-in verification and continuous barge-in processing; endpointing latency continues to be an active area in streaming ASR.<\/p>\n<p><strong>Protocol.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Script prompts where the user interrupts TTS at controlled offsets and SNRs.<\/li>\n<li>Measure suppression and recognition timings with high-precision logs (frame timestamps).<\/li>\n<li>Include noisy\/echoic far-field conditions. Classic and modern studies provide recovery and signaling strategies that reduce false barge-ins.<\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\"><strong>3) Hallucination-Under-Noise (HUN)<\/strong><\/h4>\n<p><strong>Metric.<\/strong> <strong>HUN Rate:<\/strong> fraction of outputs that are fluent but semantically unrelated to the audio, under controlled noise or non-speech audio.<br \/><strong>Why.<\/strong> ASR and audio-LLM stacks can emit \u201cconvincing nonsense,\u201d especially with non-speech segments or noise overlays. Recent work defines and measures ASR hallucinations; targeted studies show Whisper hallucinations induced by non-speech sounds. <\/p>\n<p><strong>Protocol.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Construct audio sets with additive environmental noise (varied SNRs), non-speech distractors, and content disfluencies.<\/li>\n<li>Score semantic relatedness (human judgment with adjudication) and compute HUN.<\/li>\n<li>Track whether downstream agent actions propagate hallucinations to incorrect task steps.<\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\"><strong>4) Instruction Following, Safety, and Robustness<\/strong><\/h4>\n<p><strong>Metric Families.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Instruction-Following Accuracy<\/strong> (format and constraint adherence).<\/li>\n<li><strong>Safety Refusal Rate<\/strong> on adversarial spoken prompts.<\/li>\n<li><strong>Robustness Deltas<\/strong> across speaker age\/accent\/pitch, environment (noise, reverb, far-field), and content noise (grammar errors, disfluencies).<\/li>\n<\/ul>\n<p><strong>Why.<\/strong> VoiceBench explicitly targets these axes with spoken instructions (real and synthetic) spanning general knowledge, instruction following, and safety; it perturbs speaker, environment, and content to probe robustness. <\/p>\n<p><strong>Protocol.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>Use VoiceBench for breadth on speech-interaction capabilities; report aggregate and per-axis scores.<\/li>\n<li>For SLU specifics (NER, dialog acts, QA, summarization), leverage SLUE and Phase-2. <\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\"><strong>5) Perceptual Speech Quality (for TTS and Enhancement)<\/strong><\/h4>\n<p><strong>Metric.<\/strong> Subjective Mean Opinion Score via <strong>ITU-T P.808<\/strong> (crowdsourced ACR\/DCR\/CCR).<br \/><strong>Why.<\/strong> Interaction quality depends on <em>both<\/em> recognition and playback quality. P.808 gives a validated crowdsourcing protocol with open-source tooling.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Benchmark Landscape: What Each Covers<\/strong><\/h3>\n<h4 class=\"wp-block-heading\"><strong>VoiceBench (2024)<\/strong><\/h4>\n<p><strong>Scope:<\/strong> Multi-facet voice assistant evaluation with spoken inputs covering <strong>general knowledge<\/strong>, <strong>instruction following<\/strong>, <strong>safety<\/strong>, and <strong>robustness<\/strong> across speaker\/environment\/content variations; uses both real and synthetic speech.<br \/><strong>Limitations: <\/strong>Does <strong>not<\/strong> benchmark barge-in\/endpointing latency or real-world task completion on devices; focuses on response correctness and safety under variations.<\/p>\n<h4 class=\"wp-block-heading\"><strong>SLUE \/ SLUE Phase-2<\/strong><\/h4>\n<p><strong>Scope:<\/strong> Spoken language understanding tasks: NER, sentiment, dialog acts, named-entity localization, QA, summarization; designed to study end-to-end vs. pipeline sensitivity to ASR errors.<br \/><strong>Use: <\/strong>Great for probing SLU robustness and pipeline fragility in spoken settings.<\/p>\n<h4 class=\"wp-block-heading\"><strong>MASSIVE<\/strong><\/h4>\n<p><strong>Scope:<\/strong> &gt;1M virtual-assistant utterances across 51\u201352 languages with intents\/slots; strong fit for <strong>multilingual<\/strong> task-oriented evaluation.<br \/><strong>Use:<\/strong> Build multilingual task suites and measure TSR\/slot F1 under speech conditions (paired with TTS or read speech). <\/p>\n<h4 class=\"wp-block-heading\"><strong>Spoken-SQuAD \/ HeySQuAD and Related Spoken-QA Sets<\/strong><\/h4>\n<p><strong>Scope:<\/strong> Spoken question answering to test ASR-aware comprehension and multi-accent robustness.<br \/><strong>Use:<\/strong> Stress-test comprehension under speech errors; not a full agent task suite.<\/p>\n<h4 class=\"wp-block-heading\"><strong>DSTC (Dialog System Technology Challenge) Tracks<\/strong><\/h4>\n<p><strong>Scope:<\/strong> Robust dialog modeling with <strong>spoken<\/strong>, task-oriented data; human ratings alongside automatic metrics; recent tracks emphasize multilinguality, safety, and evaluation dimensionality.<br \/><strong>Use:<\/strong> Complementary for dialog quality, DST, and knowledge-grounded responses under speech conditions. <\/p>\n<h4 class=\"wp-block-heading\"><strong>Real-World Task Assistance (Alexa Prize TaskBot)<\/strong><\/h4>\n<p><strong>Scope:<\/strong> Multi-step task assistance with <strong>user ratings<\/strong> and success criteria (cooking\/DIY).<br \/><strong>Use:<\/strong> Gold-standard inspiration for defining TSR and interaction KPIs; the public reports describe evaluation focus and outcomes.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Filling the Gaps: What You Still Need to Add<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>Barge-In &amp; Endpointing KPIs<\/strong><br \/>Add explicit measurement harnesses. Literature offers barge-in verification and continuous processing strategies; streaming ASR endpointing latency remains an active research topic. Track barge-in detection latency, suppression correctness, endpointing delay, and false barge-ins.<\/li>\n<li><strong>Hallucination-Under-Noise (HUN) Protocols<\/strong><br \/>Adopt emerging ASR-hallucination definitions and controlled noise\/non-speech tests; report HUN rate and its impact on downstream actions. <\/li>\n<li><strong>On-Device Interaction Latency<\/strong><br \/>Correlate user-perceived latency with streaming ASR designs (e.g., transducer variants); measure time-to-first-token, time-to-final, and local processing overhead.<\/li>\n<li><strong>Cross-Axis Robustness Matrices<\/strong><br \/>Combine VoiceBench\u2019s speaker\/environment\/content axes with your task suite (TSR) to expose failure surfaces (e.g., barge-in under far-field echo; task success at low SNR; multilingual slots under accent shift).<\/li>\n<li><strong>Perceptual Quality for Playback<\/strong><br \/>Use ITU-T P.808 (with the open P.808 toolkit) to quantify user-perceived TTS quality in your end-to-end loop, not just ASR.<\/li>\n<\/ol>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h3 class=\"wp-block-heading\"><strong>A Concrete, Reproducible Evaluation Plan<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>Assemble the Suite<\/strong><\/li>\n<\/ol>\n<ul class=\"wp-block-list\">\n<li><strong>Speech-Interaction Core:<\/strong> VoiceBench for knowledge, instruction following, safety, and robustness axes. <\/li>\n<li><strong>SLU Depth:<\/strong> SLUE\/Phase-2 tasks (NER, dialog acts, QA, summarization) for SLU performance under speech.<\/li>\n<li><strong>Multilingual Coverage:<\/strong> MASSIVE for intent\/slot and multilingual stress.<\/li>\n<li><strong>Comprehension Under ASR Noise:<\/strong> Spoken-SQuAD\/HeySQuAD for spoken QA and multi-accent readouts.<\/li>\n<\/ul>\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Add Missing Capabilities<\/strong><\/li>\n<\/ol>\n<ul class=\"wp-block-list\">\n<li><strong>Barge-In\/Endpointing Harness:<\/strong> scripted interruptions at controlled offsets and SNRs; log suppression time and false barge-ins; measure endpointing delay with streaming ASR.<\/li>\n<li><strong>Hallucination-Under-Noise:<\/strong> non-speech inserts and noise overlays; annotate semantic relatedness to compute HUN.<\/li>\n<li><strong>Task Success Block:<\/strong> scenario tasks with objective success checks; compute TSR, TCT, and Turns; follow TaskBot style definitions. <\/li>\n<li><strong>Perceptual Quality:<\/strong> P.808 crowdsourced ACR with the Microsoft toolkit. <\/li>\n<\/ul>\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Report Structure<\/strong><\/li>\n<\/ol>\n<ul class=\"wp-block-list\">\n<li><strong>Primary table:<\/strong> TSR\/TCT\/Turns; barge-in latency and error rates; endpointing latency; HUN rate; VoiceBench aggregate and per-axis; SLU metrics; P.808 MOS.<\/li>\n<li><strong>Stress plots:<\/strong> TSR and HUN vs. SNR and reverberation; barge-in latency vs. interrupt timing.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h3 class=\"wp-block-heading\"><strong>References<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li>VoiceBench: first multi-facet speech-interaction benchmark for LLM-based voice assistants (knowledge, instruction following, safety, robustness). (<a href=\"https:\/\/ar5iv.org\/pdf\/2410.17196\" target=\"_blank\" rel=\"noreferrer noopener\">ar5iv<\/a>)<\/li>\n<li>SLUE \/ SLUE Phase-2: spoken NER, dialog acts, QA, summarization; sensitivity to ASR errors in pipelines. (<a href=\"https:\/\/arxiv.org\/abs\/2111.10367\" target=\"_blank\" rel=\"noreferrer noopener\">arXiv<\/a>)<\/li>\n<li>MASSIVE: 1M+ multilingual intent\/slot utterances for assistants. (<a href=\"https:\/\/www.amazon.science\/publications\/massive-a-1m-example-multilingual-natural-language-understanding-dataset-with-51-typologically-diverse-languages?\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon Science<\/a>)<\/li>\n<li>Spoken-SQuAD \/ HeySQuAD: spoken question answering datasets. (<a href=\"https:\/\/github.com\/Chia-Hsuan-Lee\/Spoken-SQuAD?\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a>)<\/li>\n<li>User-centric evaluation in production assistants (Cortana): predict satisfaction beyond ASR. (<a href=\"https:\/\/people.cs.umass.edu\/~jpjiang\/papers\/www15_cortana_sat_final.pdf?\" target=\"_blank\" rel=\"noreferrer noopener\">UMass Amherst<\/a>)<\/li>\n<li>Barge-in verification\/processing and endpointing latency: AWS\/academic barge-in papers, Microsoft continuous barge-in, recent endpoint detection for streaming ASR. (<a href=\"https:\/\/arxiv.org\/pdf\/2211.13280?\" target=\"_blank\" rel=\"noreferrer noopener\">arXiv<\/a>)<\/li>\n<li>ASR hallucination definitions and non-speech-induced hallucinations (Whisper). (<a href=\"https:\/\/arxiv.org\/abs\/2401.01572\" target=\"_blank\" rel=\"noreferrer noopener\">arXiv<\/a>)<\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/10\/05\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\">How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Table of contents Why WER Isn\u2019t Enough ? What to Measure (and How) ? Benchmark Landscape: What Each Covers Filling the Gaps: What You Still Need to Add A Concrete, Reproducible Evaluation Plan References Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise\u2014alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet speech-interaction benchmark across general knowledge, instruction following, safety, and robustness to speaker\/environment\/content variations, but it does not cover barge-in or real-device task completion. SLUE (and Phase-2) target spoken language understanding (SLU); MASSIVE and Spoken-SQuAD probe multilingual and spoken QA; DSTC tracks add spoken, task-oriented robustness. Combine these with explicit barge-in\/endpointing tests, user-centric task-success measurement, and controlled noise-stress protocols to obtain a complete picture. Why WER Isn\u2019t Enough? WER measures transcription fidelity, not interaction quality. Two agents with similar WER can diverge widely in dialog success because latency, turn-taking, misunderstanding recovery, safety, and robustness to acoustic and content perturbations dominate user experience. Prior work on real systems shows the need to evaluate user satisfaction and task success directly\u2014e.g., Cortana\u2019s automatic online evaluation predicted user satisfaction from in-situ interaction signals, not only ASR accuracy. What to Measure (and How)? 1) End-to-End Task Success Metric: Task Success Rate (TSR) with strict success criteria per task (goal completion, constraints met), plus Task Completion Time (TCT) and Turns-to-Success.Why. Real assistants are judged by outcomes. Competitions like Alexa Prize TaskBot explicitly measured users\u2019 ability to finish multi-step tasks (e.g., cooking, DIY) with ratings and completion. Protocol. Define tasks with verifiable endpoints (e.g., \u201cassemble shopping list with N items and constraints\u201d). Use blinded human raters and automatic logs to compute TSR\/TCT\/Turns. For multilingual\/SLU coverage, draw task intents\/slots from MASSIVE. 2) Barge-In and Turn-Taking Metrics: Barge-In Detection Latency (ms): time from user onset to TTS suppression. True\/False Barge-In Rates: correct interruptions vs. spurious stops. Endpointing Latency (ms): time to ASR finalization after user stop. Why. Smooth interruption handling and fast endpointing determine perceived responsiveness. Research formalizes barge-in verification and continuous barge-in processing; endpointing latency continues to be an active area in streaming ASR. Protocol. Script prompts where the user interrupts TTS at controlled offsets and SNRs. Measure suppression and recognition timings with high-precision logs (frame timestamps). Include noisy\/echoic far-field conditions. Classic and modern studies provide recovery and signaling strategies that reduce false barge-ins. 3) Hallucination-Under-Noise (HUN) Metric. HUN Rate: fraction of outputs that are fluent but semantically unrelated to the audio, under controlled noise or non-speech audio.Why. ASR and audio-LLM stacks can emit \u201cconvincing nonsense,\u201d especially with non-speech segments or noise overlays. Recent work defines and measures ASR hallucinations; targeted studies show Whisper hallucinations induced by non-speech sounds. Protocol. Construct audio sets with additive environmental noise (varied SNRs), non-speech distractors, and content disfluencies. Score semantic relatedness (human judgment with adjudication) and compute HUN. Track whether downstream agent actions propagate hallucinations to incorrect task steps. 4) Instruction Following, Safety, and Robustness Metric Families. Instruction-Following Accuracy (format and constraint adherence). Safety Refusal Rate on adversarial spoken prompts. Robustness Deltas across speaker age\/accent\/pitch, environment (noise, reverb, far-field), and content noise (grammar errors, disfluencies). Why. VoiceBench explicitly targets these axes with spoken instructions (real and synthetic) spanning general knowledge, instruction following, and safety; it perturbs speaker, environment, and content to probe robustness. Protocol. Use VoiceBench for breadth on speech-interaction capabilities; report aggregate and per-axis scores. For SLU specifics (NER, dialog acts, QA, summarization), leverage SLUE and Phase-2. 5) Perceptual Speech Quality (for TTS and Enhancement) Metric. Subjective Mean Opinion Score via ITU-T P.808 (crowdsourced ACR\/DCR\/CCR).Why. Interaction quality depends on both recognition and playback quality. P.808 gives a validated crowdsourcing protocol with open-source tooling. Benchmark Landscape: What Each Covers VoiceBench (2024) Scope: Multi-facet voice assistant evaluation with spoken inputs covering general knowledge, instruction following, safety, and robustness across speaker\/environment\/content variations; uses both real and synthetic speech.Limitations: Does not benchmark barge-in\/endpointing latency or real-world task completion on devices; focuses on response correctness and safety under variations. SLUE \/ SLUE Phase-2 Scope: Spoken language understanding tasks: NER, sentiment, dialog acts, named-entity localization, QA, summarization; designed to study end-to-end vs. pipeline sensitivity to ASR errors.Use: Great for probing SLU robustness and pipeline fragility in spoken settings. MASSIVE Scope: &gt;1M virtual-assistant utterances across 51\u201352 languages with intents\/slots; strong fit for multilingual task-oriented evaluation.Use: Build multilingual task suites and measure TSR\/slot F1 under speech conditions (paired with TTS or read speech). Spoken-SQuAD \/ HeySQuAD and Related Spoken-QA Sets Scope: Spoken question answering to test ASR-aware comprehension and multi-accent robustness.Use: Stress-test comprehension under speech errors; not a full agent task suite. DSTC (Dialog System Technology Challenge) Tracks Scope: Robust dialog modeling with spoken, task-oriented data; human ratings alongside automatic metrics; recent tracks emphasize multilinguality, safety, and evaluation dimensionality.Use: Complementary for dialog quality, DST, and knowledge-grounded responses under speech conditions. Real-World Task Assistance (Alexa Prize TaskBot) Scope: Multi-step task assistance with user ratings and success criteria (cooking\/DIY).Use: Gold-standard inspiration for defining TSR and interaction KPIs; the public reports describe evaluation focus and outcomes. Filling the Gaps: What You Still Need to Add Barge-In &amp; Endpointing KPIsAdd explicit measurement harnesses. Literature offers barge-in verification and continuous processing strategies; streaming ASR endpointing latency remains an active research topic. Track barge-in detection latency, suppression correctness, endpointing delay, and false barge-ins. Hallucination-Under-Noise (HUN) ProtocolsAdopt emerging ASR-hallucination definitions and controlled noise\/non-speech tests; report HUN rate and its impact on downstream actions. On-Device Interaction LatencyCorrelate user-perceived latency with streaming ASR designs (e.g., transducer variants); measure time-to-first-token, time-to-final, and local processing overhead. Cross-Axis Robustness MatricesCombine VoiceBench\u2019s speaker\/environment\/content axes with your task suite (TSR) to expose failure surfaces (e.g., barge-in under far-field echo; task success at low SNR; multilingual slots under accent shift). Perceptual Quality for PlaybackUse ITU-T P.808 (with the open P.808 toolkit) to quantify user-perceived TTS quality in your end-to-end loop, not just ASR. A Concrete, Reproducible Evaluation Plan Assemble the Suite Speech-Interaction Core: VoiceBench for knowledge, instruction following, safety, and robustness axes. SLU Depth: SLUE\/Phase-2 tasks (NER, dialog acts,<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-42324","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/fr\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/fr\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-05T07:23:31+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise\",\"datePublished\":\"2025-10-05T07:23:31+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\"},\"wordCount\":1307,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\",\"url\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\",\"name\":\"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-10-05T07:23:31+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/fr\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/fr\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/","og_locale":"fr_FR","og_type":"article","og_title":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/fr\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-10-05T07:23:31+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"admin NU","Dur\u00e9e de lecture estim\u00e9e":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise","datePublished":"2025-10-05T07:23:31+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/"},"wordCount":1307,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/","url":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/","name":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-10-05T07:23:31+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-to-evaluate-voice-agents-in-2025-beyond-automatic-speech-recognition-asr-and-word-error-rate-wer-to-task-success-barge-in-and-hallucination-under-noise\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/fr\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/fr\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/fr\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Table of contents Why WER Isn\u2019t Enough ? What to Measure (and How) ? Benchmark Landscape: What Each Covers Filling the Gaps: What You Still Need to Add A Concrete, Reproducible Evaluation Plan References Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation\u2026","_links":{"self":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/42324","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/comments?post=42324"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/42324\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/media?parent=42324"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/categories?post=42324"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/tags?post=42324"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}