{"id":65375,"date":"2026-01-20T11:05:34","date_gmt":"2026-01-20T11:05:34","guid":{"rendered":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/"},"modified":"2026-01-20T11:05:34","modified_gmt":"2026-01-20T11:05:34","slug":"how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts","status":"publish","type":"post","link":"https:\/\/youzum.net\/ja\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/","title":{"rendered":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS"},"content":{"rendered":"<p>In this tutorial, we build an end-to-end streaming voice agent that mirrors how modern low-latency conversational systems operate in real time. We simulate the complete pipeline, from chunked audio input and streaming speech recognition to incremental language model reasoning and streamed text-to-speech output, while explicitly tracking latency at every stage. By working with strict latency budgets and observing metrics such as time to first token and time to first audio, we focus on the practical engineering trade-offs that shape responsive voice-based user experiences. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import time\nimport asyncio\nimport numpy as np\nfrom collections import deque\nfrom dataclasses import dataclass\nfrom typing import List, AsyncIterator\nfrom enum import Enum\nimport matplotlib.pyplot as plt\n\n\n@dataclass\nclass LatencyMetrics:\n   audio_chunk_received: float = 0.0\n   asr_started: float = 0.0\n   asr_partial: float = 0.0\n   asr_complete: float = 0.0\n   llm_started: float = 0.0\n   llm_first_token: float = 0.0\n   llm_complete: float = 0.0\n   tts_started: float = 0.0\n   tts_first_chunk: float = 0.0\n   tts_complete: float = 0.0\n\n\n   def get_time_to_first_audio(self) -&gt; float:\n       return self.tts_first_chunk - self.asr_complete if self.tts_first_chunk and self.asr_complete else 0.0\n\n\n   def get_total_latency(self) -&gt; float:\n       return self.tts_complete - self.audio_chunk_received if self.tts_complete else 0.0\n\n\n@dataclass\nclass LatencyBudgets:\n   asr_processing: float = 0.1\n   asr_finalization: float = 0.3\n   llm_first_token: float = 0.5\n   llm_token_generation: float = 0.02\n   tts_first_chunk: float = 0.2\n   tts_chunk_generation: float = 0.05\n   time_to_first_audio: float = 1.0\n\n\nclass AgentState(Enum):\n   LISTENING = \"listening\"\n   PROCESSING_SPEECH = \"processing_speech\"\n   THINKING = \"thinking\"\n   SPEAKING = \"speaking\"\n   INTERRUPTED = \"interrupted\"<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the core data structures and state representations that allow us to track latency across the entire voice pipeline. We formalize timing signals for ASR, LLM, and TTS to ensure consistent measurement across all stages. We also establish a clear agent state machine that guides how the system transitions during a conversational turn. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">class AudioInputStream:\n   def __init__(self, sample_rate: int = 16000, chunk_duration_ms: int = 100):\n       self.sample_rate = sample_rate\n       self.chunk_duration_ms = chunk_duration_ms\n       self.chunk_size = int(sample_rate * chunk_duration_ms \/ 1000)\n\n\n   async def stream_audio(self, text: str) -&gt; AsyncIterator[np.ndarray]:\n       chars_per_second = (150 * 5) \/ 60\n       duration_seconds = len(text) \/ chars_per_second\n       num_chunks = int(duration_seconds * 1000 \/ self.chunk_duration_ms)\n\n\n       for _ in range(num_chunks):\n           chunk = np.random.randn(self.chunk_size).astype(np.float32) * 0.1\n           await asyncio.sleep(self.chunk_duration_ms \/ 1000)\n           yield chunk<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We simulate real-time audio input by breaking speech into fixed-duration chunks that arrive asynchronously. We model realistic speaking rates and streaming behavior to mimic live microphone input. We use this stream as the foundation for testing downstream latency-sensitive components. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">class StreamingASR:\n   def __init__(self, latency_budget: float = 0.1):\n       self.latency_budget = latency_budget\n       self.silence_threshold = 0.5\n\n\n   async def transcribe_stream(\n       self,\n       audio_stream: AsyncIterator[np.ndarray],\n       ground_truth: str\n   ) -&gt; AsyncIterator[tuple[str, bool]]:\n       words = ground_truth.split()\n       words_transcribed = 0\n       silence_duration = 0.0\n       chunk_count = 0\n\n\n       async for chunk in audio_stream:\n           chunk_count += 1\n           await asyncio.sleep(self.latency_budget)\n\n\n           if chunk_count % 3 == 0 and words_transcribed &lt; len(words):\n               words_transcribed += 1\n               yield \" \".join(words[:words_transcribed]), False\n\n\n           audio_power = np.mean(np.abs(chunk))\n           silence_duration = silence_duration + 0.1 if audio_power &lt; 0.05 else 0.0\n\n\n           if silence_duration &gt;= self.silence_threshold:\n               await asyncio.sleep(0.2)\n               yield ground_truth, True\n               return\n\n\n       yield ground_truth, True<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement a streaming ASR module that produces partial transcriptions before emitting a final result. We progressively reveal words to reflect how modern ASR systems operate in real time. We also introduce silence-based finalization to approximate end-of-utterance detection. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">class StreamingLLM:\n   def __init__(self, time_to_first_token: float = 0.3, tokens_per_second: float = 50):\n       self.time_to_first_token = time_to_first_token\n       self.tokens_per_second = tokens_per_second\n\n\n   async def generate_response(self, prompt: str) -&gt; AsyncIterator[str]:\n       responses = {\n           \"hello\": \"Hello! How can I help you today?\",\n           \"weather\": \"The weather is sunny with a temperature of 72\u00b0F.\",\n           \"time\": \"The current time is 2:30 PM.\",\n           \"default\": \"I understand. Let me help you with that.\"\n       }\n\n\n       response = responses[\"default\"]\n       for key in responses:\n           if key in prompt.lower():\n               response = responses[key]\n               break\n\n\n       await asyncio.sleep(self.time_to_first_token)\n\n\n       for word in response.split():\n           yield word + \" \"\n           await asyncio.sleep(1.0 \/ self.tokens_per_second)\n\n\nclass StreamingTTS:\n   def __init__(self, time_to_first_chunk: float = 0.2, chars_per_second: float = 15):\n       self.time_to_first_chunk = time_to_first_chunk\n       self.chars_per_second = chars_per_second\n\n\n   async def synthesize_stream(self, text_stream: AsyncIterator[str]) -&gt; AsyncIterator[np.ndarray]:\n       first_chunk = True\n       buffer = \"\"\n\n\n       async for text in text_stream:\n           buffer += text\n           if len(buffer) &gt;= 20 or first_chunk:\n               if first_chunk:\n                   await asyncio.sleep(self.time_to_first_chunk)\n                   first_chunk = False\n\n\n               duration = len(buffer) \/ self.chars_per_second\n               yield np.random.randn(int(16000 * duration)).astype(np.float32) * 0.1\n               buffer = \"\"\n               await asyncio.sleep(duration * 0.5)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>In this snippet, we model a streaming language model and a streaming text-to-speech engine working together. We generate responses token by token to capture time-to-first-token behavior. We then convert incremental text into audio chunks to simulate early and continuous speech synthesis. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">class StreamingVoiceAgent:\n   def __init__(self, latency_budgets: LatencyBudgets):\n       self.budgets = latency_budgets\n       self.audio_stream = AudioInputStream()\n       self.asr = StreamingASR(latency_budgets.asr_processing)\n       self.llm = StreamingLLM(\n           latency_budgets.llm_first_token,\n           1.0 \/ latency_budgets.llm_token_generation\n       )\n       self.tts = StreamingTTS(\n           latency_budgets.tts_first_chunk,\n           1.0 \/ latency_budgets.tts_chunk_generation\n       )\n       self.state = AgentState.LISTENING\n       self.metrics_history: List[LatencyMetrics] = []\n\n\n   async def process_turn(self, user_input: str) -&gt; LatencyMetrics:\n       metrics = LatencyMetrics()\n       start_time = time.time()\n\n\n       metrics.audio_chunk_received = time.time() - start_time\n       audio_gen = self.audio_stream.stream_audio(user_input)\n\n\n       metrics.asr_started = time.time() - start_time\n       async for text, final in self.asr.transcribe_stream(audio_gen, user_input):\n           if final:\n               metrics.asr_complete = time.time() - start_time\n               transcription = text\n\n\n       metrics.llm_started = time.time() - start_time\n       response = \"\"\n       async for token in self.llm.generate_response(transcription):\n           if not metrics.llm_first_token:\n               metrics.llm_first_token = time.time() - start_time\n           response += token\n\n\n       metrics.llm_complete = time.time() - start_time\n       metrics.tts_started = time.time() - start_time\n\n\n       async def text_stream():\n           for word in response.split():\n               yield word + \" \"\n\n\n       async for _ in self.tts.synthesize_stream(text_stream()):\n           if not metrics.tts_first_chunk:\n               metrics.tts_first_chunk = time.time() - start_time\n\n\n       metrics.tts_complete = time.time() - start_time\n       self.metrics_history.append(metrics)\n       return metrics<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We orchestrate the full voice agent by wiring audio input, ASR, LLM, and TTS into a single asynchronous flow. We record precise timestamps at each transition to compute critical latency metrics. We treat each user turn as an isolated experiment to enable systematic performance analysis. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">async def run_demo():\n   budgets = LatencyBudgets(\n       asr_processing=0.08,\n       llm_first_token=0.3,\n       llm_token_generation=0.02,\n       tts_first_chunk=0.15,\n       time_to_first_audio=0.8\n   )\n\n\n   agent = StreamingVoiceAgent(budgets)\n\n\n   inputs = [\n       \"Hello, how are you today?\",\n       \"What's the weather like?\",\n       \"Can you tell me the time?\"\n   ]\n\n\n   for text in inputs:\n       await agent.process_turn(text)\n       await asyncio.sleep(1)\n\n\nif __name__ == \"__main__\":\n   asyncio.run(run_demo())<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We run the entire system across multiple conversational turns to observe latency consistency and variance. We apply aggressive latency budgets to stress the pipeline under realistic constraints. We use these runs to validate whether the system meets responsiveness targets across interactions.<\/p>\n<p>In conclusion, we demonstrated how a fully streaming voice agent can be orchestrated as a single asynchronous pipeline with clear stage boundaries and measurable performance guarantees. We showed that combining partial ASR, token-level LLM streaming, and early-start TTS reduces perceived latency, even when total computation time remains non-trivial. This approach helps us reason systematically about turn-taking, responsiveness, and optimization levers, and it provides a solid foundation for extending the system toward real-world deployments using production ASR, LLM, and TTS models.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Voice%20AI\/streaming_voice_agent_latency_budgets_end_to_end_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/01\/19\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\">How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an end-to-end streaming voice agent that mirrors how modern low-latency conversational systems operate in real time. We simulate the complete pipeline, from chunked audio input and streaming speech recognition to incremental language model reasoning and streamed text-to-speech output, while explicitly tracking latency at every stage. By working with strict latency budgets and observing metrics such as time to first token and time to first audio, we focus on the practical engineering trade-offs that shape responsive voice-based user experiences. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser import time import asyncio import numpy as np from collections import deque from dataclasses import dataclass from typing import List, AsyncIterator from enum import Enum import matplotlib.pyplot as plt @dataclass class LatencyMetrics: audio_chunk_received: float = 0.0 asr_started: float = 0.0 asr_partial: float = 0.0 asr_complete: float = 0.0 llm_started: float = 0.0 llm_first_token: float = 0.0 llm_complete: float = 0.0 tts_started: float = 0.0 tts_first_chunk: float = 0.0 tts_complete: float = 0.0 def get_time_to_first_audio(self) -&gt; float: return self.tts_first_chunk &#8211; self.asr_complete if self.tts_first_chunk and self.asr_complete else 0.0 def get_total_latency(self) -&gt; float: return self.tts_complete &#8211; self.audio_chunk_received if self.tts_complete else 0.0 @dataclass class LatencyBudgets: asr_processing: float = 0.1 asr_finalization: float = 0.3 llm_first_token: float = 0.5 llm_token_generation: float = 0.02 tts_first_chunk: float = 0.2 tts_chunk_generation: float = 0.05 time_to_first_audio: float = 1.0 class AgentState(Enum): LISTENING = &#8220;listening&#8221; PROCESSING_SPEECH = &#8220;processing_speech&#8221; THINKING = &#8220;thinking&#8221; SPEAKING = &#8220;speaking&#8221; INTERRUPTED = &#8220;interrupted&#8221; We define the core data structures and state representations that allow us to track latency across the entire voice pipeline. We formalize timing signals for ASR, LLM, and TTS to ensure consistent measurement across all stages. We also establish a clear agent state machine that guides how the system transitions during a conversational turn. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser class AudioInputStream: def __init__(self, sample_rate: int = 16000, chunk_duration_ms: int = 100): self.sample_rate = sample_rate self.chunk_duration_ms = chunk_duration_ms self.chunk_size = int(sample_rate * chunk_duration_ms \/ 1000) async def stream_audio(self, text: str) -&gt; AsyncIterator[np.ndarray]: chars_per_second = (150 * 5) \/ 60 duration_seconds = len(text) \/ chars_per_second num_chunks = int(duration_seconds * 1000 \/ self.chunk_duration_ms) for _ in range(num_chunks): chunk = np.random.randn(self.chunk_size).astype(np.float32) * 0.1 await asyncio.sleep(self.chunk_duration_ms \/ 1000) yield chunk We simulate real-time audio input by breaking speech into fixed-duration chunks that arrive asynchronously. We model realistic speaking rates and streaming behavior to mimic live microphone input. We use this stream as the foundation for testing downstream latency-sensitive components. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser class StreamingASR: def __init__(self, latency_budget: float = 0.1): self.latency_budget = latency_budget self.silence_threshold = 0.5 async def transcribe_stream( self, audio_stream: AsyncIterator[np.ndarray], ground_truth: str ) -&gt; AsyncIterator[tuple[str, bool]]: words = ground_truth.split() words_transcribed = 0 silence_duration = 0.0 chunk_count = 0 async for chunk in audio_stream: chunk_count += 1 await asyncio.sleep(self.latency_budget) if chunk_count % 3 == 0 and words_transcribed &lt; len(words): words_transcribed += 1 yield &#8221; &#8220;.join(words[:words_transcribed]), False audio_power = np.mean(np.abs(chunk)) silence_duration = silence_duration + 0.1 if audio_power &lt; 0.05 else 0.0 if silence_duration &gt;= self.silence_threshold: await asyncio.sleep(0.2) yield ground_truth, True return yield ground_truth, True We implement a streaming ASR module that produces partial transcriptions before emitting a final result. We progressively reveal words to reflect how modern ASR systems operate in real time. We also introduce silence-based finalization to approximate end-of-utterance detection. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser class StreamingLLM: def __init__(self, time_to_first_token: float = 0.3, tokens_per_second: float = 50): self.time_to_first_token = time_to_first_token self.tokens_per_second = tokens_per_second async def generate_response(self, prompt: str) -&gt; AsyncIterator[str]: responses = { &#8220;hello&#8221;: &#8220;Hello! How can I help you today?&#8221;, &#8220;weather&#8221;: &#8220;The weather is sunny with a temperature of 72\u00b0F.&#8221;, &#8220;time&#8221;: &#8220;The current time is 2:30 PM.&#8221;, &#8220;default&#8221;: &#8220;I understand. Let me help you with that.&#8221; } response = responses[&#8220;default&#8221;] for key in responses: if key in prompt.lower(): response = responses[key] break await asyncio.sleep(self.time_to_first_token) for word in response.split(): yield word + &#8221; &#8221; await asyncio.sleep(1.0 \/ self.tokens_per_second) class StreamingTTS: def __init__(self, time_to_first_chunk: float = 0.2, chars_per_second: float = 15): self.time_to_first_chunk = time_to_first_chunk self.chars_per_second = chars_per_second async def synthesize_stream(self, text_stream: AsyncIterator[str]) -&gt; AsyncIterator[np.ndarray]: first_chunk = True buffer = &#8220;&#8221; async for text in text_stream: buffer += text if len(buffer) &gt;= 20 or first_chunk: if first_chunk: await asyncio.sleep(self.time_to_first_chunk) first_chunk = False duration = len(buffer) \/ self.chars_per_second yield np.random.randn(int(16000 * duration)).astype(np.float32) * 0.1 buffer = &#8220;&#8221; await asyncio.sleep(duration * 0.5) In this snippet, we model a streaming language model and a streaming text-to-speech engine working together. We generate responses token by token to capture time-to-first-token behavior. We then convert incremental text into audio chunks to simulate early and continuous speech synthesis. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser class StreamingVoiceAgent: def __init__(self, latency_budgets: LatencyBudgets): self.budgets = latency_budgets self.audio_stream = AudioInputStream() self.asr = StreamingASR(latency_budgets.asr_processing) self.llm = StreamingLLM( latency_budgets.llm_first_token, 1.0 \/ latency_budgets.llm_token_generation ) self.tts = StreamingTTS( latency_budgets.tts_first_chunk, 1.0 \/ latency_budgets.tts_chunk_generation ) self.state = AgentState.LISTENING self.metrics_history: List[LatencyMetrics] = [] async def process_turn(self, user_input: str) -&gt; LatencyMetrics: metrics = LatencyMetrics() start_time = time.time() metrics.audio_chunk_received = time.time() &#8211; start_time audio_gen = self.audio_stream.stream_audio(user_input) metrics.asr_started = time.time() &#8211; start_time async for text, final in self.asr.transcribe_stream(audio_gen, user_input): if final: metrics.asr_complete = time.time() &#8211; start_time transcription = text metrics.llm_started = time.time() &#8211; start_time response = &#8220;&#8221; async for token in self.llm.generate_response(transcription): if not metrics.llm_first_token: metrics.llm_first_token = time.time() &#8211; start_time response += token metrics.llm_complete = time.time() &#8211; start_time metrics.tts_started = time.time() &#8211; start_time async def text_stream(): for word in response.split(): yield word + &#8221; &#8221; async for _ in self.tts.synthesize_stream(text_stream()): if not metrics.tts_first_chunk: metrics.tts_first_chunk = time.time() &#8211; start_time metrics.tts_complete = time.time() &#8211; start_time self.metrics_history.append(metrics) return metrics We orchestrate the full voice agent by wiring audio input, ASR, LLM, and TTS into a single asynchronous flow. We record precise timestamps at each transition to compute critical latency metrics. We treat each user turn as an isolated experiment to enable systematic performance analysis. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser async def run_demo(): budgets = LatencyBudgets( asr_processing=0.08, llm_first_token=0.3, llm_token_generation=0.02, tts_first_chunk=0.15, time_to_first_audio=0.8 ) agent = StreamingVoiceAgent(budgets) inputs = [ &#8220;Hello, how are you today?&#8221;, &#8220;What&#8217;s<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-65375","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/ja\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/ja\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-01-20T11:05:34+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS\",\"datePublished\":\"2026-01-20T11:05:34+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\"},\"wordCount\":586,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\",\"url\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\",\"name\":\"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-01-20T11:05:34+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/ja\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/ja\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/","og_locale":"ja_JP","og_type":"article","og_title":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/ja\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-01-20T11:05:34+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"admin NU","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"7\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS","datePublished":"2026-01-20T11:05:34+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/"},"wordCount":586,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"ja","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/","url":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/","name":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-01-20T11:05:34+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-to-design-a-fully-streaming-voice-agent-with-end-to-end-latency-budgets-incremental-asr-llm-streaming-and-real-time-tts\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/ja\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/ja\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/ja\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we build an end-to-end streaming voice agent that mirrors how modern low-latency conversational systems operate in real time. We simulate the complete pipeline, from chunked audio input and streaming speech recognition to incremental language model reasoning and streamed text-to-speech output, while explicitly tracking latency at every stage. By working with strict latency&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/65375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/comments?post=65375"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/65375\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media?parent=65375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/categories?post=65375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/tags?post=65375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}