{"id":88981,"date":"2026-05-08T16:16:25","date_gmt":"2026-05-08T16:16:25","guid":{"rendered":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/"},"modified":"2026-05-08T16:16:25","modified_gmt":"2026-05-08T16:16:25","slug":"openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/","title":{"rendered":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API"},"content":{"rendered":"<p>OpenAI released three new audio models through its Realtime API, each targeting a distinct capability in live voice applications: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the model releases, the Realtime API officially exits beta and is now generally available \u2014 a meaningful signal for developers who held off building production systems on it. All three models are available immediately through the OpenAI API and can be tested in the Playground.<\/p>\n<p>Together, they push voice applications past the basic question-and-answer loop \u2014 toward systems that can listen, reason, translate, transcribe, and act within a single conversation.<\/p>\n<h3 class=\"wp-block-heading\"><strong>GPT-Realtime-2: Voice Reasoning with a 128K Context Window<\/strong><\/h3>\n<p>The flagship release is GPT-Realtime-2, which OpenAI team describes as its first voice model with GPT-5-class reasoning. GPT-Realtime-2 can process harder requests, manage interruptions, and continue conversations naturally. OpenAI expanded the model\u2019s context window from 32K to 128K tokens, allowing longer conversations and more complex tasks without losing context.<\/p>\n<p>Previous voice models frequently stalled on multi-step requests or dropped earlier context during longer sessions. GPT-Realtime-2 is specifically designed to keep the conversation moving while it reasons through a request.<\/p>\n<p>Developers can enable short preamble phrases \u2014 like \u201clet me check that\u201d or \u201cone moment while I look into it\u201d \u2014 so users know the agent is working on the request. The model can also call multiple tools at once and narrate what it\u2019s doing while it does \u2014 so instead of dead air during a multi-step task, the user gets a running commentary. These features directly address one of the most common failure modes in deployed voice agents: awkward silence that makes the system feel broken.<\/p>\n<p>A particularly useful control for production builders is adjustable reasoning effort. Developers can dial reasoning intensity across<strong> five levels<\/strong>: minimal, low, medium, high, and xhigh. The default is \u201clow\u201d to keep latency down for simple requests, while tougher tasks can tap into more compute. This means teams can tune the performance-latency tradeoff at the session level depending on the use case \u2014 a quick customer lookup doesn\u2019t need the same reasoning depth as a multi-step travel booking workflow.<\/p>\n<p>GPT-Realtime-2 also adds tone control. The model can adjust its speaking style depending on the situation \u2014 staying calm during problem-solving, shifting to empathetic when users are frustrated, and turning upbeat after a successful outcome. The model is also better at understanding industry-specific terminology, including healthcare vocabulary and proper nouns.<\/p>\n<p>On benchmarks, the gains are measurable. GPT-Realtime-2 with high reasoning scored 96.6% on Big Bench Audio, compared to 81.4% for GPT-Realtime-1.5 \u2014 a 15.2 percentage point improvement. GPT-Realtime-2 with xhigh reasoning scored 48.5% on Audio MultiChallenge instruction following, compared to 34.7% for GPT-Realtime-1.5. <\/p>\n<p>Big Bench Audio evaluates challenging reasoning capabilities in language models that support audio input. Audio MultiChallenge evaluates multi-turn conversational intelligence in spoken dialogue systems, including instruction following, context integration, self-consistency, and handling natural speech corrections.<\/p>\n<p><strong>Pricing:<\/strong> GPT-Realtime-2 is priced at $32 per 1M audio input tokens ($0.40 for cached input tokens) and $64 per 1M audio output tokens.<\/p>\n<h3 class=\"wp-block-heading\"><strong>GPT-Realtime-Translate: Live Speech Translation Across 70+ Languages<\/strong><\/h3>\n<p>GPT-Realtime-Translate is a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker. Unlike GPT-Realtime-2, this model is a dedicated translation pipe \u2014 speech goes in one language and comes out in another. It is not a conversational agent; it is designed to convert one audio stream into another in real time.<\/p>\n<p>The distinction is important for developers choosing the right tool. If your application needs a bilingual customer support flow or a live interpreter for an in-person event, GPT-Realtime-Translate is the purpose-built option. If you need the model to also reason, call functions, or hold context across turns, GPT-Realtime-2 handles that.<\/p>\n<p><strong>Pricing:<\/strong> GPT-Realtime-Translate is priced at $0.034 per minute.<\/p>\n<h3 class=\"wp-block-heading\"><strong>GPT-Realtime-Whisper: Streaming Transcription as People Speak<\/strong><\/h3>\n<p>GPT-Realtime-Whisper is a new streaming speech-to-text model built for low-latency speech-to-text \u2014 transcribing audio as people speak, so live products can feel faster, more responsive, and more natural.<\/p>\n<p>The original Whisper model was designed for completed chunks of audio, making it better suited for post-session transcription. GPT-Realtime-Whisper is the streaming counterpart, purpose-built for applications that need live output. For realtime transcription, gpt-realtime-whisper gives you controllable latency \u2014 lower delay settings produce earlier partial text, while higher delay settings can improve transcript quality. <\/p>\n<p>Use cases include live broadcast captions, meeting notes generated during the conversation, and voice agents that need to continuously understand the user rather than wait for turn-by-turn input.<\/p>\n<p><strong>Pricing:<\/strong> GPT-Realtime-Whisper is priced at $0.017 per minute.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architecture Patterns and New Voices<\/strong><\/h3>\n<p>Developers can choose between<strong> three session types<\/strong> depending on the use case: a voice-agent session when the application needs an assistant that responds to the user, a translation session when the application needs an interpreter, and a transcription session when text from audio is needed without model-generated responses.<\/p>\n<p>On the voice output side, two new voices, Cedar and Marin, join the API roster exclusively with this release.<\/p>\n<p>All three models \u2014 GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper \u2014 are available now through the OpenAI Realtime API, which is generally available starting today.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li>GPT-Realtime-2 brings GPT-5-class reasoning to voice with a 128K context window, five-level adjustable reasoning effort, tone control, parallel tool calls, and interruption recovery<\/li>\n<li>On Big Bench Audio, GPT-Realtime-2 (high) scores 96.6% vs. 81.4% for GPT-Realtime-1.5; on Audio MultiChallenge, the xhigh variant scores 48.5% vs. 34.7%.<\/li>\n<li>GPT-Realtime-Translate handles live speech translation across 70+ input languages into 13 output languages at $0.034\/min<\/li>\n<li>GPT-Realtime-Whisper streams transcription in real time with controllable latency at $0.017\/min<\/li>\n<li>The Realtime API exits beta and goes generally available today alongside two new voices, Cedar and Marin<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/openai.com\/index\/advancing-voice-intelligence-with-new-models-in-the-api\/\" target=\"_blank\" rel=\"noreferrer noopener\">Full Technical Details here<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/08\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\">OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>OpenAI released three new audio models through its Realtime API, each targeting a distinct capability in live voice applications: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the model releases, the Realtime API officially exits beta and is now generally available \u2014 a meaningful signal for developers who held off building production systems on it. All three models are available immediately through the OpenAI API and can be tested in the Playground. Together, they push voice applications past the basic question-and-answer loop \u2014 toward systems that can listen, reason, translate, transcribe, and act within a single conversation. GPT-Realtime-2: Voice Reasoning with a 128K Context Window The flagship release is GPT-Realtime-2, which OpenAI team describes as its first voice model with GPT-5-class reasoning. GPT-Realtime-2 can process harder requests, manage interruptions, and continue conversations naturally. OpenAI expanded the model\u2019s context window from 32K to 128K tokens, allowing longer conversations and more complex tasks without losing context. Previous voice models frequently stalled on multi-step requests or dropped earlier context during longer sessions. GPT-Realtime-2 is specifically designed to keep the conversation moving while it reasons through a request. Developers can enable short preamble phrases \u2014 like \u201clet me check that\u201d or \u201cone moment while I look into it\u201d \u2014 so users know the agent is working on the request. The model can also call multiple tools at once and narrate what it\u2019s doing while it does \u2014 so instead of dead air during a multi-step task, the user gets a running commentary. These features directly address one of the most common failure modes in deployed voice agents: awkward silence that makes the system feel broken. A particularly useful control for production builders is adjustable reasoning effort. Developers can dial reasoning intensity across five levels: minimal, low, medium, high, and xhigh. The default is \u201clow\u201d to keep latency down for simple requests, while tougher tasks can tap into more compute. This means teams can tune the performance-latency tradeoff at the session level depending on the use case \u2014 a quick customer lookup doesn\u2019t need the same reasoning depth as a multi-step travel booking workflow. GPT-Realtime-2 also adds tone control. The model can adjust its speaking style depending on the situation \u2014 staying calm during problem-solving, shifting to empathetic when users are frustrated, and turning upbeat after a successful outcome. The model is also better at understanding industry-specific terminology, including healthcare vocabulary and proper nouns. On benchmarks, the gains are measurable. GPT-Realtime-2 with high reasoning scored 96.6% on Big Bench Audio, compared to 81.4% for GPT-Realtime-1.5 \u2014 a 15.2 percentage point improvement. GPT-Realtime-2 with xhigh reasoning scored 48.5% on Audio MultiChallenge instruction following, compared to 34.7% for GPT-Realtime-1.5. Big Bench Audio evaluates challenging reasoning capabilities in language models that support audio input. Audio MultiChallenge evaluates multi-turn conversational intelligence in spoken dialogue systems, including instruction following, context integration, self-consistency, and handling natural speech corrections. Pricing: GPT-Realtime-2 is priced at $32 per 1M audio input tokens ($0.40 for cached input tokens) and $64 per 1M audio output tokens. GPT-Realtime-Translate: Live Speech Translation Across 70+ Languages GPT-Realtime-Translate is a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker. Unlike GPT-Realtime-2, this model is a dedicated translation pipe \u2014 speech goes in one language and comes out in another. It is not a conversational agent; it is designed to convert one audio stream into another in real time. The distinction is important for developers choosing the right tool. If your application needs a bilingual customer support flow or a live interpreter for an in-person event, GPT-Realtime-Translate is the purpose-built option. If you need the model to also reason, call functions, or hold context across turns, GPT-Realtime-2 handles that. Pricing: GPT-Realtime-Translate is priced at $0.034 per minute. GPT-Realtime-Whisper: Streaming Transcription as People Speak GPT-Realtime-Whisper is a new streaming speech-to-text model built for low-latency speech-to-text \u2014 transcribing audio as people speak, so live products can feel faster, more responsive, and more natural. The original Whisper model was designed for completed chunks of audio, making it better suited for post-session transcription. GPT-Realtime-Whisper is the streaming counterpart, purpose-built for applications that need live output. For realtime transcription, gpt-realtime-whisper gives you controllable latency \u2014 lower delay settings produce earlier partial text, while higher delay settings can improve transcript quality. Use cases include live broadcast captions, meeting notes generated during the conversation, and voice agents that need to continuously understand the user rather than wait for turn-by-turn input. Pricing: GPT-Realtime-Whisper is priced at $0.017 per minute. Architecture Patterns and New Voices Developers can choose between three session types depending on the use case: a voice-agent session when the application needs an assistant that responds to the user, a translation session when the application needs an interpreter, and a transcription session when text from audio is needed without model-generated responses. On the voice output side, two new voices, Cedar and Marin, join the API roster exclusively with this release. All three models \u2014 GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper \u2014 are available now through the OpenAI Realtime API, which is generally available starting today. Key Takeaways GPT-Realtime-2 brings GPT-5-class reasoning to voice with a 128K context window, five-level adjustable reasoning effort, tone control, parallel tool calls, and interruption recovery On Big Bench Audio, GPT-Realtime-2 (high) scores 96.6% vs. 81.4% for GPT-Realtime-1.5; on Audio MultiChallenge, the xhigh variant scores 48.5% vs. 34.7%. GPT-Realtime-Translate handles live speech translation across 70+ input languages into 13 output languages at $0.034\/min GPT-Realtime-Whisper streams transcription in real time with controllable latency at $0.017\/min The Realtime API exits beta and goes generally available today alongside two new voices, Cedar and Marin Check out\u00a0the\u00a0Full Technical Details here.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0150k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0Connect<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-88981","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-08T16:16:25+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API\",\"datePublished\":\"2026-05-08T16:16:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\"},\"wordCount\":1020,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\",\"url\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\",\"name\":\"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-05-08T16:16:25+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/","og_locale":"es_ES","og_type":"article","og_title":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-08T16:16:25+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"5 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API","datePublished":"2026-05-08T16:16:25+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/"},"wordCount":1020,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/","url":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/","name":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-05-08T16:16:25+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/openai-releases-three-realtime-audio-models-gpt-realtime-2-gpt-realtime-translate-and-gpt-realtime-whisper-in-the-realtime-api\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"OpenAI released three new audio models through its Realtime API, each targeting a distinct capability in live voice applications: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the model releases, the Realtime API officially exits beta and is now generally available \u2014 a meaningful signal for&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/88981","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=88981"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/88981\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=88981"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=88981"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=88981"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}