{"id":37870,"date":"2025-09-13T06:33:08","date_gmt":"2025-09-13T06:33:08","guid":{"rendered":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/"},"modified":"2025-09-13T06:33:08","modified_gmt":"2025-09-13T06:33:08","slug":"how-do-ai-models-generate-videos","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/how-do-ai-models-generate-videos\/","title":{"rendered":"How do AI models generate videos?"},"content":{"rendered":"<p>MIT Technology Review<em> Explains: Let our writers untangle the complex, messy world of technology to help you understand what\u2019s coming next. <\/em><a href=\"https:\/\/www.technologyreview.com\/tag\/tech-review-explains\"><em>You can read more from the series here<\/em><\/a><em>.<\/em><\/p>\n<p>It\u2019s been a big year for video generation. In the last nine months <a href=\"https:\/\/www.technologyreview.com\/2024\/12\/09\/1108309\/how-to-use-sora-openais-new-video-generating-tool\/\">OpenAI made Sora public<\/a>, <a href=\"https:\/\/www.technologyreview.com\/2025\/07\/15\/1120156\/googles-generative-video-model-veo-3-has-a-subtitles-problem\/\">Google DeepMind launched Veo 3<\/a>, the video startup Runway launched Gen-4. All can produce video clips that are <a href=\"https:\/\/www.technologyreview.com\/2025\/07\/22\/1120556\/five-things-to-know-ai\/\">(almost) impossible to distinguish from actual filmed footage<\/a> or CGI animation. This year also saw Netflix debut an AI visual effect in its show <em>The Eternaut<\/em>, the first time video generation has been used to make mass-market TV.<\/p>\n<p>Sure, the clips you see in demo reels are <a href=\"https:\/\/www.technologyreview.com\/2024\/02\/15\/1088401\/openai-amazing-new-generative-ai-video-model-sora\/\">cherry-picked to showcase a company\u2019s models<\/a> at the top of their game. But with the technology in the hands of more users than ever before\u2014Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers\u2014even the most casual filmmaker can now knock out something remarkable.\u00a0<\/p>\n<p>The downside is that creators are competing with AI slop, and social media feeds are filling up\u00a0with faked news footage. Video generation also uses up a <a href=\"https:\/\/www.technologyreview.com\/supertopic\/ai-energy-package\/\">huge amount of energy<\/a>, many times more than text or image generation.\u00a0<\/p>\n<p>With AI-generated videos everywhere, let\u2019s take a moment to talk about the tech that makes them work.<\/p>\n<h3 class=\"wp-block-heading\"><strong>How do you generate a video?<\/strong><\/h3>\n<p>Let\u2019s assume you\u2019re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: \u201cHey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.\u201d What you get back will be hit or miss, and you\u2019ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted.\u00a0<\/p>\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<\/div>\n<\/figure>\n<p>So what\u2019s going on under the hood? Why is it hit or miss\u2014and why does it take so much energy? The latest wave of video generation models are what\u2019s known as <strong>latent diffusion transformers<\/strong>. Yes, that\u2019s quite a mouthful. Let\u2019s unpack each part in turn, starting with diffusion.\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>What\u2019s a diffusion model?<\/strong><\/h3>\n<p>Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set.\u00a0<\/p>\n<p>A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes.\u00a0<\/p>\n<p>The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set.\u00a0<\/p>\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<\/div>\n<\/figure>\n<p>But you don\u2019t want any image\u2014you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model\u2014such as a large language model (LLM) trained to match images with text descriptions\u2014that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt.\u00a0<\/p>\n<p>An aside: This LLM isn\u2019t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it\u2019s represented online, distorted by prejudice (and pornography).<\/p>\n<p>It\u2019s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, <a href=\"https:\/\/www.technologyreview.com\/2025\/04\/16\/1114433\/ai-artificial-intelligence-music-diffusion-creativity-songs-writer\">including audio<\/a> and video. To generate movie clips, a diffusion model must clean up sequences of images\u2014the consecutive frames of a video\u2014instead of just one image.\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>What\u2019s a latent diffusion model?<\/strong>\u00a0<\/h3>\n<p>All this takes a huge amount of compute (read: energy). That\u2019s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data\u2014the millions of pixels in each video frame\u2014the model works in what\u2019s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest.\u00a0<\/p>\n<p>A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video.\u00a0<\/p>\n<p>And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user\u2019s prompt, the compressed video gets converted into something you can watch.\u00a0\u00a0<\/p>\n<p>With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses <a href=\"https:\/\/www.technologyreview.com\/2025\/05\/20\/1116327\/ai-energy-usage-climate-footprint-big-tech\/\">more energy than image or text generation<\/a>. There\u2019s just an eye-popping amount of computation involved.)\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>What\u2019s a latent diffusion transformer?<\/strong><\/h3>\n<p>Still with me? There\u2019s one more piece to the puzzle\u2014and that\u2019s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video.\u00a0<\/p>\n<p>Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as <a href=\"https:\/\/www.technologyreview.com\/2025\/08\/07\/1121308\/gpt-5-is-here-now-what\/\">OpenAI\u2019s GPT-5<\/a> and <a href=\"https:\/\/www.technologyreview.com\/2024\/02\/08\/1087911\/googles-gemini-is-now-in-everything-heres-how-you-can-try-it-out\/\">Google DeepMind\u2019s Gemini<\/a>, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences.\u00a0<\/p>\n<p>But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. \u201cIt\u2019s like if you were to have a stack of all the video frames and you cut little cubes from it,\u201d says Tim Brooks, a lead researcher on Sora.<\/p>\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<\/div><figcaption class=\"wp-element-caption\">A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan<\/figcaption><\/figure>\n<p>Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don\u2019t pop in and out of existence, for example.\u00a0<\/p>\n<p>And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats.\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>What about the audio?\u00a0<\/strong><\/h3>\n<p>A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That\u2019s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at <a href=\"https:\/\/www.technologyreview.com\/2025\/05\/21\/1117251\/by-putting-ai-into-everything-google-wants-to-make-it-invisible\/\">this year\u2019s Google I\/O<\/a>: \u201cWe\u2019re emerging from the silent era of video generation.\u201d\u00a0<\/p>\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<\/div>\n<\/figure>\n<p>The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind\u2019s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.\u00a0\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>You said that diffusion models can generate different kinds of data. Is this how LLMs work too?\u00a0<\/strong><\/h3>\n<p>No\u2014or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models\u2014which generate text (including computer code)\u2014are built using transformers. But the lines are blurring. We\u2019ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text.\u00a0<\/p>\n<p>Here\u2019s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind\u2019s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!<\/p>","protected":false},"excerpt":{"rendered":"<p>MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what\u2019s coming next. You can read more from the series here. It\u2019s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV. Sure, the clips you see in demo reels are cherry-picked to showcase a company\u2019s models at the top of their game. But with the technology in the hands of more users than ever before\u2014Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers\u2014even the most casual filmmaker can now knock out something remarkable.\u00a0 The downside is that creators are competing with AI slop, and social media feeds are filling up\u00a0with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation.\u00a0 With AI-generated videos everywhere, let\u2019s take a moment to talk about the tech that makes them work. How do you generate a video? Let\u2019s assume you\u2019re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: \u201cHey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.\u201d What you get back will be hit or miss, and you\u2019ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted.\u00a0 So what\u2019s going on under the hood? Why is it hit or miss\u2014and why does it take so much energy? The latest wave of video generation models are what\u2019s known as latent diffusion transformers. Yes, that\u2019s quite a mouthful. Let\u2019s unpack each part in turn, starting with diffusion.\u00a0 What\u2019s a diffusion model? Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set.\u00a0 A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes.\u00a0 The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set.\u00a0 But you don\u2019t want any image\u2014you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model\u2014such as a large language model (LLM) trained to match images with text descriptions\u2014that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt.\u00a0 An aside: This LLM isn\u2019t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it\u2019s represented online, distorted by prejudice (and pornography). It\u2019s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images\u2014the consecutive frames of a video\u2014instead of just one image.\u00a0 What\u2019s a latent diffusion model?\u00a0 All this takes a huge amount of compute (read: energy). That\u2019s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data\u2014the millions of pixels in each video frame\u2014the model works in what\u2019s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest.\u00a0 A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video.\u00a0 And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user\u2019s prompt, the compressed video gets converted into something you can watch.\u00a0\u00a0 With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There\u2019s just an eye-popping amount of computation involved.)\u00a0 What\u2019s a latent diffusion transformer? Still with me? There\u2019s one more piece to the puzzle\u2014and that\u2019s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-37870","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How do AI models generate videos? - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/how-do-ai-models-generate-videos\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How do AI models generate videos? - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/how-do-ai-models-generate-videos\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-13T06:33:08+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"8\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How do AI models generate videos?\",\"datePublished\":\"2025-09-13T06:33:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/\"},\"wordCount\":1639,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/\",\"url\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/\",\"name\":\"How do AI models generate videos? - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-09-13T06:33:08+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How do AI models generate videos?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How do AI models generate videos? - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/how-do-ai-models-generate-videos\/","og_locale":"de_DE","og_type":"article","og_title":"How do AI models generate videos? - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/how-do-ai-models-generate-videos\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-09-13T06:33:08+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"8\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How do AI models generate videos?","datePublished":"2025-09-13T06:33:08+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/"},"wordCount":1639,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/","url":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/","name":"How do AI models generate videos? - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-09-13T06:33:08+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-do-ai-models-generate-videos\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-do-ai-models-generate-videos\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How do AI models generate videos?"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what\u2019s coming next. You can read more from the series here. It\u2019s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/37870","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=37870"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/37870\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=37870"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=37870"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=37870"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}