{"id":29539,"date":"2025-08-05T05:56:24","date_gmt":"2025-08-05T05:56:24","guid":{"rendered":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/"},"modified":"2025-08-05T05:56:24","modified_gmt":"2025-08-05T05:56:24","slug":"google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/","title":{"rendered":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents"},"content":{"rendered":"<p>In today\u2019s data-driven world, valuable insights are often buried in unstructured text\u2014be it clinical notes, lengthy legal contracts, or customer feedback threads. Extracting meaningful, traceable information from these documents is both a technical and practical challenge.\u00a0<strong>Google AI\u2019s new open-source Python library, <a href=\"https:\/\/github.com\/google\/langextract\" target=\"_blank\" rel=\"noreferrer noopener\">LangExtract<\/a>, is designed to address this gap directly, using LLMs like Gemini to deliver powerful, automated extraction with traceability and transparency at its core.<\/strong><\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Innovations of LangExtract<\/strong><\/h3>\n<h4 class=\"wp-block-heading\"><strong>1.\u00a0Declarative and Traceable Extraction<\/strong><\/h4>\n<p>LangExtract lets users define custom extraction tasks using natural language instructions and high-quality \u201cfew-shot\u201d examples. This empowers developers and analysts to\u00a0<strong>specify exactly which entities, relationships, or facts to extract, and in what structure<\/strong>. Crucially, every extracted piece of information is\u00a0<strong>tied directly back to its source text<\/strong>\u2014enabling validation, auditing, and end-to-end traceability.<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/developers.googleblog.com\/en\/introducing-langextract-a-gemini-powered-information-extraction-library\/\"><\/a><\/p>\n<h4 class=\"wp-block-heading\"><strong>2.\u00a0Domain Versatility<\/strong><\/h4>\n<p>The library works not just in tech demos but in critical real-world domains\u2014including health (clinical notes, medical reports), finance (summaries, risk documents), law (contracts), research literature, and even the arts (analyzing Shakespeare). Original use cases include automatic extraction of medications, dosages, and administration details from clinical documents, as well as relationships and emotions from plays or literature.<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/developers.googleblog.com\/en\/introducing-langextract-a-gemini-powered-information-extraction-library\/\"><\/a><\/p>\n<h4 class=\"wp-block-heading\">3.\u00a0<strong>Schema Enforcement with LLMs<\/strong><\/h4>\n<p>Powered by Gemini and compatible with other LLMs, LangExtract enables\u00a0<strong>enforcement of custom output schemas<\/strong>\u00a0(like JSON), so results aren\u2019t just accurate\u2014they\u2019re immediately usable in downstream databases, analytics, or AI pipelines. It solves traditional LLM weaknesses around hallucination and schema drift by grounding outputs to both user instructions and actual source text.<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/github.com\/google\/langextract\"><\/a><\/p>\n<h4 class=\"wp-block-heading\">4.\u00a0<strong>Scalability and Visualization<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li><strong>Handles Large Volumes:<\/strong>\u00a0LangExtract efficiently processes long documents by chunking, parallelizing, and aggregating results.<\/li>\n<li><strong>Interactive Visualization:<\/strong>\u00a0Developers can generate interactive HTML reports, viewing each extracted entity with context by highlighting its location in the original document\u2014making auditing and error analysis seamless.<a href=\"https:\/\/developers.googleblog.com\/en\/introducing-langextract-a-gemini-powered-information-extraction-library\/\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/li>\n<li><strong>Smooth Integration:<\/strong>\u00a0Works in Google Colab, Jupyter, or as standalone HTML files, supporting a rapid feedback loop for developers and researchers.<\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\">5.\u00a0<strong>Installation and Usage<\/strong><\/h4>\n<p><strong>Install easily with pip:<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">pip install langextract\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p><strong>Example Workflow (Extracting Character Info from Shakespeare):<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import langextract as lx\nimport textwrap\n\n# 1. Define your prompt\nprompt = textwrap.dedent(\"\"\"\nExtract characters, emotions, and relationships in order of appearance.\nUse exact text for extractions. Do not paraphrase or overlap entities.\nProvide meaningful attributes for each entity to add context.\n\"\"\")\n\n# 2. Give a high-quality example\nexamples = [\n    lx.data.ExampleData(\n        text=\"ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.\",\n        extractions=[\n            lx.data.Extraction(extraction_class=\"character\", extraction_text=\"ROMEO\", attributes={\"emotional_state\": \"wonder\"}),\n            lx.data.Extraction(extraction_class=\"emotion\", extraction_text=\"But soft!\", attributes={\"feeling\": \"gentle awe\"}),\n            lx.data.Extraction(extraction_class=\"relationship\", extraction_text=\"Juliet is the sun\", attributes={\"type\": \"metaphor\"}),\n        ],\n    )\n]\n\n# 3. Extract from new text\ninput_text = \"Lady Juliet gazed longingly at the stars, her heart aching for Romeo\"\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-pro\"\n)\n\n# 4. Save and visualize results\nlx.io.save_annotated_documents([result], output_name=\"extraction_results.jsonl\")\nhtml_content = lx.visualize(\"extraction_results.jsonl\")\nwith open(\"visualization.html\", \"w\") as f:\n    f.write(html_content)\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>This results in structured, source-anchored JSON outputs, plus an interactive HTML visualization for easy review and demonstration.<\/p>\n<figure class=\"wp-block-video aligncenter\"><video controls src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/romeo_juliet_basic-1.mp4\" preload=\"none\"><\/video><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Specialized &amp; Real-World Applications<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Medicine<\/strong>: Extracts medications, dosages, timing, and links them back to source sentences. Powered by insights from research conducted on accelerating medical information extraction, LangExtract\u2019s approach is directly applicable to structuring clinical and radiology reports\u2014improving clarity and supporting interoperability.<a href=\"https:\/\/developers.googleblog.com\/en\/introducing-langextract-a-gemini-powered-information-extraction-library\/\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/li>\n<li><strong>Finance &amp; Law<\/strong>: Automatically pulls relevant clauses, terms, or risks from dense legal or financial text, ensuring every output can be traced back to its context.<\/li>\n<li><strong>Research &amp; Data Mining<\/strong>: Streamlines high-throughput extraction from thousands of scientific papers.<\/li>\n<\/ul>\n<p>The team even provides a demonstration called\u00a0<em>RadExtract<\/em>\u00a0for structuring radiology reports\u2014highlighting not just what was extracted, but exactly where the information appeared in the original input.<\/p>\n<h3 class=\"wp-block-heading\"><strong>How LangExtract Compares<\/strong><\/h3>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Feature<\/th>\n<th>Traditional Approaches<\/th>\n<th>LangExtract Approach<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Schema Consistency<\/strong><\/td>\n<td>Often manual\/error-prone<\/td>\n<td>Enforced via instructions &amp; few-shot examples<\/td>\n<\/tr>\n<tr>\n<td><strong>Result Traceability<\/strong><\/td>\n<td>Minimal<\/td>\n<td>All output linked to input text<\/td>\n<\/tr>\n<tr>\n<td><strong>Scaling to Long Texts<\/strong><\/td>\n<td>Windowed, lossy<\/td>\n<td>Chunked + parallel extraction, then aggregation<\/td>\n<\/tr>\n<tr>\n<td><strong>Visualization<\/strong><\/td>\n<td>Custom, usually absent<\/td>\n<td>Built-in, interactive HTML reports<\/td>\n<\/tr>\n<tr>\n<td><strong>Deployment<\/strong><\/td>\n<td>Rigid, model-specific<\/td>\n<td>Gemini-first, open to other LLMs &amp; on-premises<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h3 class=\"wp-block-heading\"><strong>In Summary<\/strong><\/h3>\n<p>LangExtract presents a new era for extracting structured, actionable data from text\u2014delivering:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Declarative, explainable extraction<\/strong><\/li>\n<li><strong>Traceable results backed by source context<\/strong><\/li>\n<li><strong>Instant visualization for rapid iteration<\/strong><\/li>\n<li><strong>Easy integration into any Python workflow<\/strong><\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/google\/langextract\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a><\/strong>\u00a0and\u00a0<strong><a href=\"https:\/\/developers.googleblog.com\/en\/introducing-langextract-a-gemini-powered-information-extraction-library\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical Blog<\/a><em>.<\/em><\/strong>\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/04\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\">Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In today\u2019s data-driven world, valuable insights are often buried in unstructured text\u2014be it clinical notes, lengthy legal contracts, or customer feedback threads. Extracting meaningful, traceable information from these documents is both a technical and practical challenge.\u00a0Google AI\u2019s new open-source Python library, LangExtract, is designed to address this gap directly, using LLMs like Gemini to deliver powerful, automated extraction with traceability and transparency at its core. Key Innovations of LangExtract 1.\u00a0Declarative and Traceable Extraction LangExtract lets users define custom extraction tasks using natural language instructions and high-quality \u201cfew-shot\u201d examples. This empowers developers and analysts to\u00a0specify exactly which entities, relationships, or facts to extract, and in what structure. Crucially, every extracted piece of information is\u00a0tied directly back to its source text\u2014enabling validation, auditing, and end-to-end traceability. 2.\u00a0Domain Versatility The library works not just in tech demos but in critical real-world domains\u2014including health (clinical notes, medical reports), finance (summaries, risk documents), law (contracts), research literature, and even the arts (analyzing Shakespeare). Original use cases include automatic extraction of medications, dosages, and administration details from clinical documents, as well as relationships and emotions from plays or literature. 3.\u00a0Schema Enforcement with LLMs Powered by Gemini and compatible with other LLMs, LangExtract enables\u00a0enforcement of custom output schemas\u00a0(like JSON), so results aren\u2019t just accurate\u2014they\u2019re immediately usable in downstream databases, analytics, or AI pipelines. It solves traditional LLM weaknesses around hallucination and schema drift by grounding outputs to both user instructions and actual source text. 4.\u00a0Scalability and Visualization Handles Large Volumes:\u00a0LangExtract efficiently processes long documents by chunking, parallelizing, and aggregating results. Interactive Visualization:\u00a0Developers can generate interactive HTML reports, viewing each extracted entity with context by highlighting its location in the original document\u2014making auditing and error analysis seamless. Smooth Integration:\u00a0Works in Google Colab, Jupyter, or as standalone HTML files, supporting a rapid feedback loop for developers and researchers. 5.\u00a0Installation and Usage Install easily with pip: Copy CodeCopiedUse a different Browser pip install langextract Example Workflow (Extracting Character Info from Shakespeare): Copy CodeCopiedUse a different Browser import langextract as lx import textwrap # 1. Define your prompt prompt = textwrap.dedent(&#8220;&#8221;&#8221; Extract characters, emotions, and relationships in order of appearance. Use exact text for extractions. Do not paraphrase or overlap entities. Provide meaningful attributes for each entity to add context. &#8220;&#8221;&#8221;) # 2. Give a high-quality example examples = [ lx.data.ExampleData( text=&#8221;ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.&#8221;, extractions=[ lx.data.Extraction(extraction_class=&#8221;character&#8221;, extraction_text=&#8221;ROMEO&#8221;, attributes={&#8220;emotional_state&#8221;: &#8220;wonder&#8221;}), lx.data.Extraction(extraction_class=&#8221;emotion&#8221;, extraction_text=&#8221;But soft!&#8221;, attributes={&#8220;feeling&#8221;: &#8220;gentle awe&#8221;}), lx.data.Extraction(extraction_class=&#8221;relationship&#8221;, extraction_text=&#8221;Juliet is the sun&#8221;, attributes={&#8220;type&#8221;: &#8220;metaphor&#8221;}), ], ) ] # 3. Extract from new text input_text = &#8220;Lady Juliet gazed longingly at the stars, her heart aching for Romeo&#8221; result = lx.extract( text_or_documents=input_text, prompt_description=prompt, examples=examples, model_id=&#8221;gemini-2.5-pro&#8221; ) # 4. Save and visualize results lx.io.save_annotated_documents([result], output_name=&#8221;extraction_results.jsonl&#8221;) html_content = lx.visualize(&#8220;extraction_results.jsonl&#8221;) with open(&#8220;visualization.html&#8221;, &#8220;w&#8221;) as f: f.write(html_content) This results in structured, source-anchored JSON outputs, plus an interactive HTML visualization for easy review and demonstration. Specialized &amp; Real-World Applications Medicine: Extracts medications, dosages, timing, and links them back to source sentences. Powered by insights from research conducted on accelerating medical information extraction, LangExtract\u2019s approach is directly applicable to structuring clinical and radiology reports\u2014improving clarity and supporting interoperability. Finance &amp; Law: Automatically pulls relevant clauses, terms, or risks from dense legal or financial text, ensuring every output can be traced back to its context. Research &amp; Data Mining: Streamlines high-throughput extraction from thousands of scientific papers. The team even provides a demonstration called\u00a0RadExtract\u00a0for structuring radiology reports\u2014highlighting not just what was extracted, but exactly where the information appeared in the original input. How LangExtract Compares Feature Traditional Approaches LangExtract Approach Schema Consistency Often manual\/error-prone Enforced via instructions &amp; few-shot examples Result Traceability Minimal All output linked to input text Scaling to Long Texts Windowed, lossy Chunked + parallel extraction, then aggregation Visualization Custom, usually absent Built-in, interactive HTML reports Deployment Rigid, model-specific Gemini-first, open to other LLMs &amp; on-premises In Summary LangExtract presents a new era for extracting structured, actionable data from text\u2014delivering: Declarative, explainable extraction Traceable results backed by source context Instant visualization for rapid iteration Easy integration into any Python workflow Check out the\u00a0GitHub Page\u00a0and\u00a0Technical Blog.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-29539","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-05T05:56:24+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents\",\"datePublished\":\"2025-08-05T05:56:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\"},\"wordCount\":647,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\",\"url\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\",\"name\":\"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-08-05T05:56:24+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/","og_locale":"es_ES","og_type":"article","og_title":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-05T05:56:24+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"4 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents","datePublished":"2025-08-05T05:56:24+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/"},"wordCount":647,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/","url":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/","name":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-08-05T05:56:24+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/google-ai-releases-langextract-an-open-source-python-library-that-extracts-structured-data-from-unstructured-text-documents\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In today\u2019s data-driven world, valuable insights are often buried in unstructured text\u2014be it clinical notes, lengthy legal contracts, or customer feedback threads. Extracting meaningful, traceable information from these documents is both a technical and practical challenge.\u00a0Google AI\u2019s new open-source Python library, LangExtract, is designed to address this gap directly, using LLMs like Gemini to deliver&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/29539","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=29539"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/29539\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=29539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=29539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=29539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}