{"id":32266,"date":"2025-08-17T06:07:29","date_gmt":"2025-08-17T06:07:29","guid":{"rendered":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/"},"modified":"2025-08-17T06:07:29","modified_gmt":"2025-08-17T06:07:29","slug":"meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/","title":{"rendered":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing"},"content":{"rendered":"<p><strong><a href=\"https:\/\/github.com\/rednote-hilab\/dots.ocr?tab=readme-ov-file\" target=\"_blank\" rel=\"noreferrer noopener\">dots.ocr<\/a><\/strong> is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architecture<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Unified Model:<\/strong> dots.ocr combines layout detection and content recognition into a single transformer-based neural network. This eliminates the complexity of separate detection and OCR pipelines, allowing users to switch tasks by adjusting input prompts.<\/li>\n<li><strong>Parameters:<\/strong> The model contains 1.7 billion parameters, balancing computational efficiency with performance for most practical scenarios.<\/li>\n<li><strong>Input Flexibility:<\/strong> Inputs can be image files or PDF documents. The model features preprocessing options (such as fitz_preprocess) for optimizing quality on low-resolution or dense multi-page files.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Capabilities<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Multilingual:<\/strong> dots.ocr is trained on datasets spanning more than 100 languages, including major world languages and less common scripts, reflecting broad multilingual support.<\/li>\n<li><strong>Content Extraction:<\/strong> The model extracts plain text, tabular data, mathematical formulas (in LaTeX), and preserves reading order within documents. Output formats include structured JSON, Markdown, and HTML, depending on the layout and content type.<\/li>\n<li><strong>Preserves Structure:<\/strong> dots.ocr maintains document structure, including table boundaries, formula regions, and image placements, ensuring extracted data remains faithful to the original document.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Benchmark Performance<\/strong><\/h3>\n<p>dots.ocr has been evaluated against modern document AI systems, with results summarized below:<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>dots.ocr<\/th>\n<th>Gemini2.5-Pro<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Table TEDS accuracy<\/td>\n<td>88.6%<\/td>\n<td>85.8%<\/td>\n<\/tr>\n<tr>\n<td>Text edit distance<\/td>\n<td>0.032<\/td>\n<td>0.055<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<ul class=\"wp-block-list\">\n<li><strong>Tables:<\/strong> Outperforms Gemini2.5-Pro in table parsing accuracy.<\/li>\n<li><strong>Text:<\/strong> Demonstrates lower text edit distance (indicating higher precision).<\/li>\n<li><strong>Formulas and Layout:<\/strong> Matches or exceeds leading models in formula recognition and document structure reconstruction.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"548\" data-attachment-id=\"73682\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/16\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/screenshot-2025-08-16-at-10-18-43-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1.png\" data-orig-size=\"2088,1118\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-16 at 10.18.43\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-300x161.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548.png\" alt=\"\" class=\"wp-image-73682\" \/><figcaption class=\"wp-element-caption\">https:\/\/github.com\/rednote-hilab\/dots.ocr\/blob\/master\/assets\/blog.md<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Deployment and Integration<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Open-Source:<\/strong> Released under the MIT license, with source, documentation, and pre-trained models available on GitHub. The repository provides installation instructions for pip, Conda, and Docker-based deployments.<\/li>\n<li><strong>API and Scripting:<\/strong> Supports flexible task configuration via prompt templates. The model can be used interactively or within automated pipelines for batch document processing.<\/li>\n<li><strong>Output Formats:<\/strong> Extracted results are supplied in structured JSON for programmatic use, with options for Markdown and HTML where appropriate. Visualization scripts enable inspection of detected layouts.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>dots.ocr provides a technical solution for high-accuracy, multilingual document parsing by unifying layout detection and content recognition in a single, open-source model. It is particularly suited for scenarios requiring robust, language-agnostic document analysis and structured information extraction in resource-constrained or production environments.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/rednote-hilab\/dots.ocr?tab=readme-ov-file\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a><\/strong>. Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/95xaxi6d7td.typeform.com\/to\/jhs8ftBd\" target=\"_blank\" rel=\"noreferrer noopener\">Partner with Marktechpost for Promotion<\/a><\/div>\n<\/div>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/16\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\">Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types. Architecture Unified Model: dots.ocr combines layout detection and content recognition into a single transformer-based neural network. This eliminates the complexity of separate detection and OCR pipelines, allowing users to switch tasks by adjusting input prompts. Parameters: The model contains 1.7 billion parameters, balancing computational efficiency with performance for most practical scenarios. Input Flexibility: Inputs can be image files or PDF documents. The model features preprocessing options (such as fitz_preprocess) for optimizing quality on low-resolution or dense multi-page files. Capabilities Multilingual: dots.ocr is trained on datasets spanning more than 100 languages, including major world languages and less common scripts, reflecting broad multilingual support. Content Extraction: The model extracts plain text, tabular data, mathematical formulas (in LaTeX), and preserves reading order within documents. Output formats include structured JSON, Markdown, and HTML, depending on the layout and content type. Preserves Structure: dots.ocr maintains document structure, including table boundaries, formula regions, and image placements, ensuring extracted data remains faithful to the original document. Benchmark Performance dots.ocr has been evaluated against modern document AI systems, with results summarized below: Benchmark dots.ocr Gemini2.5-Pro Table TEDS accuracy 88.6% 85.8% Text edit distance 0.032 0.055 Tables: Outperforms Gemini2.5-Pro in table parsing accuracy. Text: Demonstrates lower text edit distance (indicating higher precision). Formulas and Layout: Matches or exceeds leading models in formula recognition and document structure reconstruction. https:\/\/github.com\/rednote-hilab\/dots.ocr\/blob\/master\/assets\/blog.md Deployment and Integration Open-Source: Released under the MIT license, with source, documentation, and pre-trained models available on GitHub. The repository provides installation instructions for pip, Conda, and Docker-based deployments. API and Scripting: Supports flexible task configuration via prompt templates. The model can be used interactively or within automated pipelines for batch document processing. Output Formats: Extracted results are supplied in structured JSON for programmatic use, with options for Markdown and HTML where appropriate. Visualization scripts enable inspection of detected layouts. Conclusion dots.ocr provides a technical solution for high-accuracy, multilingual document parsing by unifying layout detection and content recognition in a single, open-source model. It is particularly suited for scenarios requiring robust, language-agnostic document analysis and structured information extraction in resource-constrained or production environments. Check out the\u00a0GitHub Page. Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Partner with Marktechpost for Promotion The post Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":32267,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-32266","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-17T06:07:29+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"2\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing\",\"datePublished\":\"2025-08-17T06:07:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\"},\"wordCount\":483,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\",\"url\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\",\"name\":\"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png\",\"datePublished\":\"2025-08-17T06:07:29+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png\",\"width\":1024,\"height\":548},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/","og_locale":"de_DE","og_type":"article","og_title":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-17T06:07:29+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"2\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing","datePublished":"2025-08-17T06:07:29+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/"},"wordCount":483,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/","url":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/","name":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png","datePublished":"2025-08-17T06:07:29+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png","width":1024,"height":548},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/meet-dots-ocr-a-new-1-7b-vision-language-model-that-achieves-sota-performance-on-multilingual-document-parsing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-300x161.png",300,161,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn.png",1024,548,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-18x10.png",18,10,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-600x321.png",600,321,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-16-at-10.18.43-AM-1-1024x548-cTWnRn-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types. Architecture Unified Model: dots.ocr combines layout detection and content recognition into&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/32266","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=32266"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/32266\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/32267"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=32266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=32266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=32266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}