{"id":80957,"date":"2026-04-03T14:51:56","date_gmt":"2026-04-03T14:51:56","guid":{"rendered":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/"},"modified":"2026-04-03T14:51:56","modified_gmt":"2026-04-03T14:51:56","slug":"tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/","title":{"rendered":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts"},"content":{"rendered":"<p>In the current landscape of computer vision, the standard operating procedure involves a modular \u2018Lego-brick\u2019 approach: a pre-trained vision encoder for feature extraction paired with a separate decoder for task prediction. While effective, this architectural separation complicates scaling and bottlenecks the interaction between language and vision.<\/p>\n<p>The <strong>Technology Innovation Institute (TII)<\/strong> research team is challenging this paradigm with <strong>Falcon Perception<\/strong>, a 600M-parameter unified dense Transformer. By processing image patches and text tokens in a shared parameter space from the very first layer, TII research team has developed an <strong>early-fusion<\/strong> stack that handles perception and task modeling with extreme efficiency.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1384\" height=\"1066\" data-attachment-id=\"78783\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/03\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/screenshot-2026-04-03-at-1-47-18-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1.png\" data-orig-size=\"1384,1066\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-03 at 1.47.18\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-300x231.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-1024x789.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1.png\" alt=\"\" class=\"wp-image-78783\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.27365<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Architecture: A Single Stack for Every Modality<\/strong><\/h3>\n<p>The core design of Falcon Perception is built on the hypothesis that a single Transformer can simultaneously learn visual representations and perform task-specific generation<sup><\/sup>.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Hybrid Attention and GGROPE<\/strong><\/h4>\n<p>Unlike standard language models that use strict causal masking, Falcon Perception employs a <strong>hybrid attention strategy<\/strong><sup><\/sup><sup><\/sup><sup><\/sup>. Image tokens attend to each other bidirectionally to build a global visual context, while text and task tokens attend to all preceding tokens (causal masking) to enable autoregressive prediction<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<p>To maintain 2D spatial relationships in a flattened sequence, the research team uses <strong>3D Rotary Positional Embeddings<\/strong>. This decomposes the head dimension into a sequential component and a spatial component using <strong>Golden Gate ROPE (GGROPE)<\/strong>. GGROPE allows attention heads to attend to relative positions along arbitrary angles, making the model robust to rotation and aspect ratio variations.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Minimalist Sequence Logic<\/strong><\/h4>\n<p>The basic architectural sequence follows a <strong>Chain-of-Perception<\/strong> format:<\/p>\n<p><code>[Image] [Text] &lt;coord&gt; &lt;size&gt; &lt;seg&gt; ... &lt;eos&gt;<\/code><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<p>This ensures that the model resolves spatial ambiguity (position and size) as a conditioning signal before generating the final segmentation mask<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Engineering for Scale: Muon, FlexAttention, and Raster Ordering<\/strong><\/h3>\n<p>TII research team introduced several optimizations to stabilize training and maximize GPU utilization for these heterogeneous sequences.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Muon Optimization:<\/strong> The research team report that employing the <strong>Muon optimizer<\/strong> for specialized heads (coordinates, size, and segmentation) led to lower training losses and improved performance on benchmarks compared to standard AdamW.<\/li>\n<li><strong>FlexAttention and Sequence Packing:<\/strong> To process images at native resolutions without wasting compute on padding, the model uses a <strong>scatter-and-pack strategy<\/strong>. Valid patches are packed into fixed-length blocks, and <strong>FlexAttention<\/strong> is used to restrict self-attention within each image sample\u2019s boundaries.<\/li>\n<li><strong>Raster Ordering:<\/strong> When multiple objects are present, Falcon Perception predicts them in <strong>raster order<\/strong> (top-to-bottom, left-to-right). This was found to converge faster and produce lower coordinate loss than random or size-based ordering.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>The Training Recipe: Distillation to 685GT<\/strong><\/h3>\n<p>The model uses <strong>multi-teacher distillation<\/strong> for initialization, distilling knowledge from <strong>DINOv3 (ViT-H)<\/strong> for local features and <strong>SigLIP2 (So400m)<\/strong> for language-aligned features<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>. Following initialization, the model undergoes a <strong>three-stage perception training pipeline<\/strong> totaling approximately <strong>685 Gigatokens (GT)<\/strong><sup><\/sup>:<\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>In-Context Listing (450 GT):<\/strong> Learning to \u2018list\u2019 the scene inventory to build global context.<\/li>\n<li><strong>Task Alignment (225 GT):<\/strong> Transitioning to independent-query tasks using <strong>Query Masking<\/strong> to ensure the model grounds each query solely on the image.<\/li>\n<li><strong>Long-Context Finetuning (10 GT):<\/strong> Short adaptation for extreme density, increasing the mask limit to 600 per expression.<\/li>\n<\/ol>\n<p><strong>During these stages, the task-specific serialization is used:<\/strong><\/p>\n<p><code>&lt;image&gt;expr1&lt;present&gt;&lt;coord&gt;&lt;size&gt;&lt;seg&gt; &lt;eoq&gt;expr2&lt;absent&gt; &lt;eoq&gt; &lt;eos&gt;<\/code><sup><\/sup>.<\/p>\n<p>The <code>&lt;present&gt;<\/code> and <code>&lt;absent&gt;<\/code> tokens force the model to commit to a binary decision on an object\u2019s existence before localization<sup><\/sup>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>PBench: Profiling Capabilities Beyond Saturated Baselines<\/strong><\/h3>\n<p>To measure progress, TII research team introduced <strong>PBench<\/strong>, a benchmark that organizes samples into five levels of semantic complexity to disentangle model failure modes.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Main Results: Falcon Perception vs. SAM 3 (Macro-<em>F1<\/em>)<\/strong><\/h4>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Benchmark Split<\/strong><\/td>\n<td><strong>SAM 3<\/strong><\/td>\n<td><strong>Falcon Perception (600M)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>L0: Simple Objects<\/strong><\/td>\n<td>64.3<\/td>\n<td><strong>65.1<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>L1: Attributes<\/strong><\/td>\n<td>54.4<\/td>\n<td><strong>63.6<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>L2: OCR-Guided<\/strong><\/td>\n<td>24.6<\/td>\n<td><strong>38.0<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>L3: Spatial Understanding<\/strong><\/td>\n<td>31.6<\/td>\n<td><strong>53.5<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>L4: Relations<\/strong><\/td>\n<td>33.3<\/td>\n<td><strong>49.1<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Dense Split<\/strong><\/td>\n<td>58.4<\/td>\n<td><strong>72.6<\/strong><\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>Falcon Perception significantly outperforms SAM 3 on complex semantic tasks, particularly showing a <strong>+21.9 point gain<\/strong> on spatial understanding (Level 3)<sup><\/sup>.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"2206\" height=\"784\" data-attachment-id=\"78784\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/03\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/screenshot-2026-04-03-at-1-48-57-am\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.48.57-AM.png\" data-orig-size=\"2206,784\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-03 at 1.48.57\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.48.57-AM-300x107.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.48.57-AM-1024x364.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.48.57-AM.png\" alt=\"\" class=\"wp-image-78784\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.27365<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>FalconOCR: The 300M Document specialist<\/strong><\/h3>\n<p>TII team also extended this early-fusion recipe to <strong>FalconOCR<\/strong>, a compact <strong>300M-parameter<\/strong> model initialized from scratch to prioritize fine-grained glyph recognition. <strong>FalconOCR is competitive with several larger proprietary and modular OCR systems:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>olmOCR:<\/strong> Achieves <strong>80.3% accuracy<\/strong>, matching or exceeding Gemini 3 Pro (80.2%) and GPT 5.2 (69.8%).<\/li>\n<li><strong>OmniDocBench:<\/strong> Reaches an overall score of <strong>88.64<\/strong>, ahead of GPT 5.2 (86.56) and Mistral OCR 3 (85.20), though it trails the top modular pipeline PaddleOCR VL 1.5 (94.37).<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Unified Early-Fusion Architecture<\/strong>: Falcon Perception replaces modular encoder-decoder pipelines with a single dense Transformer that processes image patches and text tokens in a shared parameter space from the first layer. It utilizes a hybrid attention mask\u2014bidirectional for visual tokens and causal for task tokens\u2014to act simultaneously as a vision encoder and an autoregressive decoder.<\/li>\n<li><strong>Chain-of-Perception Sequence<\/strong>: The model serializes instance segmentation into a structured sequence (\u27e8coord\u27e9\u2192\u27e8size\u27e9\u2192\u27e8seg\u27e9)(langle coordrangle rightarrow langle sizerangle rightarrow langle segrangle), which forces it to resolve spatial position and size as a conditioning signal before generating the pixel-level mask.<\/li>\n<li><strong>Specialized Heads and GGROPE<\/strong>: To manage dense spatial data, the model uses Fourier Feature encoders for high-dimensional coordinate mapping and Golden Gate ROPE (GGROPE) to enable isotropic 2D spatial attention. The Muon optimizer is employed for these specialized heads to balance learning rates against the pre-trained backbone.<\/li>\n<li><strong>Semantic Performance Gains<\/strong>: On the new PBench benchmark, which disentangles semantic capabilities (Levels 0-4), the 600M model demonstrates significant gains over SAM 3 in complex categories, including a +13.4 point lead in OCR-guided queries and a +21.9 point lead in spatial understanding.<\/li>\n<li><strong>High-Efficiency OCR Extension<\/strong>: The architecture scales down to Falcon OCR, a 300M-parameter model that achieves 80.3% on olmOCR and 88.64 on OmniDocBench. It matches or exceeds the accuracy of much larger systems like Gemini 3 Pro and GPT 5.2 while maintaining high throughput for large-scale document processing.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2603.27365\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/huggingface.co\/tiiuae\/Falcon-Perception\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weight<\/a>, <a href=\"https:\/\/github.com\/tiiuae\/falcon-perception\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a> <\/strong>and <strong><a href=\"https:\/\/huggingface.co\/blog\/tiiuae\/falcon-perception\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>. \u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/03\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\">TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In the current landscape of computer vision, the standard operating procedure involves a modular \u2018Lego-brick\u2019 approach: a pre-trained vision encoder for feature extraction paired with a separate decoder for task prediction. While effective, this architectural separation complicates scaling and bottlenecks the interaction between language and vision. The Technology Innovation Institute (TII) research team is challenging this paradigm with Falcon Perception, a 600M-parameter unified dense Transformer. By processing image patches and text tokens in a shared parameter space from the very first layer, TII research team has developed an early-fusion stack that handles perception and task modeling with extreme efficiency. https:\/\/arxiv.org\/pdf\/2603.27365 The Architecture: A Single Stack for Every Modality The core design of Falcon Perception is built on the hypothesis that a single Transformer can simultaneously learn visual representations and perform task-specific generation. Hybrid Attention and GGROPE Unlike standard language models that use strict causal masking, Falcon Perception employs a hybrid attention strategy. Image tokens attend to each other bidirectionally to build a global visual context, while text and task tokens attend to all preceding tokens (causal masking) to enable autoregressive prediction. To maintain 2D spatial relationships in a flattened sequence, the research team uses 3D Rotary Positional Embeddings. This decomposes the head dimension into a sequential component and a spatial component using Golden Gate ROPE (GGROPE). GGROPE allows attention heads to attend to relative positions along arbitrary angles, making the model robust to rotation and aspect ratio variations. Minimalist Sequence Logic The basic architectural sequence follows a Chain-of-Perception format: [Image] [Text] &lt;coord&gt; &lt;size&gt; &lt;seg&gt; &#8230; &lt;eos&gt;. This ensures that the model resolves spatial ambiguity (position and size) as a conditioning signal before generating the final segmentation mask. Engineering for Scale: Muon, FlexAttention, and Raster Ordering TII research team introduced several optimizations to stabilize training and maximize GPU utilization for these heterogeneous sequences. Muon Optimization: The research team report that employing the Muon optimizer for specialized heads (coordinates, size, and segmentation) led to lower training losses and improved performance on benchmarks compared to standard AdamW. FlexAttention and Sequence Packing: To process images at native resolutions without wasting compute on padding, the model uses a scatter-and-pack strategy. Valid patches are packed into fixed-length blocks, and FlexAttention is used to restrict self-attention within each image sample\u2019s boundaries. Raster Ordering: When multiple objects are present, Falcon Perception predicts them in raster order (top-to-bottom, left-to-right). This was found to converge faster and produce lower coordinate loss than random or size-based ordering. The Training Recipe: Distillation to 685GT The model uses multi-teacher distillation for initialization, distilling knowledge from DINOv3 (ViT-H) for local features and SigLIP2 (So400m) for language-aligned features. Following initialization, the model undergoes a three-stage perception training pipeline totaling approximately 685 Gigatokens (GT): In-Context Listing (450 GT): Learning to \u2018list\u2019 the scene inventory to build global context. Task Alignment (225 GT): Transitioning to independent-query tasks using Query Masking to ensure the model grounds each query solely on the image. Long-Context Finetuning (10 GT): Short adaptation for extreme density, increasing the mask limit to 600 per expression. During these stages, the task-specific serialization is used: &lt;image&gt;expr1&lt;present&gt;&lt;coord&gt;&lt;size&gt;&lt;seg&gt; &lt;eoq&gt;expr2&lt;absent&gt; &lt;eoq&gt; &lt;eos&gt;. The &lt;present&gt; and &lt;absent&gt; tokens force the model to commit to a binary decision on an object\u2019s existence before localization. PBench: Profiling Capabilities Beyond Saturated Baselines To measure progress, TII research team introduced PBench, a benchmark that organizes samples into five levels of semantic complexity to disentangle model failure modes. Main Results: Falcon Perception vs. SAM 3 (Macro-F1) Benchmark Split SAM 3 Falcon Perception (600M) L0: Simple Objects 64.3 65.1 L1: Attributes 54.4 63.6 L2: OCR-Guided 24.6 38.0 L3: Spatial Understanding 31.6 53.5 L4: Relations 33.3 49.1 Dense Split 58.4 72.6 Falcon Perception significantly outperforms SAM 3 on complex semantic tasks, particularly showing a +21.9 point gain on spatial understanding (Level 3). https:\/\/arxiv.org\/pdf\/2603.27365 FalconOCR: The 300M Document specialist TII team also extended this early-fusion recipe to FalconOCR, a compact 300M-parameter model initialized from scratch to prioritize fine-grained glyph recognition. FalconOCR is competitive with several larger proprietary and modular OCR systems: olmOCR: Achieves 80.3% accuracy, matching or exceeding Gemini 3 Pro (80.2%) and GPT 5.2 (69.8%). OmniDocBench: Reaches an overall score of 88.64, ahead of GPT 5.2 (86.56) and Mistral OCR 3 (85.20), though it trails the top modular pipeline PaddleOCR VL 1.5 (94.37). Key Takeaways Unified Early-Fusion Architecture: Falcon Perception replaces modular encoder-decoder pipelines with a single dense Transformer that processes image patches and text tokens in a shared parameter space from the first layer. It utilizes a hybrid attention mask\u2014bidirectional for visual tokens and causal for task tokens\u2014to act simultaneously as a vision encoder and an autoregressive decoder. Chain-of-Perception Sequence: The model serializes instance segmentation into a structured sequence (\u27e8coord\u27e9\u2192\u27e8size\u27e9\u2192\u27e8seg\u27e9)(langle coordrangle rightarrow langle sizerangle rightarrow langle segrangle), which forces it to resolve spatial position and size as a conditioning signal before generating the pixel-level mask. Specialized Heads and GGROPE: To manage dense spatial data, the model uses Fourier Feature encoders for high-dimensional coordinate mapping and Golden Gate ROPE (GGROPE) to enable isotropic 2D spatial attention. The Muon optimizer is employed for these specialized heads to balance learning rates against the pre-trained backbone. Semantic Performance Gains: On the new PBench benchmark, which disentangles semantic capabilities (Levels 0-4), the 600M model demonstrates significant gains over SAM 3 in complex categories, including a +13.4 point lead in OCR-guided queries and a +21.9 point lead in spatial understanding. High-Efficiency OCR Extension: The architecture scales down to Falcon OCR, a 300M-parameter model that achieves 80.3% on olmOCR and 88.64 on OmniDocBench. It matches or exceeds the accuracy of much larger systems like Gemini 3 Pro and GPT 5.2 while maintaining high throughput for large-scale document processing. Check out\u00a0the\u00a0Paper, Model Weight, Repo and Technical details. \u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0120k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":80958,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-80957","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-03T14:51:56+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts\",\"datePublished\":\"2026-04-03T14:51:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\"},\"wordCount\":982,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\",\"url\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\",\"name\":\"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png\",\"datePublished\":\"2026-04-03T14:51:56+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png\",\"width\":1384,\"height\":1066},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/","og_locale":"th_TH","og_type":"article","og_title":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-04-03T14:51:56+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"5 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts","datePublished":"2026-04-03T14:51:56+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/"},"wordCount":982,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/","url":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/","name":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png","datePublished":"2026-04-03T14:51:56+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png","width":1384,"height":1066},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/tii-releases-falcon-perception-a-0-6b-parameter-early-fusion-transformer-for-open-vocabulary-grounding-and-segmentation-from-natural-language-prompts\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"TII Releases Falcon Perception:\u00a0A\u00a00.6B-Parameter\u00a0Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png",1384,1066,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png",1384,1066,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png",1384,1066,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-300x231.png",300,231,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-1024x789.png",1024,789,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png",1384,1066,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl.png",1384,1066,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-16x12.png",16,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-600x462.png",600,462,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-03-at-1.47.18-AM-1-NtwVOl-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In the current landscape of computer vision, the standard operating procedure involves a modular \u2018Lego-brick\u2019 approach: a pre-trained vision encoder for feature extraction paired with a separate decoder for task prediction. While effective, this architectural separation complicates scaling and bottlenecks the interaction between language and vision. The Technology Innovation Institute (TII) research team is challenging&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/80957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=80957"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/80957\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/80958"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=80957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=80957"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=80957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}