{"id":17538,"date":"2025-06-09T04:30:54","date_gmt":"2025-06-09T04:30:54","guid":{"rendered":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/"},"modified":"2025-06-09T04:30:54","modified_gmt":"2025-06-09T04:30:54","slug":"can-vision-language-models-infer-human-gaze-direction-a-controlled-study","status":"publish","type":"post","link":"https:\/\/youzum.net\/ja\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/","title":{"rendered":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study"},"content":{"rendered":"<p>arXiv:2506.05412v1 Announce Type: cross<br \/>\nAbstract: Gaze-referential inference&#8211;the ability to infer what others are looking at&#8211;is a critical component of a theory of mind that underpins natural human-AI interaction. In a controlled study, we evaluated this skill across 111 Vision Language Models (VLMs) using photos taken with manipulated difficulty and variability, comparing performance with that of human participants (N = 65), and analyzed behaviors using mixed-effects models. We found that 94 of the 111 VLMs failed to do better than random guessing, while humans achieved near-ceiling accuracy. VLMs even respond with each choice almost equally frequently. Are they randomly guessing? Although most VLMs struggle, when we zoom in on five of the top-tier VLMs with above-chance performance, we find that their performance declined with increasing task difficulty but varied only slightly across different prompts and scene objects. These behavioral features cannot be explained by considering them as random guessers. Instead, they likely use a combination of heuristics and guessing such that their performance is subject to the task difficulty but robust to perceptual variations. This suggests that VLMs, lacking gaze inference capability, have yet to become technologies that can naturally interact with humans, but the potential remains.<\/p>","protected":false},"excerpt":{"rendered":"<p>arXiv:2506.05412v1 Announce Type: cross Abstract: Gaze-referential inference&#8211;the ability to infer what others are looking at&#8211;is a critical component of a theory of mind that underpins natural human-AI interaction. In a controlled study, we evaluated this skill across 111 Vision Language Models (VLMs) using photos taken with manipulated difficulty and variability, comparing performance with that of human participants (N = 65), and analyzed behaviors using mixed-effects models. We found that 94 of the 111 VLMs failed to do better than random guessing, while humans achieved near-ceiling accuracy. VLMs even respond with each choice almost equally frequently. Are they randomly guessing? Although most VLMs struggle, when we zoom in on five of the top-tier VLMs with above-chance performance, we find that their performance declined with increasing task difficulty but varied only slightly across different prompts and scene objects. These behavioral features cannot be explained by considering them as random guessers. Instead, they likely use a combination of heuristics and guessing such that their performance is subject to the task difficulty but robust to perceptual variations. This suggests that VLMs, lacking gaze inference capability, have yet to become technologies that can naturally interact with humans, but the potential remains.<\/p>","protected":false},"author":2,"featured_media":17539,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-17538","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/ja\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/ja\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-06-09T04:30:54+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"1\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study\",\"datePublished\":\"2025-06-09T04:30:54+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\"},\"wordCount\":203,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\",\"url\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\",\"name\":\"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg\",\"datePublished\":\"2025-06-09T04:30:54+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg\",\"width\":77,\"height\":77},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/ja\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/ja\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/","og_locale":"ja_JP","og_type":"article","og_title":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/ja\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-06-09T04:30:54+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"admin NU","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"1\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study","datePublished":"2025-06-09T04:30:54+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/"},"wordCount":203,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"ja","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/","url":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/","name":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg","datePublished":"2025-06-09T04:30:54+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg","width":77,"height":77},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/can-vision-language-models-infer-human-gaze-direction-a-controlled-study\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Can Vision Language Models Infer Human Gaze Direction? A Controlled Study"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/ja\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",12,12,false],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/feedzy-E1yyre.svg",77,77,false]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/ja\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/ja\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"arXiv:2506.05412v1 Announce Type: cross Abstract: Gaze-referential inference&#8211;the ability to infer what others are looking at&#8211;is a critical component of a theory of mind that underpins natural human-AI interaction. In a controlled study, we evaluated this skill across 111 Vision Language Models (VLMs) using photos taken with manipulated difficulty and variability, comparing performance with that of&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/17538","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/comments?post=17538"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/17538\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media\/17539"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media?parent=17538"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/categories?post=17538"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/tags?post=17538"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}