{"id":86446,"date":"2026-04-27T15:36:25","date_gmt":"2026-04-27T15:36:25","guid":{"rendered":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/"},"modified":"2026-04-27T15:36:25","modified_gmt":"2026-04-27T15:36:25","slug":"meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/","title":{"rendered":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo"},"content":{"rendered":"<p>If you\u2019ve ever watched a motion capture system struggle with a person\u2019s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity. Getting a model to understand all of that, at once, across arbitrary real-world images, is genuinely difficult.<\/p>\n<p>Meta AI research team introduced <strong>Sapiens2<\/strong>, the second generation of its foundation model family for human-centric vision. Trained on a newly curated dataset of <strong>1 billion human images<\/strong>, spanning model sizes from 0.4B to 5B parameters, and designed to operate at native <strong>1K resolution<\/strong> with hierarchical variants supporting <strong>4K<\/strong>, Sapiens2 is a substantial leap over its predecessor across every benchmark the team evaluated.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1086\" height=\"326\" data-attachment-id=\"79341\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/27\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/screenshot-2026-04-27-at-1-43-16-am\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM.png\" data-orig-size=\"1086,326\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-27 at 1.43.16\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-1024x307.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM.png\" alt=\"\" class=\"wp-image-79341\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2604.21681<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What Sapiens2 is Trying to Solve<\/strong><\/h3>\n<p>The original Sapiens model relied primarily on <strong>Masked Autoencoder (MAE)<\/strong> pretraining. MAE works by masking a large portion of input image patches, 75% in this case, and training the model to reconstruct the missing pixels. This forces the model to learn spatial details and textures, which is useful for dense prediction tasks like segmentation or depth estimation.<\/p>\n<p>The problem is that MAE, as a form of masked image modeling (MIM), learns largely through compression. It doesn\u2019t naturally learn high-level semantics. It can tell you what something <em>looks<\/em> like, but not necessarily what it <em>means<\/em> in the context of a human body. That\u2019s where contrastive learning (CL) methods like DINO and SimCLR shine: they organize representations semantically by training the model to treat different views of the same image as similar and views of different images as distinct.<\/p>\n<p>But CL has its own tradeoff. Its aggressive augmentation strategies like color jitter, blurring, can strip away appearance cues like skin tone or lighting conditions that are critical for tasks like albedo estimation (recovering the true color of a surface independent of lighting). This is what the research team calls <strong>representation drift<\/strong>.<\/p>\n<p>Sapiens2 addresses this problem directly by combining both objectives: a <strong>masked image reconstruction loss (LMAE)<\/strong> to preserve low-level fidelity, and a <strong>global contrastive loss (LCL)<\/strong> on the [CLS] token using a student-teacher framework based on DINOv3, where the teacher\u2019s parameters are an exponential moving average (EMA) of the student. Crucially, color augmentations are <strong>not applied to global views<\/strong> used for the MAE objective, preserving the appearance cues needed for photorealistic tasks. The joint objective is <strong>L = L<sub>MAE<\/sub> + \u03bbL<sub>CL<\/sub><\/strong>.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1106\" height=\"530\" data-attachment-id=\"79340\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/27\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/screenshot-2026-04-27-at-1-42-54-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.42.54-AM-1.png\" data-orig-size=\"1106,530\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-27 at 1.42.54\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.42.54-AM-1-1024x491.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.42.54-AM-1.png\" alt=\"\" class=\"wp-image-79340\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2604.21681<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Data: Humans-1B<\/strong><\/h3>\n<p>Getting 1 billion training images right required a multi-stage filtering pipeline. Starting from a web-scale pool of approximately <strong>4 billion images<\/strong>, Meta team applied bounding box detection, head-pose estimation, aesthetic and realism scoring, CLIP-based feature filtering, and text-overlay detection. The result is a curated corpus where every image contains at least one prominent person with a minimum short-side resolution of <strong>384 pixels<\/strong>.<\/p>\n<p>To ensure diversity, the research team used perceptual hashing and deep-feature nearest-neighbor pruning for deduplication, then clustered visual embeddings and applied selective sampling to balance the dataset across poses, viewpoints, occlusion levels, clothing types, and lighting conditions. No task labels or human-specific priors were injected during pretraining \u2014 just images.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Architecture: Scaling to 5B and 4K<\/strong><\/h3>\n<p>Sapiens2 introduces four model sizes: <strong>0.4B, 0.8B, 1B, and 5B parameters<\/strong>, each at native 1K resolution. The 5B model is the highest-FLOPs vision transformer reported to date at <strong>15.722 TFLOPs<\/strong>.<\/p>\n<p>For 4K resolution, the research team adopted a <strong>hierarchical windowed attention design<\/strong>. The first K layers apply windowed self-attention locally to capture fine texture and boundaries within spatial windows. A [CLS]-guided pooling step then downsamples the 2D token grid by a spatial stride \u221a\u03c9, and the subsequent L layers apply global self-attention over this reduced sequence. This layout is compatible with MAE-style pretraining because masked tokens can be dropped after the local stage, preventing information from leaking across masked regions \u2014 a problem that convolutional backbones typically need masked convolutions to avoid.<\/p>\n<p>The masking strategy itself is also carefully designed: Sapiens2 uses <strong>mixed blockwise\/patchwise masking<\/strong> (blockwise probability 0.4) at a <strong>75% mask ratio<\/strong> with patch size 16. At 1024\u00d7768 resolution (64\u00d748 = 3072 patches), this masks approximately 2304 patches per image which is enough to create coarse occlusions that regularize MAE while preserving sufficient context for the contrastive objective.<\/p>\n<p>For stability at scale, the architecture incorporates several improvements: <strong>RMSNorm<\/strong> replacing LayerNorm, <strong>Grouped-Query Attention (GQA)<\/strong> in mid-depth blocks for higher throughput, <strong>QK-Norm<\/strong> for robust high-resolution training, and <strong>SwiGLU feed-forward<\/strong> layers. The decoder uses <strong>pixel-shuffle<\/strong> upsampling for sub-pixel reasoning. Decoder output resolution was also increased from 0.5K to <strong>1K for base backbones<\/strong>, and to <strong>2K for 4K backbones<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Post-Training: Five Human Tasks, 10\u00d7 More Supervision<\/strong><\/h3>\n<p>A critical improvement over the original Sapiens is the scale and quality of task-specific supervision. Relative to the first generation, Sapiens2 scales task-specific labels by <strong>10\u00d7<\/strong>, typically reaching around <strong>1 million labels per task<\/strong>. After pretraining, the backbone is fine-tuned for five downstream tasks using lightweight task-specific heads while leaving the backbone unchanged:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Pose Estimation<\/strong>: A 308-keypoint full-body skeleton with dense face (243 keypoints) and hand (40 keypoints) coverage. The research team newly annotated 100K in-the-wild images to complement studio capture data, improving generalization significantly.<\/li>\n<li><strong>Body-Part Segmentation<\/strong>: 29 semantic classes (extended from 28 by adding eyeglasses), trained with per-pixel weighted cross-entropy combined with Dice loss for sharper boundaries.<\/li>\n<li><strong>Pointmap Estimation<\/strong>: Rather than predicting relative depth, Sapiens2 regresses a per-pixel 3D pointmap P\u0302(u) \u2208 \u211d\u00b3 in the camera frame \u2014 a harder task that requires reasoning about camera intrinsics.<\/li>\n<li><strong>Normal Estimation<\/strong>: Per-pixel surface unit normals, decoded using multiple PixelShuffle layers for artifact-free upsampling.<\/li>\n<li><strong>Albedo Estimation<\/strong>: Per-pixel diffuse albedo \u00c2(u) \u2208 [0,1]\u00b3, trained purely on synthetic high-fidelity data and designed to recover true skin tone and clothing color under varying illumination.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Results<\/strong><\/h3>\n<p>The numbers are difficult to argue with. On the 11K-image in-the-wild pose test set, <strong>Sapiens2-5B achieves 82.3 mAP<\/strong> compared to 78.3 mAP for Sapiens-2B \u2014 a <strong>+4 mAP<\/strong> improvement. On body-part segmentation, even the smallest model, <strong>Sapiens2-0.4B, scores 79.5 mIoU<\/strong> (+21.3 over Sapiens-2B*), while <strong>Sapiens2-5B reaches 82.5 mIoU<\/strong> \u2014 a <strong>+24.3 mIoU<\/strong> gain over the previous generation\u2019s largest model. The 4K variant, <strong>Sapiens2-1B-4K<\/strong>, further pushes segmentation to <strong>81.9 mIoU and 92.0 mAcc<\/strong>, demonstrating the benefit of higher-resolution reasoning.<\/p>\n<p>On surface normal estimation, <strong>Sapiens2-0.4B already achieves a mean angular error of 8.63\u00b0<\/strong>, outperforming the previous state-of-the-art DAViD-L at 10.73\u00b0. The 5B model brings this down further to <strong>6.73\u00b0<\/strong>, and the 4K variant reaches <strong>6.98\u00b0<\/strong> with a median angular error of just 3.08\u00b0.<\/p>\n<p>For albedo estimation, <strong>Sapiens2-5B achieves an MAE of 0.012 and a PSNR of 32.61 dB<\/strong>, with consistent improvement across all model sizes. On pointmap estimation, all Sapiens2 model sizes outperform MoGe, which was previously state-of-the-art for monocular geometry estimation.<\/p>\n<p>In dense probing evaluations, where the backbone is frozen and only lightweight decoders are trained with identical hyperparameters, <strong>Sapiens2-5B surpasses all baselines across every task, including DINOv3-7B<\/strong> (6.71B parameters), despite Sapiens2 being a human-specialist model evaluated against a general-purpose backbone nearly 1.5\u00d7 its size.<\/p>\n<hr class=\"wp-block-separator aligncenter has-alpha-channel-opacity is-style-wide\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/huggingface.co\/collections\/facebook\/sapiens2\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights with Demos<\/a><\/strong>, <strong><a href=\"https:\/\/arxiv.org\/pdf\/2604.21681\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/strong> and <strong><a href=\"https:\/\/github.com\/facebookresearch\/sapiens2\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/27\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\">Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>If you\u2019ve ever watched a motion capture system struggle with a person\u2019s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity. Getting a model to understand all of that, at once, across arbitrary real-world images, is genuinely difficult. Meta AI research team introduced Sapiens2, the second generation of its foundation model family for human-centric vision. Trained on a newly curated dataset of 1 billion human images, spanning model sizes from 0.4B to 5B parameters, and designed to operate at native 1K resolution with hierarchical variants supporting 4K, Sapiens2 is a substantial leap over its predecessor across every benchmark the team evaluated. https:\/\/arxiv.org\/pdf\/2604.21681 What Sapiens2 is Trying to Solve The original Sapiens model relied primarily on Masked Autoencoder (MAE) pretraining. MAE works by masking a large portion of input image patches, 75% in this case, and training the model to reconstruct the missing pixels. This forces the model to learn spatial details and textures, which is useful for dense prediction tasks like segmentation or depth estimation. The problem is that MAE, as a form of masked image modeling (MIM), learns largely through compression. It doesn\u2019t naturally learn high-level semantics. It can tell you what something looks like, but not necessarily what it means in the context of a human body. That\u2019s where contrastive learning (CL) methods like DINO and SimCLR shine: they organize representations semantically by training the model to treat different views of the same image as similar and views of different images as distinct. But CL has its own tradeoff. Its aggressive augmentation strategies like color jitter, blurring, can strip away appearance cues like skin tone or lighting conditions that are critical for tasks like albedo estimation (recovering the true color of a surface independent of lighting). This is what the research team calls representation drift. Sapiens2 addresses this problem directly by combining both objectives: a masked image reconstruction loss (LMAE) to preserve low-level fidelity, and a global contrastive loss (LCL) on the [CLS] token using a student-teacher framework based on DINOv3, where the teacher\u2019s parameters are an exponential moving average (EMA) of the student. Crucially, color augmentations are not applied to global views used for the MAE objective, preserving the appearance cues needed for photorealistic tasks. The joint objective is L = LMAE + \u03bbLCL. https:\/\/arxiv.org\/pdf\/2604.21681 The Data: Humans-1B Getting 1 billion training images right required a multi-stage filtering pipeline. Starting from a web-scale pool of approximately 4 billion images, Meta team applied bounding box detection, head-pose estimation, aesthetic and realism scoring, CLIP-based feature filtering, and text-overlay detection. The result is a curated corpus where every image contains at least one prominent person with a minimum short-side resolution of 384 pixels. To ensure diversity, the research team used perceptual hashing and deep-feature nearest-neighbor pruning for deduplication, then clustered visual embeddings and applied selective sampling to balance the dataset across poses, viewpoints, occlusion levels, clothing types, and lighting conditions. No task labels or human-specific priors were injected during pretraining \u2014 just images. The Architecture: Scaling to 5B and 4K Sapiens2 introduces four model sizes: 0.4B, 0.8B, 1B, and 5B parameters, each at native 1K resolution. The 5B model is the highest-FLOPs vision transformer reported to date at 15.722 TFLOPs. For 4K resolution, the research team adopted a hierarchical windowed attention design. The first K layers apply windowed self-attention locally to capture fine texture and boundaries within spatial windows. A [CLS]-guided pooling step then downsamples the 2D token grid by a spatial stride \u221a\u03c9, and the subsequent L layers apply global self-attention over this reduced sequence. This layout is compatible with MAE-style pretraining because masked tokens can be dropped after the local stage, preventing information from leaking across masked regions \u2014 a problem that convolutional backbones typically need masked convolutions to avoid. The masking strategy itself is also carefully designed: Sapiens2 uses mixed blockwise\/patchwise masking (blockwise probability 0.4) at a 75% mask ratio with patch size 16. At 1024\u00d7768 resolution (64\u00d748 = 3072 patches), this masks approximately 2304 patches per image which is enough to create coarse occlusions that regularize MAE while preserving sufficient context for the contrastive objective. For stability at scale, the architecture incorporates several improvements: RMSNorm replacing LayerNorm, Grouped-Query Attention (GQA) in mid-depth blocks for higher throughput, QK-Norm for robust high-resolution training, and SwiGLU feed-forward layers. The decoder uses pixel-shuffle upsampling for sub-pixel reasoning. Decoder output resolution was also increased from 0.5K to 1K for base backbones, and to 2K for 4K backbones. Post-Training: Five Human Tasks, 10\u00d7 More Supervision A critical improvement over the original Sapiens is the scale and quality of task-specific supervision. Relative to the first generation, Sapiens2 scales task-specific labels by 10\u00d7, typically reaching around 1 million labels per task. After pretraining, the backbone is fine-tuned for five downstream tasks using lightweight task-specific heads while leaving the backbone unchanged: Pose Estimation: A 308-keypoint full-body skeleton with dense face (243 keypoints) and hand (40 keypoints) coverage. The research team newly annotated 100K in-the-wild images to complement studio capture data, improving generalization significantly. Body-Part Segmentation: 29 semantic classes (extended from 28 by adding eyeglasses), trained with per-pixel weighted cross-entropy combined with Dice loss for sharper boundaries. Pointmap Estimation: Rather than predicting relative depth, Sapiens2 regresses a per-pixel 3D pointmap P\u0302(u) \u2208 \u211d\u00b3 in the camera frame \u2014 a harder task that requires reasoning about camera intrinsics. Normal Estimation: Per-pixel surface unit normals, decoded using multiple PixelShuffle layers for artifact-free upsampling. Albedo Estimation: Per-pixel diffuse albedo \u00c2(u) \u2208 [0,1]\u00b3, trained purely on synthetic high-fidelity data and designed to recover true skin tone and clothing color under varying illumination. Results The numbers are difficult to argue with. On the 11K-image in-the-wild pose test set, Sapiens2-5B achieves 82.3 mAP compared to 78.3 mAP for Sapiens-2B \u2014 a +4 mAP improvement. On body-part segmentation, even the smallest model, Sapiens2-0.4B, scores 79.5 mIoU (+21.3 over<\/p>","protected":false},"author":2,"featured_media":86447,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-86446","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-27T15:36:25+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo\",\"datePublished\":\"2026-04-27T15:36:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\"},\"wordCount\":1254,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\",\"url\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\",\"name\":\"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png\",\"datePublished\":\"2026-04-27T15:36:25+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png\",\"width\":1086,\"height\":326},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/","og_locale":"zh_CN","og_type":"article","og_title":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-04-27T15:36:25+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"6 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo","datePublished":"2026-04-27T15:36:25+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/"},"wordCount":1254,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/","url":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/","name":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png","datePublished":"2026-04-27T15:36:25+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png","width":1086,"height":326},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/meta-ai-releases-sapiens2-a-high-resolution-human-centric-vision-model-for-pose-segmentation-normals-pointmap-and-albedo\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png",1086,326,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png",1086,326,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png",1086,326,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-300x90.png",300,90,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-1024x307.png",1024,307,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png",1086,326,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4.png",1086,326,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-18x5.png",18,5,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-600x180.png",600,180,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-27-at-1.43.16-AM-haGlM4-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"If you\u2019ve ever watched a motion capture system struggle with a person\u2019s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity.&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/86446","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=86446"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/86446\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media\/86447"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=86446"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=86446"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=86446"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}