YouZum

Uncategorized

AI, Committee, News, Uncategorized

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

arXiv:2512.05112v1 Announce Type: cross Abstract: Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. To this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning paradigm that fully leverages both textual and visual contents in CoT for better planning and verification. Our method first generates a low-resolution draft image as preview, providing more concrete and structural visual planning and guidance. Then, we employ the model’s inherent understanding capability to verify potential semantic misalignments between the draft and input prompt, and performs refinement through selective corrections with super-resolution. In this way, our approach addresses two fundamental challenges: the coarse-grained nature of textual planning and the difficulty in generating rare attribute combinations. To support training, we curate DraCo-240K, aiming to enhance three atomic capabilities spanning general correction, instance manipulation, and layout reorganization. Supported by DraCo-CFG, a specialized classifier-free guidance (CFG) strategy for interleaved reasoning, DraCo achieves a tremendous increase on GenEval (+8%), Imagine-Bench (+0.91), and GenEval++ (+3%), significantly outperforming direct generation and other generation methods empowered by CoT.

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation Read Post »

AI, Committee, News, Uncategorized

A Group Fairness Lens for Large Language Models

arXiv:2312.15478v2 Announce Type: replace Abstract: The need to assess LLMs for bias and fairness is critical, with current evaluations often being narrow, missing a broad categorical view. In this paper, we propose evaluating the bias and fairness of LLMs from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFAIR, encapsulating target-attribute combinations across multiple dimensions. Moreover, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLMs from a group fairness perspective, we pioneer a novel chainof-thought method GF-THINK to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias and achieving fairness in LLMs. Our dataset and codes are available at https://github.com/surika/Group-Fairness-LLMs.

A Group Fairness Lens for Large Language Models Read Post »

AI, Committee, News, Uncategorized

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

arXiv:2512.03499v1 Announce Type: cross Abstract: The Segment Anything Model (SAM) has emerged as a powerful visual foundation model for image segmentation. However, adapting SAM to specific downstream tasks, such as medical and agricultural imaging, remains a significant challenge. To address this, Low-Rank Adaptation (LoRA) and its variants have been widely employed to enhancing SAM’s adaptation performance on diverse domains. Despite advancements, a critical question arises: can we integrate inductive bias into the model? This is particularly relevant since the Transformer encoder in SAM inherently lacks spatial priors within image patches, potentially hindering the acquisition of high-level semantic information. In this paper, we propose NAS-LoRA, a new Parameter-Efficient Fine-Tuning (PEFT) method designed to bridge the semantic gap between pre-trained SAM and specialized domains. Specifically, NAS-LoRA incorporates a lightweight Neural Architecture Search (NAS) block between the encoder and decoder components of LoRA to dynamically optimize the prior knowledge integrated into weight updates. Furthermore, we propose a stage-wise optimization strategy to help the ViT encoder balance weight updates and architectural adjustments, facilitating the gradual learning of high-level semantic information. Various Experiments demonstrate our NAS-LoRA improves existing PEFT methods, while reducing training cost by 24.14% without increasing inference cost, highlighting the potential of NAS in enhancing PEFT for visual foundation models.

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation Read Post »

AI, Committee, News, Uncategorized

Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning

arXiv:2502.09854v2 Announce Type: replace Abstract: In this work, we investigate how small language models (SLMs) can be scaled to support multimodal search and recommendation use cases while remaining efficient enough for real-time, resource-constrained deployments. We present a framework that combines upside-down reinforcement learning with synthetic data distillation from a large language model (Llama-3) to train a 100M-parameter GPT-2 model for multitask prompt generation. Despite being up to 80 times smaller than state-of-the-art large language models (LLMs), our SLM achieves relevance and diversity scores within 6% of competitive baselines such as Llama-3 8B, Qwen3 8B, and Ministral 8B. These results demonstrate that SLMs can effectively handle multimodal search and recommendation tasks, while dramatically reducing inference latency and memory overhead. Our study highlights the potential of lightweight models as practical engines for scalable multimodal discovery, bridging the gap between cutting-edge research and real-world multimodal applications such as media recommendations and creative content generation.

Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning Read Post »

AI, Committee, News, Uncategorized

Stable Signer: Hierarchical Sign Language Generative Model

arXiv:2512.04048v1 Announce Type: cross Abstract: Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose, Pose2Vid stages, and some concentrated on Prompt2Gloss and Text2Avatar stages. However, this field has made slow progress due to the inaccuracy of text conversion, pose generation, and the rendering of poses into real human videos in these stages, resulting in gradually accumulating errors. Therefore, in this paper, we streamline the traditional redundant structure, simplify and optimize the task objective, and design a new sign language generative model called Stable Signer. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid, and executes text understanding through our proposed new Sign Language Understanding Linker called SLUL, and generates hand gestures through the named SLP-MoE hand gesture rendering expert block to end-to-end generate high-quality and multi-style sign language videos. SLUL is trained using the newly developed Semantic-Aware Gloss Masking Loss (SAGM Loss). Its performance has improved by 48.6% compared to the current SOTA generation methods.

Stable Signer: Hierarchical Sign Language Generative Model Read Post »

AI, Committee, News, Uncategorized

How to Build a Meta-Cognitive AI Agent That Dynamically Adjusts Its Own Reasoning Depth for Efficient Problem Solving

In this tutorial, we build an advanced meta-cognitive control agent that learns how to regulate its own depth of thinking. We treat reasoning as a spectrum, ranging from fast heuristics to deep chain-of-thought to precise tool-like solving, and we train a neural meta-controller to decide which mode to use for each task. By optimizing the trade-off between accuracy, computation cost, and a limited reasoning budget, we explore how an agent can monitor its internal state and adapt its reasoning strategy in real time. Through each snippet, we experiment, observe patterns, and understand how meta-cognition emerges when an agent learns to think about its own thinking. Check out the FULL CODE NOTEBOOK. Copy CodeCopiedUse a different Browser import random import numpy as np import torch import torch.nn as nn import torch.optim as optim OPS = [‘+’, ‘*’] def make_task(): op = random.choice(OPS) if op == ‘+’: a, b = random.randint(1, 99), random.randint(1, 99) else: a, b = random.randint(2, 19), random.randint(2, 19) return a, b, op def true_answer(a, b, op): return a + b if op == ‘+’ else a * b def true_difficulty(a, b, op): if op == ‘+’ and a <= 30 and b <= 30: return 0 if op == ‘*’ and a <= 10 and b <= 10: return 1 return 2 def heuristic_difficulty(a, b, op): score = 0 if op == ‘*’: score += 0.6 score += max(a, b) / 100.0 return min(score, 1.0) def fast_heuristic(a, b, op): if op == ‘+’: base = a + b noise = random.choice([-2, -1, 0, 0, 0, 1, 2, 3]) else: base = int(0.8 * a * b) noise = random.choice([-5, -3, 0, 0, 2, 5, 8]) return base + noise, 0.5 def deep_chain_of_thought(a, b, op, verbose=False): if op == ‘+’: x, y = a, b carry = 0 pos = 1 result = 0 step = 0 while x > 0 or y > 0 or carry: dx, dy = x % 10, y % 10 s = dx + dy + carry carry, digit = divmod(s, 10) result += digit * pos x //= 10; y //= 10; pos *= 10 step += 1 else: result = 0 step = 0 for i, d in enumerate(reversed(str(b))): row = a * int(d) * (10 ** i) result += row step += 1 return result, max(2.0, 0.4 * step) def tool_solver(a, b, op): return eval(f”{a}{op}{b}”), 1.2 ACTION_NAMES = [“fast”, “deep”, “tool”] We set up the world our meta-agent operates in. We generate arithmetic tasks, define ground-truth answers, estimate difficulty, and implement three different reasoning modes. As we run it, we observe how each solver behaves differently in terms of accuracy and computational cost, which form the foundation of the agent’s decision space. Check out the FULL CODE NOTEBOOK. Copy CodeCopiedUse a different Browser def encode_state(a, b, op, rem_budget, error_ema, last_action): a_n = a / 100.0 b_n = b / 100.0 op_plus = 1.0 if op == ‘+’ else 0.0 op_mul = 1.0 – op_plus diff_hat = heuristic_difficulty(a, b, op) rem_n = rem_budget / MAX_BUDGET last_onehot = [0.0, 0.0, 0.0] if last_action is not None: last_onehot[last_action] = 1.0 feats = [ a_n, b_n, op_plus, op_mul, diff_hat, rem_n, error_ema ] + last_onehot return torch.tensor(feats, dtype=torch.float32, device=device) STATE_DIM = 10 N_ACTIONS = 3 class PolicyNet(nn.Module): def __init__(self, state_dim, hidden=48, n_actions=3): super().__init__() self.net = nn.Sequential( nn.Linear(state_dim, hidden), nn.Tanh(), nn.Linear(hidden, hidden), nn.Tanh(), nn.Linear(hidden, n_actions) ) def forward(self, x): return self.net(x) policy = PolicyNet(STATE_DIM, hidden=48, n_actions=N_ACTIONS).to(device) optimizer = optim.Adam(policy.parameters(), lr=3e-3) We encode each task into a structured state that captures operands, operation type, predicted difficulty, remaining budget, and recent performance. We then define a neural policy network that maps this state to a probability distribution over actions. As we work through it, we see how the policy becomes the core mechanism through which the agent learns to regulate its thinking. Check out the FULL CODE NOTEBOOK. Copy CodeCopiedUse a different Browser GAMMA = 0.98 COST_PENALTY = 0.25 MAX_BUDGET = 25.0 EPISODES = 600 STEPS_PER_EP = 20 ERROR_EMA_DECAY = 0.9 def run_episode(train=True): log_probs = [] rewards = [] info = [] rem_budget = MAX_BUDGET error_ema = 0.0 last_action = None for _ in range(STEPS_PER_EP): a, b, op = make_task() state = encode_state(a, b, op, rem_budget, error_ema, last_action) logits = policy(state) dist = torch.distributions.Categorical(logits=logits) action = dist.sample() if train else torch.argmax(logits) act_idx = int(action.item()) if act_idx == 0: pred, cost = fast_heuristic(a, b, op) elif act_idx == 1: pred, cost = deep_chain_of_thought(a, b, op, verbose=False) else: pred, cost = tool_solver(a, b, op) correct = (pred == true_answer(a, b, op)) acc_reward = 1.0 if correct else 0.0 budget_penalty = 0.0 rem_budget -= cost if rem_budget < 0: budget_penalty = -1.5 * (abs(rem_budget) / MAX_BUDGET) step_reward = acc_reward – COST_PENALTY * cost + budget_penalty rewards.append(step_reward) if train: log_probs.append(dist.log_prob(action)) err = 0.0 if correct else 1.0 error_ema = ERROR_EMA_DECAY * error_ema + (1 – ERROR_EMA_DECAY) * err last_action = act_idx info.append({ “correct”: correct, “cost”: cost, “difficulty”: true_difficulty(a, b, op), “action”: act_idx }) if train: returns = [] G = 0.0 for r in reversed(rewards): G = r + GAMMA * G returns.append(G) returns = list(reversed(returns)) returns_t = torch.tensor(returns, dtype=torch.float32, device=device) baseline = returns_t.mean() adv = returns_t – baseline loss = -(torch.stack(log_probs) * adv).mean() optimizer.zero_grad() loss.backward() optimizer.step() return rewards, info We implement the heart of learning using the REINFORCE policy gradient algorithm. We run multi-step episodes, collect log-probabilities, accumulate rewards, and compute returns. As we execute this part, we watch the meta-controller adjust its strategy by reinforcing decisions that balance accuracy with cost. Check out the FULL CODE NOTEBOOK. Copy CodeCopiedUse a different Browser print(“Training meta-cognitive controller…”) for ep in range(EPISODES): rewards, _ = run_episode(train=True) if (ep + 1) % 100 == 0: print(f” episode {ep+1:4d} | avg reward {np.mean(rewards):.3f}”) def evaluate(n_episodes=50): all_actions = {0: [0,0,0], 1: [0,0,0], 2: [0,0,0]} stats = {0: {“n”:0,”acc”:0,”cost”:0}, 1: {“n”:0,”acc”:0,”cost”:0}, 2: {“n”:0,”acc”:0,”cost”:0}} for _ in range(n_episodes): _, info = run_episode(train=False) for step in info: d = step[“difficulty”] a_idx = step[“action”] all_actions[d][a_idx] += 1 stats[d][“n”] += 1 stats[d][“acc”] += 1 if step[“correct”] else 0 stats[d][“cost”] += step[“cost”] for d in

How to Build a Meta-Cognitive AI Agent That Dynamically Adjusts Its Own Reasoning Depth for Efficient Problem Solving Read Post »

AI, Committee, News, Uncategorized

Multilingual Pretraining for Pixel Language Models

arXiv:2505.21265v2 Announce Type: replace Abstract: Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual pretraining substantially enhances the capability of pixel language models to effectively support a diverse set of languages.

Multilingual Pretraining for Pixel Language Models Read Post »

AI, Committee, News, Uncategorized

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

arXiv:2512.02807v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with human preferences typically relies on external supervision, which faces critical limitations: human annotations are scarce and subjective, reward models are vulnerable to reward hacking, and self-evaluation methods suffer from prompt sensitivity and biases. In this work, we propose stable rank, an intrinsic, annotation-free quality signal derived from model representations. Stable rank measures the effective dimensionality of hidden states by computing the ratio of total variance to dominant-direction variance, capturing quality through how information distributes across representation dimensions. Empirically, stable rank achieves 84.04% accuracy on RewardBench and improves task accuracy by an average of 11.3 percentage points over greedy decoding via Best-of-N sampling. Leveraging this insight, we introduce Stable Rank Group Relative Policy Optimization (SR-GRPO), which uses stable rank as a reward signal for reinforcement learning. Without external supervision, SR-GRPO improves Qwen2.5-1.5B-Instruct by 10% on STEM and 19% on mathematical reasoning, outperforming both learned reward models and self-evaluation baselines. Our findings demonstrate that quality signals can be extracted from internal model geometry, offering a path toward scalable alignment without external supervision.

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment Read Post »

AI, Committee, News, Uncategorized

Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles

arXiv:2505.07409v2 Announce Type: replace Abstract: Democratic societies need reliable information. Misinformation in popular media, such as news articles or videos, threatens to impair civic discourse. Citizens are, unfortunately, not equipped to verify the flood of content consumed daily at increasing rates. This work aims to quantify the scientific accuracy of online media semi-automatically. We investigate the state of the art of climate-related ground truth knowledge representation. By semantifying media content of unknown veracity, their statements can be compared against these ground truth knowledge graphs. We implemented a workflow using LLM-based statement extraction and knowledge graph analysis. Our implementation can streamline content processing towards state-of-the-art knowledge representation and veracity quantification. Developed and evaluated with the help of 27 experts and detailed interviews with 10, the tool evidently provides a beneficial veracity indication. These findings are supported by 43 anonymous participants from a parallel user survey. This initial step, however, is unable to annotate public media at the required granularity and scale. Additionally, the identified state of climate change knowledge graphs is vastly insufficient to support this neurosymbolic fact-checking approach. Further work towards a FAIR (Findable, Accessible, Interoperable, Reusable) ground truth and complementary metrics is required to support civic discourse scientifically.

Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles Read Post »

AI, Committee, News, Uncategorized

ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce

arXiv:2512.02555v1 Announce Type: new Abstract: Relevance modeling in e-commerce search remains challenged by semantic gaps in term-matching methods (e.g., BM25) and neural models’ reliance on the scarcity of domain-specific hard samples. We propose ADORE, a self-sustaining framework that synergizes three innovations: (1) A Rule-aware Relevance Discrimination module, where a Chain-of-Thought LLM generates intent-aligned training data, refined via Kahneman-Tversky Optimization (KTO) to align with user behavior; (2) An Error-type-aware Data Synthesis module that auto-generates adversarial examples to harden robustness; and (3) A Key-attribute-enhanced Knowledge Distillation module that injects domain-specific attribute hierarchies into a deployable student model. ADORE automates annotation, adversarial generation, and distillation, overcoming data scarcity while enhancing reasoning. Large-scale experiments and online A/B testing verify the effectiveness of ADORE. The framework establishes a new paradigm for resource-efficient, cognitively aligned relevance modeling in industrial applications.

ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US