YouZum

News

AI, Committee, News, Uncategorized

“Amazing, They All Lean Left” — Analyzing the Political Temperaments of Current LLMs

arXiv:2507.08027v1 Announce Type: new Abstract: Recent studies have revealed a consistent liberal orientation in the ethical and political responses generated by most commercial large language models (LLMs), yet the underlying causes and resulting implications remain unclear. This paper systematically investigates the political temperament of seven prominent LLMs – OpenAI’s GPT-4o, Anthropic’s Claude Sonnet 4, Perplexity (Sonar Large), Google’s Gemini 2.5 Flash, Meta AI’s Llama 4, Mistral 7b Le Chat and High-Flyer’s DeepSeek R1 — using a multi-pronged approach that includes Moral Foundations Theory, a dozen established political ideology scales and a new index of current political controversies. We find strong and consistent prioritization of liberal-leaning values, particularly care and fairness, across most models. Further analysis attributes this trend to four overlapping factors: Liberal-leaning training corpora, reinforcement learning from human feedback (RLHF), the dominance of liberal frameworks in academic ethical discourse and safety-driven fine-tuning practices. We also distinguish between political “bias” and legitimate epistemic differences, cautioning against conflating the two. A comparison of base and fine-tuned model pairs reveals that fine-tuning generally increases liberal lean, an effect confirmed through both self-report and empirical testing. We argue that this “liberal tilt” is not a programming error or the personal preference of programmers but an emergent property of training on democratic rights-focused discourse. Finally, we propose that LLMs may indirectly echo John Rawls’ famous veil-of ignorance philosophical aspiration, reflecting a moral stance unanchored to personal identity or interest. Rather than undermining democratic discourse, this pattern may offer a new lens through which to examine collective reasoning.

“Amazing, They All Lean Left” — Analyzing the Political Temperaments of Current LLMs Read Post »

AI, Committee, News, Uncategorized

Sequence graphs realizations and ambiguity in language models

arXiv:2402.08830v2 Announce Type: replace-cross Abstract: Several popular language models represent local contexts in an input text $x$ as bags of words. Such representations are naturally encoded by a sequence graph whose vertices are the distinct words occurring in $x$, with edges representing the (ordered) co-occurrence of two words within a sliding window of size $w$. However, this compressed representation is not generally bijective: some may be ambiguous, admitting several realizations as a sequence, while others may not admit any realization. In this paper, we study the realizability and ambiguity of sequence graphs from a combinatorial and algorithmic point of view. We consider the existence and enumeration of realizations of a sequence graph under multiple settings: window size $w$, presence/absence of graph orientation, and presence/absence of weights (multiplicities). When $w=2$, we provide polynomial time algorithms for realizability and enumeration in all cases except the undirected/weighted setting, where we show the $#$P-hardness of enumeration. For $w ge 3$, we prove the hardness of all variants, even when $w$ is considered as a constant, with the notable exception of the undirected unweighted case for which we propose XP algorithms for both problems, tight due to a corresponding $W[1]-$hardness result. We conclude with an integer program formulation to solve the realizability problem, and a dynamic programming algorithm to solve the enumeration problem in instances of moderate sizes. This work leaves open the membership to NP of both problems, a non-trivial question due to the existence of minimum realizations having size exponential on the instance encoding.

Sequence graphs realizations and ambiguity in language models Read Post »

AI, Committee, News, Uncategorized

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

arXiv:2503.08893v2 Announce Type: replace Abstract: An ideal model evaluation should achieve two goals: identifying where the model fails and providing actionable improvement guidance. Toward these goals for language model (LM) evaluations, we formulate the problem of generating a weakness profile, a set of weaknesses expressed in natural language, given an LM’s performance on every individual instance in a benchmark. We introduce a suite of quantitative assessments to compare different weakness profiling methods. We also introduce a weakness profiling method EvalTree. EvalTree constructs a capability tree where each node represents a capability described in natural language and is linked to a subset of benchmark instances that specifically evaluate this capability; it then extracts nodes where the LM performs poorly to generate a weakness profile. On the MATH and WildChat benchmarks, we show that EvalTree outperforms baseline weakness profiling methods by identifying weaknesses more precisely and comprehensively. Weakness profiling further enables weakness-guided data collection, and training data collection guided by EvalTree-identified weaknesses improves LM performance more than other data collection strategies. We also show how EvalTree exposes flaws in Chatbot Arena’s human-voter-based evaluation practice. To facilitate future work, we provide an interface that allows practitioners to interactively explore the capability trees built by EvalTree.

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees Read Post »

AI, Committee, News, Uncategorized

From Perception to Action: The Role of World Models in Embodied AI Systems

Introduction to Embodied AI Agents Embodied AI agents are systems that exist in physical or virtual forms, such as robots, wearables, or avatars, and can interact with their surroundings. Unlike static web-based bots, these agents perceive the world and act meaningfully within it. Their embodiment enhances physical interaction, human trust, and human-like learning. Recent advances in large language and vision-language models have powered more capable, autonomous agents that can plan, reason, and adapt to users’ needs. These agents understand context, retain memory, and can collaborate or request clarification when needed. Despite progress, challenges remain, especially with generative models that often prioritize detail over efficient reasoning and decision-making. World Modeling and Applications Researchers at Meta AI are exploring how embodied AI agents, such as avatars, wearables, and robots, can interact more naturally with users and their surroundings by sensing, learning, and acting within real or virtual environments. Central to this is “world modeling,” which combines perception, reasoning, memory, and planning to help agents understand both physical spaces and human intentions. These agents are reshaping industries such as healthcare, entertainment, and labor. The study highlights future goals, such as enhancing collaboration, social intelligence, and ethical safeguards, particularly around privacy and anthropomorphism, as these agents become increasingly integrated into our lives. Types of Embodied Agents Embodied AI agents come in three forms: virtual, wearable, and robotic, and are designed to interact with the world in much the same way as humans. Virtual agents, such as therapy bots or avatars in the metaverse, simulate emotions to foster empathetic interactions. Wearable agents, such as those in smart glasses, share the user’s view and assist with real-time tasks or provide cognitive support. Robotic agents operate in physical spaces, assisting with complex or high-risk tasks such as caregiving or disaster response. These agents not only enhance daily life but also push us closer to general AI by learning through real-world experience, perception, and physical interaction. Importance of World Models World models are crucial for embodied AI agents, enabling them to perceive, understand, and interact with their environment like humans. These models integrate various sensory inputs, such as vision, sound, and touch, with memory and reasoning capabilities to form a cohesive understanding of the world. This enables agents to anticipate outcomes, plan effective actions, and adapt to new situations. By incorporating both physical surroundings and user intentions, world models facilitate more natural and intuitive interactions between humans and AI agents, enhancing their ability to perform complex tasks autonomously. To enable truly autonomous learning in Embodied AI, future research must integrate passive observation (such as vision-language learning) with active interaction (like reinforcement learning). Passive systems excel at understanding structure from data but lack grounding in real-world actions. Active systems learn through doing, but are often inefficient. By combining both, AI can gain abstract knowledge and apply it through goal-driven behavior. Looking ahead, collaboration among multiple agents adds complexity, requiring effective communication, coordination, and conflict resolution. Strategies like emergent communication, negotiation, and multi-agent reinforcement learning will be key. Ultimately, the aim is to build adaptable, interactive AI that learns like humans through experience. Conclusion In conclusion, the study examines how embodied AI agents, such as virtual avatars, wearable devices, and robots, can interact with the world more like humans by perceiving, learning, and acting within their environments. Central to their success is building “world models” that help them understand context, predict outcomes, and plan effectively. These agents are already reshaping areas like therapy, entertainment, and real-time assistance. As they become more integrated into daily life, ethical issues such as privacy and human-like behavior require careful attention. Future work will focus on improving learning, collaboration, and social intelligence, aiming for more natural, intuitive, and responsible human-AI interaction. Check out the Paper here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, and Youtube and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post From Perception to Action: The Role of World Models in Embodied AI Systems appeared first on MarkTechPost.

From Perception to Action: The Role of World Models in Embodied AI Systems Read Post »

AI, Committee, News, Uncategorized

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior

Kimi K2, launched by Moonshot AI in July 2025, is a purpose-built, open-source Mixture-of-Experts (MoE) model—1 trillion total parameters, with 32 billion active parameters per token. It’s trained using the custom MuonClip optimizer on 15.5 trillion tokens, achieving stable training at this unprecedented scale without the typical instabilities seen in ultra-large models. Unlike traditional chatbots, K2 is architected specifically for agentic workflows. It features native Model Context Protocol (MCP) support and was trained on simulated multi-step tool interactions, enabling it to autonomously decompose tasks, execute tool sequences, write and debug code, analyze data, and orchestrate workflows—all with minimal human oversight. Why Agentic over Conversational? While advanced models like GPT-4 and Claude 4 Sonnet excel at language reasoning, Kimi K2 moves from reasoning to action. It doesn’t just respond—it executes. The core shift lies in enabling real-world workflows: Autonomous code execution Data analysis with charts and interfaces End-to-end web application development Orchestration of 17+ tools per session without human input K2’s training incorporated millions of synthetic dialogues, each rated by an LLM-based evaluator. These dialogues simulate realistic tool-use scenarios, giving K2 a practical edge in tool selection and multi-step execution. Architecture and Training Innovations K2’s technical design demonstrates several novel elements: MoE Transformer Design: 384 experts with routing to 8 active experts per token, plus 1 shared expert for global context. The model uses 64 attention heads and supports a 128K-token context window. MuonClip Optimizer: A modified version of Muon that stabilizes training at scale. It uses qk-clipping to constrain attention scores by rescaling Q/K matrices, effectively preventing instability in deep layers. Training Dataset: Over 15.5 trillion tokens from multilingual and multimodal sources, giving K2 robust generalization and tool-use reasoning across diverse domains. The model comes in two variants: Kimi-K2-Base, the foundational model ideal for fine-tuning and building customized solutions; and Kimi-K2-Instruct, the post-trained version optimized for immediate use in general-purpose chat and tool-using agentic tasks. Instruct is reflex-grade—optimized for fast, low-latency interaction rather than long-form deliberation. On benchmarks, Kimi K2 outperforms Claude Sonnet 4 and GPT-4.1 in coding and agentic reasoning, with 71.6% on SWE-bench, 65.8% on agentic tasks, and 53.7% on LiveCodeBench. Performance Benchmarks Kimi K2 not only matches but often surpasses closed-source models on key benchmarks: Benchmark Kimi K2 GPT‑4.1 Claude Sonnet 4 SWE-bench Verified 71.6 % 54.6 % ~72.7 % Agentic Coding (Tau2) 65.8 % 45.2 % ~61 % LiveCodeBench v6 (Pass@1) 53.7 % 44.7 % 47.4 % MATH-500 97.4 % 92.4 % – MMLU 89.5 % ~90.4 % ~92.9 % Its performance in agentic benchmarks like Tau2 and LiveCodeBench demonstrates its superior capacity to handle multi-step, real-world coding tasks—outperforming many proprietary models. Cost Efficiency Perhaps the most disruptive element is pricing: Claude 4 Sonnet: $3 input / $15 output per million tokens Gemini 2.5 Pro: $2.5 input / $15 output Kimi K2: $0.60 input / $2.50 output Kimi K2 is roughly 5x cheaper than Claude or Gemini while offering equal or better performance on several metrics. The cost advantage, combined with open access and support for local deployment, positions K2 as an economically viable alternative for developers, enterprises, and research teams. Strategic Shift: From Thinking to Acting Kimi K2 marks a pivotal moment in AI’s evolution—from thinking agents to acting systems. With native tool-use capabilities and built-in support for multi-agent protocols, it goes far beyond static chat interfaces. It is capable of triggering workflows, making decisions, executing API calls, and delivering tangible outputs autonomously. Moreover, its release comes at a time when most such capabilities are either locked behind expensive APIs or limited to research labs. K2 is: Open-source, requiring no subscription Globally accessible, not limited to US-based deployment Designed for developers, not just end-users Broader Implications Will agentic architecture become the norm? K2’s strong performance on tool use tasks could push proprietary players to rethink their architectures. Can open-source efforts from Asia compete at global scale? With K2, Moonshot AI joins others like DeepSeek in showing that top-tier performance doesn’t have to originate from Silicon Valley. What’s next in the agentic evolution? Future models may combine video, robotics, and embodied reasoning to further expand the scope of what agentic AI can accomplish. Conclusion Kimi K2 isn’t just a bigger model—it’s a blueprint for what comes after the reasoning race: execution-first AI. By combining trillion-parameter scale, low inference costs, and deeply integrated agentic capabilities, Kimi K2 opens the door for AI systems that do more than generate—they build, act, and solve autonomously. Check out the Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, and Youtube and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior appeared first on MarkTechPost.

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior Read Post »

AI, Committee, News, Uncategorized

A Survey on Latent Reasoning

arXiv:2507.06203v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especially when guided by explicit chain-of-thought (CoT) reasoning that verbalizes intermediate steps. While CoT improves both interpretability and accuracy, its dependence on natural language reasoning limits the model’s expressive bandwidth. Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model’s continuous hidden state, eliminating token-level supervision. To advance latent reasoning research, this survey provides a comprehensive overview of the emerging field of latent reasoning. We begin by examining the foundational role of neural network layers as the computational substrate for reasoning, highlighting how hierarchical representations support complex transformations. Next, we explore diverse latent reasoning methodologies, including activation-based recurrence, hidden state propagation, and fine-tuning strategies that compress or internalize explicit reasoning traces. Finally, we discuss advanced paradigms such as infinite-depth latent reasoning via masked diffusion models, which enable globally consistent and reversible reasoning processes. By unifying these perspectives, we aim to clarify the conceptual landscape of latent reasoning and chart future directions for research at the frontier of LLM cognition. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/multimodal-art-projection/LatentCoT-Horizon/.

A Survey on Latent Reasoning Read Post »

en_US