YouZum

Committee

AI, Committee, Nachrichten, Uncategorized

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation

arXiv:2506.21805v1 Announce Type: cross Abstract: Modeling human behavior in urban environments is fundamental for social science, behavioral studies, and urban planning. Prior work often rely on rigid, hand-crafted rules, limiting their ability to simulate nuanced intentions, plans, and adaptive behaviors. Addressing these challenges, we envision an urban simulator (CitySim), capitalizing on breakthroughs in human-level intelligence exhibited by large language models. In CitySim, agents generate realistic daily schedules using a recursive value-driven approach that balances mandatory activities, personal habits, and situational factors. To enable long-term, lifelike simulations, we endow agents with beliefs, long-term goals, and spatial memory for navigation. CitySim exhibits closer alignment with real humans than prior work, both at micro and macro levels. Additionally, we conduct insightful experiments by modeling tens of thousands of agents and evaluating their collective behaviors under various real-world scenarios, including estimating crowd density, predicting place popularity, and assessing well-being. Our results highlight CitySim as a scalable, flexible testbed for understanding and forecasting urban phenomena.

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents

arXiv:2506.21605v1 Announce Type: new Abstract: Recent works have highlighted the significance of memory mechanisms in LLM-based agents, which enable them to store observed information and adapt to dynamic environments. However, evaluating their memory capabilities still remains challenges. Previous evaluations are commonly limited by the diversity of memory levels and interactive scenarios. They also lack comprehensive metrics to reflect the memory capabilities from multiple aspects. To address these problems, in this paper, we construct a more comprehensive dataset and benchmark to evaluate the memory capability of LLM-based agents. Our dataset incorporates factual memory and reflective memory as different levels, and proposes participation and observation as various interactive scenarios. Based on our dataset, we present a benchmark, named MemBench, to evaluate the memory capability of LLM-based agents from multiple aspects, including their effectiveness, efficiency, and capacity. To benefit the research community, we release our dataset and project at https://github.com/import-myself/Membench.

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

arXiv:2506.21582v1 Announce Type: new Abstract: Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE’s effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience — from none to expert — demonstrates the system’s usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Hope Speech Detection in code-mixed Roman Urdu tweets: A Positive Turn in Natural Language Processing

arXiv:2506.21583v1 Announce Type: new Abstract: Hope is a positive emotional state involving the expectation of favorable future outcomes, while hope speech refers to communication that promotes optimism, resilience, and support, particularly in adverse contexts. Although hope speech detection has gained attention in Natural Language Processing (NLP), existing research mainly focuses on high-resource languages and standardized scripts, often overlooking informal and underrepresented forms such as Roman Urdu. To the best of our knowledge, this is the first study to address hope speech detection in code-mixed Roman Urdu by introducing a carefully annotated dataset, thereby filling a critical gap in inclusive NLP research for low-resource, informal language varieties. This study makes four key contributions: (1) it introduces the first multi-class annotated dataset for Roman Urdu hope speech, comprising Generalized Hope, Realistic Hope, Unrealistic Hope, and Not Hope categories; (2) it explores the psychological foundations of hope and analyzes its linguistic patterns in code-mixed Roman Urdu to inform dataset development; (3) it proposes a custom attention-based transformer model optimized for the syntactic and semantic variability of Roman Urdu, evaluated using 5-fold cross-validation; and (4) it verifies the statistical significance of performance gains using a t-test. The proposed model, XLM-R, achieves the best performance with a cross-validation score of 0.78, outperforming the baseline SVM (0.75) and BiLSTM (0.76), with gains of 4% and 2.63% respectively.

Hope Speech Detection in code-mixed Roman Urdu tweets: A Positive Turn in Natural Language Processing Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Alibaba Qwen Team Releases Qwen-VLo: A Unified Multimodal Understanding and Generation Model

The Alibaba Qwen team has introduced Qwen-VLo, a new addition to its Qwen model family, designed to unify multimodal understanding and generation within a single framework. Positioned as a powerful creative engine, Qwen-VLo enables users to generate, edit, and refine high-quality visual content from text, sketches, and commands—in multiple languages and through step-by-step scene construction. This model marks a significant leap in multimodal AI, making it highly applicable for designers, marketers, content creators, and educators. Unified Vision-Language Modeling Qwen-VLo builds on Qwen-VL, Alibaba’s earlier vision-language model, by extending it with image generation capabilities. The model integrates visual and textual modalities in both directions—it can interpret images and generate relevant textual descriptions or respond to visual prompts, while also producing visuals based on textual or sketch-based instructions. This bidirectional flow enables seamless interaction between modalities, optimizing creative workflows. Key Features of Qwen-VLo Concept-to-Polish Visual Generation: Qwen-VLo supports generating high-resolution images from rough inputs, such as text prompts or simple sketches. The model understands abstract concepts and converts them into polished, aesthetically refined visuals. This capability is ideal for early-stage ideation in design and branding. On-the-Fly Visual Editing: With natural language commands, users can iteratively refine images, adjusting object placements, lighting, color themes, and composition. Qwen-VLo simplifies tasks like retouching product photography or customizing digital advertisements, eliminating the need for manual editing tools. Multilingual Multimodal Understanding: Qwen-VLo is trained with support for multiple languages, allowing users from diverse linguistic backgrounds to engage with the model. This makes it suitable for global deployment in industries such as e-commerce, publishing, and education. Progressive Scene Construction: Rather than rendering complex scenes in one pass, Qwen-VLo enables progressive generation. Users can guide the model step-by-step—adding elements, refining interactions, and adjusting layouts incrementally. This mirrors natural human creativity and improves user control over output. Architecture and Training Enhancements While details of the model architecture are not deeply specified in the public blog, Qwen-VLo likely inherits and extends the Transformer-based architecture from the Qwen-VL line. The enhancements focus on fusion strategies for cross-modal attention, adaptive fine-tuning pipelines, and integration of structured representations for better spatial and semantic grounding. The training data includes multilingual image-text pairs, sketches with image ground truths, and real-world product photography. This diverse corpus allows Qwen-VLo to generalize well across tasks like composition generation, layout refinement, and image captioning. Target Use Cases Design & Marketing: Qwen-VLo’s ability to convert text concepts into polished visuals makes it ideal for ad creatives, storyboards, product mockups, and promotional content. Education: Educators can visualize abstract concepts (e.g., science, history, art) interactively. Language support enhances accessibility in multilingual classrooms. E-commerce & Retail: Online sellers can use the model to generate product visuals, retouch shots, or localize designs per region. Social Media & Content Creation: For influencers or content producers, Qwen-VLo offers fast, high-quality image generation without relying on traditional design software. Key Benefits Qwen-VLo stands out in the current LMM (Large Multimodal Model) landscape by offering: Seamless text-to-image and image-to-text transitions Localized content generation in multiple languages High-resolution outputs suitable for commercial use Editable and interactive generation pipeline Its design supports iterative feedback loops and precision edits, which are critical for professional-grade content generation workflows. Conclusion Alibaba’s Qwen-VLo pushes forward the frontier of multimodal AI by merging understanding and generation capabilities into a cohesive, interactive model. Its flexibility, multilingual support, and progressive generation features make it a valuable tool for a wide array of content-driven industries. As the demand for visual and language content convergence grows, Qwen-VLo positions itself as a scalable, creative assistant ready for global adoption. Check out the Technical details and Try it here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Alibaba Qwen Team Releases Qwen-VLo: A Unified Multimodal Understanding and Generation Model appeared first on MarkTechPost.

Alibaba Qwen Team Releases Qwen-VLo: A Unified Multimodal Understanding and Generation Model Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context

Tencent’s Hunyuan team has introduced Hunyuan-A13B, a new open-source large language model built on a sparse Mixture-of-Experts (MoE) architecture. While the model consists of 80 billion total parameters, only 13 billion are active during inference, offering a highly efficient balance between performance and computational cost. It supports Grouped Query Attention (GQA), 256K context length, and a dual-mode reasoning framework that toggles between fast and slow thinking. Designed for efficient deployment and robust reasoning, Hunyuan-A13B achieves top-tier performance across agentic benchmarks including BFCL-v3, τ-Bench, C3-Bench, and ComplexFuncBench, often outperforming larger models in tool-calling and long-context scenarios. Architecture: Sparse MoE with 13B Active Parameters At its core, Hunyuan-A13B follows a fine-grained MoE design comprising 1 shared expert and 64 non-shared experts, with 8 experts activated per forward pass. This architecture, backed by scaling experiments, ensures performance consistency while keeping inference costs low. The model includes 32 layers, uses SwiGLU activations, a vocabulary size of 128K, and integrates GQA for enhanced memory efficiency during long-context inference. The model’s MoE setup is paired with an optimized training curriculum: a 20T-token pretraining phase, followed by fast annealing and long-context adaptation. This last phase scales the context window first to 32K and then to 256K tokens using NTK-aware positional encoding, ensuring stable performance at large sequence lengths. Dual-Mode Reasoning: Fast and Slow Thinking A standout feature of Hunyuan-A13B is its dual-mode Chain-of-Thought (CoT) capability. It supports both a low-latency fast-thinking mode for routine queries and a more elaborate slow-thinking mode for multi-step reasoning. These modes are controlled through a simple tag system: /no think for fast inference and /think for reflective reasoning. This flexibility allows users to adapt computational cost to task complexity. Post-Training: Reinforcement Learning with Task-Specific Reward Models The post-training pipeline of Hunyuan-A13B includes multi-stage supervised fine-tuning (SFT) and reinforcement learning (RL) across both reasoning-specific and general tasks. The RL stages incorporate outcome-based rewards and tool-specific feedback, including sandbox execution environments for code and rule-based checks for agents. In the agent training phase, the team synthesized diverse tool-use scenarios with planner, checker, and tool roles, generating over 20,000 format combinations. This reinforced Hunyuan-A13B’s ability to execute real-world workflows such as spreadsheet processing, information search, and structured reasoning. Evaluation: State-of-the-Art Agentic Performance Hunyuan-A13B shows strong benchmark results across diverse NLP tasks: On MATH, CMATH, and GPQA, it scores on par or above larger dense and MoE models. It surpasses Qwen3-A22B and DeepSeek R1 in logical reasoning (BBH: 89.1; ZebraLogic: 84.7). In coding, it holds its own with 83.9 on MBPP and 69.3 on MultiPL-E. For agent tasks, it leads on BFCL-v3 (78.3) and ComplexFuncBench (61.2), validating its tool-usage capabilities. Long-context comprehension is another highlight. On PenguinScrolls, it scores 87.7—just shy of Gemini 2.5 Pro. On RULER, it sustains high performance (73.9) even at 64K–128K context, outperforming larger models like Qwen3-A22B and DeepSeek R1 in context resilience. Inference Optimization and Deployment Hunyuan-A13B is fully integrated with popular inference frameworks like vLLM, SGLang, and TensorRT-LLM. It supports precision formats such as W16A16, W8A8, and KV Cache FP8, along with features like Auto Prefix Caching and Chunk Prefill. It achieves up to 1981.99 tokens/sec throughput on a 32-batch input (2048 input, 14336 output length), making it practical for real-time applications. Open Source and Industry Relevance Available on Hugging Face and GitHub, Hunyuan-A13B is released with permissive open-source licensing. It’s engineered for efficient research and production use, especially in latency-sensitive environments and long-context tasks. By combining MoE scalability, agentic reasoning, and open-source accessibility, Tencent’s Hunyuan-A13B offers a compelling alternative to heavyweight LLMs, enabling broader experimentation and deployment without sacrificing capability. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context appeared first on MarkTechPost.

Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

CTGT wins Best Presentation Style award at VB Transform 2025

San Francisco-based CTGT, a startup focused on making AI more trustworthy through feature-level model customization, won the Best Presentation Style award at VB Transform 2025 in San Francisco. Founded by 23-year-old Cyril Gorlla, the company showcased how its technology helps enterprises overcome AI trust barriers by directly modifying model featu…Read More

CTGT wins Best Presentation Style award at VB Transform 2025 Beitrag lesen »

de_DE