What your tools miss at 2:13 AM: How gen AI attack chains exploit telemetry lag – Part 1
Explore a strategic 2025 roadmap for cybersecurity leaders to tackle gen AI, insider risks, and team burnout with actionable guidance.Read More
Explore a strategic 2025 roadmap for cybersecurity leaders to tackle gen AI, insider risks, and team burnout with actionable guidance.Read More
AI models today are expected to handle complex tasks such as solving mathematical problems, interpreting logical statements, and assisting with enterprise decision-making. Building such models demands the integration of mathematical reasoning, scientific understanding, and advanced pattern recognition. As the demand for intelligent agents in real-time applications, like coding assistants and business automation tools, continues to grow, there is a pressing need for models that combine strong performance with efficient memory and token usage, making them viable for deployment in practical hardware environments. A central challenge in AI development is the resource intensity of large-scale reasoning models. Despite their strong capabilities, these models often require significant memory and computational resources, limiting their real-world applicability. This creates a gap between what advanced models can achieve and what users can realistically deploy. Even well-resourced enterprises may find running models demanding dozens of gigabytes of memory or high inference costs unsustainable. The issue is not just about building smarter models, but ensuring they are efficient and deployable in real-world platforms. High-performing models such as QWQ‑32b, o1‑mini, and EXAONE‑Deep‑32b excel at tasks involving mathematical reasoning and academic benchmarks. However, their dependence on high-end GPUs and high token consumption limits their use in production settings. These models highlight the ongoing trade-off in AI deployment: achieving high accuracy at the cost of scalability and efficiency. Addressing this gap, researchers at ServiceNow introduced Apriel-Nemotron-15b-Thinker. This model consists of 15 billion parameters, a relatively modest size compared to its high-performing counterparts, yet it demonstrates performance on par with models almost twice its size. The primary advantage lies in its memory footprint and token efficiency. While delivering competitive results, it requires nearly half the memory of QWQ‑32b and EXAONE‑Deep‑32b. This directly contributes to improved operational efficiency in enterprise environments, making it feasible to integrate high-performance reasoning models into real-world applications without large-scale infrastructure upgrades. The development of Apriel-Nemotron-15b-Thinker followed a structured three-stage training approach, each designed to enhance a specific aspect of the model’s reasoning capabilities. In the initial phase, termed Continual Pre-training (CPT), the model was exposed to over 100 billion tokens. These tokens were not generic text but carefully selected examples from domains requiring deep reasoning, mathematical logic, programming challenges, scientific literature, and logical deduction tasks. This exposure provided the foundational reasoning capabilities that distinguish the model from others. The second stage involved Supervised Fine-Tuning (SFT) using 200,000 high-quality demonstrations. These examples further calibrated the model’s responses to reasoning challenges, enhancing performance on tasks that require accuracy and attention to detail. The final tuning stage, GRPO (Guided Reinforcement Preference Optimization), refined the model’s outputs by optimizing alignment with expected results across key tasks. This pipeline ensures the model is intelligent, precise, structured, and scalable. In enterprise-specific tasks such as MBPP, BFCL, Enterprise RAG, MT Bench, MixEval, IFEval, and Multi-Challenge, the model delivered competitive or superior performance compared to larger models. Regarding production efficiency, it consumed 40% fewer tokens than QWQ‑32b, significantly lowering inference costs. From a memory standpoint, it achieves all this with approximately 50% of the memory needed by QWQ‑32b and EXAONE-Deep‑32b, indicating a substantial improvement in deployment feasibility. Even in academic benchmarks, such as AIME-24, AIME-25, AMC-23, MATH-500, and GPQA, the model held its own, often equaling or surpassing the performance of other larger models, all while being significantly lighter in computational demand. Several Key Takeaways from the Research on Apriel-Nemotron-15b-Thinker: Apriel-Nemotron-15b-Thinker has 15 billion parameters, significantly smaller than QWQ-32b or EXAONE-Deep-32b, but performs competitively. Uses a 3-phase training, 100B+ tokens in CPT, 200K fine-tuning demos in SFT, and final GRPO refinement. Consumes around 50% less memory than QWQ-32b, allowing for easier deployment on enterprise hardware. Uses 40% fewer tokens in production tasks than QWQ-32b, reducing inference cost and increasing speed. Outperforms or equals larger models on MBPP, BFCL, Enterprise RAG, and academic tasks like GPQA and MATH-500. Optimized for Agentic and Enterprise tasks, suggesting utility in corporate automation, coding agents, and logical assistants. Designed specifically for real-world use, avoiding over-reliance on lab-scale compute environments. Check out the Model on Hugging Face. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) The post ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency appeared first on MarkTechPost.
Computer science research has evolved into a multidisciplinary effort involving logic, engineering, and data-driven experimentation. With computing systems now deeply embedded in everyday life, research increasingly focuses on large-scale, real-time systems capable of adapting to diverse user needs. These systems often learn from massive datasets and must handle unpredictable interactions. As the scope of computer science broadens, so does the methodology, requiring tools and approaches that accommodate scalability, responsiveness, and empirical validation over purely theoretical models. The difficulty arises when connecting innovative ideas to practical applications without losing the depth and risk inherent in true research. Rapid development cycles, product deadlines, and user expectations often overlap with the uncertain timelines and exploratory nature of research. The challenge is enabling meaningful innovation while maintaining relevance and practical outcomes. Finding a structure where exploration and implementation coexist is essential to making real progress in this demanding and high-impact field. Traditionally, the division between research and engineering has led to inefficiencies. Research teams create conceptual models or prototypes, which are later handed over to engineering teams for scaling and integration. This separation often results in delays, failures in technology transfer, and difficulty adapting ideas to real-world use. Even when research has academic value, the lack of immediate relevance or scalable deployment options limits its broader impact. Conventional dissemination methods, such as peer-reviewed papers, don’t always align with the fast-moving demands of technology development. Google introduced a hybrid research model integrating researchers directly into product and engineering teams. This approach was designed to reduce delays between ideation and implementation, enabling faster and more relevant outcomes. Researchers at Google, a company that runs at the intersection of massive computing infrastructure and billions of users, operate within small teams that remain involved from concept to deployment. By embedding development research, the risk of failure is offset by iterative learning and empirical data gathered from actual user interactions. This model promotes cross-functional innovation where knowledge flows seamlessly between domains. The methodology adopted by Google supports research through robust infrastructure and real-time experimentation. Teams write production-ready code early and rely on continuous feedback from deployed services. Elaborate prototypes are avoided, as they slow the path to real user impact. Google’s services model allows even small teams to access powerful computing resources and integrate complex features quickly. Their projects are modularized, breaking long-term goals into smaller, achievable components. This structure keeps motivation high and provides frequent opportunities for measurable progress. Research is not isolated from engineering but rather supported by it, ensuring that practical constraints and user behavior shape every line of code and every experiment. The results of this model are substantial. Google published 279 research papers in 2011, a steep rise from 13 in 2003, showing an increased emphasis on sharing its scientific advancements. High-impact systems such as MapReduce, BigTable, and the Google File System originated within this hybrid structure and have become foundational to modern computing. Over 1,000 open-source projects and hundreds of public APIs have emerged from this integrated approach. Google Translate and Voice Search are examples of small research teams that transitioned ideas into large-scale products. Contributions extend to global standards, with team members shaping specifications like HTML5. By deeply connecting research with product development, Google has built a model that fosters innovation and delivers it at scale. Its hybrid research system empowers teams to work on difficult problems without being detached from practical realities. Projects are designed with user impact and academic relevance in mind, allowing teams to adjust direction quickly when goals are unmet. This has led to projects such as Google Health being re-evaluated when they did not yield the expected outcomes, showing the model’s flexibility and pragmatism. Combining experimentation, real-world data, and scalable engineering, Google has built a framework that makes research outcomes more tangible and impactful. This paper clearly shows how a unified approach to research and engineering can bridge the gap between innovation and usability, offering a potential blueprint for other technology-driven organizations. Check out the Paper. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) The post Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering appeared first on MarkTechPost.
LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes increasingly unsustainable, similar to bottlenecks identified in LLM pretraining. Moreover, exclusive dependency on human-designed tasks may constrain AI systems’ capacity for autonomous learning and development, especially as they evolve beyond human intellectual capabilities. Researchers have explored various approaches to enhance LLM reasoning capabilities. STaR pioneered self-bootstrapping using expert iteration and rejection sampling of outcome-verified responses to improve CoT reasoning. The o1 model deployed this concept at scale, achieving state-of-the-art results, and R1 later became the first open-weight model to match or surpass o1’s performance by introducing the “zero” setting where RL is applied directly to the base LLM. Further, self-play paradigms have evolved from Schmidhuber’s early two-agent setups to more complex implementations like AlphaGo and AlphaZero. Recent methods such as SPIN, Self-Rewarding Language Models, SPC, and SPAG have applied self-play to language models for alignment and reasoning. Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to enable a single model to autonomously generate and solve tasks that maximize its own learning progress without relying on any external data. Under this method, researchers have introduced the Absolute Zero Reasoner (AZR) that self-evolves its training curriculum and reasoning ability through a code executor that validates proposed code reasoning tasks and verifies answers, providing a unified source of verifiable reward to guide open-ended yet grounded learning. AZR can be effectively implemented across different model scales and remains compatible with various model classes, suggesting broad applicability. LLMs provide an ideal framework for implementing AZR in multitask learning contexts. During each online rollout iteration in the absolute zero setting’s objective equation, AZR proposes new reasoning tasks based on task type and past self-generated examples, with explicit prompting to generate diverse tasks and then attempts to solve them, receiving grounded feedback for its model responses. AZR utilizes a code executor as both a flexible interface and verifiable environment, enabling automatic construction, execution, and validation of code reasoning tasks. Lastly, the AZR Algorithm includes buffer initialization, Task Proposal Inputs and Buffer Management, valid task construction, solution validation, and advantage estimator calculation through Task-Relative REINFORCE++. The Absolute Zero Reasoner-Coder-7B has achieved state-of-the-art performance in the 7B overall average and coding average categories, surpassing previous best models by 1.8 absolute percentage points despite being entirely out-of-distribution for both math and code reasoning benchmarks. It outperforms models trained with expert-curated human data in coding by 0.3 absolute percentage points while never accessing such data itself. Scaling analysis reveals that AZR delivers greater gains on larger models, with the 7B and 14B models continuing to improve beyond 200 training steps while the 3B model plateaus. Out-of-distribution performance gains increase with model size: +5.7, +10.2, and +13.2 for 3B, 7B, and 14B, respectively. In conclusion, researchers introduced the Absolute Zero paradigm to address data limitations in existing RLVR frameworks. Under this method, researchers present AZR, which trains models to propose and solve code-related reasoning tasks grounded by a code executor. However, there is a limitation regarding safety management in self-improving systems. The team observed several instances of safety-concerning CoT reasoning from the Llama-3.1-8B model, termed “uh-oh moments.” The findings indicate that while the Absolute Zero paradigm reduces human intervention needs in task curation, ongoing oversight remains necessary to address lingering safety concerns, highlighting a critical direction for future research. Check out the Paper, Model on Hugging Face and GitHub Page. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us The post AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data appeared first on MarkTechPost.
By combining fine-tuning and in-context learning, you get LLMs that can learn tasks that would be too difficult or expensive for either methodRead More
Executives like OpenAI’s Sam Altman said US support for infrastructure would make it easier for AI companies to meet demand.Read More
OpenAI, Microsoft tell Senate ‘no one country can win AI’ Leer entrada »
LLMs have made significant strides in language-related tasks such as conversational AI, reasoning, and code generation. However, human communication extends beyond text, often incorporating visual elements to enhance understanding. To create a truly versatile AI, models need the ability to process and generate text and visual information simultaneously. Training such unified vision-language models from scratch using methods like autoregressive token prediction or a hybrid approach combining diffusion and language losses has shown strong performance. Still, it requires vast computational resources and retraining for each new modality. An alternative approach adapts pretrained LLMs with vision capabilities, which offers a more efficient path but often compromises the language model’s original performance. Current research has focused on three main strategies: merging LLMs with standalone image generation models, training large multimodal models end-to-end, or using a combination of diffusion and autoregressive losses. While these methods have achieved state-of-the-art results, they either require retraining large models or result in degradation of the LLM’s core capabilities. Despite these challenges, leveraging pretrained LLMs with added vision components has demonstrated significant potential, particularly in tasks involving image understanding and generation. However, these methods still face limitations in terms of efficiency and flexibility. Researchers from UCLA, the University of Wisconsin-Madison, and Adobe Research propose X-Fusion, which adapts pretrained LLMs for multimodal tasks while preserving language capabilities. X-Fusion utilizes a dual-tower architecture, freezing the LLM’s language weights while adding a vision-specific tower to process visual information. The approach aligns text and vision features at multiple levels, improving performance in image-to-text and text-to-image tasks. Through ablation studies, the researchers emphasize the importance of clean image data for training and show that aligning vision features with pre-trained representations accelerates convergence, especially for smaller models. X-Fusion is a unified framework that adapts pretrained LLMs for vision tasks while retaining their language capabilities. It uses a dual-tower design, freezing the LLM’s text weights while introducing a separate vision tower for processing visual information. Images are tokenized using a pretrained encoder, and image and text tokens are jointly optimized. The model incorporates an optional X-Fuse operation to merge features from both towers for enhanced performance. X-Fusion is trained with autoregressive and image denoising losses, and its performance is evaluated on image generation (text-to-image) and image understanding (image-to-text) tasks. The study evaluates the Dual Tower architecture against alternative transformer variants for multimodal integration. It compares the Single Tower, Gated Tower, and Dual Projection designs, highlighting the flexibility of the Dual Tower for image and text tasks. The Dual Tower performs best in image generation and understanding, outperforming other designs by 23% in FID without increasing training parameters. The study also investigates the effects of noise and data ratios on performance, finding that clean images improve understanding and generation. Additionally, aligning vision features with a pretrained encoder like CLIP boosts performance, especially for smaller models. In conclusion, X-Fusion is a framework that adapts pretrained LLMs to multimodal tasks, such as image understanding and generation, while preserving language capabilities. It introduces a Dual Tower architecture where language weights remain fixed, and a separate trainable vision tower processes visual features. Experimental results show that X-Fusion outperforms alternative designs in image and text-to-image tasks. Key findings include the benefits of incorporating understanding-focused data, reducing noise in image data, and the positive impact of feature alignment, especially for smaller models. The research contributes valuable insights into building efficient multimodal models. Check out the Paper. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) ML News Community – r/machinelearningnews (92k+ members) The post Multimodal LLMs Without Compromise: Researchers from UCLA, UW–Madison, and Adobe Introduce X-Fusion to Add Vision to Frozen Language Models Without Losing Language Capabilities appeared first on MarkTechPost.
Microsoft CEO Satya Nadella’s endorsement of Google DeepMind‘s A2A open protocol and Anthropic’s MCP is huge sign the industry is moving to an open garden.Read More
Alibaba’s ZeroSearch trains large language models to beat Google Search and slash API costs by 88%, redefining how AI learns to retrieve information.Read More
Accenture’s new research reveals the critical strategies that separate the companies successfully scaling AI from the 92% stuck in perpetual pilot mode, providing enterprise leaders with actionable insights to accelerate their AI transformation journey.Read More
5 strategies that separate AI leaders from the 92% still stuck in pilot mode Leer entrada »
We use cookies to improve your experience and performance on our website. You can learn more at Política de privacidad and manage your privacy settings by clicking Settings.