YouZum

Committee

AI, Committee, ニュース, Uncategorized

Encoders and Decoders in Transformer Models

This article is divided into three parts; they are: • Full Transformer Models: Encoder-Decoder Architecture • Encoder-Only Models • Decoder-Only Models The original transformer architecture, introduced in “Attention is All You Need,” combines an encoder and decoder specifically designed for sequence-to-sequence (seq2seq) tasks like machine translation.

Encoders and Decoders in Transformer Models 投稿を読む »

AI, Committee, ニュース, Uncategorized

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference

A prominent area of exploration involves enabling large language models (LLMs) to function collaboratively. Multi-agent systems powered by LLMs are now being examined for their potential to coordinate challenging problems by splitting tasks and working simultaneously. This direction has gained attention due to its potential to increase efficiency and reduce latency in real-time applications. A common issue in collaborative LLM systems is agents’ sequential, turn-based communication. In such systems, each agent must wait for others to complete their reasoning steps before proceeding. This slows down processing, especially in situations demanding rapid responses. Moreover, agents often duplicate efforts or generate inconsistent outputs, as they cannot see the evolving thoughts of their peers during generation. This latency and redundancy reduce the practicality of deploying multi-agent LLMs, particularly when time and computation are constrained, such as edge devices. Most current solutions have relied on sequential or independently parallel sampling techniques to improve reasoning. Methods like Chain-of-Thought prompting help models to solve problems in a structured way but often come with increased inference time. Approaches such as Tree-of-Thoughts and Graph-of-Thoughts expand on this by branching reasoning paths. However, these approaches still do not allow for real-time mutual adaptation among agents. Multi-agent setups have explored collaborative methods, but mostly through alternating message exchanges, which again introduces delays. Some advanced systems propose complex dynamic scheduling or role-based configurations, which are not optimized for efficient inference. Research from MediaTek Research introduced a new method called Group Think. This approach enables multiple reasoning agents within a single LLM to operate concurrently, observing each other’s partial outputs at the token level. Each reasoning thread adapts to the evolving thoughts of the others mid-generation. This mechanism reduces duplication and enables agents to shift direction if another thread is better positioned to continue a specific line of reasoning. Group Think is implemented through a token-level attention mechanism that lets each agent attend to previously generated tokens from all agents, supporting real-time collaboration. The method works by assigning each agent its own sequence of token indices, allowing their outputs to be interleaved in memory. These interleaved tokens are stored in a shared cache accessible to all agents during generation. This design allows efficient attention across reasoning threads without architectural changes to the transformer model. The implementation works both on personal devices and in data centers. On local devices, it effectively uses idle compute by batching multiple agent outputs, even with a batch size of one. In data centers, Group Think allows multiple requests to be processed together, interleaving tokens across agents while maintaining correct attention dynamics. Performance tests demonstrate that Group Think significantly improves latency and output quality. In enumeration tasks, such as listing 100 distinct names, it achieved near-complete results more rapidly than conventional Chain-of-Thought approaches. The acceleration was proportional to the number of thinkers; for example, four thinkers reduced latency by a factor of about four. In divide-and-conquer problems, using the Floyd–Warshall algorithm on a graph of five nodes, four thinkers reduced the completion time to half that of a single agent. Group Think solved code generation challenges in programming tasks more effectively than baseline models. With four or more thinkers, the model produced correct code segments much faster than traditional reasoning models. This research shows that existing LLMs, though not explicitly trained for collaboration, can already demonstrate emergent group reasoning behaviors under the Group Think setup. In experiments, agents naturally diversified their work to avoid redundancy, often dividing tasks by topic or focus area. These findings suggest that Group Think’s efficiency and sophistication could be enhanced further with dedicated training on collaborative data. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference appeared first on MarkTechPost.

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference 投稿を読む »

AI, Committee, ニュース, Uncategorized

The FDA plans to limit access to covid vaccines. Here’s why that’s not all bad.

This week, two new leaders at the US Food and Drug Administration announced plans to limit access to covid vaccines, arguing that there is not much evidence to support the value of annual shots in healthy people. New vaccines will be made available only to the people who are most vulnerable—namely, those over 65 and others with conditions that make them more susceptible to severe disease. Anyone else will have to wait. Covid vaccines will soon be required to go through more rigorous trials to ensure that they really are beneficial for people who aren’t at high risk. The plans have been met with fear and anger in some quarters. But they weren’t all that shocking to me. In the UK, where I live, covid boosters have been offered only to vulnerable groups for a while now. And the immunologists I spoke to agree: The plans make sense. They are still controversial. Covid hasn’t gone away. And while most people are thought to have some level of immunity to the virus, some of us still stand to get very sick if infected. The threat of long covid lingers, too. Given that people respond differently to both the virus and the vaccine, perhaps individuals should be able to choose whether they get a vaccine or not. I should start by saying that covid vaccines have been a remarkable success story. The drugs were developed at record-breaking speed—they were given to people in clinical trials just 69 days after the virus had been identified. They are, on the whole, very safe. And they work remarkably well. They have saved millions of lives. And they rescued many of us from lockdowns. But while many of us have benefited hugely from covid vaccinations in the past, there are questions over how useful continuing annual booster doses might be. That’s the argument being made by FDA head Marty Makary and Vinay Prasad, director of the agency’s Center for Biologics Evaluation and Research. Both men have been critical of the FDA in the past. Makary has long been accused of downplaying the benefits of covid vaccines. He made incorrect assumptions about the coronavirus responsible for covid-19 and predicted that the disease would be “mostly gone” by April 2021. Most recently, he also testified in Congress that the theory that the virus came from a lab in China was a “no-brainer.” (The strongest evidence suggests the virus jumped from animals to humans in a market in Wuhan.) Prasad has said “the FDA is a failure” and has called annual covid boosters “a public health disaster the likes of which we’ve never seen before,” because of a perceived lack of clinical evidence to support their use. Makary and Prasad’s plans, which were outlined in the New England Journal of Medicine on Tuesday, don’t include such inflammatory language or unfounded claims, thankfully. In fact, they seem pretty measured: Annual covid booster shots will continue to be approved for vulnerable people but will have to be shown to benefit others before people outside the approved groups can access them. There are still concerns being raised, though. Let’s address a few of the biggest ones. Shouldn’t I get an annual covid booster alongside my flu vaccine? At the moment, a lot of people in the US opt to get a covid vaccination around the time they get their annual flu jab. Each year, a flu vaccine is developed to protect against what scientists predict will be the dominant strain of virus circulating come flu season, which tends to run from October through March. But covid doesn’t seem to stick to the same seasonal patterns, says Susanna Dunachie, a clinical doctor and professor of infectious diseases at the University of Oxford in the UK. “We seem to be getting waves of covid year-round,” she says. And an annual shot might not offer the best protection against covid anyway, says Fikadu Tafesse, an immunologist and virologist at Oregon Health & Science University in Portland. His own research suggests that leaving more than a year between booster doses could enhance their effectiveness. “One year is really a random time,” he says. It might be better to wait five or 10 years between doses instead, he adds. “If you are at risk [of a serious covid infection] you may actually need [a dose] every six months,” says Tafesse. “But for healthy individuals, it’s a very different conversation.” What about children—shouldn’t we be protecting them? There are reports that pediatricians are concerned about the impact on children, some of whom can develop serious cases of covid. “If we have safe and effective vaccines that prevent illness, we think they should be available,” James Campbell, vice chair of the committee on infectious diseases at the American Academy of Pediatrics, told STAT. This question has been on my mind for a while. My two young children, who were born in the UK, have never been eligible for a covid vaccine in this country. I found this incredibly distressing when the virus started tearing through child-care centers—especially given that at the time, the US was vaccinating babies from the age of six months. My kids were eventually offered a vaccine in the US, when we temporarily moved there a couple of years ago. But by that point, the equation had changed. They’d both had covid by then. I had a better idea of the general risks of the virus to children. I turned it down. I was relieved to hear that Tafesse had made the same decision for his own children. “There are always exceptions, but in general, [covid] is not severe in kids,” he says. The UK’s Joint Committee on Vaccination and Immunology found that the benefits of vaccination are much smaller for children than they are for adults. “Of course there are children with health problems who should definitely have it,” says Dunachie. “But for healthy children in healthy households, the benefits probably are quite marginal.” Shouldn’t healthy people get vaccinated to help protect more vulnerable

The FDA plans to limit access to covid vaccines. Here’s why that’s not all bad. 投稿を読む »

AI, Committee, ニュース, Uncategorized

Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly. Such an approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there’s a pressing need to equip language models with a mechanism that allows them to make autonomous decisions about whether to think deeply or respond succinctly. Current tools attempting to solve this issue either rely on manually set heuristics or prompt engineering to switch between short and long responses. Some methods use separate models and route questions based on complexity estimates. Still, these external routing systems often lack insight into the target model’s strengths and fail to make optimal decisions. Other techniques fine-tune models with prompt-based cues like “reasoning on/off,” but these rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to enable fully autonomous and context-sensitive control within a single model. Researchers from the National University of Singapore introduced a new framework called Thinkless, which equips a language model with the ability to dynamically decide between using short or long-form reasoning. The framework is built on reinforcement learning and introduces two special control tokens—<short> for concise answers and <think> for detailed responses. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query. The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format. The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens. This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both <short> and <think> tokens receive balanced updates, promoting stable learning across response types. When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the <think> token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently. On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized <think> only 13.31% of the time, yet still achieved 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks. Overall, this study from the National University of Singapore researchers presents a compelling solution to the inefficiencies of uniform reasoning in large language models. By introducing a mechanism that enables models to judge task complexity and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The method balances depth of reasoning and response precision without relying on fixed rules, offering a data-driven approach to more intelligent language model behavior. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO appeared first on MarkTechPost.

Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO 投稿を読む »

ja