YouZum

AI

AI, Committee, Noticias, Uncategorized

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation appeared first on MarkTechPost.

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation Leer entrada »

AI, Committee, Noticias, Uncategorized

Google AI Introduces Multi-Agent System Search MASS: A New AI Agent Optimization Framework for Better Prompts and Topologies

Multi-agent systems are becoming a critical development in artificial intelligence due to their ability to coordinate multiple large language models (LLMs) to solve complex problems. Instead of relying on a single model’s perspective, these systems distribute roles among agents, each contributing a unique function. This division of labor enhances the system’s ability to analyze, respond, and act in more robust ways. Whether applied to code debugging, data analysis, retrieval-augmented generation, or interactive decision-making, LLM-driven agents are achieving results that single models cannot consistently match. The power of these systems lies in their design, particularly the configuration of inter-agent connections, known as topologies, and the specific instructions given to each agent, referred to as prompts. As this model of computation matures, the challenge has shifted from proving feasibility to optimizing architecture and behavior for superior results. One significant problem lies in the difficulty of designing these systems efficiently. When prompts, those structured inputs that guide each agent’s role, are slightly altered, performance can swing dramatically. This sensitivity makes scalability risky, especially when agents are linked together in workflows where one’s output serves as another’s input. Errors can propagate or even amplify. Moreover, topological decisions, such as determining the number of agents involved, their interaction style, and task sequence, are still heavily reliant on manual configuration and trial-and-error. The design space is vast and nonlinear, as it combines numerous options for both prompt engineering and topology construction. Optimizing both simultaneously has been largely out of reach for traditional design methods. Several efforts have been made to improve various aspects of this design problem, but gaps remain. Methods like DSPy automate exemplar generation for prompts, while others focus on increasing the number of agents participating in tasks like voting. Tools like ADAS introduce code-based topological configurations through meta-agents. Some frameworks, such as AFlow, apply techniques like Monte Carlo Tree Search to explore combinations more efficiently. Yet, these solutions generally concentrate on either prompt or topology optimization, rather than both. This lack of integration limits their ability to generate MAS designs that are both intelligent and robust under complex operational conditions. Researchers at Google and the University of Cambridge introduced a new framework named Multi-Agent System Search (Mass). This method automates MAS design by interleaving the optimization of both prompts and topologies in a staged approach. Unlike earlier attempts that treated the two components independently, Mass begins by identifying which elements, both prompts and topological structures, are most likely to influence performance. By narrowing the search to this influential subspace, the framework operates more efficiently while delivering higher-quality outcomes. The method progresses in three phases: localized prompt optimization, selection of effective workflow topologies based on the optimized prompts, and then global optimization of prompts at the system-wide level. The framework not only reduces computational overhead but also removes the burden of manual tuning from researchers. The technical implementation of Mass is structured and methodical. First, each building block of a MAS undergoes prompt refinement. These blocks are agent modules with specific responsibilities, such as aggregation, reflection, or debate. For example, prompt optimizers generate variations that include both instructional guidance (e.g., “think step by step”) and example-based learning (e.g., one-shot or few-shot demos). The optimizer evaluates these using a validation metric to guide improvements. Once each agent’s prompt is optimized locally, the system proceeds to explore valid combinations of agents to form topologies. This topology optimization is informed by earlier results and constrained to a pruned search space identified as most influential. Finally, the best topology undergoes global-level prompt tuning, where instructions are fine-tuned in the context of the entire workflow to maximize collective efficiency. In tasks such as reasoning, multi-hop understanding, and code generation, the optimized MAS consistently surpassed existing benchmarks. In performance testing using Gemini 1.5 Pro on the MATH dataset, prompt-optimized agents showed an average accuracy of around 84% with enhanced prompting techniques, compared to 76–80% for agents scaled through self-consistency or multi-agent debate. In the HotpotQA benchmark, using the debate topology within Mass yielded a 3% improvement. In contrast, other topologies, such as reflect or summarize, failed to yield gains or even led to a 15% degradation. On LiveCodeBench, the Executor topology provided a +6% boost, but methods like reflection again saw negative results. These findings validate that only a fraction of the topological design space contributes positively and reinforce the need for targeted optimization, such as that used in Mass. Several Key Takeaways from the Research include: MAS design complexity is significantly influenced by prompt sensitivity and topological arrangement. Prompt optimization, both at the block and system level, is more effective than agent scaling alone, as evidenced by the 84% accuracy with enhanced prompts versus 76% with self-consistency scaling. Not all topologies are beneficial; debate added +3% in HotpotQA, while reflection caused a drop of up to -15%. The Mass framework integrates prompt and topology optimization in three phases, drastically reducing computational and design burden. Topologies like debate and executor are effective, while others, such as reflect and summarize, can degrade system performance. Mass avoids full search complexity by pruning the design space based on early influence analysis, improving performance while saving resources. The approach is modular and supports plug-and-play agent configurations, making it adaptable to various domains and tasks. Final MAS models from Mass outperform state-of-the-art baselines across multiple benchmarks like MATH, HotpotQA, and LiveCodeBench. In conclusion, this research identifies prompt sensitivity and topology complexity as major bottlenecks in multi-agent system (MAS) development and proposes a structured solution that strategically optimizes both areas. The Mass framework demonstrates a scalable, efficient approach to MAS design, minimizing the need for human input while maximizing performance. The research presents compelling evidence that better prompt design is more effective than merely adding agents and that targeted search within influential topology subsets leads to meaningful gains in real-world tasks. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Google AI

Google AI Introduces Multi-Agent System Search MASS: A New AI Agent Optimization Framework for Better Prompts and Topologies Leer entrada »

AI, Committee, Noticias, Uncategorized

Why doctors should look for ways to prescribe hope

This week, I’ve been thinking about the powerful connection between mind and body. Some new research suggests that people with heart conditions have better outcomes when they are more hopeful and optimistic. Hopelessness, on the other hand, is associated with a significantly higher risk of death. The findings build upon decades of fascinating research into the phenomenon of the placebo effect. Our beliefs and expectations about a medicine (or a sham treatment) can change the way it works. The placebo effect’s “evil twin,” the nocebo effect, is just as powerful—negative thinking has been linked to real symptoms. Researchers are still trying to understand the connection between body and mind, and how our thoughts can influence our physiology. In the meantime, many are developing ways to harness it in hospital settings. Is it possible for a doctor to prescribe hope? Alexander Montasem, a lecturer in psychology at the University of Liverpool, is trying to find an answer to that question. In his latest study, Montasem and his colleagues focused on people with cardiovascular disease. The team reviewed all published research into the link between hope and heart health outcomes in such individuals. Hope is a pretty tricky thing to nail down, but these studies use questionnaires to try to do that. In one popular questionnaire, hope is defined as “a positive motivational state” based on having agency and plans to meet personal goals. Montasem’s team found 12 studies that fit the bill. All told, these studies included over 5,000 people. And together, they found that high hopefulness was associated with better health outcomes: less angina, less post-stroke fatigue, a higher quality of life, and a lower risk of death. The team presented its work at the British Cardiovascular Society meeting in Manchester earlier this week. When I read the results, it immediately got me thinking about the placebo effect. A placebo is a “sham” treatment—an inert substance like a sugar pill or saline injection that does not contain any medicine. And yet hundreds of studies have shown that such treatments can have remarkable effects. They can ease the symptoms of pain, migraine, Parkinson’s disease, depression, anxiety, and a host of other disorders. The way a placebo is delivered can influence its effectiveness, and so can its color, shape, and price. Expensive placebos seem to be more effective. And placebos can even work when people know they are just placebos. And then there’s the nocebo effect. If you expect to feel worse after taking something, you are much more likely to. The nocebo effect can increase the risk of pain, gastrointestinal symptoms, flu-like symptoms, and more.   It’s obvious our thoughts and beliefs can play an enormous role in our health and well-being. What’s less clear is exactly how it happens. Scientists have made some progress—there’s evidence that a range of brain chemicals, including the body’s own opioids, are involved in both the placebo and nocebo effects. But the exact mechanisms remain something of a mystery. In the meantime, researchers are working on ways to harness the power of positive thinking. There have been long-running debates over whether it is ever ethical for a doctor to deceive patients to make them feel better. But I’m firmly of the belief that doctors have a duty to be honest with their patients. A more ethical approach might be to find ways to build patients’ hope, says Montasem. Not by exaggerating the likely benefit of a drug or by sugar-coating a prognosis, but perhaps by helping them work on their goals, agency, and general outlook on life. Some early research suggests that this approach can help. Laurie McLouth at the University of Kentucky and her colleagues found that a series of discussions about values, goals, and strategies to achieve those goals improved hope among people being treated for advanced lung cancer. Montasem now plans to review all the published work in this area and design a new approach to increasing hope. Any approach might have to be tailored to an individual, he adds. Some people might be more responsive to a more spiritual or religious way of thinking about their lives, for example. These approaches could also be helpful for all of us, even outside clinical settings. I asked Montasem if he had any advice for people who want to have a positive outlook on life more generally. He told me that it’s important to have personal goals, along with a plan to achieve them. His own goals center on advancing his research, helping patients, and spending time with his family. “Materialistic goals aren’t as beneficial for your wellbeing,” he adds. Since we spoke, I’ve been thinking over my own goals. I’ve realized that my first is to come up with a list of goals. And I plan to do it soon. “The minute we give up [on pursuing] our goals, we start falling into hopelessness,” he says. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

Why doctors should look for ways to prescribe hope Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: China’s AI agent boom, and GPS alternatives

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Manus has kick-started an AI agent boom in China Last year, China saw a boom in foundation models, the do-everything large language models that underpin the AI revolution. This year, the focus has shifted to AI agents—systems that are less about responding to users’ queries and more about autonomously accomplishing things for them. There are now a host of Chinese startups building these general-purpose digital tools, which can answer emails, browse the internet to plan vacations, and even design an interactive website. Many of these have emerged in just the last two months, following in the footsteps of Manus—a general AI agent that sparked weeks of social media frenzy for invite codes after its limited-release launch in early March. As the race to define what a useful AI agent looks like unfolds, a mix of ambitious startups and entrenched tech giants are now testing how these tools might actually work in practice—and for whom. Read the full story. —Caiwei Chen Inside the race to find GPS alternatives Later this month, an inconspicuous 150-kilogram satellite is set to launch into space aboard the SpaceX Transporter 14 mission. Once in orbit, it will test super-accurate next-generation satnav technology designed to make up for the shortcomings of the US Global Positioning System (GPS). Despite the system’s indispensable nature, the GPS signal is easily suppressed or disrupted by everything from space weather to 5G cell towers to phone-size jammers worth a few tens of dollars. The problem has been whispered about among experts for years, but it has really come to the fore in the last three years, since Russia invaded Ukraine. Now, startup Xona Space Systems wants to create a space-based system that would do what GPS does but better. Read the full story. —Tereza Pultarova Why doctors should look for ways to prescribe hope —Jessica Hamzelou This week, I’ve been thinking about the powerful connection between mind and body. Some new research suggests that people with heart conditions have better outcomes when they are more hopeful and optimistic. Hopelessness, on the other hand, is associated with a significantly higher risk of death. The findings build upon decades of fascinating research into the phenomenon of the placebo effect. Our beliefs and expectations about a medicine (or a sham treatment) can change the way it works. The placebo effect’s “evil twin,” the nocebo effect, is just as powerful—negative thinking has been linked to real symptoms. Researchers are still trying to understand the connection between body and mind, and how our thoughts can influence our physiology. In the meantime, many are developing ways to harness it in hospital settings. Is it possible for a doctor to prescribe hope? Read the full story. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Elon Musk threatened to cut off NASA’s use of SpaceX’s Dragon spacecraftHis war of words with Donald Trump is dramatically escalating. (WP $)+ If Musk actually carried through with his threat, NASA would seriously struggle. (NYT $)+ Silicon Valley is starting to pick sides. (Wired $)+ It appears as though Musk has more to lose from their bruising breakup. (NY Mag $) 2 Apple and Alibaba’s AI rollout in China has been delayedIt’s the latest victim of Trump’s trade war. (FT $)+ The deal is supposed to support iPhones’ AI offerings in the country. (Reuters) 3 X’s new policy blocks the use of its posts to ‘fine-tune or train’ AI modelsUnless companies strike a deal with them, that is. (TechCrunch)+ The platform could end up striking agreements like Reddit and Google. (The Verge) 4 RJK Jr’s new hire is hunting for proof that vaccines cause autismVaccine skeptic David Geier is seeking access to a database he was previously barred from. (WSJ $)+ How measuring vaccine hesitancy could help health professionals tackle it. (MIT Technology Review) 5 Anthropic has launched a new service for the militaryClaude Gov is designed specifically for US defense and intelligence agencies. (The Verge)+ Generative AI is learning to spy for the US military. (MIT Technology Review) 6 There’s no guarantee your billion-dollar startup won’t failIn fact, one in five of them will. (Bloomberg $)+ Beware the rise of the AI coding startup. (Reuters) 7 Walmart’s drone deliveries are taking offIt’s expanding to 100 new US stories in the next year. (Wired $) 8 AI might be able to tell us how old the Dead Sea Scrolls really are Models suggest they’re even older than we previously thought. (The Economist $)+ How AI is helping historians better understand our past. (MIT Technology Review) 9 All-in-one super apps are a hit in the Gulf They’re following in China’s footsteps. (Rest of World) 10 Nintendo’s Switch 2 has revived the midnight launch eventFans queued for hours outside stores to get their hands on the new console. (Insider $)+ How the company managed to dodge Trump’s tariffs. (The Guardian) Quote of the day “Elon finally found a way to make Twitter fun again.” —Dan Pfeiffer, a host of the political podcast Pod Save America, jokes about Elon Musk and Donald Trump’s ongoing feud in a post on X. One more thing This rare earth metal shows us the future of our planet’s resources We’re in the middle of a potentially transformative moment. Metals discovered barely a century ago now underpin the technologies we’re relying on for cleaner energy, and not having enough of them could slow progress.  Take neodymium, one of the rare earth metals. It’s used in cryogenic coolers to reach ultra-low temperatures needed for devices like superconductors and in high-powered magnets that power everything from smartphones to wind turbines. And very soon, demand for it could outstrip supply. What happens then? And what

The Download: China’s AI agent boom, and GPS alternatives Leer entrada »

AI, Committee, Noticias, Uncategorized

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing

arXiv:2410.12872v2 Announce Type: replace Abstract: Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing (KT) due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have proposed new prompt formats, they struggle to represent the full interaction histories of example learners within a single prompt during in-context learning (ICL), resulting in limited scalability and high computational cost under token constraints. In this work, we present textit{LLM-based Option-weighted Knowledge Tracing (LOKT)}, a simple yet effective framework that encodes the interaction histories of example learners in context as textit{textual categorical option weights (TCOW)}. TCOW are semantic labels (e.g., “inadequate”) assigned to the options selected by learners when answering questions, enhancing the interpretability of LLMs. Experiments on multiple-choice datasets show that LOKT outperforms existing non-LLM and LLM-based KT models in both cold-start and warm-start settings. Moreover, LOKT enables scalable and cost-efficient inference, achieving strong performance even under strict token constraints. Our code is available at href{https://anonymous.4open.science/r/LOKT_model-3233}{https://anonymous.4open.science/r/LOKT_model-3233}.

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing Leer entrada »

AI, Committee, Noticias, Uncategorized

Inducing lexicons of in-group language with socio-temporal context

arXiv:2409.19257v3 Announce Type: replace Abstract: In-group language is an important signifier of group dynamics. This paper proposes a novel method for inducing lexicons of in-group language, which incorporates its socio-temporal context. Existing methods for lexicon induction do not capture the evolving nature of in-group language, nor the social structure of the community. Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction. We develop a test set for the task of lexicon induction and a new lexicon of manosphere language, validated by human experts, which quantifies the relevance of each term to a specific sub-community at a given point in time. Finally, we present novel insights on in-group language which illustrate the utility of this approach.

Inducing lexicons of in-group language with socio-temporal context Leer entrada »

We use cookies to improve your experience and performance on our website. You can learn more at Política de privacidad and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
es_ES