YouZum

Uncategorized

AI, Committee, Noticias, Uncategorized

Why doctors should look for ways to prescribe hope

This week, I’ve been thinking about the powerful connection between mind and body. Some new research suggests that people with heart conditions have better outcomes when they are more hopeful and optimistic. Hopelessness, on the other hand, is associated with a significantly higher risk of death. The findings build upon decades of fascinating research into the phenomenon of the placebo effect. Our beliefs and expectations about a medicine (or a sham treatment) can change the way it works. The placebo effect’s “evil twin,” the nocebo effect, is just as powerful—negative thinking has been linked to real symptoms. Researchers are still trying to understand the connection between body and mind, and how our thoughts can influence our physiology. In the meantime, many are developing ways to harness it in hospital settings. Is it possible for a doctor to prescribe hope? Alexander Montasem, a lecturer in psychology at the University of Liverpool, is trying to find an answer to that question. In his latest study, Montasem and his colleagues focused on people with cardiovascular disease. The team reviewed all published research into the link between hope and heart health outcomes in such individuals. Hope is a pretty tricky thing to nail down, but these studies use questionnaires to try to do that. In one popular questionnaire, hope is defined as “a positive motivational state” based on having agency and plans to meet personal goals. Montasem’s team found 12 studies that fit the bill. All told, these studies included over 5,000 people. And together, they found that high hopefulness was associated with better health outcomes: less angina, less post-stroke fatigue, a higher quality of life, and a lower risk of death. The team presented its work at the British Cardiovascular Society meeting in Manchester earlier this week. When I read the results, it immediately got me thinking about the placebo effect. A placebo is a “sham” treatment—an inert substance like a sugar pill or saline injection that does not contain any medicine. And yet hundreds of studies have shown that such treatments can have remarkable effects. They can ease the symptoms of pain, migraine, Parkinson’s disease, depression, anxiety, and a host of other disorders. The way a placebo is delivered can influence its effectiveness, and so can its color, shape, and price. Expensive placebos seem to be more effective. And placebos can even work when people know they are just placebos. And then there’s the nocebo effect. If you expect to feel worse after taking something, you are much more likely to. The nocebo effect can increase the risk of pain, gastrointestinal symptoms, flu-like symptoms, and more.   It’s obvious our thoughts and beliefs can play an enormous role in our health and well-being. What’s less clear is exactly how it happens. Scientists have made some progress—there’s evidence that a range of brain chemicals, including the body’s own opioids, are involved in both the placebo and nocebo effects. But the exact mechanisms remain something of a mystery. In the meantime, researchers are working on ways to harness the power of positive thinking. There have been long-running debates over whether it is ever ethical for a doctor to deceive patients to make them feel better. But I’m firmly of the belief that doctors have a duty to be honest with their patients. A more ethical approach might be to find ways to build patients’ hope, says Montasem. Not by exaggerating the likely benefit of a drug or by sugar-coating a prognosis, but perhaps by helping them work on their goals, agency, and general outlook on life. Some early research suggests that this approach can help. Laurie McLouth at the University of Kentucky and her colleagues found that a series of discussions about values, goals, and strategies to achieve those goals improved hope among people being treated for advanced lung cancer. Montasem now plans to review all the published work in this area and design a new approach to increasing hope. Any approach might have to be tailored to an individual, he adds. Some people might be more responsive to a more spiritual or religious way of thinking about their lives, for example. These approaches could also be helpful for all of us, even outside clinical settings. I asked Montasem if he had any advice for people who want to have a positive outlook on life more generally. He told me that it’s important to have personal goals, along with a plan to achieve them. His own goals center on advancing his research, helping patients, and spending time with his family. “Materialistic goals aren’t as beneficial for your wellbeing,” he adds. Since we spoke, I’ve been thinking over my own goals. I’ve realized that my first is to come up with a list of goals. And I plan to do it soon. “The minute we give up [on pursuing] our goals, we start falling into hopelessness,” he says. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

Why doctors should look for ways to prescribe hope Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: China’s AI agent boom, and GPS alternatives

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Manus has kick-started an AI agent boom in China Last year, China saw a boom in foundation models, the do-everything large language models that underpin the AI revolution. This year, the focus has shifted to AI agents—systems that are less about responding to users’ queries and more about autonomously accomplishing things for them. There are now a host of Chinese startups building these general-purpose digital tools, which can answer emails, browse the internet to plan vacations, and even design an interactive website. Many of these have emerged in just the last two months, following in the footsteps of Manus—a general AI agent that sparked weeks of social media frenzy for invite codes after its limited-release launch in early March. As the race to define what a useful AI agent looks like unfolds, a mix of ambitious startups and entrenched tech giants are now testing how these tools might actually work in practice—and for whom. Read the full story. —Caiwei Chen Inside the race to find GPS alternatives Later this month, an inconspicuous 150-kilogram satellite is set to launch into space aboard the SpaceX Transporter 14 mission. Once in orbit, it will test super-accurate next-generation satnav technology designed to make up for the shortcomings of the US Global Positioning System (GPS). Despite the system’s indispensable nature, the GPS signal is easily suppressed or disrupted by everything from space weather to 5G cell towers to phone-size jammers worth a few tens of dollars. The problem has been whispered about among experts for years, but it has really come to the fore in the last three years, since Russia invaded Ukraine. Now, startup Xona Space Systems wants to create a space-based system that would do what GPS does but better. Read the full story. —Tereza Pultarova Why doctors should look for ways to prescribe hope —Jessica Hamzelou This week, I’ve been thinking about the powerful connection between mind and body. Some new research suggests that people with heart conditions have better outcomes when they are more hopeful and optimistic. Hopelessness, on the other hand, is associated with a significantly higher risk of death. The findings build upon decades of fascinating research into the phenomenon of the placebo effect. Our beliefs and expectations about a medicine (or a sham treatment) can change the way it works. The placebo effect’s “evil twin,” the nocebo effect, is just as powerful—negative thinking has been linked to real symptoms. Researchers are still trying to understand the connection between body and mind, and how our thoughts can influence our physiology. In the meantime, many are developing ways to harness it in hospital settings. Is it possible for a doctor to prescribe hope? Read the full story. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Elon Musk threatened to cut off NASA’s use of SpaceX’s Dragon spacecraftHis war of words with Donald Trump is dramatically escalating. (WP $)+ If Musk actually carried through with his threat, NASA would seriously struggle. (NYT $)+ Silicon Valley is starting to pick sides. (Wired $)+ It appears as though Musk has more to lose from their bruising breakup. (NY Mag $) 2 Apple and Alibaba’s AI rollout in China has been delayedIt’s the latest victim of Trump’s trade war. (FT $)+ The deal is supposed to support iPhones’ AI offerings in the country. (Reuters) 3 X’s new policy blocks the use of its posts to ‘fine-tune or train’ AI modelsUnless companies strike a deal with them, that is. (TechCrunch)+ The platform could end up striking agreements like Reddit and Google. (The Verge) 4 RJK Jr’s new hire is hunting for proof that vaccines cause autismVaccine skeptic David Geier is seeking access to a database he was previously barred from. (WSJ $)+ How measuring vaccine hesitancy could help health professionals tackle it. (MIT Technology Review) 5 Anthropic has launched a new service for the militaryClaude Gov is designed specifically for US defense and intelligence agencies. (The Verge)+ Generative AI is learning to spy for the US military. (MIT Technology Review) 6 There’s no guarantee your billion-dollar startup won’t failIn fact, one in five of them will. (Bloomberg $)+ Beware the rise of the AI coding startup. (Reuters) 7 Walmart’s drone deliveries are taking offIt’s expanding to 100 new US stories in the next year. (Wired $) 8 AI might be able to tell us how old the Dead Sea Scrolls really are Models suggest they’re even older than we previously thought. (The Economist $)+ How AI is helping historians better understand our past. (MIT Technology Review) 9 All-in-one super apps are a hit in the Gulf They’re following in China’s footsteps. (Rest of World) 10 Nintendo’s Switch 2 has revived the midnight launch eventFans queued for hours outside stores to get their hands on the new console. (Insider $)+ How the company managed to dodge Trump’s tariffs. (The Guardian) Quote of the day “Elon finally found a way to make Twitter fun again.” —Dan Pfeiffer, a host of the political podcast Pod Save America, jokes about Elon Musk and Donald Trump’s ongoing feud in a post on X. One more thing This rare earth metal shows us the future of our planet’s resources We’re in the middle of a potentially transformative moment. Metals discovered barely a century ago now underpin the technologies we’re relying on for cleaner energy, and not having enough of them could slow progress.  Take neodymium, one of the rare earth metals. It’s used in cryogenic coolers to reach ultra-low temperatures needed for devices like superconductors and in high-powered magnets that power everything from smartphones to wind turbines. And very soon, demand for it could outstrip supply. What happens then? And what

The Download: China’s AI agent boom, and GPS alternatives Leer entrada »

AI, Committee, Noticias, Uncategorized

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing

arXiv:2410.12872v2 Announce Type: replace Abstract: Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing (KT) due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have proposed new prompt formats, they struggle to represent the full interaction histories of example learners within a single prompt during in-context learning (ICL), resulting in limited scalability and high computational cost under token constraints. In this work, we present textit{LLM-based Option-weighted Knowledge Tracing (LOKT)}, a simple yet effective framework that encodes the interaction histories of example learners in context as textit{textual categorical option weights (TCOW)}. TCOW are semantic labels (e.g., “inadequate”) assigned to the options selected by learners when answering questions, enhancing the interpretability of LLMs. Experiments on multiple-choice datasets show that LOKT outperforms existing non-LLM and LLM-based KT models in both cold-start and warm-start settings. Moreover, LOKT enables scalable and cost-efficient inference, achieving strong performance even under strict token constraints. Our code is available at href{https://anonymous.4open.science/r/LOKT_model-3233}{https://anonymous.4open.science/r/LOKT_model-3233}.

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing Leer entrada »

AI, Committee, Noticias, Uncategorized

Inducing lexicons of in-group language with socio-temporal context

arXiv:2409.19257v3 Announce Type: replace Abstract: In-group language is an important signifier of group dynamics. This paper proposes a novel method for inducing lexicons of in-group language, which incorporates its socio-temporal context. Existing methods for lexicon induction do not capture the evolving nature of in-group language, nor the social structure of the community. Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction. We develop a test set for the task of lexicon induction and a new lexicon of manosphere language, validated by human experts, which quantifies the relevance of each term to a specific sub-community at a given point in time. Finally, we present novel insights on in-group language which illustrate the utility of this approach.

Inducing lexicons of in-group language with socio-temporal context Leer entrada »

AI, Committee, Noticias, Uncategorized

Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science

arXiv:2506.04410v1 Announce Type: cross Abstract: Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims. Matter-of-Fact includes 8.4k claims extracted from scientific articles spanning four high-impact contemporary materials science topics, including superconductors, semiconductors, batteries, and aerospace materials, while including qualitative and quantitative claims from theoretical, experimental, and code/simulation results. We show that strong baselines that include retrieval augmented generation over scientific literature and code generation fail to exceed 72% performance on this task (chance performance is 50%), while domain-expert verification suggests nearly all are solvable — highlighting both the difficulty of this task for current models, and the potential to accelerate scientific discovery by making near-term progress.

Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science Leer entrada »

AI, Committee, Noticias, Uncategorized

Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning

Reinforcement finetuning uses reward signals to guide the large language model toward desirable behavior. This method sharpens the model’s ability to produce logical and structured outputs by reinforcing correct responses. Yet, the challenge persists in ensuring that these models also know when not to respond—particularly when faced with incomplete or misleading questions that don’t have a definite answer. The problem arises when language models, after reinforcement finetuning, begin to lose their ability to refuse to answer unclear or ambiguous queries. Instead of signaling uncertainty, the models tend to produce confidently stated but incorrect responses. This phenomenon, identified in the paper as the “hallucination tax,” highlights a growing risk. As models are trained to perform better, they may also become more likely to hallucinate answers in situations where silence would be more appropriate. This is especially hazardous in domains that require high trust and precision. Tools currently used in training large language models often overlook the importance of refusal behavior. Reinforcement finetuning frameworks tend to reward only correct answers while penalizing incorrect ones, ignoring cases where a valid response should be no answer at all. The reward systems in use do not sufficiently reinforce refusal, resulting in overconfident models. For instance, the paper shows that refusal rates dropped to near zero across multiple models after standard RFT, demonstrating that current training fails to address hallucination properly. Researchers from the University of Southern California developed the Synthetic Unanswerable Math (SUM) dataset. SUM introduces implicitly unanswerable math problems by modifying existing questions through criteria such as missing key information or creating logical inconsistencies. The researchers used DeepScaleR as the base dataset and employed the o3-mini model to generate high-quality unanswerable questions. This synthetic dataset aims to teach models to recognize when a problem lacks sufficient information and respond accordingly. SUM’s core technique is to mix answerable and unanswerable problems during training. Questions are modified to become ambiguous or unsolvable while maintaining plausibility. The training prompts instruct models to say “I don’t know” for unanswerable inputs. By introducing only 10% of the SUM data into reinforcement finetuning, models begin to leverage inference-time reasoning to evaluate uncertainty. This structure allows them to refuse answers more appropriately without impairing their performance on solvable problems. Performance analysis shows significant improvements. After training with SUM, the Qwen2.5-7B model increased its refusal rate from 0.01 to 0.73 on the SUM benchmark and from 0.01 to 0.81 on the UMWP benchmark. On the SelfAware dataset, refusal accuracy rose dramatically from 0.01 to 0.94. Llama-3.1-8B-Instruct showed a similar trend, with refusal rates improving from 0.00 to 0.75 on SUM and from 0.01 to 0.79 on UMWP. Despite these gains in refusal behavior, accuracy on answerable datasets, such as GSM8K and MATH-500, remained stable, with most changes ranging from 0.00 to -0.05. The minimal drop indicates that refusal training can be introduced without major sacrifices in task performance. This study outlines a clear trade-off between improved reasoning and trustworthiness. Reinforcement finetuning, while powerful, tends to suppress cautious behavior. The SUM dataset corrects this by teaching models to recognize what they cannot solve. With only a small addition to training data, language models become better at identifying the boundaries of their knowledge. This approach marks a significant step in making AI systems not just smarter but also more careful and honest. Check out the Paper and Dataset on Hugging Face. All credit for this research goes to the researchers of this project. Did you know? Marktechpost is the fastest-growing AI media platform—trusted by over 1 million monthly readers. Book a strategy call to discuss your campaign goals. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning appeared first on MarkTechPost.

Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning Leer entrada »

AI, Committee, Noticias, Uncategorized

Alibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual Embedding and Ranking Standards

Text embedding and reranking are foundational to modern information retrieval systems, powering applications such as semantic search, recommendation systems, and retrieval-augmented generation (RAG). However, current approaches often face key challenges—particularly in achieving both high multilingual fidelity and task adaptability without relying on proprietary APIs. Existing models frequently fall short in scenarios requiring nuanced semantic understanding across multiple languages or domain-specific tasks like code retrieval and instruction following. Moreover, most open-source models either lack scale or flexibility, while commercial APIs remain costly and closed. Qwen3-Embedding and Qwen3-Reranker: A New Standard for Open-Source Embedding Alibaba’s Qwen Team has unveiled the Qwen3-Embedding and Qwen3-Reranker Series—models that set a new benchmark in multilingual text embedding and relevance ranking. Built on the Qwen3 foundation models, the series includes variants in 0.6B, 4B, and 8B parameter sizes and supports a wide range of languages (119 in total), making it one of the most versatile and performant open-source offerings to date. These models are now open-sourced under the Apache 2.0 license on Hugging Face, GitHub, and ModelScope, and are also accessible via Alibaba Cloud APIs. These models are optimized for use cases such as semantic retrieval, classification, RAG, sentiment analysis, and code search—providing a strong alternative to existing solutions like Gemini Embedding and OpenAI’s embedding APIs. Technical Architecture Qwen3-Embedding models adopt a dense transformer-based architecture with causal attention, producing embeddings by extracting the hidden state corresponding to the [EOS] token. Instruction-awareness is a key feature: input queries are formatted as {instruction} {query}<|endoftext|>, enabling task-conditioned embeddings. The reranker models are trained with a binary classification format, judging document-query relevance in an instruction-guided manner using a token likelihood-based scoring function. The models are trained using a robust multi-stage training pipeline: Large-scale weak supervision: 150M synthetic training pairs generated using Qwen3-32B, covering retrieval, classification, STS, and bitext mining across languages and tasks. Supervised fine-tuning: 12M high-quality data pairs are selected using cosine similarity (>0.7), fine-tuning performance in downstream applications. Model merging: Spherical linear interpolation (SLERP) of multiple fine-tuned checkpoints ensures robustness and generalization. This synthetic data generation pipeline enables control over data quality, language diversity, task difficulty, and more—resulting in a high degree of coverage and relevance in low-resource settings. Performance Benchmarks and Insights The Qwen3-Embedding and Qwen3-Reranker series demonstrate strong empirical performance across several multilingual benchmarks. On MMTEB (216 tasks across 250+ languages), Qwen3-Embedding-8B achieves a mean task score of 70.58, surpassing Gemini and GTE-Qwen2 series. On MTEB (English v2): Qwen3-Embedding-8B reaches 75.22, outperforming other open models including NV-Embed-v2 and GritLM-7B. On MTEB-Code: Qwen3-Embedding-8B leads with 80.68, excelling in applications like code retrieval and Stack Overflow QA. For reranking: Qwen3-Reranker-0.6B already outperforms Jina and BGE rerankers. Qwen3-Reranker-8B achieves 81.22 on MTEB-Code and 72.94 on MMTEB-R, marking state-of-the-art performance. Ablation studies confirm the necessity of each training stage. Removing synthetic pretraining or model merging led to significant performance drops (up to 6 points on MMTEB), emphasizing their contributions. Conclusion Alibaba’s Qwen3-Embedding and Qwen3-Reranker Series present a robust, open, and scalable solution to multilingual and instruction-aware semantic representation. With strong empirical results across MTEB, MMTEB, and MTEB-Code, these models bridge the gap between proprietary APIs and open-source accessibility. Their thoughtful training design—leveraging high-quality synthetic data, instruction-tuning, and model merging—positions them as ideal candidates for enterprise applications in search, retrieval, and RAG pipelines. By open-sourcing these models, the Qwen team not only pushes the boundaries of language understanding but also empowers the broader community to innovate on top of a solid foundation. Check out the Paper, Technical details, Qwen3-Embedding and Qwen3-Reranker. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Alibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual Embedding and Ranking Standards appeared first on MarkTechPost.

Alibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual Embedding and Ranking Standards Leer entrada »

es_ES