Committee Archives - Página 30 de 100

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs

admin NU / septiembre 4, 2025

arXiv:2406.13748v3 Announce Type: replace Abstract: This paper investigates the propagation of harmful information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. We demonstrate that fake information, regardless of the language it is in, once introduced into these models through training data, can spread across different languages, compromising the integrity and reliability of the generated content. Our findings reveal that standard unlearning techniques, which typically focus on English data, are insufficient in mitigating the spread of harmful content in multilingual contexts and could inadvertently reinforce harmful content across languages. We show that only by addressing harmful responses in both English and the original language of the harmful data can we effectively eliminate generations for all languages. This underscores the critical need for comprehensive unlearning strategies that consider the multilingual nature of modern LLMs to enhance their safety and reliability across diverse linguistic landscapes.

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs Leer entrada »

AI, Committee, Noticias, Uncategorized

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction

admin NU / septiembre 4, 2025

arXiv:2509.03020v1 Announce Type: new Abstract: Large language models (LLMs) have increasingly been explored as powerful text embedders. Existing LLM-based text embedding approaches often leverage the embedding of the final token, typically a reserved special token such as [EOS]. However, these tokens have not been intentionally trained to capture the semantics of the whole context, limiting their capacity as text embeddings, especially for retrieval and re-ranking tasks. We propose to add a new training stage before contrastive learning to enrich the semantics of the final token embedding. This stage employs bidirectional generative reconstruction tasks, namely EBQ2D (Embedding-Based Query-to-Document) and EBD2Q (Embedding-Based Document-to-Query), which interleave to anchor the [EOS] embedding and reconstruct either side of Query-Document pairs. Experimental results demonstrate that our additional training stage significantly improves LLM performance on the Massive Text Embedding Benchmark (MTEB), achieving new state-of-the-art results across different LLM base models and scales.

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction Leer entrada »

AI, Committee, Noticias, Uncategorized

Explaining Length Bias in LLM-Based Preference Evaluations

admin NU / septiembre 3, 2025

arXiv:2407.01085v4 Announce Type: replace-cross Abstract: The use of large language models (LLMs) as judges, particularly in preference comparisons, has become widespread, but this reveals a notable bias towards longer responses, undermining the reliability of such evaluations. To better understand such bias, we propose to decompose the preference evaluation metric, specifically the win rate, into two key components: desirability and information mass, where the former is length-independent and related to trustworthiness such as correctness, toxicity, and consistency, and the latter is length-dependent and represents the amount of information in the response. We empirically demonstrated the decomposition through controlled experiments and found that response length impacts evaluations by influencing information mass. To derive a reliable evaluation metric that assesses content quality without being confounded by response length, we propose AdapAlpaca, a simple yet effective adjustment to win rate measurement. Specifically, AdapAlpaca ensures a fair comparison of response quality by aligning the lengths of reference and test model responses under equivalent length intervals.

Explaining Length Bias in LLM-Based Preference Evaluations Leer entrada »

AI, Committee, Noticias, Uncategorized

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

admin NU / septiembre 3, 2025

arXiv:2507.14201v2 Announce Type: replace-cross Abstract: We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent x on the task of Cyber Threat Investigation through security questions derived from investigation graphs. Real-world security analysts must sift through a large number of heterogeneous alert signals and security logs, follow multi-hop chains of evidence, and compile an incident report. With the developments of LLMs, building LLM-based agents for automatic thread investigation is a promising direction. To assist the development and evaluation of LLM agents, we construct a dataset from a controlled Azure tenant that covers 8 simulated real-world multi-step attacks, 57 log tables from Microsoft Sentinel and related services, and 589 automatically generated questions. We leverage security logs extracted with expert-crafted detection logic to build threat investigation graphs, and then generate questions with LLMs using paired nodes on the graph, taking the start node as background context and the end node as answer. Anchoring each question to these explicit nodes and edges not only provides automatic, explainable ground truth answers but also makes the pipeline reusable and readily extensible to new logs. This also enables the automatic generation of procedural tasks with verifiable rewards, which can be naturally extended to training agents via reinforcement learning. Our comprehensive experiments with different models confirm the difficulty of the task: with the base setting, the average reward across all evaluated models is 0.249, and the best achieved is 0.368, leaving substantial headroom for future research. Code and data are coming soon!

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation Leer entrada »

AI, Committee, Noticias, Uncategorized

Annotation and modeling of emotions in a textual corpus: an evaluative approach

admin NU / septiembre 3, 2025

arXiv:2509.01260v1 Announce Type: new Abstract: Emotion is a crucial phenomenon in the functioning of human beings in society. However, it remains a widely open subject, particularly in its textual manifestations. This paper examines an industrial corpus manually annotated following an evaluative approach to emotion. This theoretical framework, which is currently underutilized, offers a different perspective that complements traditional approaches. Noting that the annotations we collected exhibit significant disagreement, we hypothesized that they nonetheless follow stable statistical trends. Using language models trained on these annotations, we demonstrate that it is possible to model the labeling process and that variability is driven by underlying linguistic features. Conversely, our results indicate that language models seem capable of distinguishing emotional situations based on evaluative criteria.

Annotation and modeling of emotions in a textual corpus: an evaluative approach Leer entrada »

AI, Committee, Noticias, Uncategorized

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

admin NU / septiembre 3, 2025

arXiv:2509.01455v1 Announce Type: new Abstract: Deployed language models must decide not only what to answer but also when not to answer. We present UniCR, a unified framework that turns heterogeneous uncertainty evidence including sequence likelihoods, self-consistency dispersion, retrieval compatibility, and tool or verifier feedback into a calibrated probability of correctness and then enforces a user-specified error budget via principled refusal. UniCR learns a lightweight calibration head with temperature scaling and proper scoring, supports API-only models through black-box features, and offers distribution-free guarantees using conformal risk control. For long-form generation, we align confidence with semantic fidelity by supervising on atomic factuality scores derived from retrieved evidence, reducing confident hallucinations while preserving coverage. Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics, lower area under the risk-coverage curve, and higher coverage at fixed risk compared to entropy or logit thresholds, post-hoc calibrators, and end-to-end selective baselines. Analyses reveal that evidence contradiction, semantic dispersion, and tool inconsistency are the dominant drivers of abstention, yielding informative user-facing refusal messages. The result is a portable recipe of evidence fusion to calibrated probability to risk-controlled decision that improves trustworthiness without fine-tuning the base model and remains valid under distribution shift.

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal Leer entrada »

AI, Committee, Noticias, Uncategorized

Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models

admin NU / septiembre 3, 2025

Introduction Tencent’s Hunyuan team has released Hunyuan-MT-7B (a translation model) and Hunyuan-MT-Chimera-7B (an ensemble model). Both models are designed specifically for multilingual machine translation and were introduced in conjunction with Tencent’s participation in the WMT2025 General Machine Translation shared task, where Hunyuan-MT-7B ranked first in 30 out of 31 language pairs. https://github.com/Tencent-Hunyuan/Hunyuan-MT/blob/main/Hunyuan_MT_Technical_Report.pdf Model Overview Hunyuan-MT-7B A 7B parameter translation model. Supports mutual translation across 33 languages, including Chinese ethnic minority languages such as Tibetan, Mongolian, Uyghur, and Kazakh. Optimized for both high-resource and low-resource translation tasks, achieving state-of-the-art results among models of comparable size. Hunyuan-MT-Chimera-7B An integrated weak-to-strong fusion model. Combines multiple translation outputs at inference time and produces a refined translation using reinforcement learning and aggregation techniques. Represents the first open-source translation model of this type, improving translation quality beyond single-system outputs. https://github.com/Tencent-Hunyuan/Hunyuan-MT/blob/main/Hunyuan_MT_Technical_Report.pdf Training Framework The models were trained using a five-stage framework designed for translation tasks: General Pre-training 1.3 trillion tokens covering 112 languages and dialects. Multilingual corpora assessed for knowledge value, authenticity, and writing style. Diversity maintained through disciplinary, industry, and thematic tagging systems. MT-Oriented Pre-training Monolingual corpora from mC4 and OSCAR, filtered using fastText (language ID), minLSH (deduplication), and KenLM (perplexity filtering). Parallel corpora from OPUS and ParaCrawl, filtered with CometKiwi. Replay of general pre-training data (20%) to avoid catastrophic forgetting. Supervised Fine-Tuning (SFT) Stage I: ~3M parallel pairs (Flores-200, WMT test sets, curated Mandarin–minority data, synthetic pairs, instruction-tuning data). Stage II: ~268k high-quality pairs selected through automated scoring (CometKiwi, GEMBA) and manual verification. Reinforcement Learning (RL) Algorithm: GRPO. Reward functions: XCOMET-XXL and DeepSeek-V3-0324 scoring for quality. Terminology-aware rewards (TAT-R1). Repetition penalties to avoid degenerate outputs. Weak-to-Strong RL Multiple candidate outputs generated and aggregated through reward-based output Applied in Hunyuan-MT-Chimera-7B, improving translation robustness and reducing repetitive errors. Benchmark Results Automatic Evaluation WMT24pp (English⇔XX): Hunyuan-MT-7B achieved 0.8585 (XCOMET-XXL), surpassing larger models like Gemini-2.5-Pro (0.8250) and Claude-Sonnet-4 (0.8120). FLORES-200 (33 languages, 1056 pairs): Hunyuan-MT-7B scored 0.8758 (XCOMET-XXL), outperforming open-source baselines including Qwen3-32B (0.7933). Mandarin⇔Minority Languages: Scored 0.6082 (XCOMET-XXL), higher than Gemini-2.5-Pro (0.5811), showing significant improvements in low-resource settings. Comparative Results Outperforms Google Translator by 15–65% across evaluation categories. Outperforms specialized translation models such as Tower-Plus-9B and Seed-X-PPO-7B despite having fewer parameters. Chimera-7B adds ~2.3% improvement on FLORES-200, particularly in Chinese⇔Other and non-English⇔non-Chinese translations. Human Evaluation A custom evaluation set (covering social, medical, legal, and internet domains) compared Hunyuan-MT-7B with state-of-the-art models: Hunyuan-MT-7B: Avg. 3.189 Gemini-2.5-Pro: Avg. 3.223 DeepSeek-V3: Avg. 3.219 Google Translate: Avg. 2.344 This shows that Hunyuan-MT-7B, despite being smaller at 7B parameters, approaches the quality of much larger proprietary models. Case Studies The report highlights several real-world cases: Cultural References: Correctly translates “小红薯” as the platform “REDnote,” unlike Google Translate’s “sweet potatoes.” Idioms: Interprets “You are killing me” as “你真要把我笑死了” (expressing amusement), avoiding literal misinterpretation. Medical Terms: Translates “uric acid kidney stones” precisely, while baselines generate malformed outputs. Minority Languages: For Kazakh and Tibetan, Hunyuan-MT-7B produces coherent translations, where baselines fail or output nonsensical text. Chimera Enhancements: Adds improvements in gaming jargon, intensifiers, and sports terminology. Conclusion Tencent’s release of Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B establishes a new standard for open-source translation. By combining a carefully designed training framework with specialized focus on low-resource and minority language translation, the models achieve quality on par with or exceeding larger closed-source systems. The launch of these 2 models provides the AI research community with accessible, high-performance tools for multilingual translation research and deployment. Check out the Paper, GitHub Page, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models appeared first on MarkTechPost.

Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Here’s how we picked this year’s Innovators Under 35

admin NU / septiembre 2, 2025

Next week, we’ll publish our 2025 list of Innovators Under 35, highlighting smart and talented people who are working in many areas of emerging technology. This new class features 35 accomplished founders, hardware engineers, roboticists, materials scientists, and others who are already tackling tough problems and making big moves in their careers. All are under the age of 35. One is developing a technology to reduce emissions from shipping, while two others are improving fertility treatments and creating new forms of contraception. Another is making it harder for people to maliciously share intimate images online. And quite a few are applying artificial intelligence to their respective fields in novel ways. We’ll also soon reveal our 2025 Innovator of the Year, whose technical prowess is helping physicians diagnose and treat critically ill patients more quickly. What’s more (here’s your final hint), our winner even set a world record as a result of this work. MIT Technology Review first published a list of Innovators Under 35 in 1999. It’s a grand tradition for us, and we often follow the work of various featured innovators for years, even decades, after they appear on the list. So before the big announcement, I want to take a moment to explain how we select the people we recognize each year. Step 1: Call for nominations Our process begins with a call for nominations, which typically goes out in the final months of the previous year and is open to anyone, anywhere in the world. We encourage people to nominate themselves, which takes just a few minutes. This method helps us discover people doing important work that we might not otherwise encounter. This year we had 420 nominations. Two-thirds of our candidates were put forward by someone else and one-third nominated themselves. We received nominations for people located in about 40 countries. Nearly 70% were based in the United States, with the UK, Switzerland, China, and the United Arab Emirates, respectively, having the next-highest concentrations. After nominations close, a few editors then spend several weeks reviewing the nominees and selecting semifinalists. During this phase, we look for people who have developed practical solutions to societal issues or made important scientific advances that could translate into new technologies. Their work should have the potential for broad impact—it can’t be niche or incremental. And what’s unique about their approach must be clear. Step 2: Semifinalist applications This year, we winnowed our initial list of hundreds of nominees to 108 semifinalists. Then we asked those entrants for more information to help us get to know them better and evaluate their work. We request three letters of reference and a résumé from each semifinalist, and we ask all of them to answer a few short questions about their work. We also give them the option to share a video or pass along relevant journal articles or other links to help us learn more about what they do. Step 3: Expert judges weigh in Next, we bring in dozens of experts to vet the semifinalists. This year, 38 judges evaluated and scored the applications. We match the contenders with judges who work in similar fields whenever possible. At least two judges review each entrant, though most are seen by three. All these judges volunteer their time, and some return to help year after year. A few of our longtime judges include materials scientists Yet-Ming Chiang (MIT) and Julia Greer (Caltech), MIT neuroscientist Ed Boyden, and computer scientist Ben Zhao of the University of Chicago. John Rogers, a materials scientist and biomedical engineer at Northwestern University, has been a judge for more than a decade (and was featured on our very first Innovators list, in 1999). Here’s what he had to say about why he stays involved: “This award is compelling because it recognizes young people with scientific achievements that are not only of fundamental interest but also of practical significance, at the highest levels.” Step 4: Editors make the final calls In a final layer of vetting, editors who specialize in covering biotechnology, climate and energy, and artificial intelligence review the semifinalists whom judges scored highly in their respective areas. Staff editors and reporters can also nominate people they’ve come across in their coverage, and we add them to the mix for consideration. Last, a small team of senior editors reviews all the semifinalists and the judges’ scores, as well as our own staff’s recommendations, and selects 35 honorees. We aim for a good combination of people from a variety of disciplines working in different regions of the world. And we take a staff vote to pick an Innovator of the Year—someone whose work we particularly admire. In the end, it’s impossible to include every deserving individual on our list. But by incorporating both external nominations and outside expertise from our judges, we aim to make the evaluation process as rigorous and open as possible. So who made the cut this year? Come back on September 8 to find out.

Here’s how we picked this year’s Innovators Under 35 Leer entrada »

AI, Committee, Noticias, Uncategorized

NVIDIA AI Team Introduces Jetson Thor: The Ultimate Platform for Physical AI and Next-Gen Robotics

admin NU / septiembre 2, 2025

Last week, the NVIDIA robotics team released Jetson Thor that includes Jetson AGX Thor Developer Kit and the Jetson T5000 module, marking a significant milestone for real‑world AI robotics development. Engineered as a supercomputer for physical AI, Jetson Thor brings generative reasoning and multimodal sensor processing to power inference and decision-making at the edge. Architectural Highlights Compute Performance Jetson Thor delivers up to 2,070 FP4 teraflops (TFLOPS) of AI compute via its Blackwell‑based GPU—a leap of 7.5× over the previous Jetson Orin platform. This performance arrives in a 130‑watt power envelope, with configurable operation down to 40 W, balancing high throughput with energy efficiency—approximately 3.5× better than Orin. Compute Architecture At its core, Jetson Thor integrates a 2560‑core Blackwell GPU equipped with 96 fifth‑generation Tensor Cores and supports Multi‑Instance GPU (MIG), enabling flexible partitioning of GPU resources for parallel workloads. Complementing this is a 14‑core Arm® Neoverse‑V3AE CPU, with 1 MB L2 per core and 16 MB shared L3 cache. Memory and I/O The platform includes 128 GB LPDDR5X memory on a 256‑bit bus at 273 GB/s bandwidth. Storage features include a 1 TB NVMe M.2 slot, along with HDMI, DisplayPort, multiple USB, Gigabit Ethernet, CAN headers, and QSFP28 for up to four 25 GbE lanes—crucial for real-time sensor fusion. https://developer.nvidia.com/blog/introducing-nvidia-jetson-thor-the-ultimate-platform-for-physical-ai/ Software Ecosystem for Physical AI Jetson Thor supports a comprehensive NVIDIA software stack tailored for robotics and physical AI: Isaac (GR00T) for generative reasoning and humanoid control. Metropolis for vision AI. Holoscan for real-time, low-latency sensor processing and sensor-over-Ethernet (Holoscan Sensor Bridge). These components allow one system-on-module to execute multimodal AI workflows—vision, language, actuation—without offloading or combining multiple chips. https://developer.nvidia.com/blog/introducing-nvidia-jetson-thor-the-ultimate-platform-for-physical-ai/ Defining ‘Physical AI’ and Its Significance Generative Reasoning & Multimodal Processing Physical AI combines perception, reasoning, and action planning. Jetson Thor enables robots to “simulate possible sequences, anticipate consequences, and generate both high-level plans and low-level motion policies,” delivering adaptability akin to human reasoning. By supporting real-time inference over language and visual inputs, it transforms robots from simple automata into generalist agents. Applications Robots can better navigate unpredictable environments, manipulate objects, or follow complex instructions without reteaching. Use cases span manufacturing, logistics, healthcare, agriculture, and more. Developer Access and Pricing Jetson AGX Thor Developer Kit: priced at $3,499, now generally available. Jetson T5000 production modules: available through NVIDIA’s partners, with unit pricing around $2,999 for orders of 1,000. Pre-orders suggest wider availability soon, catering to both research and commercial robotics ecosystems. Conclusion NVIDIA Jetson Thor represents a pivotal shift in robotics compute—embedding server-grade, multimodal inference, and reasoning capabilities within a single, power-bounded module. Its combination of 2,070 FP4 TFLOPS, high-efficiency design, expansive I/O, and robust software stack positions it as a foundational platform for the next generation of physical AI systems. With early adoption among prominent robotics developers and ready availability, Jetson Thor brings the vision of adaptable, real-world AI agents closer to reality. Check out the FULL TECHNICAL DETAILS. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post NVIDIA AI Team Introduces Jetson Thor: The Ultimate Platform for Physical AI and Next-Gen Robotics appeared first on MarkTechPost.

NVIDIA AI Team Introduces Jetson Thor: The Ultimate Platform for Physical AI and Next-Gen Robotics Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: AI doppelgängers in the workplace, and using lidar to measure climate disasters

admin NU / septiembre 2, 2025

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Can an AI doppelgänger help me do my job? —James O’Donnell Digital clones—AI models that replicate a specific person—package together a few technologies that have been around for a while now: hyperrealistic video models to match your appearance, lifelike voices based on just a couple of minutes of speech recordings, and conversational chatbots increasingly capable of holding our attention. But they’re also offering something the ChatGPTs of the world cannot: an AI that’s not smart in the general sense, but that ‘thinks’ like you do. Could well-crafted clones serve as our stand-ins? I certainly feel stretched thin at work sometimes, wishing I could be in two places at once, and I bet you do too. To find out, I tried making a clone of myself. Read the full story to find out how it got on. This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. How lidar measures the cost of climate disasters The wildfires that swept through Los Angeles County this January left an indelible mark on the Southern California landscape. The Eaton and Palisades fires raged for 24 days, killing 29 people and destroying 16,000 structures, with losses estimated at $60 billion. More than 55,000 acres were consumed, and the landscape itself was physically transformed. Now, researchers are using lidar (light detection and ranging) technology to precisely measure these changes in the landscape’s geometry—helping them understand and track the cascading effects of climate disasters. Read the full story.—Jon Keegan This story is from our new print edition, which is all about the future of security. Subscribe here to catch future copies when they land. Here’s how we picked this year’s Innovators Under 35 Next Monday we’ll publish our 2025 list of Innovators Under 35. The list highlights smart and talented people working across many areas of emerging technology. This new class features 35 accomplished founders, hardware engineers, roboticists, materials scientists, and others who are already tackling tough problems and making big moves in their careers. MIT Technology Review first published a list of Innovators Under 35 in 1999. It’s a grand tradition for us, and we often follow the work of various featured innovators for years, even decades, after they appear on the list. So before the big announcement, we’d like to take a moment to explain how we select the people we recognize each year. Read the full story. —Amy Nordrum The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Meta created flirty chatbots of celebrities without their permissionTo make matters worse, the bots generated risqué pictures on demand. (Reuters)+ Meta’s relationship with Scale AI appears to be under pressure. (TechCrunch)+ An AI companion site is hosting sexually charged conversations with underage celebrity bots. (MIT Technology Review) 2 The FTC has warned Big Tech not to comply with EU lawsIf they jeopardize the freedom of expression or safety of US citizens, at least. (Wired $) 3 Ukraine is using drones to drop supplies to its troops in trenchesThey’re delivering everything from cigarettes to roasted chicken. (WP $)+ Meet the radio-obsessed civilian shaping Ukraine’s drone defense. (MIT Technology Review) 4 What the collapse of this AI company says about the wider industryBuilder.ai was an early industry darling. Its downfall is a dire warning. (NYT $) 5 US shoppers are racing to land an EV bargainFederal tax credits on the vehicles expire at the end of the month. (WSJ $)+ The US could really use an affordable electric truck. (MIT Technology Review) 6 A major new project will use AI to research vaccinesThe Oxford Vaccine Group hopes the jabs will protect against deadly pathogens. (FT $)+ Why US federal health agencies are abandoning mRNA vaccines. (MIT Technology Review) 7 A lot of people stop taking weight-loss drugs within one yearHow should doctors encourage the ones who need to stay on them? (Undark)+ We’re learning more about what weight-loss drugs do to the body. (MIT Technology Review) 8 Chatbots can be manipulated into breaking their own rulesIt turns out they’re susceptible to both flattery and peer pressure. (The Verge)+ Forcing LLMs to be evil during training can make them nicer in the long run. (MIT Technology Review) 9 Tennis is trying to reach a new generation of fans Through…the metaverse? (The Information $) 10 The age of cheap online shopping is endingAnd consumers are the ones paying the price. (The Atlantic $)+ AI is starting to shake up the digital shopping experience, too. (FT $)+ Your most important customer may be AI. (MIT Technology Review) Quote of the day “Stop being a clanker!” —How Jay Pinkert, a marketing manager, scolds ChatGPT when it isn’t fulfilling his requests, he tells the New York Times. One more thing The algorithms around us A metronome ticks. A record spins. And as a feel-good pop track plays, a giant compactor slowly crushes a Jenga tower of material creations. Paint cans burst. Chess pieces topple. Camera lenses shatter. An alarm clock shrills and then goes silent. A guitar neck snaps. But wait! The jaunty tune starts up again, and the jaws open to reveal … an iPad. Watching Apple’s now-infamous “Crush!” ad, it’s hard not to feel uneasy about the ways in which digitization is remaking human life. Sure, we’re happy for computers to take over tasks we don’t want to do or aren’t particularly good at, like shopping or navigating. But what does it mean when the things we hold dear and thought were uniquely ours—our friendships, our art, even our language and creativity—can be reduced to software? Read the full story. —Ariel Bleicher We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Minnesota’s Llama-Alpaca Costume Contest

The Download: AI doppelgängers in the workplace, and using lidar to measure climate disasters Leer entrada »

Committee

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction

Explaining Length Bias in LLM-Based Preference Evaluations

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Annotation and modeling of emotions in a textual corpus: an evaluative approach

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models

Here’s how we picked this year’s Innovators Under 35

NVIDIA AI Team Introduces Jetson Thor: The Ultimate Platform for Physical AI and Next-Gen Robotics

The Download: AI doppelgängers in the workplace, and using lidar to measure climate disasters

Nuestros servicios

Inicio

Cómo funciona

Noticias

Precios

Soporte

Centro de ayuda

Reportar un problema

Dar comentarios

Política de privacidad

Cuenta de usuario

Síguenos