YouZum

Committee

AI, Committee, Notizie, Uncategorized

Puzzle Corner Archives

July/August 25Guest edited by Edward Faulkner ’03 May/June 25Guest edited by Frank Rubin ’62 March/April 25Guest edited by Michael S. Branicky ’03 January/February 25Guest edited by Dan Katz ’03 November/December 24Guest edited by Edward Faulkner ’03 September/October 24Guest edited by Mark Douma ’63 and Frank Rubin ’62 July/August 24Puzzle Corner Editor Emeritus Allan Gottlieb ’67 signs off September/October 23Edited by Allan Gottlieb ’67 May/June 23Edited by Allan Gottlieb ’67 January/February 23Edited by Allan Gottlieb ’67 Allan Gottlieb ’67 launched Puzzle Corner in 1966 and edited the column for 58 years. It first appeared in the MIT student-run magazine Tech Engineering News (now defunct) and began running in Technology Review in the summer of 1966. Back issues from 1966 through 2022 are available at cs.nyu.edu/~gottlieb/tr.

Puzzle Corner Archives Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models

arXiv:2506.17781v1 Announce Type: cross Abstract: Dense embeddings are fundamental to modern machine learning systems, powering Retrieval-Augmented Generation (RAG), information retrieval, and representation learning. While instruction-conditioning has become the dominant approach for embedding specialization, its direct application to low-capacity models imposes fundamental representational constraints that limit the performance gains derived from specialization. In this paper, we analyze these limitations and introduce the Mixture of Task Experts (MoTE) transformer block, which leverages task-specialized parameters trained with Task-Aware Contrastive Learning (tacl) to enhance the model ability to generate specialized embeddings. Empirical results show that MoTE achieves $64%$ higher performance gains in retrieval datasets ($+3.27 rightarrow +5.21$) and $43%$ higher performance gains across all datasets ($+1.81 rightarrow +2.60$). Critically, these gains are achieved without altering instructions, training data, inference time, or number of active parameters.

Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

LLMs for Customized Marketing Content Generation and Evaluation at Scale

arXiv:2506.17863v1 Announce Type: new Abstract: Offsite marketing is essential in e-commerce, enabling businesses to reach customers through external platforms and drive traffic to retail websites. However, most current offsite marketing content is overly generic, template-based, and poorly aligned with landing pages, limiting its effectiveness. To address these limitations, we propose MarketingFM, a retrieval-augmented system that integrates multiple data sources to generate keyword-specific ad copy with minimal human intervention. We validate MarketingFM via offline human and automated evaluations and large-scale online A/B tests. In one experiment, keyword-focused ad copy outperformed templates, achieving up to 9% higher CTR, 12% more impressions, and 0.38% lower CPC, demonstrating gains in ad ranking and cost efficiency. Despite these gains, human review of generated ads remains costly. To address this, we propose AutoEval-Main, an automated evaluation system that combines rule-based metrics with LLM-as-a-Judge techniques to ensure alignment with marketing principles. In experiments with large-scale human annotations, AutoEval-Main achieved 89.57% agreement with human reviewers. Building on this, we propose AutoEval-Update, a cost-efficient LLM-human collaborative framework to dynamically refine evaluation prompts and adapt to shifting criteria with minimal human input. By selectively sampling representative ads for human review and using a critic LLM to generate alignment reports, AutoEval-Update improves evaluation consistency while reducing manual effort. Experiments show the critic LLM suggests meaningful refinements, improving LLM-human agreement. Nonetheless, human oversight remains essential for setting thresholds and validating refinements before deployment.

LLMs for Customized Marketing Content Generation and Evaluation at Scale Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Reinforcement Learning Teachers of Test Time Scaling

arXiv:2506.08388v2 Announce Type: replace-cross Abstract: Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL’s exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply “connect-the-dots” with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem’s solution. In practice, the raw outputs of a 7B RLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework.

Reinforcement Learning Teachers of Test Time Scaling Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Handling Numeric Expressions in Automatic Speech Recognition

arXiv:2408.00004v2 Announce Type: replace-cross Abstract: This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expressions such as years, timestamps, currency amounts, and quantities. For the end-to-end approach, we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test data set show that while approaches based on LLMs perform well in recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.

Handling Numeric Expressions in Automatic Speech Recognition Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Large Language Models for Disease Diagnosis: A Scoping Review

arXiv:2409.00097v3 Announce Type: replace Abstract: Automatic disease diagnosis has become increasingly valuable in clinical practice. The advent of large language models (LLMs) has catalyzed a paradigm shift in artificial intelligence, with growing evidence supporting the efficacy of LLMs in diagnostic tasks. Despite the increasing attention in this field, a holistic view is still lacking. Many critical aspects remain unclear, such as the diseases and clinical data to which LLMs have been applied, the LLM techniques employed, and the evaluation methods used. In this article, we perform a comprehensive review of LLM-based methods for disease diagnosis. Our review examines the existing literature across various dimensions, including disease types and associated clinical specialties, clinical data, LLM techniques, and evaluation methods. Additionally, we offer recommendations for applying and evaluating LLMs for diagnostic tasks. Furthermore, we assess the limitations of current research and discuss future directions. To our knowledge, this is the first comprehensive review for LLM-based disease diagnosis.

Large Language Models for Disease Diagnosis: A Scoping Review Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Can structural correspondences ground real world representational content in Large Language Models?

arXiv:2506.16370v1 Announce Type: new Abstract: Large Language Models (LLMs) such as GPT-4 produce compelling responses to a wide range of prompts. But their representational capacities are uncertain. Many LLMs have no direct contact with extra-linguistic reality: their inputs, outputs and training data consist solely of text, raising the questions (1) can LLMs represent anything and (2) if so, what? In this paper, I explore what it would take to answer these questions according to a structural-correspondence based account of representation, and make an initial survey of this evidence. I argue that the mere existence of structural correspondences between LLMs and worldly entities is insufficient to ground representation of those entities. However, if these structural correspondences play an appropriate role – they are exploited in a way that explains successful task performance – then they could ground real world contents. This requires overcoming a challenge: the text-boundedness of LLMs appears, on the face of it, to prevent them engaging in the right sorts of tasks.

Can structural correspondences ground real world representational content in Large Language Models? Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Techniques for supercharging academic writing with generative AI

arXiv:2310.17143v4 Announce Type: replace-cross Abstract: Academic writing is an indispensable yet laborious part of the research enterprise. This Perspective maps out principles and methods for using generative artificial intelligence (AI), specifically large language models (LLMs), to elevate the quality and efficiency of academic writing. We introduce a human-AI collaborative framework that delineates the rationale (why), process (how), and nature (what) of AI engagement in writing. The framework pinpoints both short-term and long-term reasons for engagement and their underlying mechanisms (e.g., cognitive offloading and imaginative stimulation). It reveals the role of AI throughout the writing process, conceptualized through a two-stage model for human-AI collaborative writing, and the nature of AI assistance in writing, represented through a model of writing-assistance types and levels. Building on this framework, we describe effective prompting techniques for incorporating AI into the writing routine (outlining, drafting, and editing) as well as strategies for maintaining rigorous scholarship, adhering to varied journal policies, and avoiding overreliance on AI. Ultimately, the prudent integration of AI into academic writing can ease the communication burden, empower authors, accelerate discovery, and promote diversity in science.

Techniques for supercharging academic writing with generative AI Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems

arXiv:2506.16381v1 Announce Type: new Abstract: In modern speech synthesis, paralinguistic information–such as a speaker’s vocal timbre, emotional state, and dynamic prosody–plays a critical role in conveying nuance beyond mere semantics. Traditional Text-to-Speech (TTS) systems rely on fixed style labels or inserting a speech prompt to control these cues, which severely limits flexibility. Recent attempts seek to employ natural-language instructions to modulate paralinguistic features, substantially improving the generalization of instruction-driven TTS models. Although many TTS systems now support customized synthesis via textual description, their actual ability to interpret and execute complex instructions remains largely unexplored. In addition, there is still a shortage of high-quality benchmarks and automated evaluation metrics specifically designed for instruction-based TTS, which hinders accurate assessment and iterative optimization of these models. To address these limitations, we introduce InstructTTSEval, a benchmark for measuring the capability of complex natural-language style control. We introduce three tasks, namely Acoustic-Parameter Specification, Descriptive-Style Directive, and Role-Play, including English and Chinese subsets, each with 1k test cases (6k in total) paired with reference audio. We leverage Gemini as an automatic judge to assess their instruction-following abilities. Our evaluation of accessible instruction-following TTS systems highlights substantial room for further improvement. We anticipate that InstructTTSEval will drive progress toward more powerful, flexible, and accurate instruction-following TTS.

InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View

arXiv:2506.16633v1 Announce Type: new Abstract: Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there are various tasks for evaluating multimodal reasoning ability, they still have limitations. Lack of reasoning on hierarchical visual clues at different levels of granularity, e.g., local details and global context, is of little discussion, despite its frequent involvement in real scenarios. To bridge the gap, we introduce a novel and challenging task for multimodal reasoning, namely GeoGuess. Given a street view image, the task is to identify its location and provide a detailed explanation. A system that succeeds in GeoGuess should be able to detect tiny visual clues, perceive the broader landscape, and associate with vast geographic knowledge. Therefore, GeoGuess would require the ability to reason between hierarchical visual information and geographic knowledge. In this work, we establish a benchmark for GeoGuess by introducing a specially curated dataset GeoExplain which consists of panoramas-geocoordinates-explanation tuples. Additionally, we present a multimodal and multilevel reasoning method, namely SightSense which can make prediction and generate comprehensive explanation based on hierarchy of visual information and external knowledge. Our analysis and experiments demonstrate their outstanding performance in GeoGuess.

GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View Leggi l'articolo »

it_IT