News Archives - YouZum

Microsoft AI Releases VibeVoice-Realtime: A Lightweight Real‑Time Text-to-Speech Model Supporting Streaming Text Input and Robust Long-Form Speech Generation

admin NU / 12月 7, 2025

Microsoft has released VibeVoice-Realtime-0.5B, a real time text to speech model that works with streaming text input and long form speech output, aimed at agent style applications and live data narration. The model can start producing audible speech in about 300 ms, which is critical when a language model is still generating the rest of its answer. Where VibeVoice Realtime Fits in the VibeVoice Stack? VibeVoice is a broader framework that focuses on next token diffusion over continuous speech tokens, with variants designed for long form multi speaker audio such as podcasts. The research team shows that the main VibeVoice models can synthesize up to 90 minutes of speech with up to 4 speakers in a 64k context window using continuous speech tokenizers at 7.5 Hz. The Realtime 0.5B variant is the low latency branch of this family. The model card reports an 8k context length and a typical generation length of about 10 minutes for a single speaker, which is enough for most voice agents, system narrators and live dashboards. A separate set of VibeVoice models, VibeVoice-1.5B and VibeVoice Large, handle long form multi speaker audio with 32k and 64k context windows and longer generation times. Interleaved Streaming Architecture The realtime variant uses an interleaved windowed design. Incoming text is split into chunks. The model incrementally encodes new text chunks while, in parallel, continuing diffusion based acoustic latent generation from prior context. This overlap between text encoding and acoustic decoding is what lets the system reach about 300 ms first audio latency on suitable hardware. Unlike the long form VibeVoice variants, which use both semantic and acoustic tokenizers, the realtime model removes the semantic tokenizer and uses only an acoustic tokenizer that operates at 7.5 Hz. The acoustic tokenizer is based on a σ VAE variant from LatentLM, with a mirror symmetric encoder decoder architecture that uses 7 stages of modified transformer blocks and performs 3200x downsampling from 24 kHz audio. On top of this tokenizer, a diffusion head predicts acoustic VAE features. The diffusion head has 4 layers and about 40M parameters and is conditioned on hidden states from Qwen2.5-0.5B. It uses a Denoising Diffusion Probabilistic Models process with Classifier Free Guidance and DPM Solver style samplers, following the next token diffusion approach of the full VibeVoice system. Training proceeds in two stages. First, the acoustic tokenizer is pre trained. Then the tokenizer is frozen and the team trains the LLM along with the diffusion head with curriculum learning on sequence length, increasing from about 4k to 8,192 tokens. This keeps the tokenizer stable, while the LLM and diffusion head learn to map from text tokens to acoustic tokens across long contexts. Quality on LibriSpeech and SEED The VibeVoice Realtime reports zero shot performance on LibriSpeech test clean. VibeVoice Realtime 0.5B reaches word error rate (WER) 2.00 percent and speaker similarity 0.695. For comparison, VALL-E 2 has WER 2.40 with similarity 0.643 and Voicebox has WER 1.90 with similarity 0.662 on the same benchmark. On the SEED test benchmark for short utterances, VibeVoice Realtime-0.5B reaches WER 2.05 percent and speaker similarity 0.633. SparkTTS gets a slightly lower WER 1.98 but lower similarity 0.584, while Seed TTS reaches WER 2.25 and the highest reported similarity 0.762. The research team noted that the realtime model is optimized for long form robustness, so short sentence metrics are informative but not the main target. From an engineering point of view, the interesting part is the tradeoff. By running the acoustic tokenizer at 7.5 Hz and using next token diffusion, the model reduces the number of steps per second of audio compared to higher frame rate tokenizers, while preserving competitive WER and speaker similarity. Integration Pattern for Agents And Applications The recommended setup is to run VibeVoice-Realtime-0.5B next to a conversational LLM. The LLM streams tokens during generation. These text chunks feed directly into the VibeVoice server, which synthesizes audio in parallel and streams it back to the client. For many systems this looks like a small microservice. The TTS process has a fixed 8k context and about 10 minutes of audio budget per request, which fits typical agent dialogs, support calls and monitoring dashboards. Because the model is speech only and does not generate background ambience or music, it is better suited for voice interfaces, assistant style products and programmatic narration rather than media production. Key Takeaways Low latency streaming TTS: VibeVoice-Realtime-0.5B is a real time text to speech model that supports streaming text input and can emit the first audio frames in about 300 ms, which makes it suitable for interactive agents and live narration where users cannot tolerate 1 to 3 second delays. LLM along with diffusion over continuous speech tokens: The model follows the VibeVoice design, it uses a Qwen2.5 0.5B language model to process text context and dialogue flow, then a diffusion head operates on continuous acoustic tokens from a low frame rate tokenizer to generate waveform level detail, which scales better to long sequences than classic spectrogram based TTS. Around 1B total parameters with acoustic stack: While the base LLM has 0.5B parameters, the acoustic decoder has about 340M parameters and the diffusion head about 40M parameters, so the full realtime stack is roughly 1B parameters, which is important for GPU memory planning and deployment sizing. Competitive quality on LibriSpeech and SEED: On LibriSpeech test clean, VibeVoice-Realtime-0.5B reaches word error rate 2.00 percent and speaker similarity 0.695, and on SEED test en it reaches 2.05 percent WER and 0.633 similarity, which places it in the same quality band as strong recent TTS systems while still being tuned for long form robustness. Check out the Model Card on HF. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Microsoft AI Releases VibeVoice-Realtime: A Lightweight Real‑Time Text-to-Speech Model Supporting Streaming Text Input and Robust Long-Form Speech Generation appeared first on MarkTechPost.

Microsoft AI Releases VibeVoice-Realtime: A Lightweight Real‑Time Text-to-Speech Model Supporting Streaming Text Input and Robust Long-Form Speech Generation 投稿を読む »

AI, Committee, ニュース, Uncategorized

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains

admin NU / 12月 6, 2025

Three years ago, ChatGPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged the system by its surface flaws rather than its underlying capabilities. Since then, pundits and influencers have declared that AI progress is slowing, that scaling has “hit the wall,” and that the entire field is just another tech bubble inflated by blusterous hype. In fact, many influencers have latched onto the dismissive phrase “AI slop” to diminish the amazing images, documents, videos and code that frontier AI models generate on command. This perspective is not just wrong, it is dangerous. It makes me wonder, where were all these “experts” on irrational technology bubbles when electric scooter startups were touted as a transportation revolution and cartoon NFTs were being auctioned for millions? They were probably too busy buying worthless land in the metaverse or adding to their positions in GameStop. But when it comes to the AI boom, which is easily the most significant technological and economic transformation agent of the last 25 years, journalists and influencers can’t write the word “slop” enough times. Doth we protest too much? After all, by any objective measure AI is wildly more capable than the vast majority of computer scientists predicted only five years ago and it is still improving at a surprising pace. The impressive leap demonstrated by Gemini 3 is only the latest example. At the same time, McKinsey recently reported that 20% of organizations already derive tangible value from genAI. Also, a recent survey by Deloitte indicates that 85% of organizations boosted their AI investment in 2025, and 91% plan to increase again in 2026. This doesn’t fit the “bubble” narrative and the dismissive “slop” language. As a computer scientist and research engineer who began working with neural networks back in 1989 and tracked progress through cold winters and hot booms ever since, I find myself amazed almost every day by the rapidly increasing capabilities of frontier AI models. When I talk with other professionals in the field, I hear similar sentiments. If anything, the rate of AI advancement leaves many experts feeling overwhelmed and frankly somewhat scared. The dangers of AI denial So why is the public buying into the narrative that AI is faltering, that the output is “slop,” and that the AI boom lacks authentic use cases? Personally, I believe it’s because we’ve fallen into a collective state of AI denial, latching onto the narratives we want to hear in the face of strong evidence to the contrary. Denial is the first stage of grief and thus a reasonable reaction to the very disturbing prospect that we humans may soon lose cognitive supremacy here on planet earth. In other words, the overblown AI bubble narrative is a societal defense mechanism. Believe me, I get it. I’ve been warning about the destabilizing risks and demoralizing impact of superintelligence for well over a decade, and I too feel AI is getting too smart too fast. The fact is, we are rapidly headed towards a future where widely available AI systems will be able to outperform most humans in most cognitive tasks, solving problems faster, more accurately and yes, more creatively than any individual can. I emphasize “creativity” because AI denialists often insist that certain human qualities (particularly creativity and emotional intelligence) will always be out of reach of AI systems. Unfortunately, there is little evidence supporting this perspective. On the creativity front, today’s AI models can generate content faster and with more variation than any individual human. Critics argue that true creativity requires inner motivation. I resonate with that argument but find it circular — we’re defining creativity based on how we experience it rather than the quality, originality or usefulness of the output. Also, we just don’t know if AI systems will develop internal drives or a sense of agency. Either way, if AI can produce original work that rivals most human professionals, the impact on creative jobs will still be quite devastating. The AI manipulation problem Our human edge around emotional intelligence is even more precarious. It’s likely that AI will soon be able to read our emotions faster and more accurately than any human, tracking subtle cues in our micro-expressions, vocal patterns, posture, gaze and even breathing. And as we integrate AI assistants into our phones, glasses and other wearable devices, these systems will monitor our emotional reactions throughout our day, building predictive models of our behaviors. Without strict regulation, which is increasingly unlikely, these predictive models could be used to target us with individually optimized influence that maximizes persuasion. This is called the AI manipulation problem and it suggests that emotional intelligence may not give humanity an advantage. In fact, it could be a significant weakness, fostering an asymmetric dynamic where AI systems can read us with superhuman accuracy, while we can’t read AI at all. When you talk with photorealistic AI agents (and you will) you’ll see a smiling façade designed to appear warm, empathic and trustworthy. It will look and feel human, but that’s just an illusion, and it could easily sway your perspectives. After all, our emotional reactions to faces are visceral reflexes shaped by millions of years of evolution on a planet where every interactive human face we encountered was actually human. Soon, that will no longer be true. We are rapidly heading toward a world where many of the faces we encounter will belong to AI agents hiding behind digital facades. In fact, these “virtual spokespeople” could easily have appearances that are designed for each of us based on our prior reactions – whatever gets us to best let down our guard. And yet many insist that AI is just another tech cycle. This is wishful thinking. The massive investment pouring into AI isn’t driven by

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: political chatbot persuasion, and gene editing adverts

admin NU / 12月 6, 2025

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. AI chatbots can sway voters better than political advertisements The news: Chatting with a politically biased AI model is more effective than political ads at nudging both Democrats and Republicans to support presidential candidates of the opposing party, new research shows. The catch: The chatbots swayed opinions by citing facts and evidence, but they were not always accurate—in fact, the researchers found, the most persuasive models said the most untrue things. The findings are the latest in an emerging body of research demonstrating the persuasive power of LLMs. They raise profound questions about how generative AI could reshape elections. Read the full story. —Michelle Kim The era of AI persuasion in elections is about to begin —Tal Feldman is a JD candidate at Yale Law School who focuses on technology and national security. Aneesh Pappu is a PhD student and Knight-Hennessy scholar at Stanford University who focuses on agentic AI and technology policy. The fear that elections could be overwhelmed by AI-generated realistic fake media has gone mainstream—and for good reason. But that’s only half the story. The deeper threat isn’t that AI can just imitate people—it’s that it can actively persuade people. And new research published this week shows just how powerful that persuasion can be. AI chatbots can shift voters’ views by a substantial margin, far more than traditional political advertising tends to do. In the coming years, we will see the rise of AI that can personalize arguments, test what works, and quietly reshape political views at scale. That shift—from imitation to active persuasion—should worry us deeply. Read the full story. The ads that sell the sizzle of genetic trait discrimination —Antonio Regalado, senior editor for biomedicine One day this fall, I watched an electronic sign outside the Broadway-Lafayette subway station in Manhattan switch seamlessly between an ad for makeup and one promoting the website Pickyourbaby.com, which promises a way for potential parents to use genetic tests to influence their baby’s traits, including eye color, hair color, and IQ. Inside the station, every surface was wrapped with more of its ads—babies on turnstiles, on staircases, on banners overhead. “Think about it. Makeup and then genetic optimization,” exulted Kian Sadeghi, the 26-year-old founder of Nucleus Genomics, the startup running the ads. The day after the campaign launched, Sadeghi and I had briefly sparred online. He’d been on X showing off a phone app where parents can click through traits like eye color and hair color. I snapped back that all this sounded a lot like Uber Eats—another crappy, frictionless future invented by entrepreneurs, but this time you’d click for a baby. That night, I agreed to meet Sadeghi in the station under a banner that read, “IQ is 50% genetic.” Read on to see how Antonio’s conversation with Sadeghi went. This story first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The metaverse’s future looks murkier than everOG believer Mark Zuckerberg is planning deep cuts to the division’s budget. (Bloomberg $)+ However some of that money will be diverted toward smart glasses and wearables. (NYT $)+ Meta just managed to poach one of Apple’s top design chiefs. (Bloomberg $) 2 Kids are effectively AI’s guinea pigsAnd regulators are slowly starting to take note of the risks. (The Economist $)+ You need to talk to your kid about AI. Here are 6 things you should say. (MIT Technology Review) 3 How a group of women changed UK law on non-consensual deepfakesIt’s a big victory, and they managed to secure it with stunning speed. (The Guardian)+ But bans on deepfakes take us only so far—here’s what else we need. (MIT Technology Review)+ An AI image generator startup just leaked a huge trove of nude images. (Wired $) 4 OpenAI is acquiring an AI model training startupIts researchers have been impressed by the monitoring and de-bugging tools built by Neptune. (NBC)+ It’s not just you: the speed of AI deal-making really is accelerating. (NYT $) 5 Russia has blocked Apple’s FaceTime video calling featureIt seems the Kremlin views any platform it doesn’t control as dangerous. (Reuters $)+ How Russia killed its tech industry. (MIT Technology Review) 6 The trouble with AI browsersThis reviewer tested five of them and found them to be far more effort than they’re worth. (The Verge $)+ AI means the end of internet search as we’ve known it. (MIT Technology Review) 7 An anti-AI activist has disappeared Sam Kirchner went AWOL after failing to show up at a scheduled court hearing, and friends are worried. (The Atlantic$) 8 Taiwanese chip workers are creating a community in the Arizona desertA TSMC project to build chip factories is rapidly transforming this corner of the US. (NYT $) 9 This hearing aid has become a status symbol Rich people with hearing issues swear by a product made by startup Fortell. (Wired $)+ Apple AirPods can be a gateway hearing aid. (MIT Technology Review) 10 A plane crashed after one of its 3D-printed parts melted Just because you can do something, that doesn’t mean you should. (BBC) Quote of the day “Some people claim we can scale up current technology and get to general intelligence…I think that’s bullshit, if you’ll pardon my French.” —AI researcher Yann LeCun explains why he’s leaving Meta to set up a world-model startup, Sifted reports. One more thing ILLUSTRATION SOURCES: NATIONAL HUMAN GENOME RESEARCH INSTITUTE What to expect when you’re expecting an extra X or Y chromosome Sex chromosome variations, in which people have a surplus or missing X or Y, occur in as many as one in 400 births. Yet the majority of people affected don’t even know they have them, because these conditions can fly under the radar. As more expectant parents opt for noninvasive prenatal testing in hopes of ruling out serious conditions, many of them are surprised to discover instead that their fetus has a far less severe—but far less well-known—condition. And because so many

The Download: political chatbot persuasion, and gene editing adverts 投稿を読む »

AI, Committee, ニュース, Uncategorized

Harnessing human-AI collaboration for an AI roadmap that moves beyond pilots

admin NU / 12月 6, 2025

The past year has marked a turning point in the corporate AI conversation. After a period of eager experimentation, organizations are now confronting a more complex reality: While investment in AI has never been higher, the path from pilot to production remains elusive. Three-quarters of enterprises remain stuck in experimentation mode, despite mounting pressure to convert early tests into operational gains. WATCH THE WEBCAST “Most organizations can suffer from what we like to call PTSD, or process technology skills and data challenges,” says Shirley Hung, partner at Everest Group. “They have rigid, fragmented workflows that don’t adapt well to change, technology systems that don’t speak to each other, talent that is really immersed in low-value tasks rather than creating high impact. And they are buried in endless streams of information, but no unified fabric to tie it all together.” The central challenge, then, lies in rethinking how people, processes, and technology work together. Across industries as different as customer experience and agricultural equipment, the same pattern is emerging: Traditional organizational structures—centralized decision-making, fragmented workflows, data spread across incompatible systems—are proving too rigid to support agentic AI. To unlock value, leaders must rethink how decisions are made, how work is executed, and what humans should uniquely contribute. “It is very important that humans continue to verify the content. And that is where you’re going to see more energy being put into,” Ryan Peterson, EVP and chief product officer at Concentrix. Much of the conversation centered on what can be described as the next major unlock: operationalizing human-AI collaboration. Rather than positioning AI as a standalone tool or a “virtual worker,” this approach reframes AI as a system-level capability that augments human judgment, accelerates execution, and reimagines work from end to end. That shift requires organizations to map the value they want to create; design workflows that blend human oversight with AI-driven automation; and build the data, governance, and security foundations that make these systems trustworthy. “My advice would be to expect some delays because you need to make sure you secure the data,” says Heidi Hough, VP for North America aftermarket at Valmont. “As you think about commercializing or operationalizing any piece of using AI, if you start from ground zero and have governance at the forefront, I think that will help with outcomes.” Early adopters are already showing what this looks like in practice: starting with low-risk operational use cases, shaping data into tightly scoped enclaves, embedding governance into everyday decision-making, and empowering business leaders, not just technologists, to identify where AI can create measurable impact. The result is a new blueprint for AI maturity grounded in reengineering how modern enterprises operate. “Optimization is really about doing existing things better, but reimagination is about discovering entirely new things that are worth doing,” says Hung. Watch the webcast. This webcast is produced in partnership with Concentrix. This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Harnessing human-AI collaboration for an AI roadmap that moves beyond pilots 投稿を読む »

AI, Committee, ニュース, Uncategorized

Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression

admin NU / 12月 6, 2025

How do you keep RAG systems accurate and efficient when every query tries to stuff thousands of tokens into the context window and the retriever and generator are still optimized as 2 separate, disconnected systems? A team of researchers from Apple and University of Edinburgh released CLaRa, Continuous Latent Reasoning, (CLaRa-7B-Base, CLaRa-7B-Instruct and CLaRa-7B-E2E) a retrieval augmented generation framework that compresses documents into continuous memory tokens and then performs both retrieval and generation in that shared latent space. The goal is simple. Shorten context, avoid double encoding, and let the generator teach the retriever what actually matters for downstream answers. https://arxiv.org/pdf/2511.18659 From raw documents to continuous memory tokens CLaRa starts with a semantic compressor that attaches a small number of learned memory tokens to each document. During Salient Compressor Pretraining, SCP, the base model is a Mistral 7B style transformer with LoRA adapters that switch between a compressor role and a generator role. The final layer hidden states of the memory tokens become the compressed representation for that document. SCP is trained on about 2M passages from Wikipedia 2021. A local Qwen-32B model generates 3 supervision signals for each passage. Simple QA pairs cover atomic facts. Complex QA pairs connect several facts in one question to enforce multi hop reasoning. Paraphrases reorder and compress the text while preserving semantics. A verification loop checks factual consistency and coverage and can regenerate missing questions or paraphrases for up to 10 rounds before accepting a sample. Training uses 2 losses. A cross entropy term trains the generator to answer questions or produce paraphrases conditioned only on the memory tokens and an instruction prefix. A mean squared error term aligns the average hidden state of document tokens with the average hidden state of the memory tokens. The MSE loss gives modest but consistent gains of about 0.3 to 0.6 F1 points at compression ratios 32 and 128 and keeps compressed and original representations in the same semantic region. https://arxiv.org/pdf/2511.18659 Joint retrieval and generation in a shared space After offline compression, each document is represented only by its memory tokens. CLaRa then trains a query reasoner and an answer generator on top of the same backbone. The query reasoner is another LoRA adapter that maps an input question into the same number of memory tokens used for documents. Retrieval becomes pure embedding search. The system computes cosine similarity between the query embedding and each candidate document embedding. The best compressed document embeddings for a query are concatenated with the query tokens and fed into the generator adapter. Training uses only a standard next token prediction loss on the final answer. There are no explicit relevance labels. The key trick is a differentiable top k selector implemented with a Straight Through estimator. During the forward pass the model uses hard top k selection. During the backward pass a softmax distribution over document scores allows gradients from the generator to flow into the query reasoner parameters. The research team shows 2 effects in the gradient analysis. First, the retriever is encouraged to assign higher probability to documents that increase answer likelihood. Second, because retrieval and generation share the same compressed representations, generator gradients reshape the latent document space to make it easier to reason over. Logit lens analysis of the query embeddings recovers topic tokens such as “NFL” and “Oklahoma” for a question about the nephew of Ivory Lee Brown, even though those tokens are not in the raw query but are present in the supporting articles. https://arxiv.org/pdf/2511.18659 Compression quality and QA accuracy The compressor is evaluated on 4 QA datasets: Natural Questions, HotpotQA, MuSiQue and 2WikiMultihopQA. Under the Normal setting, where the system retrieves the top 5 Wikipedia 2021 documents per query, SCP-Mistral-7B at 4 times compression reaches an average F1 of 39.86. This is 5.37 points better than the hard compression baseline LLMLingua 2 and 1.13 points better than the best soft compression baseline PISCO. Under the Oracle setting, where the gold document is guaranteed to be in the candidate set, SCP-Mistral-7B at 4 times compression reaches an average F1 of 66.76. That is 17.31 points above LLMLingua-2 and 5.35 points above PISCO. Even more interesting, the compressed representations outperform a BGE based text retriever plus full document Mistral-7B generator by about 2.36 average F1 points for Mistral and about 6.36 points for Phi 4 mini. Well trained soft compression can exceed full text RAG while cutting context length by factors from 4 to 128. https://arxiv.org/pdf/2511.18659 The performance at very high compression ratios, above 32 in Oracle, does drop, but the decline remains moderate in Normal retrieval conditions. The key explanation as per the research team is, weak document relevance bottlenecks the system before compression quality does. End to end QA and retrieval behavior For end to end QA, CLaRa uses 20 candidate documents per query with compression ratios 4, 16 and 32. On the Normal setting, CLaRa-Mistral-7B with instruction initialized weights and 16 times compression reaches F1 equal to 50.89 on Natural Questions and 44.66 on 2WikiMultihopQA. This is comparable to DRO-Mistral-7B, which reads full uncompressed text, while using 16 times shorter document representations. On some datasets, CLaRa at 16 times compression slightly improves F1 over DRO, for example from 43.65 to 47.18 on 2Wiki. In the Oracle setting, CLaRa-Mistral-7B exceeds 75, F1 on both Natural Questions and HotpotQA at 4 times compression. This shows that the generator can fully exploit accurate retrieval even when all evidence is stored only in compressed memory tokens. Instruction initialized CLaRa generally wins over pre-training initialized CLaRa in the Normal setting, while the gap narrows in Oracle, where retrieval noise is limited. On the retrieval side, CLaRa used as a reranker under Oracle conditions delivers strong Recall at 5. With pretraining initialization at compression 4 on HotpotQA, CLaRa-Mistral-7B reaches Recall at 5 equal to 96.21. This beats the supervised BGE Reranker baseline at 85.93 by 10.28 points and even outperforms a fully supervised Sup Instruct retriever trained with contrastive relevance labels. https://arxiv.org/pdf/2511.18659 What Apple has released? Apple’s

Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression 投稿を読む »

AI, Committee, ニュース, Uncategorized

Kernel Principal Component Analysis (PCA): Explained with an Example

admin NU / 12月 6, 2025

Dimensionality reduction techniques like PCA work wonderfully when datasets are linearly separable—but they break down the moment nonlinear patterns appear. That’s exactly what happens with datasets such as two moons: PCA flattens the structure and mixes the classes together. Kernel PCA fixes this limitation by mapping the data into a higher-dimensional feature space where nonlinear patterns become linearly separable. In this article, we’ll walk through how Kernel PCA works and use a simple example to visually compare PCA vs. Kernel PCA, showing how a nonlinear dataset that PCA fails to separate becomes perfectly separable after applying Kernel PCA. What is PCA and how is it different from Kernel PCA? Principal Component Analysis (PCA) is a linear dimensionality-reduction technique that identifies the directions (principal components) along which the data varies the most. It works by computing orthogonal linear combinations of the original features and projecting the dataset onto the directions of maximum variance. These components are uncorrelated and ordered so that the first few capture most of the information in the data. PCA is powerful, but it comes with one important limitation: it can only uncover linear relationships in the data. When applied to nonlinear datasets—like the “two moons” example—it often fails to separate the underlying structure. Kernel PCA extends PCA to handle nonlinear relationships. Instead of directly applying PCA in the original feature space, Kernel PCA first uses a kernel function (such as RBF, polynomial, or sigmoid) to implicitly project the data into a higher-dimensional feature space where the nonlinear structure becomes linearly separable. PCA is then performed in this transformed space using a kernel matrix, without explicitly computing the higher-dimensional projection. This “kernel trick” allows Kernel PCA to capture complex patterns that standard PCA cannot. We will now create a dataset that is nonlinear and then apply PCA to the dataset. Code Implementation Generating the dataset We generate a nonlinear “two moons” dataset using make_moons, which is ideal for demonstrating why PCA fails and Kernel PCA succeeds. Copy CodeCopiedUse a different Browser import matplotlib.pyplot as plt from sklearn.datasets import make_moons X, y = make_moons(n_samples=1000, noise=0.02, random_state=123) plt.scatter(X[:, 0], X[:, 1], c=y) plt.show() Applying PCA on the dataset Copy CodeCopiedUse a different Browser from sklearn.decomposition import PCA pca = PCA(n_components=2) X_pca = pca.fit_transform(X) plt.title(“PCA”) plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y) plt.xlabel(“Component 1”) plt.ylabel(“Component 2”) plt.show() The PCA visualization shows that the two moon-shaped clusters remain intertwined even after dimensionality reduction. This happens because PCA is a strictly linear technique—it can only rotate, scale, or flatten the data along straight directions of maximum variance. Since the “two moons” dataset has a nonlinear structure, PCA is unable to separate the classes or untangle the curved shapes. As a result, the transformed data still looks almost identical to the original pattern, and the two classes remain overlapped in the projected space. Applying Kernel PCA on the dataset We now apply Kernel PCA using an RBF kernel, which maps the nonlinear data into a higher-dimensional space where it becomes linearly separable. In the kernel space the two classes in our dataset are linearly separable. Kernel PCA uses a kernel function to project the dataset into a higher-dimensional space, where it is linearly separable. Copy CodeCopiedUse a different Browser from sklearn.decomposition import KernelPCA kpca = KernelPCA(kernel=’rbf’, gamma=15) X_kpca = kpca.fit_transform(X) plt.title(“Kernel PCA”) plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y) plt.show() The goal of PCA (and dimensionality reduction in general) is not just to compress the data—it’s to reveal the underlying structure in a way that preserves meaningful variation. In nonlinear datasets like the two-moons example, traditional PCA cannot “unfold” the curved shapes because it only applies linear transformations. Kernel PCA, however, performs a nonlinear mapping before applying PCA, allowing the algorithm to untangle the moons into two clearly separated clusters. This separation is valuable because it makes downstream tasks like visualization, clustering, and even classification far more effective. When the data becomes linearly separable after transformation, simple models—such as linear classifiers—can successfully distinguish between the classes, something that would be impossible in the original or PCA-transformed space. Challenges involved with Kernel PCA While Kernel PCA is powerful for handling nonlinear datasets, it comes with several practical challenges. The biggest drawback is computational cost—because it relies on computing pairwise similarities between all data points, the algorithm has O(n²) time and memory complexity, making it slow and memory-heavy for large datasets. Another challenge is model selection: choosing the right kernel (RBF, polynomial, etc.) and tuning parameters like gamma can be tricky and often requires experimentation or domain expertise. Kernel PCA can also be harder to interpret, since the transformed components no longer correspond to intuitive directions in the original feature space. Finally, it is sensitive to missing values and outliers, which can distort the kernel matrix and degrade performance. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Kernel Principal Component Analysis (PCA): Explained with an Example appeared first on MarkTechPost.

Kernel Principal Component Analysis (PCA): Explained with an Example 投稿を読む »

AI, Committee, ニュース, Uncategorized

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

admin NU / 12月 5, 2025

arXiv:2512.04834v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, providing insights, improving diagnostics, and personalizing learning experiences. Information extraction from clinical records is a crucial task in digital healthcare. Although traditional NLP techniques have been used for this in the past, they often fall short due to the complexity, variability of clinical language, and high inner semantics in the free clinical text. Recently, Large Language Models (LLMs) have become a powerful tool for better understanding and generating human-like text, making them highly effective in this area. In this paper, we explore the ability of open-source multilingual LLMs to understand EHRs (Electronic Health Records) in Italian and help extract information from them in real-time. Our detailed experimental campaign on comorbidity extraction from EHR reveals that some LLMs struggle in zero-shot, on-premises settings, and others show significant variation in performance, struggling to generalize across various diseases when compared to native pattern matching and manual annotations.

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case 投稿を読む »

AI, Committee, ニュース, Uncategorized

LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence

admin NU / 12月 5, 2025

arXiv:2512.04578v1 Announce Type: new Abstract: Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across domains. However, existing benchmarks are result-oriented and fail to systematically evaluate the legal intelligence of large language models (LLMs), hindering the development of legal GI. To address this, we propose LexGenius, an expert-level Chinese legal benchmark for evaluating legal GI in LLMs. It follows a Dimension-Task-Ability framework, covering seven dimensions, eleven tasks, and twenty abilities. We use the recent legal cases and exam questions to create multiple-choice questions with a combination of manual and LLM reviews to reduce data leakage risks, ensuring accuracy and reliability through multiple rounds of checks. We evaluate 12 state-of-the-art LLMs using LexGenius and conduct an in-depth analysis. We find significant disparities across legal intelligence abilities for LLMs, with even the best LLMs lagging behind human legal professionals. We believe LexGenius can assess the legal intelligence abilities of LLMs and enhance legal GI development. Our project is available at https://github.com/QwenQKing/LexGenius.

LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence 投稿を読む »

AI, Committee, ニュース, Uncategorized

Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

admin NU / 12月 5, 2025

arXiv:2509.23808v3 Announce Type: replace-cross Abstract: A prevailing view in Reinforcement Learning with Verifiable Rewards (RLVR) interprets recent progress through the lens of an exploration-exploitation trade-off, a perspective largely shaped by token-level metrics. We re-examine this perspective, proposing that this perceived trade-off may not be a fundamental constraint but rather an artifact of the measurement level. To investigate this, we shift the analysis to the semantically rich hidden-state space, adopting Effective Rank (ER) to quantify exploration and proposing its novel first- and second-order derivatives, named ER Velocity and ER Acceleration, to capture exploitation dynamics. Our analysis reveals that in the semantic space, exploration and exploitation could be decoupled (Sec.~4). This finding reveals an opportunity to enhance both capacities simultaneously. This insight motivates our method, Velocity-Exploiting Rank-Learning (VERL), the first to operationalize the principle of synergistic exploration-exploitation enhancement by directly shaping the RL advantage function. The key innovation is leveraging the theoretically stable ERA as a predictive meta-controller to create a synergistic, dual-channel incentive structure. Instead of forcing a trade-off, VERL prospectively amplifies rewards for exploration to preempt overconfidence and reinforces exploitative gains to consolidate reasoning. Experiments across diverse LLMs and reasoning benchmarks show consistent gains, including up to 21.4% absolute accuracy improvement on the challenging Gaokao 2024 dataset.

Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR 投稿を読む »

AI, Committee, ニュース, Uncategorized

Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding

admin NU / 12月 5, 2025

arXiv:2504.00409v2 Announce Type: replace Abstract: Large language models (LLMs) have greatly improved their capability in performing NLP tasks. However, deeper semantic understanding, contextual coherence, and more subtle reasoning are still difficult to obtain. The paper discusses state-of-the-art methodologies that advance LLMs with more advanced NLU techniques, such as semantic parsing, knowledge integration, and contextual reinforcement learning. We analyze the use of structured knowledge graphs, retrieval-augmented generation (RAG), and fine-tuning strategies that match models with human-level understanding. Furthermore, we address the incorporation of transformer-based architectures, contrastive learning, and hybrid symbolic-neural methods that address problems like hallucinations, ambiguity, and inconsistency in the factual perspectives involved in performing complex NLP tasks, such as question-answering text summarization and dialogue generation. Our findings show the importance of semantic precision for enhancing AI-driven language systems and suggest future research directions to bridge the gap between statistical language models and true natural language understanding.

Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding 投稿を読む »

ニュース

Microsoft AI Releases VibeVoice-Realtime: A Lightweight Real‑Time Text-to-Speech Model Supporting Streaming Text Input and Robust Long-Form Speech Generation

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains

The Download: political chatbot persuasion, and gene editing adverts

Harnessing human-AI collaboration for an AI roadmap that moves beyond pilots

Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression

Kernel Principal Component Analysis (PCA): Explained with an Example

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence

Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding

私たちのサービス

ホーム

仕組み

ニュース

料金

サポート

ヘルプセンター

問題を報告

フィードバックを送る

プライバシーポリシー

ユーザーアカウント

フォローする