YouZum

Committee

AI, Committee, News, Uncategorized

An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs

arXiv:2508.10010v1 Announce Type: new Abstract: Large Language Models (LLMs) are a double-edged sword capable of generating harmful misinformation — inadvertently, or when prompted by “jailbreak” attacks that attempt to produce malicious outputs. LLMs could, with additional research, be used to detect and prevent the spread of misinformation. In this paper, we investigate the efficacy and characteristics of LLM-produced jailbreak attacks that cause other models to produce harmful medical misinformation. We also study how misinformation generated by jailbroken LLMs compares to typical misinformation found on social media, and how effectively it can be detected using standard machine learning approaches. Specifically, we closely examine 109 distinct attacks against three target LLMs and compare the attack prompts to in-the-wild health-related LLM queries. We also examine the resulting jailbreak responses, comparing the generated misinformation to health-related misinformation on Reddit. Our findings add more evidence that LLMs can be effectively used to detect misinformation from both other LLMs and from people, and support a body of work suggesting that with careful design, LLMs can contribute to a healthier overall information ecosystem.

An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs Read Post »

AI, Committee, News, Uncategorized

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

arXiv:2508.10404v1 Announce Type: new Abstract: With the rapid proliferation of Natural Language Processing (NLP), especially Large Language Models (LLMs), generating adversarial examples to jailbreak LLMs remains a key challenge for understanding model vulnerabilities and improving robustness. In this context, we propose a new black-box attack method that leverages the interpretability of large models. We introduce the Sparse Feature Perturbation Framework (SFPF), a novel approach for adversarial text generation that utilizes sparse autoencoders to identify and manipulate critical features in text. After using the SAE model to reconstruct hidden layer representations, we perform feature clustering on the successfully attacked texts to identify features with higher activations. These highly activated features are then perturbed to generate new adversarial texts. This selective perturbation preserves the malicious intent while amplifying safety signals, thereby increasing their potential to evade existing defenses. Our method enables a new red-teaming strategy that balances adversarial effectiveness with safety alignment. Experimental results demonstrate that adversarial texts generated by SFPF can bypass state-of-the-art defense mechanisms, revealing persistent vulnerabilities in current NLP systems.However, the method’s effectiveness varies across prompts and layers, and its generalizability to other architectures and larger models remains to be validated.

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation Read Post »

AI, Committee, News, Uncategorized

Echoes of Automation: The Increasing Use of LLMs in Newsmaking

arXiv:2508.06445v2 Announce Type: replace Abstract: The rapid rise of Generative AI (GenAI), particularly LLMs, poses concerns for journalistic integrity and authorship. This study examines AI-generated content across over 40,000 news articles from major, local, and college news media, in various media formats. Using three advanced AI-text detectors (e.g., Binoculars, Fast-Detect GPT, and GPTZero), we find substantial increase of GenAI use in recent years, especially in local and college news. Sentence-level analysis reveals LLMs are often used in the introduction of news, while conclusions usually written manually. Linguistic analysis shows GenAI boosts word richness and readability but lowers formality, leading to more uniform writing styles, particularly in local media.

Echoes of Automation: The Increasing Use of LLMs in Newsmaking Read Post »

AI, Committee, News, Uncategorized

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

arXiv:2508.10029v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate impressive capabilities in various language tasks but are susceptible to jailbreak attacks that circumvent their safety alignments. This paper introduces Latent Fusion Jailbreak (LFJ), a representation-based attack that interpolates hidden states from harmful and benign query pairs to elicit prohibited responses. LFJ begins by selecting query pairs with high thematic and syntactic similarity, then performs gradient-guided interpolation at influential layers and tokens, followed by optimization to balance attack success, output fluency, and computational efficiency. Evaluations on models such as Vicuna and LLaMA-2 across benchmarks like AdvBench and MaliciousInstruct yield an average attack success rate (ASR) of 94.01%, outperforming existing methods. To mitigate LFJ, we propose an adversarial training defense that fine-tunes models on interpolated examples, reducing ASR by over 80% without degrading performance on benign inputs. Ablation studies validate the importance of query pair selection, hidden state interpolation components, and optimization strategies in LFJ’s effectiveness.

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs Read Post »

AI, Committee, News, Uncategorized

Multi-Step Reasoning with Large Language Models, a Survey

arXiv:2407.11511v2 Announce Type: replace-cross Abstract: Language models with billions of parameters exhibit in-context learning abilities, enabling few-shot learning on tasks that the model was not specifically trained for. Traditional models achieve breakthrough performance on language tasks, but do not perform well on basic reasoning benchmarks. However, a new in-context learning approach, Chain-of-thought, has demonstrated strong multi-step reasoning abilities on these benchmarks. The research on LLM reasoning abilities started with the question whether LLMs can solve grade school math word problems, and has expanded to other tasks in the past few years. This paper reviews the field of multi-step reasoning with LLMs. We propose a taxonomy that identifies different ways to generate, evaluate, and control multi-step reasoning. We provide an in-depth coverage of core approaches and open problems, and we propose a research agenda for the near future. We find that multi-step reasoning approaches have progressed beyond math word problems, and can now successfully solve challenges in logic, combinatorial games, and robotics, sometimes by first generating code that is then executed by external tools. Many studies in multi-step methods are using reinforcement learning for finetuning, external optimization loops, in context reinforcement learning, and self-reflection.

Multi-Step Reasoning with Large Language Models, a Survey Read Post »

AI, Committee, News, Uncategorized

Microsoft Releases POML (Prompt Orchestration Markup Language): Bringing Modularity and Scalability to LLM Prompts

Prompt engineering has become foundational in the development of advanced applications powered by Large Language Models (LLMs). As prompts have grown in complexity—incorporating dynamic components, multiple roles, structured data, and varied output formats—the limitations of unstructured text approaches have become evident. Microsoft released Prompt Orchestration Markup Language (POML), a novel open-source framework designed to bring order, modularity, and extensibility to prompt engineering for LLMs. What is POML? POML is an HTML/XML-inspired markup language tailored for creating sophisticated, maintainable, and reusable AI prompts. It provides a systematic approach to: Defining prompt structure using semantic components and roles. Integrating diverse data types and external resources. Decoupling content from presentation with stylesheets. Enabling advanced templating and variable logic for dynamic prompt generation. Supporting developers with an ecosystem of robust tooling. Core Features 1. Structured Prompt Markup POML uses clear, semantic elements—such as <role>, <task>, and <example>—to define the various logical sections of a prompt. This modular design makes prompts readable, maintainable, and highly reusable. Copy CodeCopiedUse a different Browser xml<poml> <role>You are a science teacher.</role> <task>Explain gravity using the image below.</task> <img src=”gravity_diagram.png” alt=”Diagram of gravity” /> <output-format> Use simple language and keep your answer under 50 words. </output-format> </poml> This approach eliminates the brittle formatting problems often seen with “prompt spaghetti” and encourages clean separation of responsibilities. 2. Comprehensive Data Handling POML natively enables embedding or referencing external data of various types: Text documents (<document>) Spreadsheets and Tables (<table>) Images (<img>) Other formats, as needed This allows seamless integration of reference materials, instructional datasets, and visual aids, all within the prompt. 3. Decoupled Presentation Styling Inspired by CSS, POML supports a style system that separates content from formatting and output constraints. Styles can be specified in <stylesheet> blocks or with inline attributes, enabling easy modifications without touching the prompt’s logical structure. For example: Copy CodeCopiedUse a different Browser xml<output-format style=”verbose”> Please provide a detailed, step-by-step explanation suitable for adults. </output-format> This minimizes the risk of LLM output instability caused by inadvertent format tweaks, and makes A/B testing different presentation layers effortless. 4. Integrated Templating Engine POML includes a powerful built-in templating engine supporting: Variables: {{ username }} Loops: for x in data Conditionals: if … else Definitions: <let> This dynamic system empowers developers to generate prompts programmatically and manage complex variations at scale. 5. Rich Tooling Ecosystem The language is backed by a suite of developer tools: VS Code Extension: Provides syntax highlighting, auto-completion, hover documentation, diagnostics, and live preview of prompt formatting and logic—greatly simplifying debugging and iterative development. SDKs: POML offers libraries for Node.js (TypeScript/JavaScript) and Python, enabling easy integration with existing workflows and popular LLM frameworks. Configuration with your preferred LLM provider (e.g., OpenAI, Azure) is also straightforward, allowing rapid testing and deployment. Example: Prompt with Image Reference A sample prompt for teaching photosynthesis to a child could look like: Copy CodeCopiedUse a different Browser xml<poml> <role>You are a patient teacher explaining concepts to a 10-year-old.</role> <task>Explain the concept of photosynthesis using the provided image.</task> <img src=”photosynthesis_diagram.png” alt=”Diagram of photosynthesis”/> <output-format> Start with “Hey there, future scientist!” and keep the explanation under 100 words. </output-format> </poml> This example demonstrates how easily POML integrates visual context and constrains output style in a reusable template. Technical Architecture & Philosophy POML is architected to embody the “view layer” concept found in traditional frontend development (MVC architecture). The markup defines the presentation, not the business logic or data access—enabling clean separation and making it easy to refactor prompts, test variations, and ensure consistency across agent workflows and automated testing. Installation & Getting Started POML is open-source (MIT License) and available on GitHub. You can: Install the VS Code extension from the marketplace Use the Node.js (npm install pomljs) or Python (pip install poml) SDKs Refer to the detailed POML documentation for syntax, examples, and integration guides. Conclusion Prompt Orchestration Markup Language (POML) brings much-needed structure, scalability, and maintainability to prompt engineering for AI developers. Its modular syntax, comprehensive data handling, decoupled styling, dynamic templating, and rich integration ecosystem position it as a promising standard for orchestrating advanced LLM applications. Whether you’re building a multi-agent workflow, debugging complex prompt logic, or developing reusable AI modules for production, POML offers a powerful new foundation that’s rapidly gaining traction in the LLM ecosystem. Check out the GitHub Page here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Star us on GitHub Sponsorship Details The post Microsoft Releases POML (Prompt Orchestration Markup Language): Bringing Modularity and Scalability to LLM Prompts appeared first on MarkTechPost.

Microsoft Releases POML (Prompt Orchestration Markup Language): Bringing Modularity and Scalability to LLM Prompts Read Post »

AI, Committee, News, Uncategorized

Decoding Neural Emotion Patterns through Natural Language Processing Embeddings

arXiv:2508.09337v1 Announce Type: new Abstract: Understanding how emotional expression in language relates to brain function is a challenge in computational neuroscience and affective computing. Traditional neuroimaging is costly and lab-bound, but abundant digital text offers new avenues for emotion-brain mapping. Prior work has largely examined neuroimaging-based emotion localization or computational text analysis separately, with little integration. We propose a computational framework that maps textual emotional content to anatomically defined brain regions without requiring neuroimaging. Using OpenAI’s text-embedding-ada-002, we generate high-dimensional semantic representations, apply dimensionality reduction and clustering to identify emotional groups, and map them to 18 brain regions linked to emotional processing. Three experiments were conducted: i) analyzing conversational data from healthy vs. depressed subjects (DIAC-WOZ dataset) to compare mapping patterns, ii) applying the method to the GoEmotions dataset and iii) comparing human-written text with large language model (LLM) responses to assess differences in inferred brain activation. Emotional intensity was scored via lexical analysis. Results showed neuroanatomically plausible mappings with high spatial specificity. Depressed subjects exhibited greater limbic engagement tied to negative affect. Discrete emotions were successfully differentiated. LLM-generated text matched humans in basic emotion distribution but lacked nuanced activation in empathy and self-referential regions (medial prefrontal and posterior cingulate cortex). This cost-effective, scalable approach enables large-scale analysis of naturalistic language, distinguishes between clinical populations, and offers a brain-based benchmark for evaluating AI emotional expression.

Decoding Neural Emotion Patterns through Natural Language Processing Embeddings Read Post »

AI, Committee, News, Uncategorized

DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments

arXiv:2506.00739v3 Announce Type: replace Abstract: Large language model (LLM) agents have shown impressive capabilities in human language comprehension and reasoning, yet their potential in cybersecurity remains underexplored. We introduce DefenderBench, a practical, open-source toolkit for evaluating language agents across offense, defense, and cybersecurity knowledge-based tasks. DefenderBench includes environments for network intrusion, malicious content detection, code vulnerability analysis, and cybersecurity knowledge assessment. It is intentionally designed to be affordable and easily accessible for researchers while providing fair and rigorous assessment. We benchmark several state-of-the-art (SoTA) and popular LLMs, including both open- and closed-weight models, using a standardized agentic framework. Our results show that Claude-3.7-sonnet performs best with a DefenderBench score of 81.65, followed by Claude-3.7-sonnet-think with 78.40, while the best open-weight model, Llama 3.3 70B, is not far behind with a DefenderBench score of 71.81. DefenderBench’s modular design allows seamless integration of custom LLMs and tasks, promoting reproducibility and fair comparisons. An anonymized version of DefenderBench is available at https://github.com/microsoft/DefenderBench.

DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments Read Post »

AI, Committee, News, Uncategorized

TEN: Table Explicitization, Neurosymbolically

arXiv:2508.09324v1 Announce Type: new Abstract: We present a neurosymbolic approach, TEN, for extracting tabular data from semistructured input text. This task is particularly challenging for text input that does not use special delimiters consistently to separate columns and rows. Purely neural approaches perform poorly due to hallucinations and their inability to enforce hard constraints. TEN uses Structural Decomposition prompting – a specialized chain-of-thought prompting approach – on a large language model (LLM) to generate an initial table, and thereafter uses a symbolic checker to evaluate not only the well-formedness of that table, but also detect cases of hallucinations or forgetting. The output of the symbolic checker is processed by a critique-LLM to generate guidance for fixing the table, which is presented to the original LLM in a self-debug loop. Our extensive experiments demonstrate that TEN significantly outperforms purely neural baselines across multiple datasets and metrics, achieving significantly higher exact match accuracy and substantially reduced hallucination rates. A 21-participant user study further confirms that TEN’s tables are rated significantly more accurate (mean score: 5.0 vs 4.3; p = 0.021), and are consistently preferred for ease of verification and correction, with participants favoring our method in over 60% of the cases.

TEN: Table Explicitization, Neurosymbolically Read Post »

AI, Committee, News, Uncategorized

Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions

arXiv:2508.08287v1 Announce Type: new Abstract: Despite the increasing usage of Large Language Models (LLMs) in answering questions in a variety of domains, their reliability and accuracy remain unexamined for a plethora of domains including the religious domains. In this paper, we introduce a novel benchmark FiqhQA focused on the LLM generated Islamic rulings explicitly categorized by the four major Sunni schools of thought, in both Arabic and English. Unlike prior work, which either overlooks the distinctions between religious school of thought or fails to evaluate abstention behavior, we assess LLMs not only on their accuracy but also on their ability to recognize when not to answer. Our zero-shot and abstention experiments reveal significant variation across LLMs, languages, and legal schools of thought. While GPT-4o outperforms all other models in accuracy, Gemini and Fanar demonstrate superior abstention behavior critical for minimizing confident incorrect answers. Notably, all models exhibit a performance drop in Arabic, highlighting the limitations in religious reasoning for languages other than English. To the best of our knowledge, this is the first study to benchmark the efficacy of LLMs for fine-grained Islamic school of thought specific ruling generation and to evaluate abstention for Islamic jurisprudence queries. Our findings underscore the need for task-specific evaluation and cautious deployment of LLMs in religious applications.

Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US