Week Ending 9.14.2025

RESEARCH WATCH: 9.14.2025

Standards in the Preparation of Biomedical Research Metadata: A Bridge2AI Perspective

The explosion of AI applications in biomedicine has created an urgent need for standardized, AI-ready datasets. This research addresses a critical bottleneck in biomedical AI development: the lack of proper metadata standards that enable effective machine learning implementation. The Bridge2AI consortium's work establishes guidelines for creating datasets that meet FAIR principles while ensuring ethical data practices and computational accessibility. Applications span drug discovery, personalized medicine, and clinical decision support systems. By standardizing metadata creation across genomics, voice biomarkers, disease trajectory modeling, and cellular mapping projects, this framework could accelerate the translation of biomedical research into practical AI tools, ultimately improving patient outcomes and reducing healthcare costs through more efficient data utilization.

Authors: Harry Caufield, Satrajit Ghosh, Sek Wong Kong, Jillian Parker, Nathan Sheffield, Bhavesh Patel, Andrew Williams, Timothy Clark, Monica C. Munoz-Torres

Link: https://arxiv.org/abs/2509.10432v1

Date: 2025-09-d

Summary:

AI-readiness describes the degree to which data may be optimally and ethically used for subsequent AI and Machine Learning (AI/ML) methods, where those methods may involve some combination of model training, data classification, and ethical, explainable prediction. The Bridge2AI consortium has defined the particular criteria a biomedical dataset may possess to render it AI-ready: in brief, a dataset's readiness is related to its FAIRness, provenance, degree of characterization, explainability, sustainability, and computability, in addition to its accompaniment with documentation about ethical data practices. To ensure AI-readiness and to clarify data structure and relationships within Bridge2AI's Grand Challenges (GCs), particular types of metadata are necessary. The GCs within the Bridge2AI initiative include four data-generating projects focusing on generating AI/ML-ready datasets to tackle complex biomedical and behavioral research problems. These projects develop standardized, multimodal data, tools, and training resources to support AI integration, while addressing ethical data practices. Examples include using voice as a biomarker, building interpretable genomic tools, modeling disease trajectories with diverse multimodal data, and mapping cellular and molecular health indicators across the human body. This report assesses the state of metadata creation and standardization in the Bridge2AI GCs, provides guidelines where required, and identifies gaps and areas for improvement across the program. New projects, including those outside the Bridge2AI consortium, would benefit from what we have learned about creating metadata as part of efforts to promote AI readiness.

--------------------------------------------------------------------------------------------------------

Mutual Information Tracks Policy Coherence in Reinforcement Learning

Real-world deployment of reinforcement learning agents faces significant challenges from hardware degradation and environmental changes, yet current systems lack intrinsic failure detection mechanisms. This research introduces an information-theoretic framework that both explains fundamental RL dynamics and provides practical diagnostic tools for autonomous systems. By analyzing state-action mutual information patterns, the method can distinguish between sensor faults (observation noise) and actuator failures (action noise) without performance degradation. Applications include autonomous vehicles, robotic manufacturing, and aerospace systems where early fault detection is critical. The framework's ability to provide precise fault localization could enable self-diagnosing AI systems that maintain operational safety through adaptive policy adjustment, potentially revolutionizing how we deploy AI in safety-critical environments.

Authors: Cameron Reid, Wael Hafez, Amirhossein Nazeri

Link: https://arxiv.org/abs/2509.10423v1

Date: 2025-09-d

Summary:

Reinforcement Learning (RL) agents deployed in real-world environments face degradation from sensor faults, actuator wear, and environmental shifts, yet lack intrinsic mechanisms to detect and diagnose these failures. We present an information-theoretic framework that reveals both the fundamental dynamics of RL and provides practical methods for diagnosing deployment-time anomalies. Through analysis of state-action mutual information patterns in a robotic control task, we first demonstrate that successful learning exhibits characteristic information signatures: mutual information between states and actions steadily increases from 0.84 to 2.83 bits (238% growth) despite growing state entropy, indicating that agents develop increasingly selective attention to task-relevant patterns. Intriguingly, states, actions and next states joint mutual information, MI(S,A;S'), follows an inverted U-curve, peaking during early learning before declining as the agent specializes suggesting a transition from broad exploration to efficient exploitation. More immediately actionable, we show that information metrics can differentially diagnose system failures: observation-space, i.e., states noise (sensor faults) produces broad collapses across all information channels with pronounced drops in state-action coupling, while action-space noise (actuator faults) selectively disrupts action-outcome predictability while preserving state-action relationships. This differential diagnostic capability demonstrated through controlled perturbation experiments enables precise fault localization without architectural modifications or performance degradation. By establishing information patterns as both signatures of learning and diagnostic for system health, we provide the foundation for adaptive RL systems capable of autonomous fault detection and policy adjustment based on information-theoretic principles.

--------------------------------------------------------------------------------------------------------

Bitcoin Cross-Chain Bridge: A Taxonomy and Its Promise in Artificial Intelligence of Things

Bitcoin's limited programmability has hindered its integration into modern blockchain ecosystems, particularly for IoT applications requiring sophisticated smart contracts. This comprehensive taxonomy of cross-chain bridge protocols addresses this limitation by systematically analyzing trust models, performance characteristics, and security implications. The research categorizes bridges into naive swaps, pegged-asset systems, and arbitrary-message protocols, evaluating each for AIoT scenarios. Applications include decentralized energy trading in smart grids, secure healthcare data exchange, and automated supply chain management. By enabling Bitcoin's vast security and liquidity to support programmable IoT applications, these bridges could unlock new economic models where devices autonomously transact value while maintaining the security guarantees of the world's most established blockchain network.

Authors: Guojun Tang, Carylyne Chan, Ning Nan, Spencer Yang, Jiayu Zhou, Henry Leung, Mohammad Mamun, Steve Drew

Link: https://arxiv.org/abs/2509.10413v1

Date: 2025-09-d

Summary:

Bitcoin's limited scripting capabilities and lack of native interoperability mechanisms have constrained its integration into the broader blockchain ecosystem, especially decentralized finance (DeFi) and multi-chain applications. This paper presents a comprehensive taxonomy of Bitcoin cross-chain bridge protocols, systematically analyzing their trust assumptions, performance characteristics, and applicability to the Artificial Intelligence of Things (AIoT) scenarios. We categorize bridge designs into three main types: naive token swapping, pegged-asset bridges, and arbitrary-message bridges. Each category is evaluated across key metrics such as trust model, latency, capital efficiency, and DeFi composability. Emerging innovations like BitVM and recursive sidechains are highlighted for their potential to enable secure, scalable, and programmable Bitcoin interoperability. Furthermore, we explore practical use cases of cross-chain bridges in AIoT applications, including decentralized energy trading, healthcare data integration, and supply chain automation. This taxonomy provides a foundational framework for researchers and practitioners seeking to design secure and efficient cross-chain infrastructures in AIoT systems.

--------------------------------------------------------------------------------------------------------

Multimodal SAM-adapter for Semantic Segmentation

Semantic segmentation remains vulnerable to challenging real-world conditions like poor lighting and adverse weather, limiting deployment in critical applications. This research extends the Segment Anything Model (SAM) to incorporate multimodal sensor data, addressing a fundamental limitation of RGB-only approaches. The MM SAM-adapter framework intelligently fuses information from LiDAR, infrared, and other sensors while preserving SAM's strong generalization capabilities. Applications span autonomous driving in harsh weather, medical imaging with multiple scanning modalities, and robotic navigation in complex environments. By achieving state-of-the-art performance on challenging benchmarks and demonstrating superior robustness in adverse conditions, this approach could enable safer autonomous systems and more reliable computer vision applications across industries where environmental variability poses significant challenges.

Authors: Iacopo Curti, Pierluigi Zama Ramirez, Alioscia Petrelli, Luigi Di Stefano

Link: https://arxiv.org/abs/2509.10408v1

Date: 2025-09-d

Summary:

Semantic segmentation, a key task in computer vision with broad applications in autonomous driving, medical imaging, and robotics, has advanced substantially with deep learning. Nevertheless, current approaches remain vulnerable to challenging conditions such as poor lighting, occlusions, and adverse weather. To address these limitations, multimodal methods that integrate auxiliary sensor data (e.g., LiDAR, infrared) have recently emerged, providing complementary information that enhances robustness. In this work, we present MM SAM-adapter, a novel framework that extends the capabilities of the Segment Anything Model (SAM) for multimodal semantic segmentation. The proposed method employs an adapter network that injects fused multimodal features into SAM's rich RGB features. This design enables the model to retain the strong generalization ability of RGB features while selectively incorporating auxiliary modalities only when they contribute additional cues. As a result, MM SAM-adapter achieves a balanced and efficient use of multimodal information. We evaluate our approach on three challenging benchmarks, DeLiVER, FMB, and MUSES, where MM SAM-adapter delivers state-of-the-art performance. To further analyze modality contributions, we partition DeLiVER and FMB into RGB-easy and RGB-hard subsets. Results consistently demonstrate that our framework outperforms competing methods in both favorable and adverse conditions, highlighting the effectiveness of multimodal adaptation for robust scene understanding. The code is available at the following link: https://github.com/iacopo97/Multimodal-SAM-Adapter.

--------------------------------------------------------------------------------------------------------

Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems

Multi-agent system failures are notoriously difficult to debug, with current methods achieving critically low accuracy in identifying failure points. This research transforms failure attribution from pattern recognition into structured causal reasoning through the Abduct-Act-Predict framework. By guiding large language models through formal abduction, intervention design, and outcome prediction, the method achieves nearly 3× improvement in accuracy over baselines. Applications include debugging complex software systems, analyzing autonomous vehicle coordination failures, and troubleshooting distributed AI systems. The framework's ability to perform robust counterfactual reasoning could revolutionize how we maintain and improve multi-agent systems, from cloud computing infrastructures to swarm robotics, by providing developers with precise, verifiable explanations of system failures and their root causes.

Authors: Alva West, Yixuan Weng, Minjun Zhu, Zhen Lin, Yue Zhang

Link: https://arxiv.org/abs/2509.10401v1

Date: 2025-09-d

Summary:

Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17\%), which renders them impractical for debugging complex systems. Their core weakness is a fundamental inability to perform robust counterfactual reasoning: to determine if correcting a single action would have actually averted the task failure. To bridge this counterfactual inference gap, we introduce Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition into a structured causal inference task. A2P explicitly guides a large language model through a formal three-step reasoning process within a single inference pass: (1) Abduction, to infer the hidden root causes behind an agent's actions; (2) Action, to define a minimal corrective intervention; and (3) Prediction, to simulate the subsequent trajectory and verify if the intervention resolves the failure. This structured approach leverages the holistic context of the entire conversation while imposing a rigorous causal logic on the model's analysis. Our extensive experiments on the Who\&When benchmark demonstrate its efficacy. On the Algorithm-Generated dataset, A2P achieves 47.46\% step-level accuracy, a 2.85$\times$ improvement over the 16.67\% of the baseline. On the more complex Hand-Crafted dataset, it achieves 29.31\% step accuracy, a 2.43$\times$ improvement over the baseline's 12.07\%. By reframing the problem through a causal lens, A2P Scaffolding provides a robust, verifiable, and significantly more accurate solution for automated failure attribution.

--------------------------------------------------------------------------------------------------------

Diversified recommendations of cultural activities with personalized determinantal point processes

Recommendation systems typically optimize for engagement at the expense of diversity, creating filter bubbles that limit user exposure to varied content. This research addresses the challenge of diversifying cultural activity recommendations while maintaining business metrics through personalized Determinantal Point Processes. The approach balances relevance and diversity by incorporating quality-diversity decomposition in the similarity kernel. Applications extend beyond cultural activities to e-commerce product recommendations, news article selection, and educational content curation. By providing a production-ready framework with open-source implementation, this work could help platforms break echo chambers and promote cultural exploration, potentially improving user satisfaction and long-term engagement while supporting the discovery of diverse content creators and cultural institutions.

Authors: Carole Ibrahim, Hiba Bederina, Daniel Cuesta, Laurent Montier, Cyrille Delabre, Jill-Jênn Vie

Link: https://arxiv.org/abs/2509.10392v1

Date: 2025-09-d

Summary:

While optimizing recommendation systems for user engagement is a well-established practice, effectively diversifying recommendations without negatively impacting core business metrics remains a significant industry challenge. In line with our initiative to broaden our audience's cultural practices, this study investigates using personalized Determinantal Point Processes (DPPs) to sample diverse and relevant recommendations. We rely on a well-known quality-diversity decomposition of the similarity kernel to give more weight to user preferences. In this paper, we present our implementations of the personalized DPP sampling, evaluate the trade-offs between relevance and diversity through both offline and online metrics, and give insights for practitioners on their use in a production environment. For the sake of reproducibility, we release the full code for our platform and experiments on GitHub.

--------------------------------------------------------------------------------------------------------

Improving Audio Event Recognition with Consistency Regularization

Audio event recognition faces significant challenges from acoustic variability and limited labeled data, particularly in real-world environments with background noise and diverse recording conditions. This research applies consistency regularization to enforce agreement between model predictions on augmented audio views, building on successful applications in speech recognition. The method demonstrates consistent improvements on AudioSet across both small and large training scenarios, with particular benefits in semi-supervised settings. Applications include smart city monitoring, wildlife conservation through automated species detection, medical diagnosis via respiratory sound analysis, and industrial equipment monitoring. By improving model robustness and enabling effective use of unlabeled data, this approach could accelerate deployment of audio AI systems in resource-constrained environments.

Authors: Shanmuka Sadhu, Weiran Wang

Link: https://arxiv.org/abs/2509.10391v1

Date: 2025-09-d

Summary:

Consistency regularization (CR), which enforces agreement between model predictions on augmented views, has found recent benefits in automatic speech recognition [1]. In this paper, we propose the use of consistency regularization for audio event recognition, and demonstrate its effectiveness on AudioSet. With extensive ablation studies for both small ($\sim$20k) and large ($\sim$1.8M) supervised training sets, we show that CR brings consistent improvement over supervised baselines which already heavily utilize data augmentation, and CR using stronger augmentation and multiple augmentations leads to additional gain for the small training set. Furthermore, we extend the use of CR into the semi-supervised setup with 20K labeled samples and 1.8M unlabeled samples, and obtain performance improvement over our best model trained on the small set.

--------------------------------------------------------------------------------------------------------

Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms

Foundation models in healthcare face critical challenges regarding fairness and generalizability across diverse patient populations. This research systematically examines how training data distribution affects ECG foundation models, revealing that demographic diversity can paradoxically reduce out-of-distribution generalization. The CAPE model, trained on over 5 million ECGs from multiple continents, demonstrates that cohort composition significantly impacts downstream performance. The proposed In-Distribution Batch strategy addresses this challenge by preserving intra-cohort consistency during pretraining. Applications include global cardiac screening programs, personalized cardiovascular risk assessment, and clinical decision support systems. This work provides crucial insights for developing clinically fair AI systems that perform equitably across diverse populations, potentially improving healthcare accessibility and reducing disparities in cardiac care.

Authors: Gul Rukh Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau

Link: https://arxiv.org/abs/2509.10369v1

Date: 2025-09-d

Summary:

Contrastive learning is a widely adopted self-supervised pretraining strategy, yet its dependence on cohort composition remains underexplored. We present Contrasting by Patient Augmented Electrocardiograms (CAPE) foundation model and pretrain on four cohorts (n = 5,203,352), from diverse populations across three continents (North America, South America, Asia). We systematically assess how cohort demographics, health status, and population diversity influence the downstream performance for prediction tasks also including two additional cohorts from another continent (Europe). We find that downstream performance depends on the distributional properties of the pretraining cohort, including demographics and health status. Moreover, while pretraining with a multi-centre, demographically diverse cohort improves in-distribution accuracy, it reduces out-of-distribution (OOD) generalisation of our contrastive approach by encoding cohort-specific artifacts. To address this, we propose the In-Distribution Batch (IDB) strategy, which preserves intra-cohort consistency during pretraining and enhances OOD robustness. This work provides important insights for developing clinically fair and generalisable foundation models.

--------------------------------------------------------------------------------------------------------

Towards Understanding Visual Grounding in Visual Language Models

Visual grounding—the ability to identify image regions matching textual descriptions—is fundamental to many AI applications but remains poorly understood in modern vision-language models. This comprehensive survey examines grounding capabilities across various domains, from referring expression comprehension to robotic control. The research delineates core components of grounded models and explores their practical applications, including fine-grained visual question answering and contextual captioning. Applications span autonomous robotics, accessibility tools for visually impaired users, interactive content creation, and augmented reality systems. By analyzing the multifaceted relationships between grounding, reasoning, and multimodal understanding, this work provides a roadmap for developing more capable AI systems that can precisely connect language with visual content, enabling more natural human-computer interaction.

Authors: Georgios Pantazopoulos, Eda B. Özyiğit

Link: https://arxiv.org/abs/2509.10345v1

Date: 2025-09-d

Summary:

Visual grounding refers to the ability of a model to identify a region within some visual input that matches a textual description. Consequently, a model equipped with visual grounding capabilities can target a wide range of applications in various domains, including referring expression comprehension, answering questions pertinent to fine-grained details in images or videos, caption visual context by explicitly referring to entities, as well as low and high-level control in simulated and real environments. In this survey paper, we review representative works across the key areas of research on modern general-purpose vision language models (VLMs). We first outline the importance of grounding in VLMs, then delineate the core components of the contemporary paradigm for developing grounded models, and examine their practical applications, including benchmarks and evaluation metrics for grounded multimodal generation. We also discuss the multifaceted interrelations among visual grounding, multimodal chain-of-thought, and reasoning in VLMs. Finally, we analyse the challenges inherent to visual grounding and suggest promising directions for future research.

--------------------------------------------------------------------------------------------------------

GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography

Mammography interpretation could benefit significantly from AI assistance, but existing vision-language models fail to capture domain-specific multi-view relationships essential for accurate diagnosis. Unlike radiologists who analyze bilateral mammographic views together, current models treat images independently, losing critical geometric context. GLAM addresses this by leveraging geometric knowledge about mammographic imaging to learn cross-view alignments through joint contrastive learning. Applications include automated breast cancer screening, radiologist training and education, and decision support systems for under-resourced clinics. By properly modeling multi-view correspondence and fine-grained local features, this approach could improve early detection accuracy and consistency, potentially reducing both false positives and missed diagnoses in mammography screening programs worldwide, ultimately contributing to better breast cancer outcomes.

Authors: Yuexi Du, Lihui Chen, Nicha C. Dvornek

Link: https://arxiv.org/abs/2509.10344v1

Date: 2025-09-d

Summary:

Mammography screening is an essential tool for early detection of breast cancer. The speed and accuracy of mammography interpretation have the potential to be improved with deep learning methods. However, the development of a foundation visual language model (VLM) is hindered by limited data and domain differences between natural and medical images. Existing mammography VLMs, adapted from natural images, often ignore domain-specific characteristics, such as multi-view relationships in mammography. Unlike radiologists who analyze both views together to process ipsilateral correspondence, current methods treat them as independent images or do not properly model the multi-view correspondence learning, losing critical geometric context and resulting in suboptimal prediction. We propose GLAM: Global and Local Alignment for Multi-view mammography for VLM pretraining using geometry guidance. By leveraging the prior knowledge about the multi-view imaging process of mammograms, our model learns local cross-view alignments and fine-grained local features through joint global and local, visual-visual, and visual-language contrastive learning. Pretrained on EMBED [14], one of the largest open mammography datasets, our model outperforms baselines across multiple datasets under different settings.

--------------------------------------------------------------------------------------------------------

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

Vision Transformers excel at semantic segmentation but remain computationally prohibitive for resource-constrained applications due to their high memory and processing requirements. This research introduces the first fully integer-only ViT segmentation framework, addressing the notorious fragility of quantized transformer architectures. I-Segmenter systematically replaces floating-point operations with integer counterparts and introduces λ-ShiftGELU activation to stabilize low-precision inference. Applications include mobile autonomous navigation, edge computing devices, IoT visual sensing, and real-time industrial inspection systems. By achieving up to 3.8× model compression and 1.2× inference speedup while maintaining competitive accuracy, this work could democratize advanced computer vision capabilities, enabling sophisticated scene understanding on smartphones, embedded systems, and other resource-limited platforms where traditional transformers are impractical.

Authors: Jordan Sassoon, Michal Szczepanski, Martyna Poreba

Link: https://arxiv.org/abs/2509.10334v1

Date: 2025-09-d

Summary:

Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers an effective strategy to improve efficiency, but ViT-based segmentation models are notoriously fragile under low precision, as quantization errors accumulate across deep encoder-decoder pipelines. We introduce I-Segmenter, the first fully integer-only ViT segmentation framework. Building on the Segmenter architecture, I-Segmenter systematically replaces floating-point operations with integer-only counterparts. To further stabilize both training and inference, we propose $\lambda$-ShiftGELU, a novel activation function that mitigates the limitations of uniform quantization in handling long-tailed activation distributions. In addition, we remove the L2 normalization layer and replace bilinear interpolation in the decoder with nearest neighbor upsampling, ensuring integer-only execution throughout the computational graph. Extensive experiments show that I-Segmenter achieves accuracy within a reasonable margin of its FP32 baseline (5.1 % on average), while reducing model size by up to 3.8x and enabling up to 1.2x faster inference with optimized runtimes. Notably, even in one-shot PTQ with a single calibration image, I-Segmenter delivers competitive accuracy, underscoring its practicality for real-world deployment.

--------------------------------------------------------------------------------------------------------

State Algebra for Propositional Logic

Propositional logic reasoning forms the foundation of many AI systems, but current approaches often lack the flexibility and computational efficiency needed for complex applications. This research introduces State Algebra, a novel algebraic framework that represents logical statements through hierarchical decompositions: Set, Coordinate, and Row representations. The system trades guaranteed canonicity for increased representational flexibility, potentially enabling more compact problem encodings. Applications include automated theorem proving, circuit design verification, knowledge compilation for expert systems, and probabilistic reasoning in uncertain environments. By providing algebraic tools for both search-based and knowledge compilation algorithms, State Algebra could enhance the efficiency of logical reasoning in AI systems, from constraint satisfaction in planning problems to model checking in software verification.

Authors: Dmitry Lesnik, Tobias Schäfer

Link: https://arxiv.org/abs/2509.10326v1

Date: 2025-09-d

Summary:

This paper presents State Algebra, a novel framework designed to represent and manipulate propositional logic using algebraic methods. The framework is structured as a hierarchy of three representations: Set, Coordinate, and Row Decomposition. These representations anchor the system in well-known semantics while facilitating the computation using a powerful algebraic engine. A key aspect of State Algebra is its flexibility in representation. We show that although the default reduction of a state vector is not canonical, a unique canonical form can be obtained by applying a fixed variable order during the reduction process. This highlights a trade-off: by foregoing guaranteed canonicity, the framework gains increased flexibility, potentially leading to more compact representations of certain classes of problems. We explore how this framework provides tools to articulate both search-based and knowledge compilation algorithms and discuss its natural extension to probabilistic logic and Weighted Model Counting.

--------------------------------------------------------------------------------------------------------

The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis

As AI systems increasingly influence human decision-making, understanding their implicit moral biases becomes critical for ensuring ethical alignment. This research systematically evaluates moral preferences across six leading language models using 18 dilemmas spanning five ethical frameworks. The study reveals consistent biases favoring Care and Virtue values while penalizing libertarian choices, regardless of model architecture or cultural origin. Applications span judicial decision support, healthcare resource allocation, autonomous vehicle ethical programming, and corporate governance systems. By highlighting the need for explainability and cultural awareness in AI moral reasoning, this work provides essential insights for developing transparent, aligned AI systems that can navigate complex ethical terrain while respecting diverse human values and cultural perspectives.

Authors: Eoin O'Doherty, Nicole Weinrauch, Andrew Talone, Uri Klempner, Xiaoyuan Yi, Xing Xie, Yi Zeng

Link: https://arxiv.org/abs/2509.10297v1

Date: 2025-09-d

Summary:

Artificial intelligence (AI) is advancing at a pace that raises urgent questions about how to align machine decision-making with human moral values. This working paper investigates how leading AI systems prioritize moral outcomes and what this reveals about the prospects for human-AI symbiosis. We address two central questions: (1) What moral values do state-of-the-art large language models (LLMs) implicitly favour when confronted with dilemmas? (2) How do differences in model architecture, cultural origin, and explainability affect these moral preferences? To explore these questions, we conduct a quantitative experiment with six LLMs, ranking and scoring outcomes across 18 dilemmas representing five moral frameworks. Our findings uncover strikingly consistent value biases. Across all models, Care and Virtue values outcomes were rated most moral, while libertarian choices were consistently penalized. Reasoning-enabled models exhibited greater sensitivity to context and provided richer explanations, whereas non-reasoning models produced more uniform but opaque judgments. This research makes three contributions: (i) Empirically, it delivers a large-scale comparison of moral reasoning across culturally distinct LLMs; (ii) Theoretically, it links probabilistic model behaviour with underlying value encodings; (iii) Practically, it highlights the need for explainability and cultural awareness as critical design principles to guide AI toward a transparent, aligned, and symbiotic future.

--------------------------------------------------------------------------------------------------------

We Need a New Ethics for a World of AI Agents

The rapid deployment of autonomous AI agents raises unprecedented ethical challenges that existing frameworks cannot adequately address. This position paper argues for urgent engagement with the implications of agent proliferation across scientific, engineering, and policy communities. The research explores critical challenges in safety, human-machine relationships, and social coordination as AI agents become increasingly autonomous and prevalent. Applications span autonomous vehicle fleets, financial trading systems, social media content moderation, and healthcare decision support. By calling attention to the need for new ethical frameworks governing agent interactions—both human-agent and agent-agent—this work highlights the urgency of developing governance structures that ensure beneficial outcomes as we transition toward a world where AI agents operate with increasing independence and influence.

Authors: Iason Gabriel, Geoff Keeling, Arianna Manzini, James Evans

Link: https://arxiv.org/abs/2509.10289v1

Date: 2025-09-d

Summary:

The deployment of capable AI agents raises fresh questions about safety, human-machine relationships and social coordination. We argue for greater engagement by scientists, scholars, engineers and policymakers with the implications of a world increasingly populated by AI agents. We explore key challenges that must be addressed to ensure that interactions between humans and agents, and among agents themselves, remain broadly beneficial.

--------------------------------------------------------------------------------------------------------

Large-scale Aerial Reconfigurable Intelligent Surface-aided Robust Anti-jamming Transmission

Modern wireless communications face sophisticated jamming threats that can severely disrupt critical operations, particularly in military and emergency scenarios. This research introduces a novel approach using aerial reconfigurable intelligent surfaces (ARIS) mounted on UAVs to combat adaptive jamming attacks. The mean field modeling approach bypasses traditional combinatorial optimization complexity while revealing that optimal jamming strategies follow a proximity-directivity trade-off principle. Applications include military communications, disaster response networks, critical infrastructure protection, and secure IoT deployments. By proposing a spatial water-filling principle for ARIS deployment and demonstrating computational complexity independent of UAV numbers, this work could enable scalable, resilient communication systems capable of maintaining connectivity under hostile conditions, ensuring mission-critical communications remain operational.

Authors: Junshan Luo, Shilian Wang, Boxiang He

Link: https://arxiv.org/abs/2509.10280v1

Date: 2025-09-d

Summary:

Aerial reconfigurable intelligent surfaces (ARIS), deployed on unmanned aerial vehicles (UAVs), could enhance anti-jamming communication performance by dynamically configuring channel conditions and establishing reliable air-ground links. However, large-scale ARIS faces critical deployment challenges due to the prohibitive computational complexity of conventional discrete optimization methods and sophisticated jamming threats. In this paper, we introduce a mean field modeling approach to design the spatial configuration of ARIS by a continuous density function, thus bypassing high-dimensional combinatorial optimization. We consider an adaptive jammer which adjusts its position and beamforming to minimize the sum-rate. A key finding reveals that the jammer's optimal strategy is governed by a proximity-directivity trade-off between reducing path loss and enhancing spatial focusing. To combat the jamming, we propose a robust anti-jamming transmission framework that jointly optimizes the BS beamforming, the ARIS reflection, and the ARIS spatial distribution to maximize the worst-case sum-rate. By leveraging variational optimization and Riemannian manifold methods, we efficiently solve the functional optimization problems. Our analysis further unveils that the optimal ARIS deployment follows a spatial water-filling principle, concentrating resources in high-gain regions while avoiding interference-prone areas. Simulation results demonstrate that the proposed framework remarkably improves the sum-rate. Furthermore, the computational complexity of the proposed algorithm is independent of the number of UAVs, validating its effectiveness for scalable ARIS-assisted anti-jamming communications.

--------------------------------------------------------------------------------------------------------

SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion

Sign language translation systems typically focus on manual gestures while overlooking crucial non-manual cues like mouthing, which provide essential linguistic information for disambiguation. SignClip addresses this limitation by fusing spatial gesture and lip movement features through hierarchical contrastive learning with multi-level alignment objectives. The framework ensures semantic consistency across sign-lip and visual-text modalities, improving translation accuracy on benchmark datasets. Applications include real-time interpretation services, educational tools for deaf students, accessibility features for video content, and communication aids for healthcare settings. By properly incorporating mouthing cues that deaf communities naturally use for linguistic precision, SignClip could significantly improve the quality of automated sign language interpretation, fostering more inclusive communication and better accessibility for deaf and hard-of-hearing individuals.

Authors: Wenfang Wu, Tingting Yuan, Yupeng Li, Daling Wang, Xiaoming Fu

Link: https://arxiv.org/abs/2509.10266v1

Date: 2025-09-d

Summary:

Sign language translation (SLT) aims to translate natural language from sign language videos, serving as a vital bridge for inclusive communication. While recent advances leverage powerful visual backbones and large language models, most approaches mainly focus on manual signals (hand gestures) and tend to overlook non-manual cues like mouthing. In fact, mouthing conveys essential linguistic information in sign languages and plays a crucial role in disambiguating visually similar signs. In this paper, we propose SignClip, a novel framework to improve the accuracy of sign language translation. It fuses manual and non-manual cues, specifically spatial gesture and lip movement features. Besides, SignClip introduces a hierarchical contrastive learning framework with multi-level alignment objectives, ensuring semantic consistency across sign-lip and visual-text modalities. Extensive experiments on two benchmark datasets, PHOENIX14T and How2Sign, demonstrate the superiority of our approach. For example, on PHOENIX14T, in the Gloss-free setting, SignClip surpasses the previous state-of-the-art model SpaMo, improving BLEU-4 from 24.32 to 24.71, and ROUGE from 46.57 to 48.38.

--------------------------------------------------------------------------------------------------------

Compartmentalised Agentic Reasoning for Clinical NLI

Clinical natural language inference faces unique challenges requiring specialized reasoning across diverse medical contexts, from causal relationships to risk assessment. This research challenges the assumption that scaling alone yields better structured representations by introducing CARENLI, which separates knowledge access from principled inference through family-specific solvers. The system routes premise-statement pairs to specialized reasoning modules while enforcing auditable procedures via planner, verifier, and refiner components. Applications include clinical decision support systems, medical literature analysis, insurance claim processing, and pharmaceutical safety monitoring. By achieving up to 42-point improvements in reasoning fidelity and providing auditable inference pathways, CARENLI could enhance trust in medical AI systems while ensuring that clinical reasoning remains transparent and verifiable for healthcare professionals.

Authors: Maël Jullien, Lei Xu, Marco Valentino, André Freitas

Link: https://arxiv.org/abs/2509.10222v1

Date: 2025-09-d

Summary:

A common assumption holds that scaling data and parameters yields increasingly structured, generalisable internal representations. We interrogate this assumption in clinical natural language inference (NLI) by adopting a benchmark decomposed into four reasoning families, Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction, and introducing CARENLI, a Compartmentalised Agentic Reasoning for Clinical NLI that separates knowledge access from principled inference. CARENLI routes each premise, statement pair to a family specific solver and enforces auditable procedures via a planner, verifier, and refiner. Across four LLMs, CARENLI improves fidelity by up to 42 points, reaching 98.0% in Causal Attribution and 81.2% in Risk State Abstraction. Verifiers flag violations with near-ceiling reliability, while refiners correct a substantial share of epistemic errors. Remaining failures cluster in routing, identifying family classification as the main bottleneck. These results show that LLMs often retain relevant facts but default to heuristics when inference is underspecified, a dissociation CARENLI makes explicit while offering a framework for safer, auditable reasoning.

--------------------------------------------------------------------------------------------------------

Towards Fully Automated Molecular Simulations: Multi-Agent Framework for Simulation Setup and Force Field Extraction

Molecular simulation setup remains a complex, time-intensive bottleneck in materials discovery, limiting the pace of scientific advancement. This research proposes a multi-agent framework where LLM-based agents autonomously understand characterization tasks, plan simulations, assemble force fields, and interpret results iteratively. The initial implementation focuses on literature-informed force field extraction and automated RASPA simulation setup with demonstrated high correctness and reproducibility. Applications span drug discovery, catalyst design, energy storage materials, and environmental remediation. By enabling fully autonomous materials characterization, this approach could dramatically accelerate the discovery of novel materials with tailored properties, from more efficient solar cells to advanced battery materials, potentially revolutionizing how we approach materials science research.

Authors: Marko Petković, Vlado Menkovski, Sofía Calero

Link: https://arxiv.org/abs/2509.10210v1

Date: 2025-09-d

Summary:

Automated characterization of porous materials has the potential to accelerate materials discovery, but it remains limited by the complexity of simulation setup and force field selection. We propose a multi-agent framework in which LLM-based agents can autonomously understand a characterization task, plan appropriate simulations, assemble relevant force fields, execute them and interpret their results to guide subsequent steps. As a first step toward this vision, we present a multi-agent system for literature-informed force field extraction and automated RASPA simulation setup. Initial evaluations demonstrate high correctness and reproducibility, highlighting this approach's potential to enable fully autonomous, scalable materials characterization.

--------------------------------------------------------------------------------------------------------

SI-FACT: Mitigating Knowledge Conflict via Self-Improving Faithfulness-Aware Contrastive Tuning

Large language models often generate unfaithful responses in knowledge-intensive tasks due to conflicts between parametric knowledge and provided context. SI-FACT addresses this through a self-improving framework that automatically generates contrastive learning data, including anchor samples and negative examples simulating unfaithful scenarios. The approach significantly reduces manual annotation costs while improving contextual recall rates by 6.2% over baselines. Applications include fact-checking systems, educational content verification, legal document analysis, and scientific literature review. By training models to distinguish faithful from unfaithful responses in representation space, SI-FACT could enhance trustworthiness in AI systems deployed for information-critical applications, ensuring that models prioritize provided evidence over potentially outdated or conflicting internal knowledge.

Authors: Shengqiang Fu

Link: https://arxiv.org/abs/2509.10208v1

Date: 2025-09-d

Summary:

Large Language Models often generate unfaithful responses in knowledge intensive tasks due to knowledge conflict,that is,a preference for relying on internal parametric knowledge rather than the provided context.To address this issue,we propose a novel self improving framework,Self Improving Faithfulness Aware Contrastive Tuning.The framework uses a self instruct mechanism that allows the base LLM to automatically generate high quality,structured contrastive learning data,including anchor samples,semantically equivalent positive samples,and negative samples simulating unfaithful scenarios.This approach significantly reduces the cost of manual annotation.Subsequently,contrastive learning is applied to train the model,enabling it to pull faithful responses closer and push unfaithful responses farther apart in the representation space.Experiments on knowledge conflict evaluation benchmarks ECARE KRE and COSE KRE show that the SI FACT model based on Llama3 8B Instruct improves the Contextual Recall Rate by 6.2% over the best baseline method,while significantly reducing dependence on internal memory.The results indicate that SI FACT provides strong effectiveness and high data efficiency in enhancing the contextual faithfulness of LLMs,offering a practical pathway toward building more proactive and trustworthy language models.

--------------------------------------------------------------------------------------------------------

Investigating Feature Attribution for 5G Network Intrusion Detection

Fifth-generation networks in critical applications require not just attack detection but reliable, interpretable security verdicts suitable for automated incident response. This research evaluates explainable AI techniques for 5G network intrusion detection, comparing statistical feature attribution (SHAP) with logical explanations (VoTE-XAI) across three metrics: sparsity, stability, and efficiency. VoTE-XAI identified significantly fewer relevant features while maintaining comprehensive coverage and demonstrating superior computational efficiency. Applications include autonomous network security orchestration, telecommunications infrastructure protection, industrial IoT security, and critical infrastructure defense. By providing faster, more concise explanations for security alerts, this work could enable real-time, automated threat response in 5G networks, ensuring that security systems can react swiftly to sophisticated attacks while maintaining operational transparency for network administrators.

Authors: Federica Uccello, Simin Nadjm-Tehrani

Link: https://arxiv.org/abs/2509.10206v1

Date: 2025-09-d

Summary:

With the rise of fifth-generation (5G) networks in critical applications, it is urgent to move from detection of malicious activity to systems capable of providing a reliable verdict suitable for mitigation. In this regard, understanding and interpreting machine learning (ML) models' security alerts is crucial for enabling actionable incident response orchestration. Explainable Artificial Intelligence (XAI) techniques are expected to enhance trust by providing insights into why alerts are raised. A dominant approach statistically associates feature sets that can be correlated to a given alert. This paper starts by questioning whether such attribution is relevant for future generation communication systems, and investigates its merits in comparison with an approach based on logical explanations. We extensively study two methods, SHAP and VoTE-XAI, by analyzing their interpretations of alerts generated by an XGBoost model in three different use cases with several 5G communication attacks. We identify three metrics for assessing explanations: sparsity, how concise they are; stability, how consistent they are across samples from the same attack type; and efficiency, how fast an explanation is generated. As an example, in a 5G network with 92 features, 6 were deemed important by VoTE-XAI for a Denial of Service (DoS) variant, ICMPFlood, while SHAP identified over 20. More importantly, we found a significant divergence between features selected by SHAP and VoTE-XAI. However, none of the top-ranked features selected by SHAP were missed by VoTE-XAI. When it comes to efficiency of providing interpretations, we found that VoTE-XAI is significantly more responsive, e.g. it provides a single explanation in under 0.002 seconds, in a high-dimensional setting (478 features).

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithSeptember 15, 2025Comment