Week Ending 7.6.2025

 

RESEARCH WATCH: 7.6.2025

 

MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

Traffic accident severity prediction is crucial for emergency response planning and transportation safety management, yet traditional approaches struggle with incomplete data and severe class imbalance where fatal accidents are rare but critical to detect. MARBLE introduces a novel multi-agent framework that decomposes the complex prediction task across specialized reasoning agents, each focusing on specific feature domains like spatial, environmental, and temporal factors. This modular approach enables more interpretable decision-making while achieving remarkable 90% accuracy compared to traditional methods' 48% ceiling. Applications include real-time emergency dispatch optimization, insurance risk assessment, traffic infrastructure planning, and automated incident response systems that can better allocate resources based on predicted severity levels.

Authors:  Kaleem Ullah Qasim, Jiashu Zhang

Link:  https://arxiv.org/abs/2507.04893v1

Date: 2025-07-d

Summary:

Accident severity prediction plays a critical role in transportation safety systems but is a persistently difficult task due to incomplete data, strong feature dependencies, and severe class imbalance in which rare but high-severity cases are underrepresented and hard to detect. Existing methods often rely on monolithic models or black box prompting, which struggle to scale in noisy, real-world settings and offer limited interpretability. To address these challenges, we propose MARBLE a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents, including an interchangeable ML-backed agent. Each agent focuses on a semantic subset of features (e.g., spatial, environmental, temporal), enabling scoped reasoning and modular prompting without the risk of prompt saturation. Predictions are coordinated through either rule-based or LLM-guided consensus mechanisms that account for class rarity and confidence dynamics. The system retains structured traces of agent-level reasoning and coordination outcomes, supporting in-depth interpretability and post-hoc performance diagnostics. Across both UK and US datasets, MARBLE consistently outperforms traditional machine learning classifiers and state-of-the-art (SOTA) prompt-based reasoning methods including Chain-of-Thought (CoT), Least-to-Most (L2M), and Tree-of-Thought (ToT) achieving nearly 90% accuracy where others plateau below 48%. This performance redefines the practical ceiling for accident severity classification under real world noise and extreme class imbalance. Our results position MARBLE as a generalizable and interpretable framework for reasoning under uncertainty in safety-critical applications.

--------------------------------------------------------------------------------------------------------

UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization

Modern cities generate vast amounts of data from sensors, traffic systems, and citizen interactions, creating opportunities for AI systems to enhance urban management and citizen services. UrbanMind addresses the challenge of developing AI that can autonomously perceive, reason, and act within complex urban environments through a novel tool-enhanced retrieval-augmented generation framework. The system's Continual Retrieval-Augmented MoE-based LLM architecture dynamically incorporates domain-specific urban knowledge while adapting to evolving city conditions through multilevel optimization. Potential applications span smart city management, traffic optimization, emergency response coordination, urban planning support, citizen service automation, and infrastructure monitoring systems that can provide intelligent insights for more efficient and responsive urban governance.

Authors:  Kai Yang, Zelin Zhu, Chengtao Jian, Hui Ma, Shengjie Zhao, Xiaozhou Ye, Ye Ouyang

Link:  https://arxiv.org/abs/2507.04706v1

Date: 2025-07-d

Summary:

Urban general intelligence (UGI) refers to the capacity of AI systems to autonomously perceive, reason, and act within dynamic and complex urban environments. In this paper, we introduce UrbanMind, a tool-enhanced retrieval-augmented generation (RAG) framework designed to facilitate UGI. Central to UrbanMind is a novel architecture based on Continual Retrieval-Augmented MoE-based LLM (C-RAG-LLM), which dynamically incorporates domain-specific knowledge and evolving urban data to support long-term adaptability. The architecture of C-RAG-LLM aligns naturally with a multilevel optimization framework, where different layers are treated as interdependent sub-problems. Each layer has distinct objectives and can be optimized either independently or jointly through a hierarchical learning process. The framework is highly flexible, supporting both end-to-end training and partial layer-wise optimization based on resource or deployment constraints. To remain adaptive under data drift, it is further integrated with an incremental corpus updating mechanism. Evaluations on real-world urban tasks of a variety of complexity verify the effectiveness of the proposed framework. This work presents a promising step toward the realization of general-purpose LLM agents in future urban environments.

--------------------------------------------------------------------------------------------------------

RAPTOR: Practical Numerical Profiling of Scientific Applications

High-performance computing is rapidly evolving with new low-precision units driven by AI hardware advances, forcing scientists to reconsider traditional 64-bit floating-point computations for performance gains. RAPTOR provides a crucial bridge by offering transparent numerical profiling tools that help domain scientists identify where precision can be safely reduced without compromising scientific accuracy. Using LLVM-based transformations, the system can emulate different precision levels and highlight numerical instabilities before deployment. Applications include climate modeling optimization, computational fluid dynamics acceleration, molecular simulation enhancement, and any scientific computing domain where balancing numerical accuracy with computational efficiency is critical for scaling research to larger problems or achieving faster time-to-solution.

Authors:  Faveo Hoerold, Ivan R. Ivanov, Akash Dhruv, William S. Moses, Anshu Dubey, Mohamed Wahib, Jens Domke

Link:  https://arxiv.org/abs/2507.04647v1

Date: 2025-07-d

Summary:

The proliferation of low-precision units in modern high-performance architectures increasingly burdens domain scientists. Historically, the choice in HPC was easy: can we get away with 32 bit floating-point operations and lower bandwidth requirements, or is FP64 necessary? Driven by Artificial Intelligence, vendors introduced novel low-precision units for vector and tensor operations, and FP64 capabilities stagnate or are reduced. This is forcing scientists to re-evaluate their codes, but a trivial search-and-replace approach to go from FP64 to FP16 will not suffice. We introduce RAPTOR: a numerical profiling tool to guide scientists in their search for code regions where precision lowering is feasible. Using LLVM, we transparently replace high-precision computations using low-precision units, or emulate a user-defined precision. RAPTOR is a novel, feature-rich approach -- with focus on ease of use -- to change, profile, and reason about numerical requirements and instabilities, which we demonstrate with four real-world multi-physics Flash-X applications.

--------------------------------------------------------------------------------------------------------

SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection

Respiratory insufficiency represents a critical medical condition requiring early detection for effective treatment, yet traditional diagnostic methods may not be accessible in remote or resource-limited settings. SPIRA demonstrates the potential of voice-based medical screening by developing an intelligent system that can detect respiratory problems through audio analysis alone. This non-invasive approach leverages advances in audio processing and machine learning to identify vocal biomarkers associated with reduced blood oxygen levels. The system's applications extend to telemedicine platforms, remote patient monitoring, mass screening programs in underserved areas, home healthcare devices, and early warning systems for respiratory conditions, potentially enabling more accessible and continuous health monitoring without specialized medical equipment.

Authors:  Renato Cordeiro Ferreira, Dayanne Gomes, Vitor Tamae, Francisco Wernke, Alfredo Goldman

Link:  https://arxiv.org/abs/2507.04548v1

Date: 2025-07-d

Summary:

Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.

--------------------------------------------------------------------------------------------------------

QF: Quick Feedforward AI Model Training without Gradient Back Propagation

Traditional neural network training relies heavily on gradient backpropagation, which can be computationally expensive and memory-intensive, particularly for large transformer models. Quick Feedforward (QF) Learning presents a paradigm shift by enabling knowledge transfer through closed-form feedforward computations that mirror biological brain processes more closely. This approach allows models to learn and adapt within the same runtime environment without requiring separate training phases, making the process more resource-efficient and brain-like. Applications include edge computing deployments where computational resources are limited, continual learning systems that must adapt to new information in real-time, mobile AI applications requiring efficient on-device learning, and personalized AI systems that can quickly adapt to individual user preferences without extensive retraining.

Authors:  Feng Qi

Link:  https://arxiv.org/abs/2507.04300v1

Date: 2025-07-d

Summary:

We propose Quick Feedforward (QF) Learning, a novel knowledge consolidation framework for transformer-based models that enables efficient transfer of instruction derived knowledge into model weights through feedforward activations without any gradient back propagation. Unlike traditional finetuning, QF updates are computed in closed form, require minimal parameter modification, and preserve prior knowledge. Importantly, QF allows models to train and infer within the same runtime environment, making the process more resource efficient and closely aligned with how the human brain operates. Code and models are open sourced on GitHub. I hope QF Learning inspires a more efficient and brain-like paradigm for AI systems.

--------------------------------------------------------------------------------------------------------

Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT Networks

Internet of Things deployments across vast geographical areas face significant security challenges, particularly when devices operate in unguarded environments vulnerable to attacks and tampering. This research addresses the critical need for automated security attestation using UAV-based systems that can verify device integrity while managing energy constraints from solar-powered charging stations. The Deep Reinforcement Learning approach optimizes UAV trajectories to balance security verification with network throughput, addressing the trade-off between device downtime during attestation and maintaining data flow. Applications include critical infrastructure monitoring, environmental sensor networks, smart agriculture systems, border security installations, and any large-scale IoT deployment where device integrity must be continuously verified across remote and potentially hostile environments.

Authors:  Yizhou Luo, Kwan-Wu Chin, Ruyi Guan, Xi Xiao, Caimeng Wang, Jingyin Feng, Tengjiao He

Link:  https://arxiv.org/abs/2507.03950v1

Date: 2025-07-d

Summary:

Devices operating in Internet of Things (IoT) networks may be deployed across vast geographical areas and interconnected via multi-hop communications. Further, they may be unguarded. This makes them vulnerable to attacks and motivates operators to check on devices frequently. To this end, we propose and study an Unmanned Aerial Vehicle (UAV)-aided attestation framework for use in IoT networks with a charging station powered by solar. A key challenge is optimizing the trajectory of the UAV to ensure it attests as many devices as possible. A trade-off here is that devices being checked by the UAV are offline, which affects the amount of data delivered to a gateway. Another challenge is that the charging station experiences time-varying energy arrivals, which in turn affect the flight duration and charging schedule of the UAV. To address these challenges, we employ a Deep Reinforcement Learning (DRL) solution to optimize the UAV's charging schedule and the selection of devices to be attested during each flight. The simulation results show that our solution reduces the average age of trust by 88% and throughput loss due to attestation by 30%.

--------------------------------------------------------------------------------------------------------

Learning Disentangled Stain and Structural Representations for Semi-Supervised Histopathology Segmentation

Cancer diagnosis through histopathology image analysis faces significant challenges due to variability in tissue staining and limited availability of expertly annotated training data. The Color-Structure Dual-Student (CSDS) framework addresses these issues by learning separate representations for stain appearance and tissue structure, enabling more robust segmentation with minimal labeled data. This semi-supervised approach uses specialized student networks and uncertainty estimation to improve performance in low-data scenarios. Applications include automated cancer diagnosis support, pathology workflow acceleration, medical education tools, remote diagnostic assistance in underserved areas, and research platforms for studying tissue morphology patterns, ultimately enabling more accessible and consistent cancer screening and diagnosis across healthcare systems with varying expertise levels.

Authors:  Ha-Hieu Pham, Nguyen Lan Vi Vu, Thanh-Huy Nguyen, Ulas Bagci, Min Xu, Trung-Nghia Le, Huy-Hieu Pham

Link:  https://arxiv.org/abs/2507.03923v1

Date: 2025-07-d

Summary:

Accurate gland segmentation in histopathology images is essential for cancer diagnosis and prognosis. However, significant variability in Hematoxylin and Eosin (H&E) staining and tissue morphology, combined with limited annotated data, poses major challenges for automated segmentation. To address this, we propose Color-Structure Dual-Student (CSDS), a novel semi-supervised segmentation framework designed to learn disentangled representations of stain appearance and tissue structure. CSDS comprises two specialized student networks: one trained on stain-augmented inputs to model chromatic variation, and the other on structure-augmented inputs to capture morphological cues. A shared teacher network, updated via Exponential Moving Average (EMA), supervises both students through pseudo-labels. To further improve label reliability, we introduce stain-aware and structure-aware uncertainty estimation modules that adaptively modulate the contribution of each student during training. Experiments on the GlaS and CRAG datasets show that CSDS achieves state-of-the-art performance in low-label settings, with Dice score improvements of up to 1.2% on GlaS and 0.7% on CRAG at 5% labeled data, and 0.7% and 1.4% at 10%. Our code and pre-trained models are available at https://github.com/hieuphamha19/CSDS.

--------------------------------------------------------------------------------------------------------

Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models

Educational presentations and corporate communications increasingly rely on dynamic animations to engage audiences and convey complex information effectively, yet AI-powered presentation tools lack sophisticated animation capabilities. This research addresses the gap by developing the first comprehensive dataset and evaluation framework for slide animation generation, covering all PowerPoint animation effects with natural language descriptions. The fine-tuned vision-language model demonstrates significant improvements in generating appropriate animations from textual descriptions. Applications include automated presentation generation tools, educational content creation platforms, corporate communication systems, accessibility tools for visually impaired users, and interactive learning environments that can dynamically adapt presentation styles based on audience engagement and comprehension needs.

Authors:  Yifan Jiang, Yibo Xue, Yukun Kang, Pin Zheng, Jian Peng, Feiran Wu, Changliang Xu

Link:  https://arxiv.org/abs/2507.03916v1

Date: 2025-07-d

Summary:

Slide animations, such as fade-ins, fly-ins, and wipes, are critical for audience engagement, efficient information delivery, and vivid visual expression. However, most AI-driven slide-generation tools still lack native animation support, and existing vision-language models (VLMs) struggle with animation tasks due to the absence of public datasets and limited temporal-reasoning capabilities. To address this gap, we release the first public dataset for slide-animation modeling: 12,000 triplets of natural-language descriptions, animation JSON files, and rendered videos, collectively covering every built-in PowerPoint effect. Using this resource, we fine-tune Qwen-2.5-VL-7B with Low-Rank Adaptation (LoRA) and achieve consistent improvements over GPT-4.1 and Gemini-2.5-Pro in BLEU-4, ROUGE-L, SPICE, and our Coverage-Order-Detail Assessment (CODA) metric, which evaluates action coverage, temporal order, and detail fidelity. On a manually curated test set of slides, the LoRA model increases BLEU-4 by around 60%, ROUGE-L by 30%, and shows significant improvements in CODA-detail. This demonstrates that low-rank adaptation enables reliable temporal reasoning and generalization beyond synthetic data. Overall, our dataset, LoRA-enhanced model, and CODA metric provide a rigorous benchmark and foundation for future research on VLM-based dynamic slide generation.

--------------------------------------------------------------------------------------------------------

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Real-time 3D scene reconstruction from continuous image streams is essential for applications ranging from autonomous navigation to augmented reality, yet existing methods struggle with memory limitations and information loss from earlier frames. Point3R introduces an innovative explicit spatial pointer memory system that maintains 3D-aware scene understanding while processing streaming visual data. Unlike implicit memory approaches, this framework directly associates spatial locations with scene features, enabling more efficient and accurate reconstruction. Applications include autonomous vehicle navigation, robotics SLAM systems, augmented reality platforms, architectural surveying, virtual reality content creation, and any real-time computer vision system requiring persistent 3D scene understanding from continuous visual input streams.

Authors:  Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Link:  https://arxiv.org/abs/2507.02863v1

Date: 2025-07-d

Summary:

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.

--------------------------------------------------------------------------------------------------------

Early Signs of Steganographic Capabilities in Frontier LLMs

As large language models become more sophisticated, concerns arise about their potential to evade monitoring systems through steganographic techniques that hide information within seemingly normal text outputs. This research investigates whether current frontier LLMs can encode hidden messages or perform covert reasoning within their generated content. While current models show limited steganographic abilities under standard conditions, they demonstrate nascent capabilities when given additional affordances like unmonitored scratchpads. Applications include AI safety research, content moderation system development, security monitoring tools, adversarial testing frameworks, and policy development for AI governance, highlighting the need for robust monitoring systems as language models continue to advance in capability and deployment.

Authors:  Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner

Link:  https://arxiv.org/abs/2507.02737v1

Date: 2025-07-d

Summary:

Monitoring Large Language Model (LLM) outputs is crucial for mitigating risks from misuse and misalignment. However, LLMs could evade monitoring through steganography: Encoding hidden information within seemingly benign generations. In this paper, we evaluate the steganography capabilities in frontier LLMs to better understand the risk they pose. We focus on two types of steganography: passing encoded messages and performing encoded reasoning. We find that current models are unable to encode short messages in their outputs without a monitor noticing under standard affordances. They can succeed, however, if given additional affordances such as using an unmonitored scratchpad and coordinating on what encoding scheme to use. We additionally find early signs that models can perform basic encoded reasoning in a simple state-tracking problem. This includes some ability to reason with their own and pre-defined schemes, including encoding schemes such as Hexadecimal. Despite this, they can rarely hide reasoning subtly within a cover task to fool a monitor. Overall, our results indicate that current LLMs exhibit nascent steganographic capabilities. While these capabilities are likely insufficient to bypass well-designed monitors at present, this could change in the future.

--------------------------------------------------------------------------------------------------------

From Turing to Tomorrow: The UK's Approach to AI Regulation

The United Kingdom has positioned itself as a global leader in AI safety coordination while pursuing a distinctive "pro-innovation" regulatory approach that balances technological advancement with risk management. This comprehensive analysis examines the UK's evolution from early AI safety concerns to establishing the AI Safety Institute and hosting international summits, while maintaining a lighter regulatory touch compared to the EU's comprehensive legislation. The research provides policy recommendations for establishing flexible, principles-based regulation and addressing emerging challenges like AI-generated misinformation and labor market disruption. Applications include informing international AI governance frameworks, guiding national AI policy development, supporting regulatory coordination efforts, and demonstrating how democratic societies can harness AI benefits while managing associated risks.

Authors:  Oliver Ritchie, Markus Anderljung, Tom Rachman

Link:  https://arxiv.org/abs/2507.03050v1

Date: 2025-07-d

Summary:

The UK has pursued a distinctive path in AI regulation: less cautious than the EU but more willing to address risks than the US, and has emerged as a global leader in coordinating AI safety efforts. Impressive developments from companies like London-based DeepMind began to spark concerns in the UK about catastrophic risks from around 2012, although regulatory discussion at the time focussed on bias and discrimination. By 2022, these discussions had evolved into a "pro-innovation" strategy, in which the government directed existing regulators to take a light-touch approach, governing AI at point of use, but avoided regulating the technology or infrastructure directly. ChatGPT arrived in late 2022, galvanising concerns that this approach may be insufficient. The UK responded by establishing an AI Safety Institute to monitor risks and hosting the first international AI Safety Summit in 2023, but - unlike the EU - refrained from regulating frontier AI development in addition to its use. A new government was elected in 2024 which promised to address this gap, but at the time of writing is yet to do so.   What should the UK do next? The government faces competing objectives: harnessing AI for economic growth and better public services while mitigating risk. In light of these, we propose establishing a flexible, principles-based regulator to oversee the most advanced AI development, defensive measures against risks from AI-enabled biological design tools, and argue that more technical work is needed to understand how to respond to AI-generated misinformation. We argue for updated legal frameworks on copyright, discrimination, and AI agents, and that regulators will have a limited but important role if AI substantially disrupts labour markets.   If the UK gets AI regulation right, it could demonstrate how democratic societies can harness AI's benefits while managing its risks.

--------------------------------------------------------------------------------------------------------

Completion of the DrugMatrix Toxicogenomics Database using 3-Dimensional Tensors

Drug development faces significant challenges in predicting toxicity across different tissues and treatments, with incomplete toxicogenomics databases limiting research progress and safety assessment capabilities. This work applies advanced tensor completion techniques to fill gaps in the world's largest in-vivo toxicogenomics database, preserving the natural 3-dimensional structure of tissue-treatment-gene expression relationships. The tensor-based approach demonstrates superior accuracy compared to traditional matrix factorization methods while revealing important relationships among tissues. Applications include accelerated drug discovery pipelines, improved safety screening protocols, cross-species toxicity prediction, personalized medicine development, regulatory approval processes, and pharmaceutical research platforms that can better predict drug effects across different biological systems, ultimately reducing development costs and improving patient safety.

Authors:  Tan Nguyen, Guojing Cong

Link:  https://arxiv.org/abs/2507.03024v1

Date: 2025-07-d

Summary:

We explore applying a tensor completion approach to complete the DrugMatrix toxicogenomics dataset. Our hypothesis is that by preserving the 3-dimensional structure of the data, which comprises tissue, treatment, and transcriptomic measurements, and by leveraging a machine learning formulation, our approach will improve upon prior state-of-the-art results. Our results demonstrate that the new tensor-based method more accurately reflects the original data distribution and effectively captures organ-specific variability. The proposed tensor-based methodology achieved lower mean squared errors and mean absolute errors compared to both conventional Canonical Polyadic decomposition and 2-dimensional matrix factorization methods. In addition, our non-negative tensor completion implementation reveals relationships among tissues. Our findings not only complete the world's largest in-vivo toxicogenomics database with improved accuracy but also offer a promising methodology for future studies of drugs that may cross species barriers, for example, from rats to humans.

--------------------------------------------------------------------------------------------------------

REDUS: Adaptive Resampling for Efficient Deep Learning in Centralized and Federated IoT Networks

Internet of Things networks face computational resource constraints when integrating deep learning capabilities with software-defined networking operations, particularly in latency-sensitive environments where both systems compete for limited processing power. REDUS addresses this challenge through intelligent data resampling that prioritizes misclassified samples while excluding redundant data, inspired by AdaBoost principles. This approach significantly reduces training time while maintaining accuracy, making it particularly valuable for federated learning scenarios on resource-constrained edge devices. Applications include smart city infrastructure, industrial IoT security systems, autonomous vehicle networks, healthcare monitoring platforms, and any distributed IoT deployment where efficient machine learning must coexist with real-time network management without compromising system responsiveness or security.

Authors:  Eyad Gad, Gad Gad, Mostafa M. Fouda, Mohamed I. Ibrahem, Muhammad Ismail, Zubair Md Fadlullah

Link:  https://arxiv.org/abs/2507.02021v1

Date: 2025-07-d

Summary:

With the rise of Software-Defined Networking (SDN) for managing traffic and ensuring seamless operations across interconnected devices, challenges arise when SDN controllers share infrastructure with deep learning (DL) workloads. Resource contention between DL training and SDN operations, especially in latency-sensitive IoT environments, can degrade SDN's responsiveness and compromise network performance. Federated Learning (FL) helps address some of these concerns by decentralizing DL training to edge devices, thus reducing data transmission costs and enhancing privacy. Yet, the computational demands of DL training can still interfere with SDN's performance, especially under the continuous data streams characteristic of IoT systems. To mitigate this issue, we propose REDUS (Resampling for Efficient Data Utilization in Smart-Networks), a resampling technique that optimizes DL training by prioritizing misclassified samples and excluding redundant data, inspired by AdaBoost. REDUS reduces the number of training samples per epoch, thereby conserving computational resources, reducing energy consumption, and accelerating convergence without significantly impacting accuracy. Applied within an FL setup, REDUS enhances the efficiency of model training on resource-limited edge devices while maintaining network performance. In this paper, REDUS is evaluated on the CICIoT2023 dataset for IoT attack detection, showing a training time reduction of up to 72.6% with a minimal accuracy loss of only 1.62%, offering a scalable and practical solution for intelligent networks.

--------------------------------------------------------------------------------------------------------

The Age of Sensorial Zero Trust: Why We Can No Longer Trust Our Senses

The emergence of sophisticated deepfake technology and AI-generated content has fundamentally changed the security landscape, making traditional human sensory verification unreliable for detecting fraud and manipulation. This research introduces the concept of Sensorial Zero Trust, extending cybersecurity principles to require systematic verification of information perceived through human senses. The framework integrates out-of-band verification, vision-language models for forensic analysis, and cryptographic provenance to create robust authentication systems. Applications include corporate security protocols, media verification systems, legal evidence authentication, educational content validation, financial fraud prevention, and any organization requiring protection against AI-generated deception, emphasizing the need for technological solutions to complement human judgment in an era of synthetic media.

Authors:  Fabio Correa Xavier

Link:  https://arxiv.org/abs/2507.00907v1

Date: 2025-07-d

Summary:

In a world where deepfakes and cloned voices are emerging as sophisticated attack vectors, organizations require a new security mindset: Sensorial Zero Trust [9]. This article presents a scientific analysis of the need to systematically doubt information perceived through the senses, establishing rigorous verification protocols to mitigate the risks of fraud based on generative artificial intelligence. Key concepts, such as Out-of-Band verification, Vision-Language Models (VLMs) as forensic collaborators, cryptographic provenance, and human training, are integrated into a framework that extends Zero Trust principles to human sensory information. The approach is grounded in empirical findings and academic research, emphasizing that in an era of AI-generated realities, even our eyes and ears can no longer be implicitly trusted without verification. Leaders are called to foster a culture of methodological skepticism to protect organizational integrity in this new threat landscape.

--------------------------------------------------------------------------------------------------------

CARTS: Collaborative Agents for Recommendation Textual Summarization

E-commerce and content recommendation systems increasingly require sophisticated textual summarization to create engaging, concise descriptions for grouped items and product carousels, yet traditional approaches struggle with the specific constraints of recommendation contexts. CARTS introduces a multi-agent framework that decomposes summarization into generation, refinement, and arbitration stages, each handled by specialized agents that consider relevance, length constraints, and user engagement factors. The system significantly outperforms single-pass approaches in both offline metrics and live A/B testing. Applications include e-commerce product recommendations, content streaming platforms, news aggregation systems, social media feeds, advertising platforms, and any recommendation system requiring compelling, contextually relevant textual summaries that drive user engagement and conversion.

Authors:  Jiao Chen, Kehui Yao, Reza Yousefi Maragheh, Kai Zhao, Jianpeng Xu, Jason Cho, Evren Korpeoglu, Sushant Kumar, Kannan Achan

Link:  https://arxiv.org/abs/2506.17765v2

Date: 2025-07-d

Summary:

Current recommendation systems often require some form of textual data summarization, such as generating concise and coherent titles for product carousels or other grouped item displays. While large language models have shown promise in NLP domains for textual summarization, these approaches do not directly apply to recommendation systems, where explanations must be highly relevant to the core features of item sets, adhere to strict word limit constraints. In this paper, we propose CARTS (Collaborative Agents for Recommendation Textual Summarization), a multi-agent LLM framework designed for structured summarization in recommendation systems. CARTS decomposes the task into three stages-Generation Augmented Generation (GAG), refinement circle, and arbitration, where successive agent roles are responsible for extracting salient item features, iteratively refining candidate titles based on relevance and length feedback, and selecting the final title through a collaborative arbitration process. Experiments on large-scale e-commerce data and live A/B testing show that CARTS significantly outperforms single-pass and chain-of-thought LLM baselines, delivering higher title relevance and improved user engagement metrics.

--------------------------------------------------------------------------------------------------------

Program of Equations Thoughts to Solve Algebra Word Problems

Algebraic word problem solving represents a challenging intersection of natural language understanding and mathematical reasoning, where large language models often fail due to computational errors that accumulate through step-by-step calculations. Program of Equations Thoughts (POET) transforms this challenge by separating equation prediction from computational execution, leveraging Python interpreters to handle complex calculations while LLMs focus on reasoning and problem decomposition. The approach achieves state-of-the-art results with over 95% accuracy on standard datasets. Applications include automated tutoring systems, educational assessment platforms, homework assistance tools, mathematical learning applications, and any educational technology requiring robust algebraic problem-solving capabilities that can provide reliable, step-by-step solutions for students and educators.

Authors:  Yunze Lin

Link:  https://arxiv.org/abs/2505.20170v2

Date: 2025-07-d

Summary:

Solving algebraic word problems (AWPs) has recently emerged as an important natural language processing task. Recently, large language models (LLMs) have demonstrated powerful mathematical capabilities, and the Chain-of-Thought technique, which guides LLMs through step-by-step reasoning, has yielded impressive results. However, this reasoning ability is limited by the computational weaknesses of LLMs themselves, where calculation errors can accumulate, leading to incorrect final answers. To address this, we propose Program of Equations Thoughts (POET), which transforms the task of generating step-by-step reasoning answers into a two-stage task of predicting equations and generating code, offloading complex computations to a Python interpreter to avoid calculation errors in LLMs. Furthermore, we propose Zero-shot POET, which utilizes a manually designed template to enable LLMs to directly generate Python code for one-step solving. Our method achieves accuracies of 95.3% and 98.0% on the PEN and ALG514 datasets, respectively, setting a new state-of-the-art (SOTA). Zero-shot POET also achieves the SOTA result of 95.5% on the DRAW-1K dataset.

--------------------------------------------------------------------------------------------------------

Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms

The search for extraterrestrial intelligence includes looking for megastructures that advanced civilizations might build around stars to harness energy, yet the long-term stability of such constructions remains poorly understood. This research examines the physics of collisional cascades in hypothetical megaswarms, vast satellite constellations that could serve as powerful technosignatures. The analysis reveals that these structures are inherently unstable on cosmic timescales without active maintenance, with collisional cascades developing rapidly once guidance systems fail. Applications include SETI search strategy optimization, technosignature detection algorithms, space archaeology methodologies, astrobiology research frameworks, and space policy development for humanity's own potential megastructure projects, providing insights into both the detection of alien civilizations and the engineering challenges of large-scale space infrastructure.

Authors:  Brian C. Lacki

Link:  https://arxiv.org/abs/2504.21151v2

Date: 2025-07-d

Summary:

Extraterrestrial intelligences are speculated to surround stars with structures to collect their energy or to signal distant observers. If they exist, these most likely are megaswarms, vast constellations of satellites (elements) in orbit around the hosts. Although long-lived megaswarms are extremely powerful technosignatures, they are liable to be subject to collisional cascades once guidance systems start failing. The collisional time is roughly an orbital period divided by the covering fraction of the swarm. Structuring the swarm orbits does not prolong the initial collisional time as long as there is enough randomness to ensure collisions, although it can reduce collision velocities. I further show that once the collisional cascade begins, it can develop extremely rapidly for hypervelocity collisions. Companion stars or planets in the stellar system induce perturbations through the Lidov-Kozai effect among others, which can result in orbits crossing within some millions of years. Radiative perturbations, including the Yarkovsky effect, also can destabilize swarms. Most megaswarms are thus likely to be short-lived on cosmic timescales without active upkeep. I discuss possible mitigation strategies and implications for megastructure searches.

--------------------------------------------------------------------------------------------------------

PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

Remote sensing applications increasingly require fusion of multiple imaging modalities to achieve comprehensive land cover classification, yet combining Synthetic Aperture Radar (SAR) and RGB imagery remains challenging due to their fundamentally different physical properties and information encoding. Phase-Amplitude Decoupling (PAD) addresses this through frequency-domain separation that preserves shared structural information while maintaining complementary characteristics specific to each modality. The approach achieves state-of-the-art performance by explicitly handling the distinct physical properties encoded in frequency spectra. Applications include agricultural monitoring, urban planning, environmental conservation, disaster response, climate change research, and any earth observation system requiring accurate, automated land cover analysis from multi-modal satellite or aerial imagery for improved decision-making and resource management.Authors:  Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li

Link:  https://arxiv.org/abs/2504.19136v2

Date: 2025-07-d

Summary:

The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and underutilized spectral complementarity. Existing methods often fail to decouple shared structural features from modality-complementary radiometric attributes, causing feature conflicts and information loss. To address this, we propose Phase-Amplitude Decoupling (PAD), a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components in the Fourier domain, thus reinforcing shared structures while preserving complementary characteristics to improve fusion quality. Unlike prior approaches that overlook the distinct physical properties encoded in frequency spectra, PAD is the first to introduce explicit amplitude-phase decoupling for multi-modal fusion. Specifically, PAD comprises two key components: 1) Phase Spectrum Correction (PSC), which aligns cross-modal phase features via convolution-guided scaling to enhance geometric consistency; and 2) Amplitude Spectrum Fusion (ASF), which dynamically integrates high-frequency and low-frequency patterns using frequency-adaptive multilayer perceptrons, leveraging SAR's morphological sensitivity and RGB's spectral richness. Extensive experiments on WHU-OPT-SAR and DDHR-SK datasets demonstrate state-of-the-art performance. Our work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing. The code will be available at https://github.com/RanFeng2/PAD.

--------------------------------------------------------------------------------------------------------

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin

Robotics applications face the persistent challenge of transferring policies trained in simulation to real-world hardware, often requiring complex adaptation procedures that may compromise performance or safety. Real-is-Sim introduces a novel paradigm where policies always operate on a simulated robot while the real hardware simply follows the simulation through continuous 60Hz synchronization. This approach shifts the sim-to-real gap from the policy domain to the digital twin's synchronization mechanisms, enabling seamless switching between virtual and physical execution. Applications include manufacturing automation, service robotics, autonomous vehicles, medical robotics, and any robotic system requiring safe policy development and deployment with the flexibility to test extensively in simulation while maintaining real-world performance guarantees.

Authors:  Jad Abou-Chakra, Lingfeng Sun, Krishan Rana, Brandon May, Karl Schmeckpeper, Niko Suenderhauf, Maria Vittoria Minniti, Laura Herlant

Link:  https://arxiv.org/abs/2504.03597v2

Date: 2025-07-d

Summary:

We introduce real-is-sim, a new approach to integrating simulation into behavior cloning pipelines. In contrast to real-only methods, which lack the ability to safely test policies before deployment, and sim-to-real methods, which require complex adaptation to cross the sim-to-real gap, our framework allows policies to seamlessly switch between running on real hardware and running in parallelized virtual environments. At the center of real-is-sim is a dynamic digital twin, powered by the Embodied Gaussian simulator, that synchronizes with the real world at 60Hz. This twin acts as a mediator between the behavior cloning policy and the real robot. Policies are trained using representations derived from simulator states and always act on the simulated robot, never the real one. During deployment, the real robot simply follows the simulated robot's joint states, and the simulation is continuously corrected with real world measurements. This setup, where the simulator drives all policy execution and maintains real-time synchronization with the physical world, shifts the responsibility of crossing the sim-to-real gap to the digital twin's synchronization mechanisms, instead of the policy itself. We demonstrate real-is-sim on a long-horizon manipulation task (PushT), showing that virtual evaluations are consistent with real-world results. We further show how real-world data can be augmented with virtual rollouts and compare to policies trained on different representations derived from the simulator state including object poses and rendered images from both static and robot-mounted cameras. Our results highlight the flexibility of the real-is-sim framework across training, evaluation, and deployment stages. Videos available at https://real-is-sim.github.io.

--------------------------------------------------------------------------------------------------------

SAGE: Steering Dialog Generation with Future-Aware State-Action Augmentation

Building emotionally intelligent chatbots that can engage in natural, strategic conversations remains a significant challenge despite advances in large language models, particularly for applications requiring long-term conversational planning and emotional awareness. SAGE introduces latent variables that encapsulate emotional states and conversational strategies, enabling coarse-grained control over dialogue progression while maintaining natural interaction patterns. The framework includes self-improvement mechanisms through dialogue tree search and reward modeling. Applications include customer service automation, therapeutic chatbots, educational tutoring systems, virtual companions, negotiation training platforms, and any conversational AI system requiring sophisticated emotional intelligence and strategic conversation management to achieve specific interaction goals while maintaining user engagement and satisfaction.

Authors:  Yizhe Zhang, Navdeep Jaitly

Link:  https://arxiv.org/abs/2503.03040v2

Date: 2025-07-d

Summary:

Recent advances in large language models have demonstrated impressive capabilities in task-oriented applications, yet building emotionally intelligent chatbots that can engage in natural, strategic conversations remains a challenge. We present a novel approach called SAGE that uses latent variables to control long-horizon behavior in dialogue generation. At the core of our method is the State-Action Chain (SAC), which augments standard language model fine-tuning by introducing latent variables that encapsulate emotional states and conversational strategies between dialogue turns. During inference, these variables are generated before each response, enabling coarse-grained control over dialogue progression while maintaining natural interaction patterns. We also introduce a self-improvement pipeline that leverages dialogue tree search, LLM-based reward modeling, and targeted fine-tuning to optimize conversational trajectories. Our experimental results show that models trained with this approach demonstrate improved performance in emotional intelligence metrics while maintaining strong capabilities on LLM benchmarks. The discrete nature of our latent variables facilitates search-based strategies and provides a foundation for future applications of reinforcement learning to dialogue systems, where learning can occur at the state level rather than the token level. https://github.com/apple/ml-sage-dialog-gen

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.