Week Ending 12.7.2025
RESEARCH WATCH: 12.7.2025
Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
Fine-tuning pre-trained vision models for downstream tasks requires selecting the optimal checkpoint, a challenge that becomes critical when target data is scarce and out-of-distribution. This paper introduces Neural Coherence, a data-efficient model selection approach that analyzes activation statistics across source and target domains, enabling reliable selection from just a few unlabeled examples. Tested on ImageNet-to-Food101/PlantNet/iNaturalist transfers and meta-learning scenarios, it significantly outperforms baselines. The method's versatility extends to training data selection. Applications include medical imaging with limited labeled data, autonomous systems adapting to new environments, and wildlife classification where extensive labeling is impractical—essentially any domain requiring robust model deployment with minimal target examples.
Authors: Simon Guiroy, Mats Richter, Sarath Chandar, Christopher Pal
Link: https://arxiv.org/abs/2512.05880v1
Date: 2025-12-d
Summary:
To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best determine which of the many possible model checkpoints resulting from a large training run to use as the starting point. This becomes especially important when data for the target task of interest is scarce, unlabeled and out-of-distribution. In such scenarios, common methods relying on in-distribution validation data become unreliable or inapplicable. This work proposes a novel approach for model selection that operates reliably on just a few unlabeled examples from the target task. Our approach is based on a novel concept: Neural Coherence, which entails characterizing a model's activation statistics for source and target domains, allowing one to define model selection methods with high data-efficiency. We provide experiments where models are pre-trained on ImageNet1K and examine target domains consisting of Food-101, PlantNet-300K and iNaturalist. We also evaluate it in many meta-learning settings. Our approach significantly improves generalization across these different target domains compared to established baselines. We further demonstrate the versatility of Neural Coherence as a powerful principle by showing its effectiveness in training data selection.
--------------------------------------------------------------------------------------------------------
Internal Deployment in the EU AI Act
The EU AI Act's scope regarding internal AI deployment remains legally ambiguous, creating uncertainty for organizations developing AI systems for in-house use. This memorandum analyzes interpretative pathways for Articles 2(1), 2(6), and 2(8), examining arguments for including internal deployment within regulatory scope alongside possible exceptions, particularly the complex scientific R&D exemption. By synthesizing these provisions with related definitions and recitals, it offers multiple interpretative frameworks for the European Commission, AI providers, deployers, and legal practitioners. Applications include corporate compliance strategy development, legal risk assessment for internal AI projects, policy guidance for regulators, and academic analysis of AI governance. This work provides essential clarity for organizations navigating regulatory obligations in European jurisdictions.
Authors: Matteo Pistillo
Link: https://arxiv.org/abs/2512.05742v1
Date: 2025-12-d
Summary:
This memorandum analyzes and stress-tests arguments in favor and against the inclusion of internal deployment within the scope of the European Union Artificial Intelligence Act (EU AI Act). In doing so, it aims to offer several possible interpretative pathways to the European Commission, AI providers and deployers, and the legal and policy community at large based on Articles 2(1), 2(6), 2(8) of the EU AI Act. Specifically, this memorandum first analyzes four interpretative pathways based on Article 2(1)(a)-(c) supporting the application of the EU AI Act to internally deployed AI models and systems. Then, it examines possible objections and exceptions based on Articles 2(1)(a), 2(6), and 2(8), with particular attention to the complexity of the scientific R&D exception under Article 2(6). Finally, it illustrates how Articles 2(1), 2(6), and 2(8) can be viewed as complementary to each other, once broken down to their most plausible meaning and interpreted in conjunction with Articles 3(1), 3(3), 3(4), 3(9), 3(10), 3(11), 3(12), 3(63), and Recitals 12, 13, 21, 25, 97, and 109.
--------------------------------------------------------------------------------------------------------
Ontology Learning with LLMs: A Benchmark Study on Axiom Identification
Ontologies structure domain knowledge but require extensive expertise to develop. This paper addresses axiom identification—defining logical relations between classes and properties—using Large Language Models. Introducing OntoAxiom, a benchmark of nine ontologies with 17,118 triples and 2,771 axioms, the study evaluates twelve LLMs across prompting strategies. The Axiom-by-Axiom approach outperforms direct querying, though performance varies by axiom type and domain (FOAF: 0.642 vs. music: 0.218 F1 scores). Larger models excel, but smaller ones remain viable for resource-constrained settings. While not fully automating ontology development, LLMs provide valuable candidate axioms. Applications include accelerating knowledge graph construction, supporting domain experts in biomedical ontology development, enterprise knowledge management systems, and semantic web applications.
Authors: Roos M. Bakker, Daan L. Di Scala, Maaike H. T. de Boer, Stephan A. Raaijmakers
Link: https://arxiv.org/abs/2512.05594v1
Date: 2025-12-d
Summary:
Ontologies are an important tool for structuring domain knowledge, but their development is a complex task that requires significant modelling and domain expertise. Ontology learning, aimed at automating this process, has seen advancements in the past decade with the improvement of Natural Language Processing techniques, and especially with the recent growth of Large Language Models (LLMs). This paper investigates the challenge of identifying axioms: fundamental ontology components that define logical relations between classes and properties. In this work, we introduce an Ontology Axiom Benchmark OntoAxiom, and systematically test LLMs on that benchmark for axiom identification, evaluating different prompting strategies, ontologies, and axiom types. The benchmark consists of nine medium-sized ontologies with together 17.118 triples, and 2.771 axioms. We focus on subclass, disjoint, subproperty, domain, and range axioms. To evaluate LLM performance, we compare twelve LLMs with three shot settings and two prompting strategies: a Direct approach where we query all axioms at once, versus an Axiom-by-Axiom (AbA) approach, where each prompt queries for one axiom only. Our findings show that the AbA prompting leads to higher F1 scores than the direct approach. However, performance varies across axioms, suggesting that certain axioms are more challenging to identify. The domain also influences performance: the FOAF ontology achieves a score of 0.642 for the subclass axiom, while the music ontology reaches only 0.218. Larger LLMs outperform smaller ones, but smaller models may still be viable for resource-constrained settings. Although performance overall is not high enough to fully automate axiom identification, LLMs can provide valuable candidate axioms to support ontology engineers with the development and refinement of ontologies.
--------------------------------------------------------------------------------------------------------
Improving Local Fidelity Through Sampling and Modeling Nonlinearity
Black-box machine learning models in high-stakes domains require interpretable explanations. LIME, a popular explanation technique, assumes linear local decision boundaries, failing to capture nonlinear relationships and producing incorrect explanations. This paper proposes using Multivariate Adaptive Regression Splines (MARS) to model nonlinear boundaries, combined with N-ball sampling that directly samples from desired distributions rather than reweighting. Evaluated across three UCI datasets with various classifiers and kernel widths, the method achieves 37% average RMSE reduction, significantly improving local fidelity. Applications include healthcare diagnosis explanation, financial credit decisioning transparency, autonomous vehicle safety auditing, and regulatory compliance where model interpretability is mandated—any domain requiring trustworthy explanations of complex model predictions for stakeholder understanding and accountability.
Authors: Sanjeev Shrestha, Rahul Dubey, Hui Liu
Link: https://arxiv.org/abs/2512.05556v1
Date: 2025-12-d
Summary:
With the increasing complexity of black-box machine learning models and their adoption in high-stakes areas, it is critical to provide explanations for their predictions. Local Interpretable Model-agnostic Explanation (LIME) is a widely used technique that explains the prediction of any classifier by learning an interpretable model locally around the predicted instance. However, it assumes that the local decision boundary is linear and fails to capture the non-linear relationships, leading to incorrect explanations. In this paper, we propose a novel method that can generate high-fidelity explanations. Multivariate adaptive regression splines (MARS) is used to model non-linear local boundaries that effectively captures the underlying behavior of the reference model, thereby enhancing the local fidelity of the explanation. Additionally, we utilize the N-ball sampling technique, which samples directly from the desired distribution instead of reweighting samples as done in LIME, further improving the faithfulness score. We evaluate our method on three UCI datasets across different classifiers and varying kernel widths. Experimental results show that our method yields more faithful explanations compared to baselines, achieving an average reduction of 37% in root mean square error, significantly improving local fidelity.
--------------------------------------------------------------------------------------------------------
RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs
Best-of-n sampling improves LLM inference quality but traditionally relies on single models, ignoring complementary strengths across different LLMs. RoBoN (Routed Online Best-of-n) sequentially routes generation across multiple models based on reward model scores and response agreement signals. This training-free approach maintains compute parity with single-model best-of-n while leveraging model diversity. Across reasoning benchmarks (MATH500, OlympiadBench, MinervaMath, GSM8K, MMLU), RoBoN achieves up to 3.4% absolute accuracy gains over individual models and uniform portfolios. Applications include educational tutoring systems requiring diverse problem-solving approaches, enterprise decision support benefiting from complementary model expertise, research environments with multiple available models, and cost-optimized inference pipelines balancing performance with computational budgets.
Authors: Jonathan Geuter, Gregor Kornhardt
Link: https://arxiv.org/abs/2512.05542v1
Date: 2025-12-d
Summary:
Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a single model to generate responses. We propose RoBoN (Routed Online Best-of-$n$), a sequential multi-LLM alternative to the prevailing single-model best-of-$n$. Given a suite of models $\{m_i\}_{i=1}^M$, RoBoN sequentially routes generations one-by-one across models, based on scores computed using a reward model and an agreement signal on the predicted responses. This online routing requires no additional training, keeps compute parity, and works with any plug-in reward model. Across reasoning benchmarks (MATH500, OlympiadBench, MinervaMath, GSM8K, MMLU), RoBoN consistently outperforms standard best-of-$n$ applied to each individual model for larger $n$, with gains of up to 3.4\% in absolute accuracy, and also improves over a uniform multi-model portfolio baseline. Our results indicate that diversity across models can be exploited at inference to improve best-of-$n$ performance over any constituent model alone, providing a simple, training-free path to test-time scaling with multiple LLMs.
--------------------------------------------------------------------------------------------------------
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
Predicting music popularity benefits artists, producers, and streaming platforms, yet lyrics remain under-explored despite their cultural significance. This paper introduces an automated pipeline using LLMs to extract high-dimensional lyric embeddings capturing semantic, syntactic, and sequential information. These features integrate into HitMusicLyricNet, a multimodal architecture combining audio, lyrics, and social metadata for popularity score prediction (0-100 range). Evaluated on SpotGenTrack's 100,000+ tracks, the method achieves 9% and 20% improvements in MAE and MSE respectively. Ablation studies confirm LyricsAENet's value in providing dense lyric representations. Applications include A&R talent scouting, playlist curation optimization, marketing campaign planning, songwriter feedback tools, and streaming platform recommendation systems requiring nuanced understanding of lyrical content's commercial appeal.
Authors: Yash Choudhary, Preeti Rao, Pushpak Bhattacharyya
Link: https://arxiv.org/abs/2512.05508v1
Date: 2025-12-d
Summary:
Accurately predicting music popularity is a critical challenge in the music industry, offering benefits to artists, producers, and streaming platforms. Prior research has largely focused on audio features, social metadata, or model architectures. This work addresses the under-explored role of lyrics in predicting popularity. We present an automated pipeline that uses LLM to extract high-dimensional lyric embeddings, capturing semantic, syntactic, and sequential information. These features are integrated into HitMusicLyricNet, a multimodal architecture that combines audio, lyrics, and social metadata for popularity score prediction in the range 0-100. Our method outperforms existing baselines on the SpotGenTrack dataset, which contains over 100,000 tracks, achieving 9% and 20% improvements in MAE and MSE, respectively. Ablation confirms that gains arise from our LLM-driven lyrics feature pipeline (LyricsAENet), underscoring the value of dense lyric representations.
--------------------------------------------------------------------------------------------------------
Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction
Bitcoin mining hardware acquisition is risky due to market volatility, technological obsolescence, and protocol-driven revenue cycles. Despite mining's capital-intensive nature, no computational frameworks address optimal purchase timing. This paper formulates hardware acquisition as time series classification, predicting whether ASIC purchases yield profitable (ROI≥1), marginal (0<ROI<1), or unprofitable (ROI≤0) returns within one year. MineROI-Net, an open-source Transformer architecture, captures multi-scale temporal patterns across 20 ASIC miners (2015-2024). It achieves 83.7% accuracy, 93.6% precision detecting unprofitable periods, and 98.5% for profitable ones, outperforming LSTM and TSLANet baselines. Applications include mining operation capital planning, hardware vendor market timing, investment portfolio risk management, and energy contract negotiations—providing data-driven guidance for capital-intensive cryptocurrency mining decisions.
Authors: Sithumi Wickramasinghe, Bikramjit Das, Dorien Herremans
Link: https://arxiv.org/abs/2512.05402v1
Date: 2025-12-d
Summary:
Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a capital-intensive industry, there is little guidance on when to purchase new Application-Specific Integrated Circuit (ASIC) hardware, and no prior computational frameworks address this decision problem. We address this gap by formulating hardware acquisition as a time series classification task, predicting whether purchasing ASIC machines yields profitable (Return on Investment (ROI) >= 1), marginal (0 < ROI < 1), or unprofitable (ROI <= 0) returns within one year. We propose MineROI-Net, an open source Transformer-based architecture designed to capture multi-scale temporal patterns in mining profitability. Evaluated on data from 20 ASIC miners released between 2015 and 2024 across diverse market regimes, MineROI-Net outperforms LSTM-based and TSLANet baselines, achieving 83.7% accuracy and 83.1% macro F1-score. The model demonstrates strong economic relevance, achieving 93.6% precision in detecting unprofitable periods and 98.5% precision for profitable ones, while avoiding misclassification of profitable scenarios as unprofitable and vice versa. These results indicate that MineROI-Net offers a practical, data-driven tool for timing mining hardware acquisitions, potentially reducing financial risk in capital-intensive mining operations. The model is available through: https://github.com/AMAAI-Lab/MineROI-Net.
--------------------------------------------------------------------------------------------------------
Legacy Modernization with AI -- Mainframe modernization
Mainframe systems remain dependable but face high maintenance costs, skill shortages, and cloud integration challenges. AI-assisted legacy modernization transforms these stalwart systems into flexible, scalable architectures. AI-driven strategies including automated code refactoring, intelligent data migration, and predictive maintenance enable transitions to microservices, containerized environments, and hybrid cloud platforms. Machine learning models analyze legacy codebases, identify optimization opportunities, and automate testing and deployment. AI enhances operational efficiency through workload leveling and anomaly detection while preserving core business logic. Applications include financial services modernization, government system upgrades, insurance platform transformations, and enterprise digital transformation initiatives—enabling faster innovation, reduced downtime, and enhanced system resilience while maintaining business continuity throughout modernization journeys.
Authors: Sunil Khemka, Arunava Majumdar
Link: https://arxiv.org/abs/2512.05375v1
Date: 2025-12-d
Summary:
Artificial Intelligence-assisted legacy modernization is essential in changing the stalwart mainframe systems of the past into flexible, scalable, and smart architecture. While mainframes are generally dependable, they can be difficult to maintain due to their high maintenance costs, the shortage of skills, and the problems in integrating them with cloud-based systems. By adopting AI-driven modernization strategies such as automated code refactoring, migration of data using smart tools, and predictive maintenance, companies can easily move to microservices, containerized environments, and hybrid cloud platforms. Machine learning models have the capability to go through legacy codebases, figure out efficiency opportunities, and carry out automated testing and deployment. Besides that, AI improves the organization's operational efficiency by generating the insights that can be used to level the workload and detect the anomalies. The coupling of the two is not only about saving the core business logic but also about enabling quicker innovation, less downtime, and enhanced system resilience. Therefore, the use of AI in mainframe modernization is a catalyst for digital transformation and enterprise growth that is sustainable over time.
--------------------------------------------------------------------------------------------------------
MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare
Healthcare AI struggles to merge contextual reasoning, long-term state management, and human-verifiable workflows cohesively. MCP-AI introduces a novel architecture combining Model Context Protocol with clinical applications, enabling intelligent agents to reason over extended periods while maintaining explainability. Unlike traditional Clinical Decision Support Systems or stateless LLMs, MCP-AI supports adaptive, longitudinal, collaborative reasoning across care settings. Each MCP file captures clinical objectives, patient context, reasoning state, and task logic as reusable, auditable memory objects. Validated through Fragile X Syndrome diagnostic modeling and remote Type 2 Diabetes/hypertension coordination, the system facilitates physician-in-the-loop validation, integrates with HL7/FHIR, and adheres to HIPAA/FDA SaMD guidelines. Applications include chronic disease management, diagnostic support, care coordination, and regulatory-compliant AI deployment in clinical environments.
Authors: Zag ElSayed, Craig Erickson, Ernest Pedapati
Link: https://arxiv.org/abs/2512.05365v1
Date: 2025-12-d
Summary:
Healthcare AI systems have historically faced challenges in merging contextual reasoning, long-term state management, and human-verifiable workflows into a cohesive framework. This paper introduces a completely innovative architecture and concept: combining the Model Context Protocol (MCP) with a specific clinical application, known as MCP-AI. This integration allows intelligent agents to reason over extended periods, collaborate securely, and adhere to authentic clinical logic, representing a significant shift away from traditional Clinical Decision Support Systems (CDSS) and prompt-based Large Language Models (LLMs). As healthcare systems become more complex, the need for autonomous, context-aware clinical reasoning frameworks has become urgent. We present MCP-AI, a novel architecture for explainable medical decision-making built upon the Model Context Protocol (MCP) a modular, executable specification for orchestrating generative and descriptive AI agents in real-time workflows. Each MCP file captures clinical objectives, patient context, reasoning state, and task logic, forming a reusable and auditable memory object. Unlike conventional CDSS or stateless prompt-based AI systems, MCP-AI supports adaptive, longitudinal, and collaborative reasoning across care settings. MCP-AI is validated through two use cases: (1) diagnostic modeling of Fragile X Syndrome with comorbid depression, and (2) remote coordination for Type 2 Diabetes and hypertension. In either scenario, the protocol facilitates physician-in-the-loop validation, streamlines clinical processes, and guarantees secure transitions of AI responsibilities between healthcare providers. The system connects with HL7/FHIR interfaces and adheres to regulatory standards, such as HIPAA and FDA SaMD guidelines. MCP-AI provides a scalable basis for interpretable, composable, and safety-oriented AI within upcoming clinical environments.
--------------------------------------------------------------------------------------------------------
Invisible Load: Uncovering the Challenges of Neurodivergent Women in Software Engineering
Neurodivergent women in software engineering face unique challenges at the intersection of gender bias and neurological differences. Despite increasing workplace neurodiversity recognition, no prior SE research systematically examined this group. Underdiagnosis, masking, and male-centric cultures exacerbate barriers causing stress, burnout, and attrition. This work proposes a hybrid methodology integrating InclusiveMag's inclusivity framework with GenderMag walkthrough, tailored for neurodivergent women in SE. The three-stage design includes literature scoping, persona derivation, and collaborative workshop application. The targeted literature review synthesizes cognitive, social, organizational, structural, and career progression challenges, highlighting how under/late diagnosis and masking intensify exclusion. Applications include inclusive workplace design, neurodiversity-aware hiring practices, supportive development tool design, organizational policy development, and retention strategies for diverse talent.
Authors: Munazza Zaib, Wei Wang, Dulaji Hidellaarachchi, Isma Farah Siddiqui
Link: https://arxiv.org/abs/2512.05350v1
Date: 2025-12-d
Summary:
Neurodivergent women in Software Engineering (SE) encounter distinctive challenges at the intersection of gender bias and neurological differences. To the best of our knowledge, no prior work in SE research has systematically examined this group, despite increasing recognition of neurodiversity in the workplace. Underdiagnosis, masking, and male-centric workplace cultures continue to exacerbate barriers that contribute to stress, burnout, and attrition. In response, we propose a hybrid methodological approach that integrates InclusiveMag's inclusivity framework with the GenderMag walkthrough process, tailored to the context of neurodivergent women in SE. The overarching design unfolds across three stages, scoping through literature review, deriving personas and analytic processes, and applying the method in collaborative workshops. We present a targeted literature review that synthesize challenges into cognitive, social, organizational, structural and career progression challenges neurodivergent women face in SE, including how under/late diagnosis and masking intensify exclusion. These findings lay the groundwork for subsequent stages that will develop and apply inclusive analytic methods to support actionable change.
--------------------------------------------------------------------------------------------------------
To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples
Chain-of-thought (CoT) prompting with few-shot in-context learning unlocks LLM reasoning capabilities but proves ineffective on novel tasks with insufficient pre-training knowledge. Using the CoT-ICL Lab framework, this research studies this limitation and proposes meta-training techniques for learning abstract reasoning tasks in-context. A key finding: excessive CoT examples during meta-training degrade performance when CoT supervision is limited. CoT-Recipe formally modulates the mix of CoT and non-CoT examples, increasing transformer accuracy on novel tasks by up to 300% even without in-context CoT examples. Applied to pretrained Qwen2.5 models for symbolic reasoning, the technique achieves 130% accuracy gains. Applications include few-shot learning optimization, reasoning task adaptation, educational AI systems, and efficient model training protocols for resource-constrained environments.
Authors: Vignesh Kothapalli, Ata Fatahibaarzi, Hamed Firooz, Maziar Sanjabi
Link: https://arxiv.org/abs/2512.05318v1
Date: 2025-12-d
Summary:
Chain-of-thought (CoT) prompting combined with few-shot in-context learning (ICL) has unlocked significant reasoning capabilities in large language models (LLMs). However, ICL with CoT examples is ineffective on novel tasks when the pre-training knowledge is insufficient. We study this problem in a controlled setting using the CoT-ICL Lab framework, and propose meta-training techniques to learn novel abstract reasoning tasks in-context. Although CoT examples facilitate reasoning, we noticed that their excessive inclusion during meta-training degrades performance when CoT supervision is limited. To mitigate such behavior, we propose CoT-Recipe, a formal approach to modulate the mix of CoT and non-CoT examples in meta-training sequences. We demonstrate that careful modulation via CoT-Recipe can increase the accuracy of transformers on novel tasks by up to 300% even when there are no CoT examples available in-context. We confirm the broader effectiveness of these techniques by applying them to pretrained LLMs (Qwen2.5 series) for symbolic reasoning tasks and observing gains of up to 130% in accuracy.
--------------------------------------------------------------------------------------------------------
This review charts intelligent structured light for high-field laser-matter interactions, where precise spatiotemporal and vectorial light control becomes critical. The framework builds on three synergistic pillars: an advanced electromagnetic toolkit extending beyond spatial light modulators to static optics and plasma modulators; physics-informed digital twins with AI-driven inverse design for high-dimensional optimization; and groundbreaking applications including programmable electron beams, orbital-angular-momentum γ-rays, compact THz accelerators, and robust communications. Grand challenges include material science advances, MHz-rate adaptive control, and quantum realm extensions. Applications span particle physics experiments, next-generation accelerators, secure quantum communications, advanced imaging systems, and plasma-based photonics—representing transformative capabilities for commanding rather than merely observing extreme light-matter interactions across scientific and technological domains.
Authors: Sergio Carbajo, Seung-Whan Bahk, Justin Baker, Andrea Bertozzi, Abhimanyu Borthakur, Antonino Di Piazza, Andrew Forbes, Spencer Gessner, Jack Hirschman, Franz Kärtner, Maciej Lewenstein, Yuhang Li, Inhyuk Nam, Eileen Otte, Aydogan Ozcan, James Rozensweig, Yijie Shen, Liwei Song, Ye Tian, Yu Wang, Yuntian Wang, Logan Wright, Xiaojun Wu, Hao Zhang
Link: https://arxiv.org/abs/2512.05042v1
Date: 2025-12-d
Summary:
This review charts the emerging paradigm of intelligent structured light for high-field laser-matter interactions, where the precise spatiotemporal and vectorial control of light is a critical degree of freedom. We outline a transformative framework built upon three synergistic pillars. First, we survey the advanced electromagnetic toolkit, moving beyond conventional spatial light modulators to include robust static optics and the promising frontier of plasma light modulators. Second, we detail the optimization engine for this high-dimensional design space, focusing on physics-informed digital twins and AI-driven inverse design to automate the discovery of optimal light structures. Finally, we explore the groundbreaking applications enabled by this integrated approach, including programmable electron beams, orbital-angular-momentum-carrying γ-rays, compact THz accelerators, and robust communications. The path forward necessitates overcoming grand challenges in material science, real-time adaptive control at MHz rates, and the extension of these principles to the quantum realm. This review serves as a call to action for a coordinated, interdisciplinary effort to command, rather than merely observe, light-matter interactions at the extreme.
--------------------------------------------------------------------------------------------------------
MALLORN: Many Artificial LSST Lightcurves based on Observations of Real Nuclear transients
The Vera C. Rubin Observatory's LSST will increase observed transients hundredfold, but insufficient spectroscopic resources necessitate photometry-based prioritization. This work focuses on identifying tidal disruption events (TDEs), crucial for determining black hole parameters and understanding accretion physics. MALLORN presents 10,178 simulated LSST light curves constructed from real Zwicky Transient Facility observations of 64 TDEs, 727 nuclear supernovae, and 1,407 AGN using Gaussian process fitting, empirically-motivated spectral energy distributions, and Rubin Survey Simulator baselines. This novel approach adapts easily to any photometric survey. The MALLORN Astronomical Classification Challenge on Kaggle allows competitors to test photometric classifiers on simulated data. Applications include LSST survey preparation, transient classification algorithm development, follow-up resource optimization, and establishing baselines for astronomical machine learning challenges.
Authors: Dylan Magill, Matt Nicholl, Vysakh Anilkumar, Sjoert van Velzen, Xinyue Sheng, Thai Son Mai, Hung Viet Tran, Ngoc Phu Doan, Thomas Moore, Shubham Srivastav, David R. Young, Charlotte R. Angus, Joshua Weston
Link: https://arxiv.org/abs/2512.04946v1
Date: 2025-12-d
Summary:
The Vera C. Rubin Observatory's 10-Year Legacy Survey of Space and Time (LSST) is expected to produce a hundredfold increase in the number of transients we observe. However, there are insufficient spectroscopic resources to follow up on all of the wealth of targets that LSST will provide. As such it is necessary to be able to prioritise objects for followup observations or inclusion in sample studies based purely on their LSST photometry. We are particularly keen to identify tidal disruption events (TDEs) with LSST. TDEs are immensely useful for determining black hole parameters and probing our understanding of accretion physics. To assist in these efforts, we present the Many Artificial LSST Lightcurves based on the Observations of Real Nuclear transients (MALLORN) data set and the corresponding classifier challenge for identifying TDEs. MALLORN comprises 10178 simulated LSST light curves, constructed from real Zwicky Transient Facility (ZTF) observations of 64 TDEs, 727 nuclear supernovae and 1407 AGN with spectroscopic labels using Gaussian process fitting, empirically-motivated spectral energy distributions from SNCosmo and the baseline from the Rubin Survey Simulator. Our novel approach can be easily adapted to simulate transients for any photometric survey using observations from another, requiring only the limiting magnitudes and an estimate of the cadence of observations. The MALLORN Astronomical Classification Challenge, launched on Kaggle on 15/10/2025, will allow competitors to test their photometric classifiers on simulated LSST data to find TDEs and improve upon their capabilities prior to the start of LSST.
--------------------------------------------------------------------------------------------------------
Assessing whether frontier AI models perform high-value consumer tasks remains unexplored. ACE introduces the first benchmark with 400 hidden test cases across shopping, food, gaming, and DIY activities, plus 80 open-sourced devset cases. Using novel grading methodology that dynamically checks response grounding in retrieved web sources, ten frontier models were evaluated with websearch enabled. GPT 5 (Thinking=High) leads at 56.1%, followed by o3 Pro (55.2%) and GPT 5.1 (55.1%). Performance varies across domains, with Shopping scoring under 50%. Models frequently hallucinate prices and links. ACE reveals substantial gaps between best-performing models and consumer needs. Applications include consumer AI product development, model benchmarking for real-world tasks, hallucination detection research, and guiding AI assistant improvements for practical consumer applications.
Authors: Julien Benchek, Rohit Shetty, Benjamin Hunsberger, Ajay Arun, Zach Richards, Brendan Foody, Osvald Nitski, Bertie Vidgen
Link: https://arxiv.org/abs/2512.04921v1
Date: 2025-12-d
Summary:
We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform high-value consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) (55.2%) and GPT 5.1 (Thinking = High) (55.1%). Models differ across domains, and in Shopping the top model scores under 50%. For some requests (such as giving the correct price or providing working links), models are highly prone to hallucination. Overall, ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.
--------------------------------------------------------------------------------------------------------
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
Expanding instruct LLM linguistic diversity is hindered by costly target language data requirements and catastrophic forgetting during adaptation. This work tackles low-resource adaptation using only unlabeled target language data. Source-Shielded Updates (SSU) selectively preserves source knowledge using parameter importance scoring to identify critical parameters, then applies column-wise freezing before adaptation. Evaluated across five typologically diverse languages with 7B and 13B models, SSU reduces source task performance degradation to just 3.4% (7B) and 2.8% (13B) versus 20.3% and 22.3% from full fine-tuning, while achieving competitive or superior target-language performance. Applications include multilingual AI democratization, low-resource language support, international business AI deployment, educational technology localization, and preserving linguistic diversity in AI systems without extensive labeled data.
Authors: Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio, Nikolaos Aletras
Link: https://arxiv.org/abs/2512.04844v1
Date: 2025-12-d
Summary:
Expanding the linguistic diversity of instruct large language models (LLMs) is crucial for global accessibility but is often hindered by the reliance on costly specialized target language labeled data and catastrophic forgetting during adaptation. We tackle this challenge under a realistic, low-resource constraint: adapting instruct LLMs using only unlabeled target language data. We introduce Source-Shielded Updates (SSU), a selective parameter update strategy that proactively preserves source knowledge. Using a small set of source data and a parameter importance scoring method, SSU identifies parameters critical to maintaining source abilities. It then applies a column-wise freezing strategy to protect these parameters before adaptation. Experiments across five typologically diverse languages and 7B and 13B models demonstrate that SSU successfully mitigates catastrophic forgetting. It reduces performance degradation on monolingual source tasks to just 3.4% (7B) and 2.8% (13B) on average, a stark contrast to the 20.3% and 22.3% from full fine-tuning. SSU also achieves target-language performance highly competitive with full fine-tuning, outperforming it on all benchmarks for 7B models and the majority for 13B models.
--------------------------------------------------------------------------------------------------------
Statistical Insight into the Correlation of Geometry and Spectral Emission in Network Lasers
Optically active networks exhibit feature-rich emission dependent on geometric details, with applications in random lasers, sensing, and photonics processors. However, predictive characterization linking network geometry to emission spectrum remains lacking. Using Steady-State ab Initio Laser Theory (SALT) equations, this work conducts extensive statistical analyses establishing geometry-spectrum connections. Results demonstrate edge crowding as key to tuning modal intensity distribution uniformity. The statistical framework provides comprehensive understanding of network properties, establishing precise design rules for network-based photonic devices. Applications include random laser optimization, optical sensing device design, neuromorphic photonics processors, programmable optical computing elements, and intelligent light-based systems—enabling systematic engineering of emission characteristics through controlled geometric manipulation of network structures for next-generation photonic technologies.
Authors: Camillo Tassi, Riccardo Mannella, Andrea Tomadin, Andrea Camposeo, Dario Pisignano
Link: https://arxiv.org/abs/2512.04811v1
Date: 2025-12-d
Summary:
Optically active networks show feature-rich emission that depends on the fine details of their geometry, and find diverse applications in random lasers, sensing devices and photonics processors. In these and other systems, a thorough and predictive characterization of how the network geometry correlates with the resulting emission spectrum would be highly important, however such outright description is still lacking. In this work, we take a step toward filling this gap, by using the well-known Steady-State ab Initio Laser Theory (SALT) equations [L. Ge et al., Phys. Rev. A 82, 063824 (2010)] to carry out an extensive set of statistical analyses and establish connections between the random network geometry and their ultimate emission spectrum. Our results show that edge crowding is key to tune the uniformity of the modal intensity distribution of the emission spectrum. A statistical framework for the comprehensive understanding of the network statistical properties is highly significant to establish precise design rules for network-based photonic devices and intelligent systems.
--------------------------------------------------------------------------------------------------------
Explanatory Interactive Learning (XIL) enables users to customize AI models by interacting with explanations, but concerns exist about order effects—cognitive biases where item sequence influences trust and feedback quality. Previous studies raised concerns, but their designs differed substantially from common XIL use cases. Two larger-scale user studies (n=713) mimicking XIL tasks assessed order effects within and between debugging sessions by manipulating correct/wrong explanation presentation order. Order effects showed limited but significant impact on user agreement (trust) within sessions only, not between them. Feedback quality remained satisfactory with small, inconsistent order effect influence. Applications include XIL system design optimization, human-AI interaction guidelines, explainable AI interface development, debugging tool refinement, and informing best practices for explanation-based model correction workflows.
Authors: Dario Pesenti, Alessandro Bogani, Katya Tentori, Stefano Teso
Link: https://arxiv.org/abs/2512.04764v1
Date: 2025-12-d
Summary:
Explanatory Interactive Learning (XIL) is a powerful interactive learning framework designed to enable users to customize and correct AI models by interacting with their explanations. In a nutshell, XIL algorithms select a number of items on which an AI model made a decision (e.g. images and their tags) and present them to users, together with corresponding explanations (e.g. image regions that drive the model's decision). Then, users supply corrective feedback for the explanations, which the algorithm uses to improve the model. Despite showing promise in debugging tasks, recent studies have raised concerns that explanatory interaction may trigger order effects, a well-known cognitive bias in which the sequence of presented items influences users' trust and, critically, the quality of their feedback. We argue that these studies are not entirely conclusive, as the experimental designs and tasks employed differ substantially from common XIL use cases, complicating interpretation. To clarify the interplay between order effects and explanatory interaction, we ran two larger-scale user studies (n = 713 total) designed to mimic common XIL tasks. Specifically, we assessed order effects both within and between debugging sessions by manipulating the order in which correct and wrong explanations are presented to participants. Order effects had a limited, through significant impact on users' agreement with the model (i.e., a behavioral measure of their trust), and only when examined withing debugging sessions, not between them. The quality of users' feedback was generally satisfactory, with order effects exerting only a small and inconsistent influence in both experiments. Overall, our findings suggest that order effects do not pose a significant issue for the successful employment of XIL approaches. More broadly, our work contributes to the ongoing efforts for understanding human factors in AI.
--------------------------------------------------------------------------------------------------------
When GenAI Meets Fake News: Understanding Image Cascade Dynamics on Reddit
AI-generated content and misinformation proliferate on social networks, yet visual content's virality role remains understudied compared to textual misinformation. This work presents the first large-scale analysis of misinformation and AI-generated image propagation through repost cascades across five ideologically diverse Reddit communities. Integrating textual sentiment, visual attributes, and diffusion metrics (time-to-first repost, community reach), the framework accurately predicts post-level virality (AUC=0.83) and cascade-level spread (AUC=0.998). Applications include social media content moderation system design, synthetic media detection and labeling, misinformation intervention strategy development, platform policy formulation for AI-generated content, viral content prediction models, and understanding cross-community information diffusion patterns—essential for moderating synthetic and misleading visual content online.
Authors: Saumya Chauhan, Mila Hong, Maria Vazhaeparambil
Link: https://arxiv.org/abs/2512.04639v1
Date: 2025-12-d
Summary:
AI-generated content and misinformation are increasingly prevalent on social networks. While prior research primarily examined textual misinformation, fewer studies have focused on visual content's role in virality. In this work, we present the first large-scale analysis of how misinformation and AI-generated images propagate through repost cascades across five ideologically diverse Reddit communities. By integrating textual sentiment, visual attributes, and diffusion metrics (e.g., time-to-first repost, community reach), our framework accurately predicts both immediate post-level virality (AUC=0.83) and long-term cascade-level spread (AUC=0.998). These findings offer essential insights for moderating synthetic and misleading visual content online.
--------------------------------------------------------------------------------------------------------
Metric dimension of Cartesian product of stars
Metric dimension quantifies the minimum landmark vertices needed for unique vertex identification by distances, capturing the tradeoff between compact information encoding and unambiguous identification in networks. This work determines exact metric dimension values for Cartesian product K₁,m □ K₁,n (hub-and-spoke grids) across all m,n values, presenting a constructive linear-time algorithm building minimum resolving sets with theoretical guarantees and practical feasibility. Visualizations illustrate parameter regime design spaces. Applications include sensor network optimization requiring minimal landmark sensors, graph-based localization systems, monitoring network design, robot navigation in grid environments, facility location problems, and intelligent information system infrastructure design—extending metric dimension theory while providing efficient methods directly relevant to computational graph theory and information science applications.
Authors: Akbar Davoodi, Mohsen Jannesari
Link: https://arxiv.org/abs/2512.04620v1
Date: 2025-12-d
Summary:
The metric dimension of a graph is the minimum number of landmark vertices required so that every vertex can be uniquely identified by its distances to the landmarks. This parameter captures the fundamental tradeoff between compact information encoding and unambiguous identification in networked systems. In this work, we determine exact value for the metric dimension of the Cartesian product $K_{1,m} \square K_{1,n}$, also known as hub-and-spoke grids, across all values of $m$ and $n$. In addition, we present a constructive linear-time algorithm that builds a minimum resolving set, providing both theoretical guarantees and practical feasibility. We complement our results with visualization of parameter regimes that illustrate the design space. The findings establish design rules for minimizing landmark sensors and support applications in graph-based localization, monitoring networks, and intelligent information systems. Our results extend the theory of metric dimension and contribute efficient methods of direct relevance to information science and computational graph theory.
--------------------------------------------------------------------------------------------------------
The Decision Path to Control AI Risks Completely: Fundamental Control Mechanisms for AI Governance
Achieving complete human control over AI risks remains unsolved—analogous to driving a fast AI "train" without brakes. This paper develops systematic solutions through fundamental control mechanisms at AI decision elements, providing an AI governance architecture with five pillars supported by six control mechanisms, illustrated through AI Mandates (AIMs). Three AIMs must be built inside AI systems (value alignment, ethics/law constraints, human intervention/shut-off switches) and three in society (resource access limitation, spillover risk mitigation). The work highlights differences between physical AI and generative AI governance, emphasizing analog physical safeguards preventing circumvention: AI's nature as software on human-controlled chips and digitization prerequisites for AI-driven physical actions. Applications include AI safety legislation, governance framework development, existential risk mitigation, AI ethics implementation, and establishing theoretical foundations for comprehensive AI risk management across industries and governmental regulatory bodies.
Authors: Yong Tao
Link: https://arxiv.org/abs/2512.04489v1
Date: 2025-12-d
Summary:
Artificial intelligence (AI) advances rapidly but achieving complete human control over AI risks remains an unsolved problem, akin to driving the fast AI "train" without a "brake system." By exploring fundamental control mechanisms at key elements of AI decisions, this paper develops a systematic solution to thoroughly control AI risks, providing an architecture for AI governance and legislation with five pillars supported by six control mechanisms, illustrated through a minimum set of AI Mandates (AIMs). Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks: 1) align AI values with human users; 2) constrain AI decision-actions by societal ethics, laws, and regulations; 3) build in human intervention options for emergencies and shut-off switches for existential threats; 4) limit AI access to resources to reinforce controls inside AI; 5) mitigate spillover risks like job loss from AI. We also highlight the differences in AI governance on physical AI systems versus generative AI. We discuss how to strengthen analog physical safeguards to prevent smarter AI/AGI/ASI from circumventing core safety controls by exploiting AI's intrinsic disconnect from the analog physical world: AI's nature as pure software code run on chips controlled by humans, and the prerequisite that all AI-driven physical actions must be digitized. These findings establish a theoretical foundation for AI governance and legislation as the basic structure of a "brake system" for AI decisions. If enacted, these controls can rein in AI dangers as completely as humanly possible, removing large chunks of currently wide-open AI risks, substantially reducing overall AI risks to residual human errors.
--------------------------------------------------------------------------------------------------------