Week Ending 2.11.2024

RESEARCH WATCH: 2.11.2024

SPONSORED BY

Predictive representations: building blocks of intelligence

Predictive representations provides a theoretical framework integrating ideas from reinforcement learning, cognition, and neuroscience on how organisms make predictions to guide adaptive behavior. This convergence suggests predictive representations may serve as versatile building blocks underlying intelligence.

Authors: Wilka Carvalho, Momchil S. Tomov, William de Cothi, Caswell Barry, Samuel J. Gershman

Link: https://arxiv.org/abs/2402.06590v1

Date: 2024-02-09

Summary:

Adaptive behavior often requires predicting future events. The theory of reinforcement learning prescribes what kinds of predictive representations are useful and how to compute them. This paper integrates these theoretical ideas with work on cognition and neuroscience. We pay special attention to the successor representation (SR) and its generalizations, which have been widely applied both as engineering tools and models of brain function. This convergence suggests that particular kinds of predictive representations may function as versatile building blocks of intelligence.

--------------------------------------------------------------------------------------------------------

Generating Higher Order Modes from Binary Black Hole mergers with Machine Learning

Generating Higher Order Modes employs machine learning for rapid and accurate gravitational wave synthesis from binary black hole mergers. This accelerates waveform generation for detection pipelines and enables incorporating higher order multipole effects.

Authors: Tim Grimbergen, Stefano Schmidt, Chinmay Kalaghatgi, Chris van den Broeck

Link: https://arxiv.org/abs/2402.06587v1

Date: 2024-02-09

Summary:

We introduce a machine learning model designed to rapidly and accurately predict the time domain gravitational wave emission of non-precessing binary black hole coalescences, incorporating the effects of higher order modes of the multipole expansion of the waveform. Expanding on our prior work, we decompose each mode by amplitude and phase and reduce dimensionality using principal component analysis. An ensemble of artificial neural networks is trained to learn the relationship between orbital parameters and the low-dimensional representation of each mode. We train our model on $\sim 10^5$ signals with mass ratio $q \in [1,10]$ and dimensionless spins $\chi_i \in [-0.9, 0.9]$, generated with the state-of-the-art approximant SEOBNRv4HM. We find that it achieves a median faithfulness of $10^{-4}$ averaged across the parameter space. We show that our model generates a single waveform two orders of magnitude faster than the training model, with the speed up increasing when waveforms are generated in batches. This framework is entirely general and can be applied to any other time domain approximant capable of generating waveforms from aligned spin circular binaries, possibly incorporating higher order modes.

--------------------------------------------------------------------------------------------------------

What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

What is Hiding in Medicine's Dark Matter? investigates statistical techniques like singular value decomposition and k-nearest neighbors to understand and impute missing data in electronic health records. Correctly handling missing data can reduce bias and improve clinical decision-making.

Authors: Neslihan Suzen, Evgeny M. Mirkes, Damian Roland, Jeremy Levesley, Alexander N. Gorban, Tim J. Coats

Link: https://arxiv.org/abs/2402.06563v1

Date: 2024-02-09

Summary:

Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can increase the validity of clinical decisions. This study focuses on statistical approaches for understanding and interpreting the missing data and machine learning based clinical data imputation using a single centre's paediatric emergency data and the data from UK's largest clinical audit for traumatic injury database (TARN). In the study of 56,961 data points related to initial vital signs and observations taken on children presenting to an Emergency Department, we have shown that missing data are likely to be non-random and how these are linked to health care professional practice patterns. We have then examined 79 TARN fields with missing values for 5,791 trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN) based missing data imputation methods are used and imputation results against the original dataset are compared and statistically tested. We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making: find the most similar patients and take their attributes as imputation.

--------------------------------------------------------------------------------------------------------

Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty

Introspective Planning incorporates uncertainty quantification into task planning for language-instructed robots. This introspection significantly increases success rates and safety by identifying ambiguous instructions that require user clarification before execution.

Authors: Kaiqu Liang, Zixu Zhang, Jaime Fernández Fisac

Link: https://arxiv.org/abs/2402.06529v1

Date: 2024-02-09

Summary:

Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries.

--------------------------------------------------------------------------------------------------------

On the Fly Detection of Root Causes from Observed Data with Application to IT Systems

On the Fly Detection of Root Causes introduces a structural causal model to rapidly detect anomalies and trace root causes in IT systems. Experiments demonstrate improved performance over alternatives, enabling real-time monitoring and diagnosis.

Authors: Lei Zan, Charles K. Assaad, Emilie Devijver, Eric Gaussier

Link: https://arxiv.org/abs/2402.06500v1

Date: 2024-02-09

Summary:

This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.

--------------------------------------------------------------------------------------------------------

Le Nozze di Giustizia. Interactions between Artificial Intelligence, Law, Logic, Language and Computation with some case studies in Traffic Regulations and Health Care

Le Nozze di Giustizia explores interactions between mathematical logic and rule-based AI through traffic regulation and healthcare case studies. It aims to convey logic basics to the legal community adopting AI techniques.

Authors: Joost J. Joosten, Manuela Montoya García

Link: https://arxiv.org/abs/2402.06487v1

Date: 2024-02-09

Summary:

An important aim of this paper is to convey some basics of mathematical logic to the legal community working with Artificial Intelligence. After analysing what AI is, we decide to delimit ourselves to rule-based AI leaving Neural Networks and Machine Learning aside. Rule based AI allows for Formal methods which are described in a rudimentary form. We will then see how mathematical logic interacts with legal rule-based AI practice. We shall see how mathematical logic imposes limitations and complications to AI applications. We classify the limitations and interactions between mathematical logic and legal AI in three categories: logical, computational and mathematical. The examples to showcase the interactions will largely come from European traffic regulations. The paper closes off with some reflections on how and where AI could be used and on basic mechanisms that shape society.

--------------------------------------------------------------------------------------------------------

Trust the Process: Zero-Knowledge Machine Learning to Enhance Trust in Generative AI Interactions

Trust the Process proposes using zero-knowledge proofs to validate quality and fairness of AI-generated content like legal briefs or medical diagnoses without compromising privacy. This transparency via cryptographic audit trails promotes trust.

Authors: Bianca-Mihaela Ganescu, Jonathan Passerat-Palmbach

Link: https://arxiv.org/abs/2402.06414v1

Date: 2024-02-09

Summary:

Generative AI, exemplified by models like transformers, has opened up new possibilities in various domains but also raised concerns about fairness, transparency and reliability, especially in fields like medicine and law. This paper emphasizes the urgency of ensuring fairness and quality in these domains through generative AI. It explores using cryptographic techniques, particularly Zero-Knowledge Proofs (ZKPs), to address concerns regarding performance fairness and accuracy while protecting model privacy. Applying ZKPs to Machine Learning models, known as ZKML (Zero-Knowledge Machine Learning), enables independent validation of AI-generated content without revealing sensitive model information, promoting transparency and trust. ZKML enhances AI fairness by providing cryptographic audit trails for model predictions and ensuring uniform performance across users. We introduce snarkGPT, a practical ZKML implementation for transformers, to empower users to verify output accuracy and quality while preserving model privacy. We present a series of empirical results studying snarkGPT's scalability and performance to assess the feasibility and challenges of adopting a ZKML-powered approach to capture quality and performance fairness problems in generative AI models.

--------------------------------------------------------------------------------------------------------

AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems

AI, Meet Human surveys different techniques for modeling human-AI collaborative systems based on the nature of interaction. It provides a conceptual taxonomy to classify and understand this growing area of research.

Authors: Clara Punzi, Roberto Pellungrini, Mattia Setzu, Fosca Giannotti, Dino Pedreschi

Link: https://arxiv.org/abs/2402.06287v1

Date: 2024-02-09

Summary:

Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classification is sparse and the goals varied. This survey proposes a taxonomy of Hybrid Decision Making Systems, providing both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.

--------------------------------------------------------------------------------------------------------

LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education

LLaVA-Docent showcases using multi-modal large language models to make art appreciation education more accessible and engaging. Evaluations reveal strengths in enhancing interactivity but weaknesses remaining in visual generation.

Authors: Unggi Lee, Minji Jeon, Yunseo Lee, Gyuri Byun, Yoorim Son, Jaeyoon Shin, Hongkyu Ko, Hyeoncheol Kim

Link: https://arxiv.org/abs/2402.06264v1

Date: 2024-02-09

Summary:

Art appreciation is vital in nurturing critical thinking and emotional intelligence among learners. However, traditional art appreciation education has often been hindered by limited access to art resources, especially for disadvantaged students, and an imbalanced emphasis on STEM subjects in mainstream education. In response to these challenges, recent technological advancements have paved the way for innovative solutions. This study explores the application of multi-modal large language models (MLLMs) in art appreciation education, focusing on developing LLaVA-Docent, a model that leverages these advancements. Our approach involved a comprehensive literature review and consultations with experts in the field, leading to developing a robust data framework. Utilizing this framework, we generated a virtual dialogue dataset that was leveraged by GPT-4. This dataset was instrumental in training the MLLM, named LLaVA-Docent. Six researchers conducted quantitative and qualitative evaluations of LLaVA-Docent to assess its effectiveness, benchmarking it against the GPT-4 model in a few-shot setting. The evaluation process revealed distinct strengths and weaknesses of the LLaVA-Docent model. Our findings highlight the efficacy of LLaVA-Docent in enhancing the accessibility and engagement of art appreciation education. By harnessing the potential of MLLMs, this study makes a significant contribution to the field of art education, proposing a novel methodology that reimagines the way art appreciation is taught and experienced.

--------------------------------------------------------------------------------------------------------

Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

Exploring Interaction Patterns draws inspiration from conversation analysis to improve debugging assistance from large language models within integrated development environments. Experiments showed lowered barriers and higher resolution rates.

Authors: Bhavya Chopra, Yasharth Bajpai, Param Biyani, Gustavo Soares, Arjun Radhakrishna, Chris Parnin, Sumit Gulwani

Link: https://arxiv.org/abs/2402.06229v1

Date: 2024-02-09

Summary:

The widespread availability of Large Language Models (LLMs) within Integrated Development Environments (IDEs) has led to their speedy adoption. Conversational interactions with LLMs enable programmers to obtain natural language explanations for various software development tasks. However, LLMs often leap to action without sufficient context, giving rise to implicit assumptions and inaccurate responses. Conversations between developers and LLMs are primarily structured as question-answer pairs, where the developer is responsible for asking the the right questions and sustaining conversations across multiple turns. In this paper, we draw inspiration from interaction patterns and conversation analysis -- to design Robin, an enhanced conversational AI-assistant for debugging. Through a within-subjects user study with 12 industry professionals, we find that equipping the LLM to -- (1) leverage the insert expansion interaction pattern, (2) facilitate turn-taking, and (3) utilize debugging workflows -- leads to lowered conversation barriers, effective fault localization, and 5x improvement in bug resolution rates.

--------------------------------------------------------------------------------------------------------

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

The Generative AI Paradox finds that while large language models excel at text generation tasks, their evaluation capabilities significantly lag, sometimes providing false confidence. This reveals the need to scrutinize faithfulness.

Authors: Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh

Link: https://arxiv.org/abs/2402.06204v1

Date: 2024-02-09

Summary:

This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators. We assess the performance of three LLMs and one open-source LM in Question-Answering (QA) and evaluation tasks using the TriviaQA (Joshi et al., 2017) dataset. Results indicate a significant disparity, with LLMs exhibiting lower performance in evaluation tasks compared to generation tasks. Intriguingly, we discover instances of unfaithful evaluation where models accurately evaluate answers in areas where they lack competence, underscoring the need to examine the faithfulness and trustworthiness of LLMs as evaluators. This study contributes to the understanding of "the Generative AI Paradox" (West et al., 2023), highlighting a need to explore the correlation between generative excellence and evaluation proficiency, and the necessity to scrutinize the faithfulness aspect in model evaluations.

--------------------------------------------------------------------------------------------------------

Large Language Models: A Survey

Large Language Models surveys prominent models, datasets, metrics, and benchmarks to provide an overview of techniques, applications, limitations, and future directions of this rapidly evolving field.

Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

Link: https://arxiv.org/abs/2402.06196v1

Date: 2024-02-09

Summary:

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

--------------------------------------------------------------------------------------------------------

Resource Allocation for Channel Estimation in Reconfigurable Intelligent Surface-Aided Multi-Cell Networks

Resource Allocation for Channel Estimation analytically investigates optimal training overhead configurations to maximize coverage probability and spectral/energy efficiency in RIS-aided wireless networks. Findings provide deployment guidelines accounting for hardware constraints.

Authors: Yining Xu, Sheng Zhou

Link: https://arxiv.org/abs/2402.06161v1

Date: 2024-02-09

Summary:

Reconfigurable intelligent surface (RIS) is a promising solution to deal with the blockage-sensitivity of millimeter wave band and reduce the high energy consumption caused by network densification. However, deploying large scale RISs may not bring expected performance gain due to significant channel estimation overhead and non-negligible reflected interference. In this paper, we derive the analytical expressions of the coverage probability, area spectrum efficiency (ASE) and energy efficiency (EE) of a downlink RIS-aided multi-cell network. In order to optimize the network performance, we investigate the conditions for the optimal number of training symbols of each antenna-to-antenna and antenna-to-element path (referred to as the optimal unit training overhead) in channel estimation. Our study shows that: 1) RIS deployment is not `the more, the better', only when blockage objects are dense should one deploy more RISs; 2) the coverage probability is maximized when the unit training overhead is designed as large as possible; 3) however, the ASE-and-EE-optimal unit training overhead exists. It is a monotonically increasing function of the frame length and a monotonically decreasing function of the average signal-to-noise-ratio (in the high signal-to-noise-ratio region). Additionally, the optimal unit training overhead is smaller when communication ends deploy particularly few or many antennas.

--------------------------------------------------------------------------------------------------------

LLMs for Coding and Robotics Education

LLMs for Coding and Robotics Education tests large language models on coding and robotics tasks to gauge suitability for STEM education. While models struggled with visuals, results confirm the trend of using them to teach programming concepts.

Authors: Peng Shu, Huaqin Zhao, Hanqi Jiang, Yiwei Li, Shaochen Xu, Yi Pan, Zihao Wu, Zhengliang Liu, Guoyu Lu, Le Guan, Gong Chen, Xianqiao Wang Tianming Liu

Link: https://arxiv.org/abs/2402.06116v1

Date: 2024-02-09

Summary:

Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code explanation, generation, and modification. In this paper, we highlight an important trend in robot coding education. We test several mainstream large language models on both traditional coding tasks and the more challenging task of robot code generation, which includes block diagrams. Our results show that GPT-4V outperforms other models in all of our tests but struggles with generating block diagram images.

--------------------------------------------------------------------------------------------------------

Function Aligned Regression: A Method Explicitly Learns Functional Derivatives from Data

Function Aligned Regression explicitly captures functional derivatives between samples, outperforming conventional regression that independently fits each data point. Experiments demonstrate consistently improved predictive accuracy on real-world tasks.

Authors: Dixian Zhu, Livnat Jerby-Arnon

Link: https://arxiv.org/abs/2402.06104v1

Date: 2024-02-08

Summary:

Regression is a fundamental task in machine learning that has garnered extensive attention over the past decades. The conventional approach for regression involves employing loss functions that primarily concentrate on aligning model prediction with the ground truth for each individual data sample, which, as we show, can result in sub-optimal prediction of the relationships between the different samples. Recent research endeavors have introduced novel perspectives by incorporating label similarity information to regression. However, a notable gap persists in these approaches when it comes to fully capturing the intricacies of the underlying ground truth function. In this work, we propose FAR (Function Aligned Regression) as a arguably better and more efficient solution to fit the underlying function of ground truth by capturing functional derivatives. We demonstrate the effectiveness of the proposed method practically on 2 synthetic datasets and on 8 extensive real-world tasks from 6 benchmark datasets with other 8 competitive baselines. The code is open-sourced at \url{https://github.com/DixianZhu/FAR}.

--------------------------------------------------------------------------------------------------------

Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making

Scaling Artificial Intelligence outlines a hierarchical reinforcement learning approach to develop intelligent agents for large-scale combat simulations and support military decision-making. Further research can enhance AI's reasoning at high levels of complexity.

Authors: Scotty Black, Christian Darken

Link: https://arxiv.org/abs/2402.06075v1

Date: 2024-02-08

Summary:

In this unprecedented era of technology-driven transformation, it becomes more critical than ever that we aggressively invest in developing robust artificial intelligence (AI) for wargaming in support of decision-making. By advancing AI-enabled systems and pairing these with human judgment, we will be able to enhance all-domain awareness, improve the speed and quality of our decision cycles, offer recommendations for novel courses of action, and more rapidly counter our adversary's actions. It therefore becomes imperative that we accelerate the development of AI to help us better address the complexity of modern challenges and dilemmas that currently requires human intelligence and, if possible, attempt to surpass human intelligence--not to replace humans, but to augment and better inform human decision-making at machine speed. Although deep reinforcement learning continues to show promising results in intelligent agent behavior development for the long-horizon, complex tasks typically found in combat modeling and simulation, further research is needed to enable the scaling of AI to deal with these intricate and expansive state-spaces characteristic of wargaming for either concept development, education, or analysis. To help address this challenge, in our research, we are developing and implementing a hierarchical reinforcement learning framework that includes a multi-model approach and dimension-invariant observation abstractions.

--------------------------------------------------------------------------------------------------------

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Large Language Model Meets Graph Neural Network performs knowledge distillation using a large language model teacher to enhance graph neural networks. The student matches or exceeds teacher accuracy on node classification but allows efficient deployment.

Authors: Shengxiang Hu, Guobing Zou, Song Yang, Yanglan Gan, Bofeng Zhang, Yixin Chen

Link: https://arxiv.org/abs/2402.05894v2

Date: 2024-02-09

Summary:

Despite recent community revelations about the advancements and potential applications of Large Language Models (LLMs) in understanding Text-Attributed Graph (TAG), the deployment of LLMs for production is hindered by its high computational and storage requirements, as well as long latencies during model inference. Simultaneously, although traditional Graph Neural Networks (GNNs) are light weight and adept at learning structural features of graphs, their ability to grasp the complex semantics in TAG is somewhat constrained for real applications. To address these limitations, we concentrate on the downstream task of node classification in TAG and propose a novel graph knowledge distillation framework, termed Linguistic Graph Knowledge Distillation (LinguGKD), using LLMs as teacher models and GNNs as student models for knowledge distillation. It involves TAG-oriented instruction tuning of LLM on designed tailored prompts, followed by propagating knowledge and aligning the hierarchically learned node features from the teacher LLM to the student GNN in latent space, employing a layer-adaptive contrastive learning strategy. Through extensive experiments on a variety of LLM and GNN models and multiple benchmark datasets, the proposed LinguGKD significantly boosts the student GNN's predictive accuracy and convergence rate, without the need of extra data or model parameters. Compared to teacher LLM, distilled GNN achieves superior inference speed equipped with much fewer computing and storage demands, when surpassing the teacher LLM's classification accuracy on some of benchmark datasets.

--------------------------------------------------------------------------------------------------------

PromptCrypt: Prompt Encryption for Secure Communication with Large Language Models

PromptCrypt leverages emoji encodings to encrypt user inputs to large language models, preventing discernment of sensitive information while maintaining task performance. Experiments highlight the practicality of prompt encryption for privacy.

Authors: Guo Lin, Wenyue Hua, Yongfeng Zhang

Link: https://arxiv.org/abs/2402.05868v1

Date: 2024-02-08

Summary:

Cloud-based large language models (LLMs) such as ChatGPT have increasingly become integral to daily operations, serving as vital tools across various applications. While these models offer substantial benefits in terms of accessibility and functionality, they also introduce significant privacy concerns: the transmission and storage of user data in cloud infrastructures pose substantial risks of data breaches and unauthorized access to sensitive information; even if the transmission and storage of data is encrypted, the LLM service provider itself still knows the real contents of the data, preventing individuals or entities from confidently using such LLM services. To address these concerns, this paper proposes a simple yet effective mechanism PromptCrypt to protect user privacy. It uses Emoji to encrypt the user inputs before sending them to LLM, effectively rendering them indecipherable to human or LLM's examination while retaining the original intent of the prompt, thus ensuring the model's performance remains unaffected. We conduct experiments on three tasks, personalized recommendation, sentiment analysis, and tabular data analysis. Experiment results reveal that PromptCrypt can encrypt personal information within prompts in such a manner that not only prevents the discernment of sensitive data by humans or LLM itself, but also maintains or even improves the precision without further tuning, achieving comparable or even better task accuracy than directly prompting the LLM without prompt encryption. These results highlight the practicality of adopting encryption measures that safeguard user privacy without compromising the functional integrity and performance of LLMs. Code and dataset are available at https://github.com/agiresearch/PromptCrypt.

--------------------------------------------------------------------------------------------------------

You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement

You Only Need One Color Space introduces a novel color space and network architecture for low-light image enhancement. Experiments show state-of-the-art results, with reduced color and brightness artifacts.

Authors: Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Qingsen Yan, Yanning Zhang

Link: https://arxiv.org/abs/2402.05809v1

Date: 2024-02-08

Summary:

Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-noise ratio can introduce sensitivity and instability into the enhancement process. Consequently, this results in the presence of color artifacts and brightness artifacts in the enhanced images. To alleviate this problem, we propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI). It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters. Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space. Within CIDNet, we introduce the Lightweight Cross-Attention (LCA) module to facilitate interaction between image structure and content information in both branches, while also suppressing noise in low-light images. Finally, we conducted 22 quantitative and qualitative experiments to show that the proposed CIDNet outperforms the state-of-the-art methods on 11 datasets. The code will be available at https://github.com/Fediory/HVI-CIDNet.

--------------------------------------------------------------------------------------------------------

Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot Teaming

Offline Risk-sensitive RL incorporates physiological signals into human-robot teaming via offline reinforcement learning on a partially observable Markov decision process model. Experiments demonstrate improved human operator state estimation and overall performance.

Authors: Giorgio Angelotti, Caroline P. C. Chanel, Adam H. M. Pinto, Christophe Lounis, Corentin Chauffaut, Nicolas Drougard

Link: https://arxiv.org/abs/2402.05703v1

Date: 2024-02-08

Summary:

The integration of physiological computing into mixed-initiative human-robot interaction systems offers valuable advantages in autonomous task allocation by incorporating real-time features as human state observations into the decision-making system. This approach may alleviate the cognitive load on human operators by intelligently allocating mission tasks between agents. Nevertheless, accommodating a diverse pool of human participants with varying physiological and behavioral measurements presents a substantial challenge. To address this, resorting to a probabilistic framework becomes necessary, given the inherent uncertainty and partial observability on the human's state. Recent research suggests to learn a Partially Observable Markov Decision Process (POMDP) model from a data set of previously collected experiences that can be solved using Offline Reinforcement Learning (ORL) methods. In the present work, we not only highlight the potential of partially observable representations and physiological measurements to improve human operator state estimation and performance, but also enhance the overall mission effectiveness of a human-robot team. Importantly, as the fixed data set may not contain enough information to fully represent complex stochastic processes, we propose a method to incorporate model uncertainty, thus enabling risk-sensitive sequential decision-making. Experiments were conducted with a group of twenty-six human participants within a simulated robot teleoperation environment, yielding empirical evidence of the method's efficacy. The obtained adaptive task allocation policy led to statistically significant higher scores than the one that was used to collect the data set, allowing for generalization across diverse participants also taking into account risk-sensitive metrics.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithFebruary 12, 2024Comment