Week Ending 11.19.2023

 

RESEARCH WATCH: 11.19.2023

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

 

Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines

Chatbots as social companions: This paper explores how companion chatbots impact human social health. Surprisingly, users saw benefits, while non-users expected harm. The findings suggest humanlike bots may support social health through safe interactions. This insight aids design of socially beneficial chatbots.

Authors:  Rose Guingrich, Michael S. A. Graziano

Link:  https://arxiv.org/abs/2311.10599v1

Date: 2023-11-17

Summary:

As artificial intelligence (AI) becomes more widespread, one question that arises is how human-AI interaction might impact human-human interaction. Chatbots, for example, are increasingly used as social companions, but little is known about how their use impacts human relationships. A common hypothesis is that these companion bots are detrimental to social health by harming or replacing human interaction. To understand how companion bots impact social health, we studied people who used companion bots and people who did not. Contrary to expectations, companion bot users indicated that these relationships were beneficial to their social health, whereas nonusers viewed them as harmful. Another common assumption is that people perceive conscious, humanlike AI as disturbing and threatening. Among both users and nonusers, however, we found the opposite: perceiving companion bots as more conscious and humanlike correlated with more positive opinions and better social health benefits. Humanlike bots may aid social health by supplying reliable and safe interactions, without necessarily harming human relationships.

--------------------------------------------------------------------------------------------------------

EduGym: An Environment Suite for Reinforcement Learning Education

EduGym for RL education: This work introduces EduGym, a toolkit of environments tailored for hands-on reinforcement learning education. EduGym focuses each environment on one RL concept, with interactive notebooks linking theory to implementation. Evaluations showed EduGym helps students connect ideas to code. EduGym can improve RL education.

Authors:  Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat

Link:  https://arxiv.org/abs/2311.10590v1

Date: 2023-11-17

Summary:

Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand, public codebases do provide practical examples, but the implemented algorithms tend to be complex, and the underlying test environments contain multiple reinforcement learning challenges at once. Although this is realistic from a research perspective, it often hinders educational conceptual understanding. To solve this issue we introduce EduGym, a set of educational reinforcement learning environments and associated interactive notebooks tailored for education. Each EduGym environment is specifically designed to illustrate a certain aspect/challenge of reinforcement learning (e.g., exploration, partial observability, stochasticity, etc.), while the associated interactive notebook explains the challenge and its possible solution approaches, connecting equations and code in a single document. An evaluation among RL students and researchers shows 86% of them think EduGym is a useful tool for reinforcement learning education. All notebooks are available from https://sites.google.com/view/edu-gym/home, while the full software package can be installed from https://github.com/RLG-Leiden/edugym.

--------------------------------------------------------------------------------------------------------

Towards Improving Robustness Against Common Corruptions using Mixture of Class Specific Experts

Mixture of Class Specific Experts for image robustness: This paper proposes using dedicated network segments for each image class to improve robustness against distortions. By training specialized modules and aggregating outputs, model performance remained high despite corruptions. This architecture may enable computer vision systems like autonomous vehicles to handle unpredictable environments.

Authors:  Shashank Kotyan, Danilo Vasconcellos Vargas

Link:  https://arxiv.org/abs/2311.10177v1

Date: 2023-11-16

Summary:

Neural networks have demonstrated significant accuracy across various domains, yet their vulnerability to subtle input alterations remains a persistent challenge. Conventional methods like data augmentation, while effective to some extent, fall short in addressing unforeseen corruptions, limiting the adaptability of neural networks in real-world scenarios. In response, this paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture. The approach involves disentangling feature learning for individual classes, offering a nuanced enhancement in scalability and overall performance. By training dedicated network segments for each class and subsequently aggregating their outputs, the proposed architecture aims to mitigate vulnerabilities associated with common neural network structures. The study underscores the importance of comprehensive evaluation methodologies, advocating for the incorporation of benchmarks like the common corruptions benchmark. This inclusion provides nuanced insights into the vulnerabilities of neural networks, especially concerning their generalization capabilities and robustness to unforeseen distortions. The research aligns with the broader objective of advancing the development of highly robust learning systems capable of nuanced reasoning across diverse and challenging real-world scenarios. Through this contribution, the paper aims to foster a deeper understanding of neural network limitations and proposes a practical approach to enhance their resilience in the face of evolving and unpredictable conditions.

--------------------------------------------------------------------------------------------------------

Intelligent Generation of Graphical Game Assets: A Conceptual Framework and Systematic Review of the State of the Art

Intelligent graphical asset generation review: This paper reviews approaches to procedurally generating graphical game assets through a systematic literature review. It presents a conceptual framework to guide asset generation method selection. The insights could help developers efficiently create graphical content and reduce manual effort.

Authors:  Kaisei Fukaya, Damon Daylamani-Zad, Harry Agius

Link:  https://arxiv.org/abs/2311.10129v1

Date: 2023-11-16

Summary:

Procedural content generation (PCG) can be applied to a wide variety of tasks in games, from narratives, levels and sounds, to trees and weapons. A large amount of game content is comprised of graphical assets, such as clouds, buildings or vegetation, that do not require gameplay function considerations. There is also a breadth of literature examining the procedural generation of such elements for purposes outside of games. The body of research, focused on specific methods for generating specific assets, provides a narrow view of the available possibilities. Hence, it is difficult to have a clear picture of all approaches and possibilities, with no guide for interested parties to discover possible methods and approaches for their needs, and no facility to guide them through each technique or approach to map out the process of using them. Therefore, a systematic literature review has been conducted, yielding 200 accepted papers. This paper explores state-of-the-art approaches to graphical asset generation, examining research from a wide range of applications, inside and outside of games. Informed by the literature, a conceptual framework has been derived to address the aforementioned gaps.

--------------------------------------------------------------------------------------------------------

Learning interactions to boost human creativity with bandits and GPT-4

Boosting creativity with GPT-4 bandits: This work shows computer hints boosted human performance in a creativity task. Bandits tailored interactions by learning from simulated users. The findings demonstrate potential for AI to enhance human creativity through learned prompting strategies.

Authors:  Ara Vartanian, Xiaoxi Sun, Yun-Shiuan Chuang, Siddharth Suresh, Xiaojin Zhu, Timothy T. Rogers

Link:  https://arxiv.org/abs/2311.10127v1

Date: 2023-11-16

Summary:

This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experiments with humans and with a language AI (GPT-4) we contrast behavior in the standard task versus a variant in which participants can ask for algorithmically-generated hints. Algorithm choice is administered by a multi-armed bandit whose reward indicates whether the hint helped generating more features. Humans and the AI show similar benefits from hints, and remarkably, bandits learning from AI responses prefer the same prompting strategy as those learning from human behavior. The results suggest that strategies for boosting human creativity via computer interactions can be learned by bandits run on groups of simulated participants.

--------------------------------------------------------------------------------------------------------

From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning

Batch-adaptive self-supervised learning: This paper introduces a method to improve image embeddings from self-supervised contrastive learning. By reconstructing batch data, it enables communication between instances during training. This adaptive technique achieved state-of-the-art performance, advancing self-supervised techniques.

Authors:  Jiansong Zhang, Peizhong Liu

Link:  https://arxiv.org/abs/2311.09974v1

Date: 2023-11-16

Summary:

In recent years, self-supervised contrastive learning has emerged as a distinguished paradigm in the artificial intelligence landscape. It facilitates unsupervised feature learning through contrastive delineations at the instance level. However, crafting an effective self-supervised paradigm remains a pivotal challenge within this field. This paper delves into two crucial factors impacting self-supervised contrastive learning-bach size and pretext tasks, and from a data processing standpoint, proposes an adaptive technique of batch fusion. The proposed method, via dimensionality reduction and reconstruction of batch data, enables formerly isolated individual data to partake in intra-batch communication through the Embedding Layer. Moreover, it adaptively amplifies the self-supervised feature encoding capability as the training progresses. We conducted a linear classification test of this method based on the classic contrastive learning framework on ImageNet-1k. The empirical findings illustrate that our approach achieves state-of-the-art performance under equitable comparisons. Benefiting from its "plug-and-play" characteristics, we further explored other contrastive learning methods. On the ImageNet-100, compared to the original performance, the top1 has seen a maximum increase of 1.25%. We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research, bringing a fresh perspective to this community.

--------------------------------------------------------------------------------------------------------

AutoPlanBench: : Automatically generating benchmarks for LLM planners from PDDL

Auto generating textual planning benchmarks: This paper automatically converts planning tasks into text for evaluating large language models. Results showed strengths and limitations of LLMs on planning challenges. The benchmark generation process enables convenient testing of LLMs on reasoning tasks.

Authors:  Katharina Stein, Alexander Koller

Link:  https://arxiv.org/abs/2311.09830v1

Date: 2023-11-16

Summary:

LLMs are being increasingly used for planning-style tasks, but their capabilities for planning and reasoning are poorly understood. We present a novel method for automatically converting planning benchmarks written in PDDL into textual descriptions and offer a benchmark dataset created with our method. We show that while the best LLM planners do well on many planning tasks, others remain out of reach of current methods.

--------------------------------------------------------------------------------------------------------

Comparing Differentiable Logics for Learning Systems: A Research Preview

Comparing differentiable logics for ML systems: This preview compares differentiable logics that inject constraints into ML training for verifiable models. Findings showed a difficulty in tuning logics' impact, motivating future work. Differentiable logics may enable reliable ML for autonomous systems.

Authors:  Thomas Flinkow, Barak A. Pearlmutter, Rosemary Monahan

Link:  https://arxiv.org/abs/2311.09809v1

Date: 2023-11-16

Summary:

Extensive research on formal verification of machine learning (ML) systems indicates that learning from data alone often fails to capture underlying background knowledge. A variety of verifiers have been developed to ensure that a machine-learnt model satisfies correctness and safety properties, however, these verifiers typically assume a trained network with fixed weights. ML-enabled autonomous systems are required to not only detect incorrect predictions, but should also possess the ability to self-correct, continuously improving and adapting. A promising approach for creating ML models that inherently satisfy constraints is to encode background knowledge as logical constraints that guide the learning process via so-called differentiable logics. In this research preview, we compare and evaluate various logics from the literature in weakly-supervised contexts, presenting our findings and highlighting open problems for future work. Our experimental results are broadly consistent with results reported previously in literature; however, learning with differentiable logics introduces a new hyperparameter that is difficult to tune and has significant influence on the effectiveness of the logics.

--------------------------------------------------------------------------------------------------------

Breaking Boundaries: Balancing Performance and Robustness in Deep Wireless Traffic Forecasting

Breaking boundaries in time series forecasting: This paper develops defenses for robust time series forecasting under data corruption. A hybrid strategy detected and eliminated adversarial noise while retaining accuracy. The techniques optimize trade-offs between performance and robustness for applications like traffic forecasting.

Authors:  Romain Ilbert, Thai V. Hoang, Zonghua Zhang, Themis Palpanas

Link:  https://arxiv.org/abs/2311.09790v2

Date: 2023-11-17

Summary:

Balancing the trade-off between accuracy and robustness is a long-standing challenge in time series forecasting. While most of existing robust algorithms have achieved certain suboptimal performance on clean data, sustaining the same performance level in the presence of data perturbations remains extremely hard. In this paper, we study a wide array of perturbation scenarios and propose novel defense mechanisms against adversarial attacks using real-world telecom data. We compare our strategy against two existing adversarial training algorithms under a range of maximal allowed perturbations, defined using $\ell_{\infty}$-norm, $\in [0.1,0.4]$. Our findings reveal that our hybrid strategy, which is composed of a classifier to detect adversarial examples, a denoiser to eliminate noise from the perturbed data samples, and a standard forecaster, achieves the best performance on both clean and perturbed data. Our optimal model can retain up to $92.02\%$ the performance of the original forecasting model in terms of Mean Squared Error (MSE) on clean data, while being more robust than the standard adversarially trained models on perturbed data. Its MSE is 2.71$\times$ and 2.51$\times$ lower than those of comparing methods on normal and perturbed data, respectively. In addition, the components of our models can be trained in parallel, resulting in better computational efficiency. Our results indicate that we can optimally balance the trade-off between the performance and robustness of forecasting models by improving the classifier and denoiser, even in the presence of sophisticated and destructive poisoning attacks.

--------------------------------------------------------------------------------------------------------

New advancements, challenges and opportunities of nanophotonics for neuromorphic computing: A state-of-the-art review

Photonics for neuromorphic computing: This review covers recent photonic hardware innovations in high-speed, energy-efficient neuromorphic computing. Comparisons reveal challenges in achieving extreme scaleup. Ongoing advances in nanophotonic devices, materials, and integration may realize future ultra-efficient AI accelerators.

Authors:  Renjie Li, Yuanhao Gong, Hai Huang, Yuze Zhou, Sixuan Mao, Connie Chang-Hasnain, Zhaoyu Zhang

Link:  https://arxiv.org/abs/2311.09767v1

Date: 2023-11-16

Summary:

The expansion of optoelectronic devices on photonic integration platforms has led to significant growth in the field of photonic computing. Photonic integrated circuits have facilitated the creation of ultrafast artificial neural networks, forming the basis for a novel category of information processing devices. Their application extends to diverse domains such as medical diagnosis, language models, telecommunications, quantum computing, and the metaverse, addressing the escalating demands of machine learning and artificial intelligence (AI). In contrast, conventional electronics faces challenges in latency, crosstalk, and energy consumption. Neuromorphic photonics emerges as a compelling solution, featuring sub-nanosecond latencies, minimal heat dissipation, and high parallelism, expanding the scope of AI and Optical Neural Networks. This review explores recent advances in integrated photonic neuromorphic systems, focusing on materials and device engineering breakthroughs needed to overcome existing challenges. Examining various technologies in AI accelerators, from traditional optics to PICs, we assess energy efficiency through operations per joule and compute density in operations per squared millimeter per second. A comparative analysis highlights crucial technical aspects, emphasizing nanophotonic components like VCSEL lasers, optical interconnects, nanocavity resonators, and frequency microcombs. These components showcase recent breakthroughs in photonic engineering and materials science, enabling the creation of customized neuromorphic systems for AI tasks. Despite progress, current technologies face obstacles in achieving photonic AI accelerators with computing speed and energy efficiencies reaching the petaOPS range. The review explores potential future approaches in new devices, fabrication, materials, scalability, and integration to enhance critical performance metrics.

--------------------------------------------------------------------------------------------------------

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

Chain-of-Noting for robust retrieval-augmented LMs: This work augments retrieval-based dialogue models with interpretability and reliability. By generating reading notes and evaluations, the method filters irrelevant retrieved text. Experiments showed improvements in open-domain QA and out-of-scope handling. The approach makes conversational agents more robust.

Authors:  Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

Link:  https://arxiv.org/abs/2311.09210v1

Date: 2023-11-15

Summary:

Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.

--------------------------------------------------------------------------------------------------------

Fusion-Eval: Integrating Evaluators with LLMs

Fusion-Eval: Integrating human perspectives: This paper proposes using LLMs themselves to evaluate LLMs by fusing human, model, and metric assessments. Tests showed high human correlation, setting a new standard for matching human judgments. The technique could enable more meaningful LLM evaluations.

Authors:  Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng

Link:  https://arxiv.org/abs/2311.09204v1

Date: 2023-11-15

Summary:

Evaluating Large Language Models (LLMs) is a complex task, especially considering the intricacies of natural language understanding and the expectations for high-level reasoning. Traditional evaluations typically lean on human-based, model-based, or automatic-metrics-based paradigms, each with its own advantages and shortcomings. We introduce "Fusion-Eval", a system that employs LLMs not solely for direct evaluations, but to skillfully integrate insights from diverse evaluators. This gives Fusion-Eval flexibility, enabling it to work effectively across diverse tasks and make optimal use of multiple references. In testing on the SummEval dataset, Fusion-Eval achieved a Spearman correlation of 0.96, outperforming other evaluators. The success of Fusion-Eval underscores the potential of LLMs to produce evaluations that closely align human perspectives, setting a new standard in the field of LLM evaluation.

--------------------------------------------------------------------------------------------------------

Assessing the Robustness of Intelligence-Driven Reinforcement Learning

Assessing robustness in intelligence-driven RL: This paper explores the vulnerability of mission-critical reinforcement learning using reward machine representations. Initial results indicate difficulties in handling noise, motivating further hardening of RL systems before deployment in high-stakes military contexts.

Authors:  Lorenzo Nodari, Federico Cerutti

Link:  https://arxiv.org/abs/2311.09027v1

Date: 2023-11-15

Summary:

Robustness to noise is of utmost importance in reinforcement learning systems, particularly in military contexts where high stakes and uncertain environments prevail. Noise and uncertainty are inherent features of military operations, arising from factors such as incomplete information, adversarial actions, or unpredictable battlefield conditions. In RL, noise can critically impact decision-making, mission success, and the safety of personnel. Reward machines offer a powerful tool to express complex reward structures in RL tasks, enabling the design of tailored reinforcement signals that align with mission objectives. This paper considers the problem of the robustness of intelligence-driven reinforcement learning based on reward machines. The preliminary results presented suggest the need for further research in evidential reasoning and learning to harden current state-of-the-art reinforcement learning approaches before being mission-critical-ready.

--------------------------------------------------------------------------------------------------------

Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values

Disentangling paper impact dimensions with GNNs: This work predicts citation impact by modeling diffusion, conformity, and contribution. Comparisons against citation graphs showed accuracy gains, particularly for recent papers. The nuanced predictions enable improved paper recommendation and evaluation.

Authors:  Zhikai Xue, Guoxiu He, Zhuoren Jiang, Yangyang Kang, Star Zhao, Wei Lu

Link:  https://arxiv.org/abs/2311.09262v1

Date: 2023-11-15

Summary:

The potential impact of an academic paper is determined by various factors, including its popularity and contribution. Existing models usually estimate original citation counts based on static graphs and fail to differentiate values from nuanced perspectives. In this study, we propose a novel graph neural network to Disentangle the Potential impacts of Papers into Diffusion, Conformity, and Contribution values (called DPPDCC). Given a target paper, DPPDCC encodes temporal and structural features within the constructed dynamic heterogeneous graph. Particularly, to capture the knowledge flow, we emphasize the importance of comparative and co-cited/citing information between papers and aggregate snapshots evolutionarily. To unravel popularity, we contrast augmented graphs to extract the essence of diffusion and predict the accumulated citation binning to model conformity. We further apply orthogonal constraints to encourage distinct modeling of each perspective and preserve the inherent value of contribution. To evaluate models' generalization for papers published at various times, we reformulate the problem by partitioning data based on specific time points to mirror real-world conditions. Extensive experimental results on three datasets demonstrate that DPPDCC significantly outperforms baselines for previously, freshly, and immediately published papers. Further analyses confirm its robust capabilities. We will make our datasets and codes publicly available.

--------------------------------------------------------------------------------------------------------

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

Multilingual multitask code benchmark: This paper introduces CodeScope, a comprehensive benchmark for evaluating code understanding across languages and tasks. An execution engine enables functionality testing. Experiments uncovered model strengths and weaknesses. CodeScope provides a rigorous testbed for coding LLMs.

Authors:  Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Shuiguang Deng, Hari Sundaram

Link:  https://arxiv.org/abs/2311.08588v1

Date: 2023-11-14

Summary:

Large Language Models (LLMs) have demonstrated remarkable performance on coding related tasks, particularly on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are deficient as they focus on a narrow range of popular programming languages and specific tasks, whereas the real-world software development scenarios show dire need to implement systems with multilingual programming environments to satisfy diverse requirements. Practical programming practices also strongly expect multi-task settings for testing coding capabilities of LLMs comprehensively and robustly. Second, most benchmarks also fail to consider the actual executability and the consistency of execution results of the generated code. To bridge these gaps between existing benchmarks and expectations from practical applications, we introduce CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length. To facilitate execution-based evaluations of code generation, we develop MultiCodeEngine, an automated code execution engine that supports 14 programming languages. Finally, we systematically evaluate and analyze 8 mainstream LLMs on CodeScope tasks and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks. The CodeScope benchmark and datasets are publicly available at https://github.com/WeixiangYAN/CodeScope.

--------------------------------------------------------------------------------------------------------

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

Robustness against counterfactual external knowledge: This paper evaluates LLMs on using unreliable external knowledge containing false information. Models struggled to discern reliable sources, motivating future work. Robust integration of external knowledge could improve chatbots' accuracy.

Authors:  Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun

Link:  https://arxiv.org/abs/2311.08147v1

Date: 2023-11-14

Summary:

LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet may include counterfactual information that will confuse the model and lead to an incorrect response. Thus there is a pressing need for LLMs to possess the ability to distinguish reliable information from external knowledge. Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information. Evaluation results show that existing LLMs are susceptible to interference from unreliable external knowledge with counterfactual information, and simple intervention methods make limited contributions to the alleviation of this issue.

--------------------------------------------------------------------------------------------------------

Act-VIT: A Representationally Robust Attention Architecture for Skeleton Based Action Recognition Using Vision Transformer

Skeleton-based action recognition with transformers: This paper explores using vision transformers for skeleton action recognition. Experiments found transformers outperformed CNNs and were more robust across input representations. The work demonstrates the advantage of attention models in this application area.

Authors:  Ozge Oztimur Karadag

Link:  https://arxiv.org/abs/2311.08094v1

Date: 2023-11-14

Summary:

Skeleton-based action recognition receives the attention of many researchers as it is robust to viewpoint and illumination changes, and its processing is much more efficient than video frames. With the emergence of deep learning models, it has become very popular to represent the skeleton data in pseudo-image form and apply Convolutional Neural Networks for action recognition. Thereafter, studies concentrated on finding effective methods for forming pseudo-images. Recently, attention networks, more specifically transformers have provided promising results in various vision problems. In this study, the effectiveness of vision transformers for skeleton-based action recognition is examined and its robustness on the pseudo-image representation scheme is investigated. To this end, a three-level architecture, Act-VIT is proposed, which forms a set of pseudo images apply a classifier on each of the representation and combine their results to find the final action class. The classifiers of Act-VIT are first realized by CNNs and then by VITs and their performances are compared. Experimental studies reveal that the vision transformer is less sensitive to the initial pseudo-image representation compared to CNN. Nevertheless, even with the vision transformer, the recognition performance can be further improved by consensus of classifiers.

--------------------------------------------------------------------------------------------------------

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

Solving visual analogies with neural embeddings: This work applies word analogy techniques to visual reasoning, representing images as latent vectors. Vector arithmetic uncovered simple analogy rules, though complex items remained challenging. The findings reveal promise in generalizing verbal analogy solutions.

Authors:  Luca H. Thoms, Karel A. Veldkamp, Hannes Rosenbusch, Claire E. Stevenson

Link:  https://arxiv.org/abs/2311.08083v1

Date: 2023-11-14

Summary:

Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king - man + woman = __?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21%) and on ConceptARC it scored 8.8%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.

--------------------------------------------------------------------------------------------------------

Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Uplift modeling with GNNs and causality: This paper estimates marketing campaign impact using graph networks and causal knowledge. Comparisons showed accuracy improvements. The method identifies the most effective campaigns and receptive customers.

Authors:  Haowen Wang, Xinyan Ye, Yangze Zhou, Zhiyi Zhang, Longhan Zhang, Jing Jiang

Link:  https://arxiv.org/abs/2311.08434v1

Date: 2023-11-14

Summary:

Uplift modeling is a fundamental component of marketing effect modeling, which is commonly employed to evaluate the effects of treatments on outcomes. Through uplift modeling, we can identify the treatment with the greatest benefit. On the other side, we can identify clients who are likely to make favorable decisions in response to a certain treatment. In the past, uplift modeling approaches relied heavily on the difference-in-difference (DID) architecture, paired with a machine learning model as the estimation learner, while neglecting the link and confidential information between features. We proposed a framework based on graph neural networks that combine causal knowledge with an estimate of uplift value. Firstly, we presented a causal representation technique based on CATE (conditional average treatment effect) estimation and adjacency matrix structure learning. Secondly, we suggested a more scalable uplift modeling framework based on graph convolution networks for combining causal knowledge. Our findings demonstrate that this method works effectively for predicting uplift values, with small errors in typical simulated data, and its effectiveness has been verified in actual industry marketing data.

--------------------------------------------------------------------------------------------------------

Towards Improving Robustness Against Common Corruptions in Object Detectors Using Adversarial Contrastive Learning

Adversarial contrastive learning for object detection: This paper boosts object detector robustness against distortions by optimizing contrastive loss on adversarial examples. The approach aligned representations between clean and perturbed images. It improves reliability in applications like autonomous driving.

Authors:  Shashank Kotyan, Danilo Vasconcellos Vargas

Link:  https://arxiv.org/abs/2311.07928v1

Date: 2023-11-14

Summary:

Neural networks have revolutionized various domains, exhibiting remarkable accuracy in tasks like natural language processing and computer vision. However, their vulnerability to slight alterations in input samples poses challenges, particularly in safety-critical applications like autonomous driving. Current approaches, such as introducing distortions during training, fall short in addressing unforeseen corruptions. This paper proposes an innovative adversarial contrastive learning framework to enhance neural network robustness simultaneously against adversarial attacks and common corruptions. By generating instance-wise adversarial examples and optimizing contrastive loss, our method fosters representations that resist adversarial perturbations and remain robust in real-world scenarios. Subsequent contrastive learning then strengthens the similarity between clean samples and their adversarial counterparts, fostering representations resistant to both adversarial attacks and common distortions. By focusing on improving performance under adversarial and real-world conditions, our approach aims to bolster the robustness of neural networks in safety-critical applications, such as autonomous vehicles navigating unpredictable weather conditions. We anticipate that this framework will contribute to advancing the reliability of neural networks in challenging environments, facilitating their widespread adoption in mission-critical scenarios.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.