Eye On AI

View Original

Week Ending 12.17.2023

RESEARCH WATCH: 12.17.2023

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems

The paper "One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems" proposes a unified neural network model that can solve various abstract visual reasoning tasks without making assumptions about the structure of the problems. This could enable more general artificial intelligence systems to perform complex reasoning.

Authors:  Mikołaj Małkiński, Jacek Mańdziuk

Link:  https://arxiv.org/abs/2312.09997v1

Date: 2023-12-15

Summary:

Abstract Visual Reasoning (AVR) comprises a wide selection of various problems similar to those used in human IQ tests. Recent years have brought dynamic progress in solving particular AVR tasks, however, in the contemporary literature AVR problems are largely dealt with in isolation, leading to highly specialized task-specific methods. With the aim of developing universal learning systems in the AVR domain, we propose the unified model for solving Single-Choice Abstract visual Reasoning tasks (SCAR), capable of solving various single-choice AVR tasks, without making any a priori assumptions about the task structure, in particular the number and location of panels. The proposed model relies on a novel Structure-Aware dynamic Layer (SAL), which adapts its weights to the structure of the considered AVR problem. Experiments conducted on Raven's Progressive Matrices, Visual Analogy Problems, and Odd One Out problems show that SCAR (SAL-based models, in general) effectively solves diverse AVR tasks, and its performance is on par with the state-of-the-art task-specific baselines. What is more, SCAR demonstrates effective knowledge reuse in multi-task and transfer learning settings. To our knowledge, this work is the first successful attempt to construct a general single-choice AVR solver relying on self-configurable architecture and unified solving method. With this work we aim to stimulate and foster progress on task-independent research paths in the AVR domain, with the long-term goal of development of a general AVR solver.

--------------------------------------------------------------------------------------------------------

Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China

The paper "Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China" finds that multilingual language models exhibit biases related to geopolitics and censorship. This highlights concerns around the objectivity of AI systems. The work could inform cross-cultural communication applications.

Authors:  Di Zhou, Yinxian Zhang

Link:  https://arxiv.org/abs/2312.09917v1

Date: 2023-12-15

Summary:

The rising popularity of ChatGPT and other AI-powered large language models (LLMs) has led to increasing studies highlighting their susceptibility to mistakes and biases. However, most of these studies focus on models trained on English texts. Taking an innovative approach, this study investigates political biases in GPT's multilingual models. We posed the same question about high-profile political issues in the United States and China to GPT in both English and simplified Chinese, and our analysis of the bilingual responses revealed that GPT's bilingual models' political "knowledge" (content) and the political "attitude" (sentiment) are significantly more inconsistent on political issues in China. The simplified Chinese GPT models not only tended to provide pro-China information but also presented the least negative sentiment towards China's problems, whereas the English GPT was significantly more negative towards China. This disparity may stem from Chinese state censorship and US-China geopolitical tensions, which influence the training corpora of GPT bilingual models. Moreover, both Chinese and English models tended to be less critical towards the issues of "their own" represented by the language used, than the issues of "the other." This suggests that GPT multilingual models could potentially develop a "political identity" and an associated sentiment bias based on their training language. We discussed the implications of our findings for information transmission and communication in an increasingly divided world.

--------------------------------------------------------------------------------------------------------

Methodologies for Future Vehicular Digital Twins

The paper "Methodologies for Future Vehicular Digital Twins" reviews approaches for developing high-fidelity simulations of wireless vehicular communications. Accurate digital twins can enable cost-effective development of autonomous, connected vehicles and intelligent transportation infrastructure.

Authors:  Danilo Radovic, Markus Hofer, Faruk Pasic, Enrico M. Vitucci, Aleksei Fedorov, Thomas Zemen

Link:  https://arxiv.org/abs/2312.09902v1

Date: 2023-12-15

Summary:

The role of wireless communications in various domains of intelligent transportation systems is significant; it is evident that dependable message exchange between nodes (cars, bikes, pedestrians, infrastructure, etc.) has to be guaranteed to fulfill the stringent requirements for future transportation systems. A precise site-specific digital twin is seen as a key enabler for the cost-effective development and validation of future vehicular communication systems. Furthermore, achieving a realistic digital twin for dependable wireless communications requires accurate measurement, modeling, and emulation of wireless communication channels. However, contemporary approaches in these domains are not efficient enough to satisfy the foreseen needs. In this position paper, we overview the current solutions, indicate their limitations, and discuss the most prospective paths for future investigation.

--------------------------------------------------------------------------------------------------------

Learning in Online Principle-Agent Interactions: The Power of Menus

The paper “Learning in Online Principle-Agent Interactions: The Power of Menus” explores how a principal can efficiently learn the private information of an agent from historical interactions. This has implications for pricing, targeted marketing, and mechanism design applications leveraging reinforcement learning.

Authors:  Minbiao Han, Michael Albert, Haifeng Xu

Link:  https://arxiv.org/abs/2312.09869v1

Date: 2023-12-15

Summary:

We study a ubiquitous learning challenge in online principal-agent problems during which the principal learns the agent's private information from the agent's revealed preferences in historical interactions. This paradigm includes important special cases such as pricing and contract design, which have been widely studied in recent literature. However, existing work considers the case where the principal can only choose a single strategy at every round to interact with the agent and then observe the agent's revealed preference through their actions. In this paper, we extend this line of study to allow the principal to offer a menu of strategies to the agent and learn additionally from observing the agent's selection from the menu. We provide a thorough investigation of several online principal-agent problem settings and characterize their sample complexities, accompanied by the corresponding algorithms we have developed. We instantiate this paradigm to several important design problems $-$ including Stackelberg (security) games, contract design, and information design. Finally, we also explore the connection between our findings and existing results about online learning in Stackelberg games, and we offer a solution that can overcome a key hard instance of Peng et al. (2019).

--------------------------------------------------------------------------------------------------------

Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark

The paper “Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark” introduces a comprehensive suite of datasets for evaluating domain adaptation techniques on time series data. This can catalyze progress in critical applications like predictive health monitoring and industrial process optimization.

Authors:  Hassan Ismail Fawaz, Ganesh Del Grosso, Tanguy Kerdoncuff, Aurelie Boisbunon, Illyyne Saffar

Link:  https://arxiv.org/abs/2312.09857v1

Date: 2023-12-15

Summary:

Unsupervised Domain Adaptation (UDA) aims to harness labeled source data to train models for unlabeled target data. Despite extensive research in domains like computer vision and natural language processing, UDA remains underexplored for time series data, which has widespread real-world applications ranging from medicine and manufacturing to earth observation and human activity recognition. Our paper addresses this gap by introducing a comprehensive benchmark for evaluating UDA techniques for time series classification, with a focus on deep learning methods. We provide seven new benchmark datasets covering various domain shifts and temporal dynamics, facilitating fair and standardized UDA method assessments with state of the art neural network backbones (e.g. Inception) for time series data. This benchmark offers insights into the strengths and limitations of the evaluated approaches while preserving the unsupervised nature of domain adaptation, making it directly applicable to practical problems. Our paper serves as a vital resource for researchers and practitioners, advancing domain adaptation solutions for time series data and fostering innovation in this critical field. The implementation code of this benchmark is available at https://github.com/EricssonResearch/UDA-4-TSC.

--------------------------------------------------------------------------------------------------------

Tracking Skiers from the Top to the Bottom

The paper "Tracking Skiers from the Top to the Bottom" introduces a new dataset and benchmark for evaluating visual tracking of skiers in video. This could help unlock computer vision applications for quantifying athlete performance in skiing.

Authors:  Matteo Dunnhofer, Luca Sordi, Niki Martinel, Christian Micheloni

Link:  https://arxiv.org/abs/2312.09723v1

Date: 2023-12-15

Summary:

Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at https://machinelearning.uniud.it/datasets/skitb.

--------------------------------------------------------------------------------------------------------

Style Generation in Robot Calligraphy with Deep Generative Adversarial Networks

The paper “Style Generation in Robot Calligraphy with Deep Generative Adversarial Networks” proposes a deep learning method for automatic generation of artistic Chinese calligraphy. The techniques could be applied to education, culture heritage preservation, and creative arts.

Authors:  Xiaoming Wang, Zhiguo Gong

Link:  https://arxiv.org/abs/2312.09673v1

Date: 2023-12-15

Summary:

Robot calligraphy is an emerging exploration of artificial intelligence in the fields of art and education. Traditional calligraphy generation researches mainly focus on methods such as tool-based image processing, generative models, and style transfer. Unlike the English alphabet, the number of Chinese characters is tens of thousands, which leads to difficulties in the generation of a style consistent Chinese calligraphic font with over 6000 characters. Due to the lack of high-quality data sets, formal definitions of calligraphy knowledge, and scientific art evaluation methods, The results generated are frequently of low quality and falls short of professional-level requirements. To address the above problem, this paper proposes an automatic calligraphy generation model based on deep generative adversarial networks (deepGAN) that can generate style calligraphy fonts with professional standards. The key highlights of the proposed method include: (1) The datasets use a high-precision calligraphy synthesis method to ensure its high quality and sufficient quantity; (2) Professional calligraphers are invited to conduct a series of Turing tests to evaluate the gap between model generation results and human artistic level; (3) Experimental results indicate that the proposed model is the state-of-the-art among current calligraphy generation methods. The Turing tests and similarity evaluations validate the effectiveness of the proposed method.

--------------------------------------------------------------------------------------------------------

Hierarchical Graph Pattern Understanding for Zero-Shot VOS

The paper “Hierarchical Graph Pattern Understanding for Zero-Shot VOS” develops a neural network architecture that combines optical flow and graph learning for video object segmentation. The work could benefit vision applications like video editing and autonomous vehicles.

Authors:  Gensheng Pei, Fumin Shen, Yazhou Yao, Tao Chen, Xian-Sheng Hua, Heng-Tao Shen

Link:  https://arxiv.org/abs/2312.09525v1

Date: 2023-12-15

Summary:

The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (\ie, optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU}.

--------------------------------------------------------------------------------------------------------

Design, construction and evaluation of emotional multimodal pathological speech database

The paper “Design, construction and evaluation of emotional multimodal pathological speech database” constructs a unique multi-perspective dataset of emotional speech from patients with motor disorders. This resource could drive new assistive speech technologies.

Authors:  Ting Zhu, Shufei Duan, Huizhi Liang, Wei Zhang

Link:  https://arxiv.org/abs/2312.08998v1

Date: 2023-12-14

Summary:

The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, angry and neutral emotions. All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program. The subjective analysis justifies from emotion discrimination accuracy, speech intelligibility, valence-arousal spatial distribution, and correlation between SCL-90 and disease severity. The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression.

--------------------------------------------------------------------------------------------------------

LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

The paper “LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers” demonstrates how large language models can provide supervised signals for agents learning skills interactively. The framework could enable more scalable and economical training of intelligent agents.

Authors:  Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch

Link:  https://arxiv.org/abs/2312.08958v1

Date: 2023-12-14

Summary:

We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them.

--------------------------------------------------------------------------------------------------------

Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

The paper “Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning” substantially improves mathematical reasoning capabilities of large language models via a novel pre-training approach. The techniques have implications for developing more robust and trustworthy AI.

Authors:  Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Mao Yang

Link:  https://arxiv.org/abs/2312.08901v1

Date: 2023-12-14

Summary:

Large language models (LLMs) have shown impressive capabilities in various tasks, yet they still struggle with math reasoning. Despite efforts to optimize Chain-of-Thoughts (CoT) prompts and fine-tune LLMs, the potential of few-shot learning remains unexplored. In this work, we propose CoT-Max, a novel approach pushing the boundaries of few-shot CoT learning to improve LLM math reasoning capabilities. CoT-Max addresses the challenges of the selection of useful examples and limited number of examples due to restricted context window length. Inspired by our observation that natural language inputs contain many redundancy, we propose a coarse-to-fine pruner as a plug-and-play module for LLMs, which first identifies crucial CoT examples from a large batch and then further prunes unimportant tokens. To train the pruner, we collect a math reasoning dataset with diverse difficulty and steps, introduce a reward to measure both the input's effectiveness for math reasoning and token length constraints, and propose a novel training approach with reinforcement learning. As a result, CoT-Max significantly outperforms CoT and few-shot prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 mathematical datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Max surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva, etc.) on the GSM8K.

--------------------------------------------------------------------------------------------------------

Knowledge-Driven Modulation of Neural Networks with Attention Mechanism for Next Activity Prediction

The paper "Knowledge-Driven Modulation of Neural Networks with Attention Mechanism for Next Activity Prediction" incorporates procedural process models to improve neural network predictions in business process monitoring applications. This could benefit anomaly detection and process optimization.

Authors:  Ivan Donadello, Jonghyeon Ko, Fabrizio Maria Maggi, Jan Mendling, Francesco Riva, Matthias Weidlich

Link:  https://arxiv.org/abs/2312.08847v1

Date: 2023-12-14

Summary:

Predictive Process Monitoring (PPM) aims at leveraging historic process execution data to predict how ongoing executions will continue up to their completion. In recent years, PPM techniques for the prediction of the next activities have matured significantly, mainly thanks to the use of Neural Networks (NNs) as a predictor. While their performance is difficult to beat in the general case, there are specific situations where background process knowledge can be helpful. Such knowledge can be leveraged for improving the quality of predictions for exceptional process executions or when the process changes due to a concept drift. In this paper, we present a Symbolic[Neuro] system that leverages background knowledge expressed in terms of a procedural process model to offset the under-sampling in the training data. More specifically, we make predictions using NNs with attention mechanism, an emerging technology in the NN field. The system has been tested on several real-life logs showing an improvement in the performance of the prediction task.

--------------------------------------------------------------------------------------------------------

TigerBot: An Open Multilingual Multitask LLM

The paper “TigerBot: An Open Multilingual Multitask LLM” releases TigerBot, a family of large multilingual language models achieving state-of-the-art performance on benchmarks. The publicly available models could spur innovation in natural language processing across many languages.

Authors:  Ye Chen, Wei Cai, Liangmin Wu, Xiaowei Li, Zhanxuan Xin, Cong Fu

Link:  https://arxiv.org/abs/2312.08688v2

Date: 2023-12-15

Summary:

We release and introduce the TigerBot family of large language models (LLMs), consisting of base and chat models, sized from 7, 13, 70 and 180 billion parameters. We develop our models embarking from Llama-2 and BLOOM, and push the boundary further in data, training algorithm, infrastructure, and application tools. Our models yield meaningful performance gain over SOTA open-source models, e.g., Llama-2, specifically 6% gain in English and 20% gain in Chinese. TigerBot model family also achieves leading performance in major academic and industrial benchmarks and leaderboards. We believe that TigerBot represents just a snapshot of lightning-fast progression in LLM open-source community. Therefore, we are thrilled to give back by publicly releasing our models and reporting our approach behind, with additional emphases on building SOTA LLMs in a democratized way and making LLMs of use in real-world applications.

--------------------------------------------------------------------------------------------------------

Contractive error feedback for gradient compression

The paper “Contractive error feedback for gradient compression” proposes a technique to reduce memory usage during neural network training by compressing gradient updates. The method could enable more efficient distributed and on-device deep learning.

Authors:  Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

Link:  https://arxiv.org/abs/2312.08538v1

Date: 2023-12-13

Summary:

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.

--------------------------------------------------------------------------------------------------------

Object-Centric Conformance Alignments with Synchronization (Extended Version)

The paper “Object-Centric Conformance Alignments with Synchronization” presents a process mining technique tailored for interdependent multi-object processes. The conformance checking approach could improve monitoring in supply chain, healthcare, and other complex system workflows.

Authors:  Alessandro Gianola, Marco Montali, Sarah Winkler

Link:  https://arxiv.org/abs/2312.08537v1

Date: 2023-12-13

Summary:

Real-world processes operate on objects that are inter-dependent. To accurately reflect the nature of such processes, object-centric process mining techniques are needed, notably conformance checking. However, while the object-centric perspective has recently gained traction, few concrete process mining techniques have been presented so far. Moreover, existing approaches are severely limited in their abilities to keep track of object identity and object dependencies. Consequently, serious problems in logs remain undetected. In this paper, we present a new formalism that combines the key modelling features of two existing approaches, in particular the ability of object-centric Petri nets to capture one-to-many relations and the one of Petri nets with identifiers to compare and synchronize objects based on their identity. We call the resulting formalism 'object-centric Petri nets with identifiers', and define alignments and the conformance checking task for this setting. We propose a conformance checking approach for such nets based on an encoding in satisfiability modulo theories (SMT), and illustrate how it can be effectively used to overcome shortcomings of earlier work. To assess its practicality, we perform an evaluation on data from the literature.

--------------------------------------------------------------------------------------------------------

The connectivity degree controls the difficulty of RBN reservoir design

The paper “The connectivity degree controls the difficulty of RBN reservoir design” investigates how neural network hyperparameters affect performance in reservoir computing, clarifying design tradeoffs. This could inform efficient deployment for temporal signal processing tasks.

Authors:  Emmanuel Calvet, Bertrand Reulet, Jean Rouat

Link:  https://arxiv.org/abs/2312.08522v1

Date: 2023-12-13

Summary:

Reservoir Computing (RC) is a paradigm in artificial intelligence where a recurrent neural network (RNN) is used to process temporal data, leveraging the inherent dynamical properties of the reservoir to perform complex computations. In the realm of RC, the excitatory-inhibitory balance b has been shown to be pivotal for driving the dynamics and performance of Echo State Networks (ESN) and, more recently, Random Boolean Network (RBN). However, the relationship between b and other parameters of the network is still poorly understood. This article explores how the interplay of the balance b, the connectivity degree K (i.e., the number of synapses per neuron) and the size of the network (i.e., the number of neurons N) influences the dynamics and performance (memory and prediction) of an RBN reservoir. Our findings reveal that K and b are strongly tied in optimal reservoirs. Reservoirs with high K have two optimal balances, one for globally inhibitory networks (b<0), and the other one for excitatory networks (b>0). Both show asymmetric performances about a zero balance. In contrast, for moderate K, the optimal value being K=4, best reservoirs are obtained when excitation and inhibition almost, but not exactly, balance each other. For almost all K, the influence of the size is such that increasing N leads to better performance, even with large values of N. Our investigation provides clear directions to generate optimal reservoirs or reservoirs with constraints on size or connectivity.

--------------------------------------------------------------------------------------------------------

Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)

The paper “Revisiting Recommendation Loss Functions through Contrastive Learning” systematically examines recommendation losses, finding contrastive methods like InfoNCE+ and MINE+ achieve best results. The analysis provides lessons for production recommender systems.

Authors:  Dong Li, Ruoming Jin, Bin Ren

Link:  https://arxiv.org/abs/2312.08520v1

Date: 2023-12-13

Summary:

Inspired by the success of contrastive learning, we systematically examine recommendation losses, including listwise (softmax), pairwise (BPR), and pointwise (MSE and CCL) losses. In this endeavor, we introduce InfoNCE+, an optimized generalization of InfoNCE with balance coefficients, and highlight its performance advantages, particularly when aligned with our new decoupled contrastive loss, MINE+. We also leverage debiased InfoNCE to debias pointwise recommendation loss (CCL) as Debiased CCL. Interestingly, our analysis reveals that linear models like iALS and EASE are inherently debiased. Empirical results demonstrates the effectiveness of MINE+ and Debiased-CCL.

--------------------------------------------------------------------------------------------------------

(Debiased) Contrastive Learning Loss for Recommendation (Technical Report)

The paper “(Debiased) Contrastive Learning Loss for Recommendation” examines recommendation losses through a contrastive learning lens. The authors introduce debiased losses that outperform biased alternatives. The techniques could improve recommendation quality in production systems.

Authors:  Ruoming Jin, Dong Li

Link:  https://arxiv.org/abs/2312.08517v1

Date: 2023-12-13

Summary:

In this paper, we perform a systemic examination of the recommendation losses, including listwise (softmax), pairwise(BPR), and pointwise (mean-squared error, MSE, and Cosine Contrastive Loss, CCL) losses through the lens of contrastive learning. We introduce and study both debiased InfoNCE and mutual information neural estimator (MINE), for the first time, under the recommendation setting. We also relate and differentiate these two losses with the BPR loss through the lower bound analysis. Furthermore, we present the debiased pointwise loss (for both MSE and CCL) and theoretically certify both iALS and EASE, two of the most popular linear models, are inherently debiased. The empirical experimental results demonstrate the effectiveness of the debiased losses and newly introduced mutual-information losses outperform the existing (biased) ones.

--------------------------------------------------------------------------------------------------------

Prompting LLMs with content plans to enhance the summarization of scientific articles

The paper “Prompting LLMs with content plans to enhance the summarization of scientific articles” proposes novel prompting methods that use keyword cues to improve summarization. Tailored prompts aid smaller models on long, complex documents. The approach introduces prompting as a tool to overcome model limitations.

Authors:  Aldan Creo, Manuel Lama, Juan C. Vidal

Link:  https://arxiv.org/abs/2312.08282v2

Date: 2023-12-15

Summary:

This paper presents novel prompting techniques to improve the performance of automatic summarization systems for scientific articles. Scientific article summarization is highly challenging due to the length and complexity of these documents. We conceive, implement, and evaluate prompting techniques that provide additional contextual information to guide summarization systems. Specifically, we feed summarizers with lists of key terms extracted from articles, such as author keywords or automatically generated keywords. Our techniques are tested with various summarization models and input texts. Results show performance gains, especially for smaller models summarizing sections separately. This evidences that prompting is a promising approach to overcoming the limitations of less powerful systems. Our findings introduce a new research direction of using prompts to aid smaller models.

--------------------------------------------------------------------------------------------------------

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

The paper “Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning” develops an interpretable prompting framework that embeds object, summary, and description context. The method improves few-shot performance across tasks, better utilizing knowledge in large LMs. Prompting is positioned as a way to stimulate embedded knowledge.

Authors:  Jinta Weng, Jiarui Zhang, Yue Hu, Daidong Fa, Xiaofeng Xuand, Heyan Huang

Link:  https://arxiv.org/abs/2312.08027v1

Date: 2023-12-13

Summary:

Large language models (LLMs) can be used as accessible and intelligent chatbots by constructing natural language queries and directly inputting the prompt into the large language model. However, different prompt' constructions often lead to uncertainty in the answers and thus make it hard to utilize the specific knowledge of LLMs (like ChatGPT). To alleviate this, we use an interpretable structure to explain the prompt learning principle in LLMs, which certificates that the effectiveness of language models is determined by position changes of the task's related tokens. Therefore, we propose MTPrompt, a multi-dimensional task prompt learning method consisting based on task-related object, summary, and task description information. By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets. In addition, we demonstrate the effectiveness and stability of our method in different experimental settings and ablation experiments. In interaction with large language models, embedding more task-related information into prompts will make it easier to stimulate knowledge embedded in large language models.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.