Week Ending 3.10.2024

RESEARCH WATCH: 3.10.2024

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Large language models continue growing, straining memory limits during inference. GEAR compresses the key-value cache required for efficient generation while maintaining near-lossless accuracy. By quantizing most entries to low precision, approximating quantization error with low-rank matrices, and correcting outliers with sparse matrices, GEAR achieves high compression ratios. This could enable deploying larger LLMs on resource-constrained devices.

Authors: Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Link: https://arxiv.org/abs/2403.05527v1

Date: 2024-03-08

Summary:

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference. However, the growing cache demand with increasing sequence length has transformed LLM inference to be a memory bound problem, significantly constraining the system throughput. Existing methods rely on dropping unimportant tokens or quantizing all entries uniformly. Such methods, however, often incur high approximation errors to represent the compressed matrices. The autoregressive decoding process further compounds the error of each step, resulting in critical deviation in model generation and deterioration of performance. To tackle this challenge, we propose GEAR, an efficient KV cache compression framework that achieves near-lossless high-ratio compression. GEAR first applies quantization to majority of entries of similar magnitudes to ultra-low precision. It then employs a low rank matrix to approximate the quantization error, and a sparse matrix to remedy individual errors from outlier entries. By adeptly integrating three techniques, GEAR is able to fully exploit their synergistic potentials. Our experiments demonstrate that compared to alternatives, GEAR achieves near-lossless 4-bit KV cache compression with up to 2.38x throughput improvement, while reducing peak-memory size up to 2.29x. Our code is publicly available at https://github.com/HaoKang-Timmy/GEAR.

--------------------------------------------------------------------------------------------------------

A Deep Learning Method for Classification of Biophilic Artworks

Exposure to nature and naturalistic art has mental health benefits. This work develops an AI system to automatically classify different biophilic characteristics like plants, water, and animals present in artwork. Such a system could aid art curators, scientists studying biophilia's impacts, and users searching for art with desired natural elements. It utilizes unsupervised techniques to extract low-dimensional image representations capturing relevant features.

Authors: Purna Kar, Jordan J. Bird, Yangang Xing, Alexander Sumich, Andrew Knight, Ahmad Lotfi, Benedict Carpenter van Barthold

Link: https://arxiv.org/abs/2403.05394v1

Date: 2024-03-08

Summary:

Biophilia is an innate love for living things and nature itself that has been associated with a positive impact on mental health and well-being. This study explores the application of deep learning methods for the classification of Biophilic artwork, in order to learn and explain the different Biophilic characteristics present in a visual representation of a painting. Using the concept of Biophilia that postulates the deep connection of human beings with nature, we use an artificially intelligent algorithm to recognise the different patterns underlying the Biophilic features in an artwork. Our proposed method uses a lower-dimensional representation of an image and a decoder model to extract salient features of the image of each Biophilic trait, such as plants, water bodies, seasons, animals, etc., based on learnt factors such as shape, texture, and illumination. The proposed classification model is capable of extracting Biophilic artwork that not only helps artists, collectors, and researchers studying to interpret and exploit the effects of mental well-being on exposure to nature-inspired visual aesthetics but also enables a methodical exploration of the study of Biophilia and Biophilic artwork for aesthetic preferences. Using the proposed algorithms, we have also created a gallery of Biophilic collections comprising famous artworks from different European and American art galleries, which will soon be published on the Vieunite@ online community.

--------------------------------------------------------------------------------------------------------

ADROIT6G DAI-driven Open and Programmable Architecture for 6G Networks

6G networks must support revolutionary applications like holographic telepresence with stringent performance, reliability, and security demands. ADROIT6G proposes transformative 6G architectural innovations: distributed AI/ML optimization, fully cloud-native software with built-in security, and automated zero-touch operations. Realizing ADROIT6G's vision could enable 6G's ambitious capabilities for emerging immersive applications across the technology landscape.

Authors: Christophoros Christophorou, Iacovos Ioannou, Vasos Vassiliou, Loizos Christofi, John S Vardakas, Erin E Seder, Carla Fabiana Chiasserini, Marius Iordache, Chaouki Ben Issaid, Ioannis Markopoulos, Giulio Franzese, Tanel Järvet, Christos Verikoukis

Link: https://arxiv.org/abs/2403.05277v1

Date: 2024-03-08

Summary:

In the upcoming 6G era, mobile networks must deal with more challenging applications (e.g., holographic telepresence and immersive communication) and meet far more stringent application requirements stemming along the edge-cloud continuum. These new applications will create an elevated level of expectations on performance, reliability, ubiquity, trustworthiness, security, openness, and sustainability, pushing the boundaries of innovation and driving transformational change across the architecture of future mobile networks. Towards this end, ADROIT6G proposes a set of disruptive innovations with a clear vision on setting a 6G network architecture that can be tailored to the requirements of innovative applications and match the ambitious KPIs set for 6G networks. More specifically, the key transformations that ADROIT6G considers essential to 6G network evolution are: i) AI/ML-powered optimisations across the network, exploring solutions in the "Distributed Artificial Intelligence (DAI)" domain for high performance and automation; ii) Transforming to fully cloud-native network software, which can be implemented across various edge-cloud platforms, with security built integrally into the network user plan; and iii) Software driven, zero-touch operations and ultimately automation of every aspect of the network and the services it delivers.

--------------------------------------------------------------------------------------------------------

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

Thoroughly evaluating large language models' capabilities remains challenging. ERBench constructs benchmarks by converting relational databases into natural language questions that can be automatically verified against the ground truth data. This allows probing LLMs' reasoning skills on queries of arbitrary complexity spanning multiple joined database tables. ERBench aims to provide more comprehensive analysis of LLM hallucinations than static benchmarks.

Authors: Jio Oh, Soyeon Kim, Junseok Seo, Jindong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang

Link: https://arxiv.org/abs/2403.05266v1

Date: 2024-03-08

Summary:

Large language models (LLMs) have achieved unprecedented performance in various applications, yet their evaluation remains a critical issue. Existing hallucination benchmarks are either static or lack adjustable complexity for thorough analysis. We contend that utilizing existing relational databases is a promising approach for constructing benchmarks due to their accurate knowledge description via functional dependencies. We propose ERBench to automatically convert any relational database into a benchmark based on the entity-relationship (ER) model. Our key idea is to construct questions using the database schema, records, and functional dependencies such that they can be automatically verified. In addition, we use foreign key constraints to join relations and construct multihop questions, which can be arbitrarily complex and used to debug the intermediate answers of LLMs. Finally, ERBench supports continuous evaluation, multimodal questions, and various prompt engineering techniques. In our experiments, we construct an LLM benchmark using databases of multiple domains and make an extensive comparison of contemporary LLMs. We observe that better LLMs like GPT-4 can handle a larger variety of question types, but are by no means perfect. Also, correct answers do not necessarily imply correct rationales, which is an important evaluation that ERBench does better than other benchmarks for various question types. Code is available at https: //github.com/DILAB-KAIST/ERBench.

--------------------------------------------------------------------------------------------------------

MarkupLens: An AI-Powered Tool to Support Designers in Video-Based Analysis at Scale

Video is increasingly used in design research to study user interactions, but manually annotating large video datasets is labor-intensive. MarkupLens integrates state-of-the-art computer vision to semi-automate annotation, improving annotation quality, designer productivity, and user experience. Such AI-assistance could enhance design practices relying on video data at scale across industries.

Authors: Tianhao He, Ying Zhang, Evangelos Niforatos, Gerd Kortuem

Link: https://arxiv.org/abs/2403.05201v1

Date: 2024-03-08

Summary:

Video-Based Design (VBD) is a design methodology that utilizes video as a primary tool for understanding user interactions, prototyping, and conducting research to enhance the design process. Artificial Intelligence (AI) can be instrumental in video-based design by analyzing and interpreting visual data from videos to enhance user interaction, automate design processes, and improve product functionality. In this study, we explore how AI can enhance professional video-based design with a State-of-the-Art (SOTA) deep learning model. We developed a prototype annotation platform (MarkupLens) and conducted a between-subjects eye-tracking study with 36 designers, annotating videos with three levels of AI assistance. Our findings indicate that MarkupLens improved design annotation quality and productivity. Additionally, it reduced the cognitive load that designers exhibited and enhanced their User Experience (UX). We believe that designer-AI collaboration can greatly enhance the process of eliciting insights in video-based design.

--------------------------------------------------------------------------------------------------------

Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge

How multilingualLMs represent factual knowledge across languages impacts their reliability. This work analyzes multilingual BERT to identify patterns of shared representation versus cross-lingual transfer of facts from Wikipedia training data. Understanding these patterns could guide techniques for more consistent multilingual factual knowledge in LLMs, critical for trustworthy multilingual applications.

Authors: Xin Zhao, Naoki Yoshinaga, Daisuke Oba

Link: https://arxiv.org/abs/2403.05189v1

Date: 2024-03-08

Summary:

Acquiring factual knowledge for language models (LMs) in low-resource languages poses a serious challenge, thus resorting to cross-lingual transfer in multilingual LMs (ML-LMs). In this study, we ask how ML-LMs acquire and represent factual knowledge. Using the multilingual factual knowledge probing dataset, mLAMA, we first conducted a neuron investigation of ML-LMs (specifically, multilingual BERT). We then traced the roots of facts back to the knowledge source (Wikipedia) to identify the ways in which ML-LMs acquire specific facts. We finally identified three patterns of acquiring and representing facts in ML-LMs: language-independent, cross-lingual shared and transferred, and devised methods for differentiating them. Our findings highlight the challenge of maintaining consistent factual knowledge across languages, underscoring the need for better fact representation learning in ML-LMs.

--------------------------------------------------------------------------------------------------------

On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

While large language models enable powerful capabilities, there are serious privacy risks in their training data exposure. This survey comprehensively overviews the privacy vulnerabilities, attacks, and defenses related to LLMs. Mitigating these privacy issues is crucial for responsible development and deployment of LLMs in sensitive domains like healthcare and finance.

Authors: Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, Xiuzheng Cheng

Link: https://arxiv.org/abs/2403.05156v1

Date: 2024-03-08

Summary:

Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information, which may threaten data privacy. This paper concentrates on elucidating the data privacy concerns associated with LLMs to foster a comprehensive understanding. Specifically, a thorough investigation is undertaken to delineate the spectrum of data privacy threats, encompassing both passive privacy leakage and active privacy attacks within LLMs. Subsequently, we conduct an assessment of the privacy protection mechanisms employed by LLMs at various stages, followed by a detailed examination of their efficacy and constraints. Finally, the discourse extends to delineate the challenges encountered and outline prospective directions for advancement in the realm of LLM privacy protection.

--------------------------------------------------------------------------------------------------------

ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models

Though impressive at general conversation, large language models struggle with structured information extraction beyond simple prompting. ChatUIE proposes an approach to improve LLMs' capabilities on this task by combining chatbot-style interactions with techniques like reinforcement learning and generation constraints. Enhanced domain-specific information extraction could unlock new LLM applications.

Authors: Jun Xu, Mengshu Sun, Zhiqiang Zhang, Jun Zhou

Link: https://arxiv.org/abs/2403.05132v1

Date: 2024-03-08

Summary:

Recent advancements in large language models have shown impressive performance in general chat. However, their domain-specific capabilities, particularly in information extraction, have certain limitations. Extracting structured information from natural language that deviates from known schemas or instructions has proven challenging for previous prompt-based methods. This motivated us to explore domain-specific modeling in chat-based language models as a solution for extracting structured information from natural language. In this paper, we present ChatUIE, an innovative unified information extraction framework built upon ChatGLM. Simultaneously, reinforcement learning is employed to improve and align various tasks that involve confusing and limited samples. Furthermore, we integrate generation constraints to address the issue of generating elements that are not present in the input. Our experimental results demonstrate that ChatUIE can significantly improve the performance of information extraction with a slight decrease in chatting ability.

--------------------------------------------------------------------------------------------------------

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Multimedia datasets inevitably contain some incorrectly matched pairs across modalities like text and images. This work introduces a framework to robustly train cross-modal retrieval models by automatically remapping mismatched pairs based on potential cross-modal similarities, enhancing performance over simply downweighting errors. Such robustness could improve multimedia retrieval applications.

Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

Link: https://arxiv.org/abs/2403.05105v1

Date: 2024-03-08

Summary:

Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this problem by estimating a soft correspondence to down-weight the contribution of PMPs. In this paper, we aim to address this challenge from a new perspective: the potential semantic similarity among unpaired samples makes it possible to excavate useful knowledge from mismatched pairs. To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to generate refined alignments by seeking a minimal-cost transport plan across different modalities. To formalize the rematching idea in OT, first, we propose a self-supervised cost function that automatically learns from explicit similarity-cost mapping relation. Second, we present to model a partial OT problem while restricting the transport among false positives to further boost refined alignments. Extensive experiments on three benchmarks demonstrate our L2RM significantly improves the robustness against PMPs for existing models. The code is available at https://github.com/hhc1997/L2RM.

--------------------------------------------------------------------------------------------------------

How Culture Shapes What People Want From AI

For AI systems to be broadly adopted, accounting for diverse cultural perspectives is critical. This research studies how cultural models of independence/interdependence influence desired capabilities for hypothetical AI assistants. Incorporating such insights could guide developing more culturally-responsive and inclusive AI aligning with global stakeholders' needs.

Authors: Xiao Ge, Chunchen Xu, Daigo Misaki, Hazel Rose Markus, Jeanne L Tsai

Link: https://arxiv.org/abs/2403.05104v1

Date: 2024-03-08

Summary:

There is an urgent need to incorporate the perspectives of culturally diverse groups into AI developments. We present a novel conceptual framework for research that aims to expand, reimagine, and reground mainstream visions of AI using independent and interdependent cultural models of the self and the environment. Two survey studies support this framework and provide preliminary evidence that people apply their cultural models when imagining their ideal AI. Compared with European American respondents, Chinese respondents viewed it as less important to control AI and more important to connect with AI, and were more likely to prefer AI with capacities to influence. Reflecting both cultural models, findings from African American respondents resembled both European American and Chinese respondents. We discuss study limitations and future directions and highlight the need to develop culturally responsive and relevant AI to serve a broader segment of the world population.

--------------------------------------------------------------------------------------------------------

Aligning Large Language Models for Controllable Recommendations

Large language models show promise for developing the next generation of recommender systems that can converse naturally, provide explanations, and follow user preferences. However, current approaches focus mainly on improving accuracy by integrating domain knowledge, neglecting the ability to adhere to instructions. This work proposes techniques to enhance LLMs' skills in following recommendation-specific instructions through supervised tasks and reinforcement learning-based alignment. Enabling controllable recommendations could allow more personalized and user-aligned recommendation experiences.

Authors: Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

Link: https://arxiv.org/abs/2403.05063v1

Date: 2024-03-08

Summary:

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance.

--------------------------------------------------------------------------------------------------------

Automatic and Universal Prompt Injection Attacks against Large Language Models

Though adept at following instructions, large language models are vulnerable to prompt injection attacks that manipulate their outputs maliciously. This paper presents a unified attack framework and an automated gradient-based method for crafting highly effective universal attack prompts, even against defenses. As LLMs become more widespread, understanding and mitigating such attacks is crucial to preventing misuse and maintaining integrity in language model applications.

Authors: Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao

Link: https://arxiv.org/abs/2403.04957v1

Date: 2024-03-07

Summary:

Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prompt injection attacks. These attacks manipulate LLM-integrated applications into producing responses aligned with the attacker's injected content, deviating from the user's actual requests. The substantial risks posed by these attacks underscore the need for a thorough understanding of the threats. Yet, research in this area faces challenges due to the lack of a unified goal for such attacks and their reliance on manually crafted prompts, complicating comprehensive assessments of prompt injection robustness. We introduce a unified framework for understanding the objectives of prompt injection attacks and present an automated gradient-based method for generating highly effective and universal prompt injection data, even in the face of defensive measures. With only five training samples (0.3% relative to the test data), our attack can achieve superior performance compared with baselines. Our findings emphasize the importance of gradient-based testing, which can avoid overestimation of robustness, especially for defense mechanisms.

--------------------------------------------------------------------------------------------------------

Fooling Neural Networks for Motion Forecasting via Adversarial Attacks

Human motion prediction is vital for applications like autonomous vehicles, but existing models can be fooled by small adversarial perturbations, as this work demonstrates across architectures. It finds models are also sensitive to simple 3D transformations not altering joint positions. As with image models, addressing such vulnerabilities is key for reliable and robust motion forecasting in safety-critical domains.

Authors: Edgar Medina, Leyong Loh

Link: https://arxiv.org/abs/2403.04954v1

Date: 2024-03-07

Summary:

Human motion prediction is still an open problem, which is extremely important for autonomous driving and safety applications. Although there are great advances in this area, the widely studied topic of adversarial attacks has not been applied to multi-regression models such as GCNs and MLP-based architectures in human motion prediction. This work intends to reduce this gap using extensive quantitative and qualitative experiments in state-of-the-art architectures similar to the initial stages of adversarial attacks in image classification. The results suggest that models are susceptible to attacks even on low levels of perturbation. We also show experiments with 3D transformations that affect the model performance, in particular, we show that most models are sensitive to simple rotations and translations which do not alter joint distances. We conclude that similar to earlier CNN models, motion forecasting tasks are susceptible to small perturbations and simple 3D transformations.

--------------------------------------------------------------------------------------------------------

A Survey on Human-AI Teaming with Large Pre-Trained Models

Integrating large pre-trained AI models with human intelligence, known as human-AI teaming, holds transformative potential for enhancing decision-making and problem-solving. This survey explores how large language models can augment human capabilities beyond traditional AI, examining synergies in model improvement, effective teaming processes, ethical considerations, and broad applications. Understanding such human-AI collaboration is vital for responsibly harnessing these cutting-edge models' full potential across sectors.

Authors: Vanshika Vats, Marzia Binta Nizam, Minghao Liu, Ziyuan Wang, Richard Ho, Mohnish Sai Prasad, Vincent Titterton, Sai Venkat Malreddy, Riya Aggarwal, Yanwen Xu, Lei Ding, Jay Mehta, Nathan Grinnell, Li Liu, Sijia Zhong, Devanathan Nallur Gandamani, Xinyi Tang, Rohan Ghosalkar, Celeste Shen, Rachel Shen, Nafisa Hussain, Kesav Ravichandran, James Davis

Link: https://arxiv.org/abs/2403.04931v1

Date: 2024-03-07

Summary:

In the rapidly evolving landscape of artificial intelligence (AI), the collaboration between human intelligence and AI systems, known as Human-AI (HAI) Teaming, has emerged as a cornerstone for advancing problem-solving and decision-making processes. The advent of Large Pre-trained Models (LPtM) has significantly transformed this landscape, offering unprecedented capabilities by leveraging vast amounts of data to understand and predict complex patterns. This paper surveys the pivotal integration of LPtMs with HAI, emphasizing how these models enhance collaborative intelligence beyond traditional approaches. It examines the synergistic potential of LPtMs in augmenting human capabilities, discussing this collaboration for AI model improvements, effective teaming, ethical considerations, and their broad applied implications in various sectors. Through this exploration, the study sheds light on the transformative impact of LPtM-enhanced HAI Teaming, providing insights for future research, policy development, and strategic implementations aimed at harnessing the full potential of this collaboration for research and societal benefit.

--------------------------------------------------------------------------------------------------------

On the Markov Property of Neural Algorithmic Reasoning: Analyses and Methods

Neural algorithmic reasoning aims to emulate algorithmic execution in neural networks, but commonly uses historical information that contradicts the inherent Markov property of these tasks. This work proposes ForgetNet, avoiding historical embeddings to align with the Markov nature, and G-ForgetNet with a gating mechanism for selective history integration during training. Extensive evaluations demonstrate their improved generalization over existing methods, advancing more theoretically-grounded neural algorithmic reasoning.

Authors: Montgomery Bohde, Meng Liu, Alexandra Saxton, Shuiwang Ji

Link: https://arxiv.org/abs/2403.04929v1

Date: 2024-03-07

Summary:

Neural algorithmic reasoning is an emerging research direction that endows neural networks with the ability to mimic algorithmic executions step-by-step. A common paradigm in existing designs involves the use of historical embeddings in predicting the results of future execution steps. Our observation in this work is that such historical dependence intrinsically contradicts the Markov nature of algorithmic reasoning tasks. Based on this motivation, we present our ForgetNet, which does not use historical embeddings and thus is consistent with the Markov nature of the tasks. To address challenges in training ForgetNet at early stages, we further introduce G-ForgetNet, which uses a gating mechanism to allow for the selective integration of historical embeddings. Such an enhanced capability provides valuable computational pathways during the model's early training phase. Our extensive experiments, based on the CLRS-30 algorithmic reasoning benchmark, demonstrate that both ForgetNet and G-ForgetNet achieve better generalization capability than existing methods. Furthermore, we investigate the behavior of the gating mechanism, highlighting its degree of alignment with our intuitions and its effectiveness for robust performance.

--------------------------------------------------------------------------------------------------------

A Safe Harbor for AI Evaluation and Red Teaming

Responsible development of powerful generative AI necessitates rigorous independent evaluation and security testing. However, companies' terms of service and aggressive anti-misuse enforcement can disincentivize legitimate safety research for fear of account suspension or legal repercussions. The authors advocate AI firms provide a "safe harbor" protecting public interest research from such threats to enable more open, inclusive scrutiny of AI risks.

Authors: Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Link: https://arxiv.org/abs/2403.04893v1

Date: 2024-03-07

Summary:

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.

--------------------------------------------------------------------------------------------------------

A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data

While multimodal learning has advanced for unstructured data like images and text, structured data like tables and time series require more attention. This work proposes MAGNUM, a modular multimodal framework flexibly integrating specialized modules to jointly learn representations across any combination of structured and unstructured modalities. Such holistic multimodal learning could benefit numerous industry applications involving diverse data types.

Authors: Marco D Alessandro, Enrique Calabrés, Mikel Elkano

Link: https://arxiv.org/abs/2403.04866v1

Date: 2024-03-07

Summary:

Multimodal learning is a rapidly growing research field that has revolutionized multitasking and generative modeling in AI. While much of the research has focused on dealing with unstructured data (e.g., language, images, audio, or video), structured data (e.g., tabular data, time series, or signals) has received less attention. However, many industry-relevant use cases involve or can be benefited from both types of data. In this work, we propose a modular, end-to-end multimodal learning method called MAGNUM, which can natively handle both structured and unstructured data. MAGNUM is flexible enough to employ any specialized unimodal module to extract, compress, and fuse information from all available modalities.

--------------------------------------------------------------------------------------------------------

Beyond Multiple Instance Learning: Full Resolution All-In-Memory End-To-End Pathology Slide Modeling

Training AI models on high-resolution gigapixel pathology slides is computationally challenging, so current methods divide slides into small tiles, introducing discontinuities. This novel approach instead trains tile encoders and slide aggregators jointly end-to-end from entire slides, bridging input and supervision. While expensive, quantitative results show promise for pretraining powerful pathology foundation models on such massive datasets, potentially transforming clinical AI.

Authors: Gabriele Campanella, Eugene Fluder, Jennifer Zeng, Chad Vanderbilt, Thomas J. Fuchs

Link: https://arxiv.org/abs/2403.04865v1

Date: 2024-03-07

Summary:

Artificial Intelligence (AI) has great potential to improve health outcomes by training systems on vast digitized clinical datasets. Computational Pathology, with its massive amounts of microscopy image data and impact on diagnostics and biomarkers, is at the forefront of this development. Gigapixel pathology slides pose a unique challenge due to their enormous size and are usually divided into tens of thousands of smaller tiles for analysis. This results in a discontinuity in the machine learning process by separating the training of tile-level encoders from slide-level aggregators and the need to adopt weakly supervised learning strategies. Training models from entire pathology slides end-to-end has been largely unexplored due to its computational challenges. To overcome this problem, we propose a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end at high-resolution, bridging the gap between input and slide-level supervision. While more computationally expensive, detailed quantitative validation shows promise for large-scale pre-training of pathology foundation models.

--------------------------------------------------------------------------------------------------------

Self-Supervision in Time for Satellite Images(S3-TSS): A novel method of SSL technique in Satellite images

With limited labeled satellite data across conditions, self-supervised techniques avoiding hand-crafted augmentation are valuable. S3-TSS leverages the natural temporal augmentation in satellite imagery's high frequency, outperforming baselines. This self-supervised representation learning could enhance remote sensing tasks when annotations are scarce.

Authors: Akansh Maurya, Hewan Shrestha, Mohammad Munem Shahriar

Link: https://arxiv.org/abs/2403.04859v1

Date: 2024-03-07

Summary:

With the limited availability of labeled data with various atmospheric conditions in remote sensing images, it seems useful to work with self-supervised algorithms. Few pretext-based algorithms, including from rotation, spatial context and jigsaw puzzles are not appropriate for satellite images. Often, satellite images have a higher temporal frequency. So, the temporal dimension of remote sensing data provides natural augmentation without requiring us to create artificial augmentation of images. Here, we propose S3-TSS, a novel method of self-supervised learning technique that leverages natural augmentation occurring in temporal dimension. We compare our results with current state-of-the-art methods and also perform various experiments. We observed that our method was able to perform better than baseline SeCo in four downstream datasets. Code for our work can be found here: https://github.com/hewanshrestha/Why-Self-Supervision-in-Time

--------------------------------------------------------------------------------------------------------

GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

Graph neural networks' ability to discriminate graphs depends critically on message aggregation and readout functions. This work proposes a variance-preserving aggregation (VPA) strategy based on signal propagation theory to maintain expressiveness while improving learning dynamics. Across domains, VPA enhances predictive performance of popular GNN architectures, paving the way for normalizer-free, self-normalizing models with theoretical grounding.

Authors: Lisa Schneckenreiter, Richard Freinschlag, Florian Sestak, Johannes Brandstetter, Günter Klambauer, Andreas Mayr

Link: https://arxiv.org/abs/2403.04747v1

Date: 2024-03-07

Summary:

Graph neural networks (GNNs), and especially message-passing neural networks, excel in various domains such as physics, drug discovery, and molecular modeling. The expressivity of GNNs with respect to their ability to discriminate non-isomorphic graphs critically depends on the functions employed for message aggregation and graph-level readout. By applying signal propagation theory, we propose a variance-preserving aggregation function (VPA) that maintains expressivity, but yields improved forward and backward dynamics. Experiments demonstrate that VPA leads to increased predictive performance for popular GNN architectures as well as improved learning dynamics. Our results could pave the way towards normalizer-free or self-normalizing GNNs.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithMarch 11, 2024Comment