Week Ending 11.26.2023

RESEARCH WATCH: 11.26.2023

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR Using Machine Learning Classification Algorithms

The COVID-19 prognosis paper investigates using machine learning on clinical data to predict patient recovery or death. Accurate prognosis tools could greatly assist healthcare decision-making, especially in a crisis like the pandemic.

Authors: Mohammad Dehghani, Zahra Yazdanparast

Link: https://arxiv.org/abs/2311.13925v1

Date: 2023-11-23

Summary:

The COVID-19 pandemic has disrupted the global economy and people's daily lives in unprecedented ways. To make appropriate decisions, it is necessary to diagnose COVID-19 rapidly and accurately. Clinical decision making is influenced by data collected from patients. With the aid of artificial intelligence, COVID-19 has been diagnosed quickly by analyzing symptoms, polymerase chain reaction (PCR), computed tomography scans, chest X-rays, routine laboratory blood tests and even cough sounds. Furthermore, these data can be used to predict a patient's morality, although there is a question about which data makes the most accurate predictions. Therefore, this study consists of two parts. Our first objective is to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death), based on the features present in the dataset. In the second part of the research, we investigated the impact of clinical and RT-PCR on prediction of recovery and decease to determine which one is more reliable. We defined four stages with different feature sets and use six machine learning methods to build prediction model. With an accuracy of 78.7%, random forest showed promising results for predicting death and recovery of patients. Based on this, it appears that recovery and decease of patients are predictable using machine learning. For second objective, results indicate that clinical alone (without using RT-PCR), trained with AdaBoost algorithm, is the most accurate with an accuracy of 82.1%. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19.

--------------------------------------------------------------------------------------------------------

Algorithmic Transparency and Manipulation

The algorithmic transparency paper analyzes ethical concerns with transparency policies enabling manipulation. As algorithms influence more decisions, understanding risks of manipulation will be critical.

Authors: Michael Klenk

Link: https://arxiv.org/abs/2311.13286v1

Date: 2023-11-22

Summary:

A series of recent papers raises worries about the manipulative potential of algorithmic transparency. But while the concern is apt and relevant, it is based on a fraught understanding of manipulation. Therefore, this paper draws attention to the indifference view of manipulation, which explains better than the vulnerability view why algorithmic transparency has manipulative potential. The paper also raises pertinent research questions for future studies of manipulation in the context of algorithmic transparency.

--------------------------------------------------------------------------------------------------------

DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

The object detection paper proposes augmentation methods to improve model generalization across different weather conditions. This could significantly advance autonomous driving systems to handle diverse real-world environments.

Authors: Lei Qi, Peng Dong, Tan Xiong, Hui Xue, Xin Geng

Link: https://arxiv.org/abs/2311.13198v1

Date: 2023-11-22

Summary:

Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny day images may not generalize well to rainy day images. In this paper, we aim to solve the single-domain generalizable object detection task in urban scenarios, meaning that a model trained on images from one weather condition should be able to perform well on images from any other weather conditions. To address this challenge, we propose a novel Double AUGmentation (DoubleAUG) method that includes image- and feature-level augmentation schemes. In the image-level augmentation, we consider the variation in color information across different weather conditions and propose a Color Perturbation (CP) method that randomly exchanges the RGB channels to generate various images. In the feature-level augmentation, we propose to utilize a Dual-Style Memory (DSM) to explore the diverse style information on the entire dataset, further enhancing the model's generalization capability. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art methods. Furthermore, ablation studies confirm the effectiveness of each module in our proposed method. Moreover, our method is plug-and-play and can be integrated into existing methods to further improve model performance.

--------------------------------------------------------------------------------------------------------

Building the Future of Responsible AI: A Pattern-Oriented Reference Architecture for Designing Large Language Model based Agents

The responsible AI agents paper proposes an architectural framework to guide development of autonomous systems powered by large language models. By establishing best practices early, it aims to enable trustworthy advancement as LLM-based planning/task execution capabilities approach human levels.

Authors: Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer, Jon Whittle

Link: https://arxiv.org/abs/2311.13148v1

Date: 2023-11-22

Summary:

Large language models (LLMs) have been widely recognized as transformative technology due to their capabilities to understand and generate natural language text, including plans with some limited reasoning capabilities. LLM-based agents derive their autonomy from the capabilities of LLMs, which enable them to autonomously break down the given goal into a set of manageable tasks and orchestrate the task execution to fulfill the goal. Despite the huge efforts put into building LLM-based autonomous agents, the architecture design of the agents has not yet been systematically explored. Also, while there are significant benefits of using autonomous agents for planning and execution, there are serious considerations regarding responsible AI related software quality attributes, such as security and accountability. Therefore, this paper presents a pattern-oriented reference architecture that serves as architecture design guidelines and enables responsible-AI-by-design when designing LLM-based autonomous agents. We evaluate the completeness and utility of the proposed reference architecture by mapping it to the architecture of two real-world agents.

--------------------------------------------------------------------------------------------------------

Steering Responsible AI: A Case for Algorithmic Pluralism

The algorithmic pluralism paper examines governing AI systems through concepts like diversity and inclusiveness from media studies. Contrasting the transparency paradigm, algorithmic pluralism could better uphold democratic values as AI mediates society. Implemented responsibly, it may sustain multiplicity vital for healthy democracy.

Authors: Stefaan G. Verhulst

Link: https://arxiv.org/abs/2311.12010v1

Date: 2023-11-20

Summary:

In this paper, I examine questions surrounding AI neutrality through the prism of existing literature and scholarship about mediation and media pluralism. Such traditions, I argue, provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. In particular, I suggest examining further the notion of algorithmic pluralism. Contrasting this notion to the dominant idea of algorithmic transparency, I seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. Implemented thoughtfully and responsibly, I argue, Algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy.

--------------------------------------------------------------------------------------------------------

Exploring Prompting Large Language Models as Explainable Metrics

The metrics prompting paper describes using prompt-based strategies to evaluate summarization systems with large language models. Experiments demonstrate promising potential for LLMs as explainable evaluation metrics in NLP. Achieving good correlation with human ratings, this approach could reduce need for manual evaluation.

Authors: Ghazaleh Mahmoudi

Link: https://arxiv.org/abs/2311.11552v1

Date: 2023-11-20

Summary:

This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data. Code and results are publicly available on GitHub.

--------------------------------------------------------------------------------------------------------

Minimizing Factual Inconsistency and Hallucination in Large Language Models

The factual inconsistency paper addresses the issue of incorrect or hallucinated responses from large language models. A multi-stage framework is proposed to improve faithfulness by generating and verifying rationales. Applied to drug-related inquiries, it increased accuracy of LLMs by 14-42%, showing capability to enhance reliability.

Authors: Muneeswaran I, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, Varun V, Vishal Vaddina, Saisubramaniam Gopalakrishnan

Link: https://arxiv.org/abs/2311.13878v1

Date: 2023-11-23

Summary:

Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models.

--------------------------------------------------------------------------------------------------------

CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems

The cooperative multi-robot SLAM paper presents an efficient collaborative mapping approach to enable robot teams to navigate environments. This has valuable real-world applications like search/rescue operations.

Authors: Young-Hee Lee, Chen Zhu, Thomas Wiedemann, Emanuel Staudinger, Siwei Zhang, Christoph Günther

Link: https://arxiv.org/abs/2311.12580v1

Date: 2023-11-21

Summary:

A swarm of robots has advantages over a single robot, since it can explore larger areas much faster and is more robust to single-point failures. Accurate relative positioning is necessary to successfully carry out a collaborative mission without collisions. When Visual Simultaneous Localization and Mapping (VSLAM) is used to estimate the poses of each robot, inter-agent loop closing is widely applied to reduce the relative positioning errors. This technique can mitigate errors using the feature points commonly observed by different robots. However, it requires significant computing and communication capabilities to detect inter-agent loops, and to process the data transmitted by multiple agents. In this paper, we propose Collaborative SLAM using Visual Odometry and Range measurements (CoVOR-SLAM) to overcome this challenge. In the framework of CoVOR-SLAM, robots only need to exchange pose estimates, covariances (uncertainty) of the estimates, and range measurements between robots. Since CoVOR-SLAM does not require to associate visual features and map points observed by different agents, the computational and communication loads are significantly reduced. The required range measurements can be obtained using pilot signals of the communication system, without requiring complex additional infrastructure. We tested CoVOR-SLAM using real images as well as real ultra-wideband-based ranges obtained with two rovers. In addition, CoVOR-SLAM is evaluated with a larger scale multi-agent setup exploiting public image datasets and ranges generated using a realistic simulation. The results show that CoVOR-SLAM can accurately estimate the robots' poses, requiring much less computational power and communication capabilities than the inter-agent loop closing technique.

--------------------------------------------------------------------------------------------------------

ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

The healthcare paper evaluates using LLMs to analyze physiological signals and assess anomalous health data. High performance suggests potential for personalized AI health assistants that provide users meaningful health insights.

Authors: Jiankai Tang, Kegang Wang, Hongming Hu, Xiyuxing Zhang, Peiyu Wang, Xin Liu, Yuntao Wang

Link: https://arxiv.org/abs/2311.12524v1

Date: 2023-11-21

Summary:

This study concentrates on evaluating the efficacy of Large Language Models (LLMs) in healthcare, with a specific focus on their application in personal anomalous health monitoring. Our research primarily investigates the capabilities of LLMs in interpreting and analyzing physiological data obtained from FDA-approved devices. We conducted an extensive analysis using anomalous physiological data gathered in a simulated low-air-pressure plateau environment. This allowed us to assess the precision and reliability of LLMs in understanding and evaluating users' health status with notable specificity. Our findings reveal that LLMs exhibit exceptional performance in determining medical indicators, including a Mean Absolute Error (MAE) of less than 1 beat per minute for heart rate and less than 1% for oxygen saturation (SpO2). Furthermore, the Mean Absolute Percentage Error (MAPE) for these evaluations remained below 1%, with the overall accuracy of health assessments surpassing 85%. In image analysis tasks, such as interpreting photoplethysmography (PPG) data, our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count and 7.28 MAE for heart rate estimation. This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants, offering personalized health insights and recommendations within the future health assistant framework.

--------------------------------------------------------------------------------------------------------

InteraSSort: Interactive Assortment Planning Using Large Language Models

The interactive assortment planning paper introduces a framework for store managers to optimize inventory decisions through natural language conversations with LLMs. Retail operations could improve from integrating such AI optimization tools.

Authors: Saketh Reddy Karra, Theja Tulabandhula

Link: https://arxiv.org/abs/2311.12241v1

Date: 2023-11-20

Summary:

Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong domain expertise remain largely overlooked. These challenges frequently necessitate collaborative efforts with multiple stakeholders which often lead to prolonged decision-making processes and significant delays. To mitigate these challenges and capitalize on the advancements of Large Language Models (LLMs), we propose an interactive assortment planning framework, InteraSSort that augments LLMs with optimization tools to assist store planners in making decisions through interactive conversations. Specifically, we develop a solution featuring a user-friendly interface that enables users to express their optimization objectives as input text prompts to InteraSSort and receive tailored optimized solutions as output. Our framework extends beyond basic functionality by enabling the inclusion of additional constraints through interactive conversation, facilitating precise and highly customized decision-making. Extensive experiments demonstrate the effectiveness of our framework and potential extensions to a broad range of operations management challenges.

--------------------------------------------------------------------------------------------------------

Large Language Models and Explainable Law: a Hybrid Methodology

The legal explainability paper advocates using LLMs to translate rule-based legal system outputs into understandable language for laypeople. Enhancing accessibility could contribute to more democratic, inclusive AI adoption in law. The study also empowers users to execute complex legal tasks themselves via interactive prompting.

Authors: Marco Billi, Alessandro Parenti, Giuseppe Pisano, Marco Sanchi

Link: https://arxiv.org/abs/2311.11811v1

Date: 2023-11-20

Summary:

The paper advocates for LLMs to enhance the accessibility, usage and explainability of rule-based legal systems, contributing to a democratic and stakeholder-oriented view of legal technology. A methodology is developed to explore the potential use of LLMs for translating the explanations produced by rule-based systems, from high-level programming languages to natural language, allowing all users a fast, clear, and accessible interaction with such technologies. The study continues by building upon these explanations to empower laypeople with the ability to execute complex juridical tasks on their own, using a Chain of Prompts for the autonomous legal comparison of different rule-based inferences, applied to the same factual case.

--------------------------------------------------------------------------------------------------------

Sample as You Infer: Predictive Coding With Langevin Dynamics

The predictive coding paper presents a novel deep generative model training algorithm improving on variational autoencoders. Injected noise facilitates tight evidence lower bounds while an encoder provides warm starts. Converging faster, it matches performance - advances could impact areas relying on generative modeling.

Authors: Umais Zahid, Qinghai Guo, Zafeirios Fountas

Link: https://arxiv.org/abs/2311.13664v1

Date: 2023-11-22

Summary:

We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning, inspired by Riemann Manifold Langevin and adaptive optimizers from the SGD literature. We compare against VAEs by training like-for-like generative models using our technique against those trained with standard reparameterisation-trick-based ELBOs. We observe our method out-performs or matches performance across a number of metrics, including sample quality, while converging in a fraction of the number of SGD training iterations.

--------------------------------------------------------------------------------------------------------

Applying Large Language Models to Power Systems: Potential Security Threats

The power systems security paper analyzes potential threats of applying LLMs, which could enhance decision-making but incur risks. Understanding these before widespread adoption is critical for urgent countermeasure research/development.

Authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Jing Qiu, Junhua Zhao, Zhao Xu, Fushuan Wen, Zhao Yang Dong

Link: https://arxiv.org/abs/2311.13361v1

Date: 2023-11-22

Summary:

Applying large language models (LLMs) to power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this letter analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and development of countermeasures.

--------------------------------------------------------------------------------------------------------

Trustworthy AI: Deciding What to Decide

The trustworthy AI paper proposes a framework to determine which information AI systems should trust for decision-making. Structuring key model components with trust properties could enable optimizing predictive models to satisfy strategic needs.

Authors: Caesar Wu, Yuan-Fang Li, Jian Li, Jingjing Xu, Bouvry Pascal

Link: https://arxiv.org/abs/2311.12604v1

Date: 2023-11-21

Summary:

When engaging in strategic decision-making, we are frequently confronted with overwhelming information and data. The situation can be further complicated when certain pieces of evidence contradict each other or become paradoxical. The primary challenge is how to determine which information can be trusted when we adopt Artificial Intelligence (AI) systems for decision-making. This issue is known as deciding what to decide or Trustworthy AI. However, the AI system itself is often considered an opaque black box. We propose a new approach to address this issue by introducing a novel framework of Trustworthy AI (TAI) encompassing three crucial components of AI: representation space, loss function, and optimizer. Each component is loosely coupled with four TAI properties. Altogether, the framework consists of twelve TAI properties. We aim to use this framework to conduct the TAI experiments by quantitive and qualitative research methods to satisfy TAI properties for the decision-making context. The framework allows us to formulate an optimal prediction model trained by the given dataset for applying the strategic investment decision of credit default swaps (CDS) in the technology sector. Finally, we provide our view of the future direction of TAI research

--------------------------------------------------------------------------------------------------------

SPOT! Revisiting Video-Language Models for Event Understanding

The event understanding paper proposes benchmarking video-language models on distinguishing factual discrepancies about events. Finding models fail on manipulated captions, injecting these as hard negatives could significantly advance video understanding and fine-grained analysis abilities.

Authors: Gengyuan Zhang, Jinhe Bi, Jindong Gu, Volker Tresp

Link: https://arxiv.org/abs/2311.12919v1

Date: 2023-11-21

Summary:

Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models's capacities of distinguishing event-level discrepancies as an indicator of models' event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

--------------------------------------------------------------------------------------------------------

Unraveling the Control Engineer's Craft with Neural Networks

The control engineering paper has a model meta-learn to tune controllers from simulated data, replacing manual effort from experts. This methodology could greatly improve efficiency of optimizing industrial processes.

Authors: Braghadeesh Lakshminarayanan, Federico Dettù, Cristian R. Rojas, Simone Formentin

Link: https://arxiv.org/abs/2311.11644v1

Date: 2023-11-20

Summary:

Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. State-of-the art neural-network architectures are then used to learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin. In this way, as far as we are aware, we tackle for the first time the problem of re-calibrating the controller by meta-learning the tuning rule directly from data, thus practically replacing the control engineer with a machine learning model. The benefits of this methodology are illustrated via numerical simulations for several choices of neural-network architectures.

--------------------------------------------------------------------------------------------------------

minimax: Efficient Baselines for Autocurricula in JAX

The reinforcement learning library paper introduces optimizations enabling over 100x faster training of decision-making agents. By compiling curriculum learning methods for hardware acceleration, innovations in this critical area can progress faster.

Authors: Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel

Link: https://arxiv.org/abs/2311.12716v2

Date: 2023-11-23

Summary:

Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.

--------------------------------------------------------------------------------------------------------

Learning Saliency From Fixations

The saliency prediction paper presents a novel approach leveraging transformers, treating it as a set prediction task solved by learned fixation queries. Achieving strong metrics, directly optimizing for fixations could advance computer vision and explainable AI.

Authors: Yasser Abdelaziz Dahou Djilali, Kevin McGuiness, Noel O'Connor

Link: https://arxiv.org/abs/2311.14073v1

Date: 2023-11-23

Summary:

We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps. Models typically rely on continuous saliency maps, to overcome the difficulty of optimizing for the discrete fixation map. We attempt to replicate the experimental setup that generates saliency datasets. Our approach treats saliency prediction as a direct set prediction problem, via a global loss that enforces unique fixations prediction through bipartite matching and a transformer encoder-decoder architecture. By utilizing a fixed set of learned fixation queries, the cross-attention reasons over the image features to directly output the fixation points, distinguishing it from other modern saliency predictors. Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.

--------------------------------------------------------------------------------------------------------

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

The LLMs survey comprehensively reviews model architecture upgrades for handling long-context inputs/outputs. As complex real-world queries strain current models, optimizing for length could unlock new capabilities and applications. The curated literature repository also enables further research.

Authors: Yunpeng Huang, Jingwei Xu, Zixu Jiang, Junyu Lai, Zenan Li, Yuan Yao, Taolue Chen, Lijuan Yang, Zhou Xin, Xiaoxing Ma

Link: https://arxiv.org/abs/2311.12351v1

Date: 2023-11-21

Summary:

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs) have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been applied in diverse areas as knowledge bases, human interfaces, and dynamic agents. However, a prevailing limitation exists: many current LLMs, constrained by resources, are primarily pre-trained on shorter texts, rendering them less effective for longer-context prompts, commonly encountered in real-world settings. In this paper, we present a comprehensive survey focusing on the advancement of model architecture in Transformer-based LLMs to optimize long-context capabilities across all stages from pre-training to inference. We firstly delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. Then, we mainly offer a holistic taxonomy to navigate the landscape of Transformer upgrades on architecture to solve these problems. Afterward, we provide the investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as some amazing optimization toolkits like libraries, systems, and compilers to augment LLMs' efficiency and efficacy across different stages. Finally, we further discuss the predominant challenges and potential avenues for future research in this domain. Additionally, we have established a repository where we curate relevant literature with real-time updates at https://github.com/Strivin0311/long-llms-learning.

--------------------------------------------------------------------------------------------------------

Anyone Can Code: Algorithmic Thinking

The programming book provides foundations in algorithmic thinking beyond coding mechanics, as automated code generation advances. Transitioning to architect role through data-centered design is key for human programmers.

Authors: Ali Arya

Link: https://arxiv.org/abs/2311.14186v1

Date: 2023-11-23

Summary:

As the second book in the Anyone Can Code series, Algorithmic Thinking focuses on the logic behind computer programming and software design. With a data-centred approach, it starts with simple algorithms that work on simple data items and advances to more complex ones covering data structures and classes. Examples are given in C/C++ and Python and use both plain text and graphics applications to illustrate the concepts in different languages and forms. With the advances in artificial intelligence and automated code generators, it is essential to learn about the logic of what a code needs to do, not just how to write the code. Anyone Can Code: Algorithmic Thinking is suitable for anyone who aims to improve their programming skills and go beyond the simple craft of programming, stepping into the world of algorithm design.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithNovember 27, 2023Comment