Eye On AI

View Original

Week Ending 12.3.2023

RESEARCH WATCH: 12.3.2023

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

The Efficiency Spectrum of Large Language Models paper provides a comprehensive review of methods to improve efficiency of large language models, covering areas like scaling laws, data utilization, model compression, and inference techniques. As LLMs grow rapidly in size, efficiency innovations enable further progress in this pivotal AI field.

Authors:  Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang

Link:  https://arxiv.org/abs/2312.00678v1

Date: 2023-12-01

Summary:

The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}.

--------------------------------------------------------------------------------------------------------

BCN: Batch Channel Normalization for Image Classification

The BCN paper introduces Batch Channel Normalization, an adaptive technique to normalize CNN inputs along batch and channel dimensions. BCN can boost model accuracy and trainability across CNN architectures, demonstrating potential to advance computer vision applications.

Authors:  Afifa Khaled, Chao Li, Jia Ning, Kun He

Link:  https://arxiv.org/abs/2312.00596v1

Date: 2023-12-01

Summary:

Normalization techniques have been widely used in the field of deep learning due to their capability of enabling higher learning rates and are less careful in initialization. However, the effectiveness of popular normalization technologies is typically limited to specific areas. Unlike the standard Batch Normalization (BN) and Layer Normalization (LN), where BN computes the mean and variance along the (N,H,W) dimensions and LN computes the mean and variance along the (C,H,W) dimensions (N, C, H and W are the batch, channel, spatial height and width dimension, respectively), this paper presents a novel normalization technique called Batch Channel Normalization (BCN). To exploit both the channel and batch dependence and adaptively and combine the advantages of BN and LN based on specific datasets or tasks, BCN separately normalizes inputs along the (N, H, W) and (C, H, W) axes, then combines the normalized outputs based on adaptive parameters. As a basic block, BCN can be easily integrated into existing models for various applications in the field of computer vision. Empirical results show that the proposed technique can be seamlessly applied to various versions of CNN or Vision Transformer architecture. The code is publicly available at https://github.com/AfifaKhaled/BatchChannel-Normalization

--------------------------------------------------------------------------------------------------------

Generative artificial intelligence enhances individual creativity but reduces the collective diversity of novel content

The study on generative AI's impact on creativity reveals increased individual creativity but reduced collective novelty when writers use AI idea generation. Findings have implications for leveraging generative models' creative potential while maintaining diversity.

Authors:  Anil R. Doshi, Oliver P. Hauser

Link:  https://arxiv.org/abs/2312.00506v1

Date: 2023-12-01

Summary:

Creativity is core to being human. Generative artificial intelligence (GenAI) holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on GenAI ideas. We study the causal impact of GenAI ideas on the production of an unstructured creative output in an online experimental study where some writers could obtain ideas for a story from a GenAI platform. We find that access to GenAI ideas causes stories to be evaluated as more creative, better written and more enjoyable, especially among less creative writers. However, objective measures of story similarity within each condition reveal that GenAI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity, but at the same time there is a risk of losing collective novelty: this dynamic resembles a social dilemma where individual writers are better off using GenAI to improve their own writing, but collectively a narrower scope of novel content may be produced with GenAI. Our results have implications for researchers, policy-makers and practitioners interested in bolstering creativity, but point to potential downstream consequences from over-reliance.

--------------------------------------------------------------------------------------------------------

Student Activity Recognition in Classroom Environments using Transfer Learning

The student activity recognition paper proposes a classroom monitoring system using computer vision and transfer learning models. Enabling automated student activity analysis could lead to smarter educational environments and teaching tools.

Authors:  Anagha Deshpande, Vedant Deshpande

Link:  https://arxiv.org/abs/2312.00348v1

Date: 2023-12-01

Summary:

The recent advances in artificial intelligence and deep learning facilitate automation in various applications including home automation, smart surveillance systems, and healthcare among others. Human Activity Recognition is one of its emerging applications, which can be implemented in a classroom environment to enhance safety, efficiency, and overall educational quality. This paper proposes a system for detecting and recognizing the activities of students in a classroom environment. The dataset has been structured and recorded by the authors since a standard dataset for this task was not available at the time of this study. Transfer learning, a widely adopted method within the field of deep learning, has proven to be helpful in complex tasks like image and video processing. Pretrained models including VGG-16, ResNet-50, InceptionV3, and Xception are used for feature extraction and classification tasks. Xception achieved an accuracy of 93%, on the novel classroom dataset, outperforming the other three models in consideration. The system proposed in this study aims to introduce a safer and more productive learning environment for students and educators.

--------------------------------------------------------------------------------------------------------

Towards Accurate Differential Diagnosis with Large Language Models

The differential diagnosis paper shows a specialized large language model matching or exceeding clinician diagnostic accuracy. LLM diagnosis automation has promise to expand healthcare access and empower physician decisions for challenging cases.

Authors:  Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Jake Sunshine, Alan Karthikesalingam, Vivek Natarajan

Link:  https://arxiv.org/abs/2312.00164v1

Date: 2023-11-30

Summary:

An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.

--------------------------------------------------------------------------------------------------------

AlignBench: Benchmarking Chinese Alignment of Large Language Models

The Chinese alignment benchmark paper presents a comprehensive suite for evaluating alignment in Chinese language models, to drive progress in helpful, human-like AI assistants. The public datasets and analysis tools enable advancement in this key area.

Authors:  Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

Link:  https://arxiv.org/abs/2311.18743v1

Date: 2023-11-30

Summary:

Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, effective evaluation of alignment for emerging Chinese LLMs is still significantly lacking, calling for real-scenario grounded, open-ended, challenging and automatic evaluations tailored for alignment. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs' alignment in Chinese. Equipped with a human-in-the-loop data curation pipeline, our benchmark employs a rule-calibrated multi-dimensional LLM-as-Judge with Chain-of-Thought to generate explanations and final ratings as evaluations, ensuring high reliability and interpretability. Furthermore, we developed a dedicated companion evaluator LLM -- CritiqueLLM, which recovers 95\% of GPT-4's evaluation ability and will be provided via public APIs to researchers for evaluation of alignment in Chinese LLMs. All evaluation codes, data, and LLM generations are available at \url{https://github.com/THUDM/AlignBench}.

--------------------------------------------------------------------------------------------------------

Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models

The multimodal fusion paper introduces a technique consolidating neural signals, images, and language for enhanced pattern recognition. Leveraging large vision-language models in this novel way demonstrates potential for advancing fusion-based perception.

Authors:  Dong Li, Jiandong Jin, Yuhao Zhang, Yanlin Zhong, Yaoyang Wu, Lan Chen, Xiao Wang, Bin Luo

Link:  https://arxiv.org/abs/2311.18592v1

Date: 2023-11-30

Summary:

Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from key issues like sematic gaps and small-scale backbone networks. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision-language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language descriptions through prompt engineering, and then obtain the semantic features using the pre-trained large-scale language model (CLIP text encoder). Subsequently, we integrate the RGB/Event features and semantic features using multimodal Transformer networks. The resulting frame and event tokens are further amplified using self-attention layers. Concurrently, we propose to enhance the interactions between text tokens and RGB/Event tokens via cross-attention. Finally, we consolidate all three modalities using self-attention and feed-forward layers for recognition. Comprehensive experiments on the HARDVS and PokerEvent datasets fully substantiate the efficacy of our proposed SAFE model. The source code will be made available at https://github.com/Event-AHU/SAFE_LargeVLM.

--------------------------------------------------------------------------------------------------------

Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

The chatbot companionship paper proposes common ground-based dialogue to improve human-chatbot interaction. The wearable system builds user profiles over time for more natural conversations. This line of research drives progress in personalized AI assistants.

Authors:  Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

Link:  https://arxiv.org/abs/2311.18251v1

Date: 2023-11-30

Summary:

Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to drive conversations. Chatbots also require this capability to enable human-like companionship. They should act based on personalized, real-time, and time-evolving knowledge of their owner. We define such essential knowledge as the \textit{common ground} between chatbots and their owners, and we propose to build a common-ground-aware dialogue system from an LLM-based module, named \textit{OS-1}, to enable chatbot companionship. Hosted by eyewear, OS-1 can sense the visual and audio signals the user receives and extract real-time contextual semantics. Those semantics are categorized and recorded to formulate historical contexts from which the user's profile is distilled and evolves over time, i.e., OS-1 gradually learns about its user. OS-1 combines knowledge from real-time semantics, historical contexts, and user-specific profiles to produce a common-ground-aware prompt input into the LLM module. The LLM's output is converted to audio, spoken to the wearer when appropriate.We conduct laboratory and in-field studies to assess OS-1's ability to build common ground between the chatbot and its user. The technical feasibility and capabilities of the system are also evaluated. OS-1, with its common-ground awareness, can significantly improve user satisfaction and potentially lead to downstream tasks such as personal emotional support and assistance.

--------------------------------------------------------------------------------------------------------

Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation

The behavior explanation paper generates natural language descriptions for opaque agent decisions using large language models. Explainability facilitates trust and effective human-AI teaming in real-world applications like search and rescue.

Authors:  Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, Joseph Campbell

Link:  https://arxiv.org/abs/2311.18062v1

Date: 2023-11-29

Summary:

Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts; however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, thus making our method independent from the underlying model's representation. For such models, we first learn a behavior representation and subsequently use it to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. We evaluate our method in a multi-agent search-and-rescue environment and demonstrate the effectiveness of our explanations for agents executing various behaviors. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those produced by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries.

--------------------------------------------------------------------------------------------------------

SODA: Bottleneck Diffusion Models for Representation Learning

The SODA paper demonstrates diffusion models can learn useful visual representations in a self-supervised fashion, enabling advances in editing, reconstruction and generation tasks. The model also reveals disentangled latent spaces to intuitively control image synthesis.

Authors:  Drew A. Hudson, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen, Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander Lerchner

Link:  https://arxiv.org/abs/2311.17901v1

Date: 2023-11-29

Summary:

We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised objective, we can turn diffusion models into strong representation learners, capable of capturing visual semantics in an unsupervised manner. To the best of our knowledge, SODA is the first diffusion model to succeed at ImageNet linear-probe classification, and, at the same time, it accomplishes reconstruction, editing and synthesis tasks across a wide range of datasets. Further investigation reveals the disentangled nature of its emergent latent space, that serves as an effective interface to control and manipulate the model's produced images. All in all, we aim to shed light on the exciting and promising potential of diffusion models, not only for image generation, but also for learning rich and robust representations.

--------------------------------------------------------------------------------------------------------

Maximum Entropy Model Correction in Reinforcement Learning

The reinforcement learning paper introduces a model correction technique to reduce the impact of inaccurate models, enabling faster convergence to optimal policies. This has potential for accelerating training in real-world RL systems like robotics and recommendation engines.

Authors:  Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand

Link:  https://arxiv.org/abs/2311.17855v1

Date: 2023-11-29

Summary:

We propose and theoretically analyze an approach for planning with an approximate model in reinforcement learning that can reduce the adverse impact of model error. If the model is accurate enough, it accelerates the convergence to the true value function too. One of its key components is the MaxEnt Model Correction (MoCo) procedure that corrects the model's next-state distributions based on a Maximum Entropy density estimation formulation. Based on MoCo, we introduce the Model Correcting Value Iteration (MoCoVI) algorithm, and its sampled-based variant MoCoDyna. We show that MoCoVI and MoCoDyna's convergence can be much faster than the conventional model-free algorithms. Unlike traditional model-based algorithms, MoCoVI and MoCoDyna effectively utilize an approximate model and still converge to the correct value function.

--------------------------------------------------------------------------------------------------------

TaskWeaver: A Code-First Agent Framework

The TaskWeaver paper proposes a flexible, code-first framework for building LLM agents that handle complex analytics tasks. Enabling seamless integration of domain knowledge could drive progress in customizable intelligent assistants.

Authors:  Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

Link:  https://arxiv.org/abs/2311.17541v2

Date: 2023-12-01

Summary:

Large Language Models (LLMs) have shown impressive abilities in natural language understanding and generation, leading to their use in applications such as chatbots and virtual assistants. However, existing LLM frameworks face limitations in handling domain-specific data analytics tasks with rich data structures. Moreover, they struggle with flexibility to meet diverse user requirements. To address these issues, TaskWeaver is proposed as a code-first framework for building LLM-powered autonomous agents. It converts user requests into executable code and treats user-defined plugins as callable functions. TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic. It also incorporates domain-specific knowledge through examples and ensures the secure execution of generated code. TaskWeaver offers a powerful and flexible framework for creating intelligent conversational agents that can handle complex tasks and adapt to domain-specific scenarios. The code is open-sourced at https://github.com/microsoft/TaskWeaver/.

--------------------------------------------------------------------------------------------------------

CLOMO: Counterfactual Logical Modification with Large Language Models

The logical modification paper assesses large language models' counterfactual reasoning abilities via a novel human-annotated benchmark. Methodology and findings provide foundations for improving LLMs' logical thinking to power applications like argument search engines.

Authors:  Yinya Huang, Ruixin Hong, Hongming Zhang, Wei Shao, Zhicheng Yang, Dong Yu, Changshui Zhang, Xiaodan Liang, Linqi Song

Link:  https://arxiv.org/abs/2311.17438v2

Date: 2023-11-30

Summary:

In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). Our primary objective is to cultivate the counterfactual thought processes within LLMs and rigorously assess these processes for their validity. Specifically, we introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. To effectively evaluate a generation model's counterfactual capabilities, we propose an innovative evaluation metric, the LogicAware Counterfactual Score to directly evaluate the natural language output of LLMs instead of modeling the task as a multiple-choice problem. Analysis shows that the proposed automatic metric aligns well with human preference. Our experimental results show that while LLMs demonstrate a notable capacity for logical counterfactual thinking, there remains a discernible gap between their current abilities and human performance.

--------------------------------------------------------------------------------------------------------

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

The video understanding benchmark paper presents a diagnostic dataset to specifically evaluate temporal concept comprehension in video-language models. Tailored evaluation better reveals model deficiencies, informing research directions for enhanced video AI.

Authors:  Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou

Link:  https://arxiv.org/abs/2311.17404v1

Date: 2023-11-29

Summary:

The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of static visual shortcuts. To remedy this issue, we present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStanding. Specifically, we first introduce a fine-grained taxonomy of temporal concepts in natural language in order to diagnose the capability of VidLMs to comprehend different temporal aspects. Furthermore, to disentangle the correlation between static and temporal information, we generate counterfactual video descriptions that differ from the original one only in the specified temporal aspect. We employ a semi-automatic data collection framework using large language models and human-in-the-loop annotation to obtain high-quality counterfactual descriptions efficiently. Evaluation of representative video-language understanding models confirms their deficiency in temporal understanding, revealing the need for greater emphasis on the temporal elements in video-language research.

--------------------------------------------------------------------------------------------------------

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

The data condensation paper introduces an algorithm for distilling datasets that retain comprehensive information across model architectures. Progress in generalized distillation techniques helps address scalability bottlenecks when training large CV models.

Authors:  Shitong Shao, Zeyuan Yin, Muxin Zhou, Xindong Zhang, Zhiqiang Shen

Link:  https://arxiv.org/abs/2311.17950v1

Date: 2023-11-29

Summary:

The lightweight "local-match-global" matching introduced by SRe2L successfully creates a distilled dataset with comprehensive information on the full 224x224 ImageNet-1k. However, this one-sided approach is limited to a particular backbone, layer, and statistics, which limits the improvement of the generalization of a distilled dataset. We suggest that sufficient and various "local-match-global" matching are more precise and effective than a single one and has the ability to create a distilled dataset with richer information and better generalization. We call this perspective "generalized matching" and propose Generalized Various Backbone and Statistical Matching (G-VBSM) in this work, which aims to create a synthetic dataset with densities, ensuring consistency with the complete dataset across various backbones, layers, and statistics. As experimentally demonstrated, G-VBSM is the first algorithm to obtain strong performance across both small-scale and large-scale datasets. Specifically, G-VBSM achieves a performance of 38.7% on CIFAR-100 with 128-width ConvNet, 47.6% on Tiny-ImageNet with ResNet18, and 31.4% on the full 224x224 ImageNet-1k with ResNet18, under images per class (IPC) 10, 50, and 10, respectively. These results surpass all SOTA methods by margins of 3.9%, 6.5%, and 10.1%, respectively.

--------------------------------------------------------------------------------------------------------

Anti-Sexism Alert System: Identification of Sexist Comments on Social Media Using AI Techniques

The sexism detection paper details an NLP system to identify sexist social media posts and analyze platform content. Automated identification of harmful speech facilitates safer online spaces and ethical AI.

Authors:  Rebeca P. Díaz Redondo, Ana Fernández Vilas, Mateo Ramos Merino, Sonia Valladares, Soledad Torres Guijarro, Manar Mohamed Hafez

Link:  https://arxiv.org/abs/2312.00053v1

Date: 2023-11-28

Summary:

Social relationships in the digital sphere are becoming more usual and frequent, and they constitute a very important aspect for all of us. {Violent interactions in this sphere are very frequent, and have serious effects on the victims}. Within this global scenario, there is one kind of digital violence that is becoming really worrying: sexism against women. Sexist comments that are publicly posted in social media (newspaper comments, social networks, etc.), usually obtain a lot of attention and become viral, with consequent damage to the persons involved. In this paper, we introduce an anti-sexism alert system, based on natural language processing (NLP) and artificial intelligence (AI), that analyzes any public post, and decides if it could be considered a sexist comment or not. Additionally, this system also works on analyzing all the public comments linked to any multimedia content (piece of news, video, tweet, etc.) and decides, using a color-based system similar to traffic lights, if there is sexism in the global set of posts. We have created a labeled data set in Spanish, since the majority of studies focus on English, to train our system, which offers a very good performance after the validation experiments.

--------------------------------------------------------------------------------------------------------

RIS-Enhanced MIMO Channels in Urban Environments: Experimental Insights

The wireless networking paper examines real-world impact of reconfigurable surfaces for boosting propagation channels, revealing tradeoffs. Findings offer insights for advancing smart radio environment technologies.

Authors:  James Rains, Anvar Tukmanov, Qammer Abbasi, Muhammad Imran

Link:  https://arxiv.org/abs/2311.16985v2

Date: 2023-11-29

Summary:

Can the smart radio environment paradigm measurably enhance the performance of contemporary urban macrocells? In this study, we explore the impact of reconfigurable intelligent surfaces (RISs) on a real-world sub-6 GHz MIMO channel. A rooftop-mounted macrocell antenna has been adapted to enable frequency domain channel measurements to be ascertained. A nature-inspired beam search algorithm has been employed to maximize channel gain at user positions, revealing a potential 50% increase in channel capacity in certain circumstances. Analysis reveals, however, that the spatial characteristics of the channel can be adversely affected through the introduction of a RIS in these settings. The RIS prototype schematics, Gerber files, and source code have been made available to aid in future experimental efforts of the wireless research community.

--------------------------------------------------------------------------------------------------------

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

The conversational AI paper presents customized Chinese dialogue models with adjustable profiles and behaviors. Ability to configure character-based assistants aids anthropomorphic language generation research.

Authors:  Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang

Link:  https://arxiv.org/abs/2311.16832v1

Date: 2023-11-28

Summary:

In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can customize various AI characters or social agents by configuring their attributes (identities, interests, viewpoints, experiences, achievements, social relationships, etc.) and behaviors (linguistic features, emotional expressions, interaction patterns, etc.). Our model outperforms most mainstream close-source large langauge models, including the GPT series, especially in terms of consistency, human-likeness, and engagement according to manual evaluations. We will release our 6B version of CharacterGLM and a subset of training data to facilitate further research development in the direction of character-based dialogue generation.

--------------------------------------------------------------------------------------------------------

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

The deep reinforcement learning paper proposes a framework to balance agent-specific and agent-agnostic action advising, using a proxy model for robust state feature extraction. Enabling effective teacher guidance while mitigating imperfections demonstrates potential to enhance sample efficiency and accelerate training for real-world DRL applications.

Authors:  Yaoquan Wei, Shunyu Liu, Jie Song, Tongya Zheng, Kaixuan Chen, Yong Wang, Mingli Song

Link:  https://arxiv.org/abs/2311.16807v1

Date: 2023-11-28

Summary:

Action advising endeavors to leverage supplementary guidance from expert teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement Learning (DRL). Previous agent-specific action advising methods are hindered by imperfections in the agent itself, while agent-agnostic approaches exhibit limited adaptability to the learning agent. In this study, we propose a novel framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7) to strike a balance between the two. The underlying concept of A7 revolves around utilizing the similarity of state features as an indicator for soliciting advice. However, unlike prior methodologies, the measurement of state feature similarity is performed by neither the error-prone learning agent nor the agent-agnostic advisor. Instead, we employ a proxy model to extract state features that are both discriminative (adaptive to the agent) and generally applicable (robust to agent noise). Furthermore, we utilize behavior cloning to train a model for reusing advice and introduce an intrinsic reward for the advised samples to incentivize the utilization of expert guidance. Experiments are conducted on the GridWorld, LunarLander, and six prominent scenarios from Atari games. The results demonstrate that A7 significantly accelerates the learning process and surpasses existing methods (both agent-specific and agent-agnostic) by a substantial margin. Our code will be made publicly available.

--------------------------------------------------------------------------------------------------------

Large Language Models Meet Computer Vision: A Brief Survey

The survey paper reviews the intersection of large language models and computer vision, analyzing model evolution, benchmarking performance metrics, and showcasing applications across vision tasks. Underscoring this pivotal research area reveals open problems at the AI frontier, illuminating directions to advance integrated natural language and visual understanding systems.

Authors:  Raby Hamadi

Link:  https://arxiv.org/abs/2311.16673v1

Date: 2023-11-28

Summary:

Recently, the intersection of Large Language Models (LLMs) and Computer Vision (CV) has emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI). As transformers have become the backbone of many state-of-the-art models in both Natural Language Processing (NLP) and CV, understanding their evolution and potential enhancements is crucial. This survey paper delves into the latest progressions in the domain of transformers and their subsequent successors, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs. This survey also presents a comparative analysis, juxtaposing the performance metrics of several leading paid and open-source LLMs, shedding light on their strengths and areas of improvement as well as a literature review on how LLMs are being used to tackle vision related tasks. Furthermore, the survey presents a comprehensive collection of datasets employed to train LLMs, offering insights into the diverse data available to achieve high performance in various pre-training and downstream tasks of LLMs. The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development. This survey aims to underscores the profound intersection of LLMs on CV, leading to a new era of integrated and advanced AI models.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.