Week Ending 4.28.2024

RESEARCH WATCH: 4.28.2024

Fuzzy Inference System for Test Case Prioritization in Software Testing

Fuzzy Inference System for Test Case Prioritization in Software Testing Test case prioritization optimizes software testing efficiency by executing the most critical test cases early. This paper introduces a novel fuzzy logic approach to automate test case prioritization, leveraging expert knowledge to link test case characteristics with prioritization. The proposed system demonstrates effectiveness through experimental validation, offering a practical solution to reduce software testing's resource demands.

Authors: Aron Karatayev, Anna Ogorodova, Pakizar Shamoi

Link: https://arxiv.org/abs/2404.16395v1

Date: 2024-04-25

Summary:

In the realm of software development, testing is crucial for ensuring software quality and adherence to requirements. However, it can be time-consuming and resource-intensive, especially when dealing with large and complex software systems. Test case prioritization (TCP) is a vital strategy to enhance testing efficiency by identifying the most critical test cases for early execution. This paper introduces a novel fuzzy logic-based approach to automate TCP, using fuzzy linguistic variables and expert-derived fuzzy rules to establish a link between test case characteristics and their prioritization. Our methodology utilizes two fuzzy variables - failure rate and execution time - alongside two crisp parameters: Prerequisite Test Case and Recently Updated Flag. Our findings demonstrate the proposed system capacity to rank test cases effectively through experimental validation on a real-world software system. The results affirm the practical applicability of our approach in optimizing the TCP and reducing the resource intensity of software testing.

--------------------------------------------------------------------------------------------------------

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Auto-labeling techniques can produce labeled training data with minimal manual effort, but overconfident model predictions limit their performance. This work proposes a framework to derive optimal confidence functions specifically for auto-labeling systems. The resulting method, Colander, achieves significant improvements in coverage while maintaining low error rates, showcasing the potential to enhance auto-labeling applicability across domains.

Authors: Harit Vishwakarma, Reid, Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak

Link: https://arxiv.org/abs/2404.16188v1

Date: 2024-04-24

Summary:

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.

--------------------------------------------------------------------------------------------------------

Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant

Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant While large language models excel at generalizing from human-written instructions, their performance on specialized domains like psychotherapy remains limited. This paper proposes using domain-specific assistant instructions based on real therapy data, combined with adaptation techniques, to improve the linguistic quality and domain knowledge of psychotherapy chatbots generated by large language models.

Authors: Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu

Link: https://arxiv.org/abs/2404.16160v1

Date: 2024-04-24

Summary:

Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.

--------------------------------------------------------------------------------------------------------

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs Building on the success of chain-of-thought prompting, this paper surveys emerging "chain-of-X" methods that structure the reasoning process of large language models across diverse domains and tasks. By categorizing and analyzing these approaches, the survey provides a valuable resource for researchers aiming to leverage structured reasoning with language models.

Authors: Yu Xia, Rui Wang, Xu Liu, Mingyan Li, Tong Yu, Xiang Chen, Julian McAuley, Shuai Li

Link: https://arxiv.org/abs/2404.15676v1

Date: 2024-04-24

Summary:

Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods for LLMs in different contexts. Specifically, we categorize them by taxonomies of nodes, i.e., the X in CoX, and application tasks. We also discuss the findings and implications of existing CoX methods, as well as potential future directions. Our survey aims to serve as a detailed and up-to-date resource for researchers seeking to apply the idea of CoT to broader scenarios.

--------------------------------------------------------------------------------------------------------

Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN

Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN Accurate detection of regions of interest is crucial for medical image analysis and disease diagnosis. This study explores fusing the YOLO and Faster R-CNN object detection algorithms to enhance gallbladder detection from ultrasound images, ultimately improving gallbladder cancer classification accuracy compared to using either technique individually.

Authors: Sara Dadjouy, Hedieh Sajedi

Link: https://arxiv.org/abs/2404.15129v1

Date: 2024-04-23

Summary:

Medical image analysis is a significant application of artificial intelligence for disease diagnosis. A crucial step in this process is the identification of regions of interest within the images. This task can be automated using object detection algorithms. YOLO and Faster R-CNN are renowned for such algorithms, each with its own strengths and weaknesses. This study aims to explore the advantages of both techniques to select more accurate bounding boxes for gallbladder detection from ultrasound images, thereby enhancing gallbladder cancer classification. A fusion method that leverages the benefits of both techniques is presented in this study. The proposed method demonstrated superior classification performance, with an accuracy of 92.62%, compared to the individual use of Faster R-CNN and YOLOv8, which yielded accuracies of 90.16% and 82.79%, respectively.

--------------------------------------------------------------------------------------------------------

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Reproducibility and transparency are essential for advancing open research on large language models. This paper releases OpenELM, an efficient state-of-the-art open language model, along with the complete framework for training, evaluation, and device optimization. This comprehensive release aims to empower the open research community and pave the way for future endeavors.

Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

Link: https://arxiv.org/abs/2404.14619v1

Date: 2024-04-22

Summary:

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

--------------------------------------------------------------------------------------------------------

Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics Passive acoustic monitoring has revolutionized ecological assessments, but annotation and compute costs limit its efficacy. This work explores optimal pretraining strategies for coral reef bioacoustics by leveraging bird, reef, and unrelated audio data. The resulting pretrained network, SurfPerch, provides a strong foundation for automated analysis of marine data with minimal costs.

Authors: Ben Williams, Bart van Merriënboer, Vincent Dumoulin, Jenny Hamer, Eleni Triantafillou, Abram B. Fleishman, Matthew McKown, Jill E. Munger, Aaron N. Rice, Ashlee Lillis, Clemency E. White, Catherine A. D. Hobbs, Tries B. Razak, Kate E. Jones, Tom Denton

Link: https://arxiv.org/abs/2404.16436v1

Date: 2024-04-25

Summary:

Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.

--------------------------------------------------------------------------------------------------------

Developing Acoustic Models for Automatic Speech Recognition in Swedish

Developing Acoustic Models for Automatic Speech Recognition in Swedish
This paper focuses on building acoustic models for continuous speech recognition in Swedish using hidden Markov models trained on the SpeechDat database. Various phone models were tested, including context-independent and context-dependent models, with evaluation on a digit and number recognition task, demonstrating improved performance over previous studies.

Authors: Giampiero Salvi

Link: https://arxiv.org/abs/2404.16547v1

Date: 2024-04-25

Summary:

This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.

--------------------------------------------------------------------------------------------------------

On the Use of Large Language Models to Generate Capability Ontologies

On the Use of Large Language Models to Generate Capability Ontologies Capability ontologies model system functionalities, but their creation is complex and typically requires ontology experts. This study investigates using large language models to generate capability ontologies from natural language input, exploring different prompting techniques and error analysis methods. The results show promising accuracy even for complex capabilities.

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Link: https://arxiv.org/abs/2404.17524v1

Date: 2024-04-26

Summary:

Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engineers / ontology experts. Therefore, this paper investigates how LLMs can be used to create capability ontologies. We present a study with a series of experiments in which capabilities with varying complexities are generated using different prompting techniques and with different LLMs. Errors in the generated ontologies are recorded and compared. To analyze the quality of the generated ontologies, a semi-automated approach based on RDF syntax checking, OWL reasoning, and SHACL constraints is used. The results of this study are very promising because even for complex capabilities, the generated ontologies are almost free of errors.

--------------------------------------------------------------------------------------------------------

Leveraging AI to Generate Audio for User-generated Content in Video Games

Leveraging AI to Generate Audio for User-generated Content in Video Games As user-generated content becomes more prevalent in video games, pre-creating audio assets is impractical. This paper explores using generative AI to create music and sound effects on-the-fly based on text or image descriptions of user-generated game content, while discussing ethical implications of this approach.

Authors: Thomas Marrinan, Pakeeza Akram, Oli Gurmessa, Anthony Shishkin

Link: https://arxiv.org/abs/2404.17018v1

Date: 2024-04-25

Summary:

In video game design, audio (both environmental background music and object sound effects) play a critical role. Sounds are typically pre-created assets designed for specific locations or objects in a game. However, user-generated content is becoming increasingly popular in modern games (e.g. building custom environments or crafting unique objects). Since the possibilities are virtually limitless, it is impossible for game creators to pre-create audio for user-generated content. We explore the use of generative artificial intelligence to create music and sound effects on-the-fly based on user-generated content. We investigate two avenues for audio generation: 1) text-to-audio: using a text description of user-generated content as input to the audio generator, and 2) image-to-audio: using a rendering of the created environment or object as input to an image-to-text generator, then piping the resulting text description into the audio generator. In this paper we discuss ethical implications of using generative artificial intelligence for user-generated content and highlight two prototype games where audio is generated for user-created environments and objects.

--------------------------------------------------------------------------------------------------------

MoDE: CLIP Data Experts via Clustering

MoDE: CLIP Data Experts via Clustering Contrastive language-image pretraining models like CLIP rely on noisy web data, impacting their performance. This paper presents MoDE, which learns a system of CLIP data experts via clustering, with each expert trained on a specific data cluster to be more robust to noise. By ensembling these experts based on task metadata, MoDE achieves superior zero-shot image classification compared to existing CLIP models with reduced training costs.

Authors: Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

Link: https://arxiv.org/abs/2404.16030v1

Date: 2024-04-24

Summary:

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less ($<$35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts. The code is available at https://github.com/facebookresearch/MetaCLIP/tree/main/mode.

--------------------------------------------------------------------------------------------------------

SFMViT: SlowFast Meet ViT in Chaotic World

Spatiotemporal action localization in chaotic scenes is a challenging video understanding task. This paper proposes SFMViT, a high-performance dual-stream network combining ViT for global feature extraction and SlowFast for spatiotemporal modeling, along with an anchor pruning strategy. SFMViT achieves state-of-the-art performance on the Chaotic World dataset, paving the way for advanced video analysis in complex environments.

Authors: Jiaying Lin, Jiajun Wen, Mengyuan Liu, Jinfu Liu, Baiqiao Yin, Yue Li

Link: https://arxiv.org/abs/2404.16609v1

Date: 2024-04-25

Summary:

The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic World dataset, far exceeding existing models. Code is available at https://github.com/jfightyr/SlowFast-Meet-ViT.

--------------------------------------------------------------------------------------------------------

Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat

Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat This work tackles the challenges of hyperspectral image reconstruction from miniaturized satellites, which suffer from stripe effects and computational limitations. The proposed Real-Time Compressed Sensing network enables efficient and robust reconstruction under noisy conditions, with a lightweight architecture suitable for deployment on edge devices, offering vital capabilities for existing miniaturized satellite systems.

Authors: Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen

Link: https://arxiv.org/abs/2404.15781v1

Date: 2024-04-24

Summary:

This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.

--------------------------------------------------------------------------------------------------------

MaGGIe: Masked Guided Gradual Human Instance Matting

MaGGIe: Masked Guided Gradual Human Instance Matting Human matting, the extraction of foreground human pixels, is a fundamental task in image/video processing. This paper introduces MaGGIe, a framework that predicts alpha mattes progressively for multiple human instances while maintaining computational efficiency, precision, and consistency. By leveraging modern architectures, MaGGIe achieves robust performance on synthesized benchmarks, enhancing the generalization of matting models in real-world scenarios.

Authors: Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee

Link: https://arxiv.org/abs/2404.16035v1

Date: 2024-04-24

Summary:

Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.

--------------------------------------------------------------------------------------------------------

Bored to Death: Artificial Intelligence Research Reveals the Role of Boredom in Suicide Behavior

Bored to Death: Artificial Intelligence Research Reveals the Role of Boredom in Suicide Behavior While AI has advanced suicide risk assessment, the theoretical understanding of suicidal behaviors remains limited. This study used AI methodologies to analyze Facebook data and uncovered boredom as a strong predictor of suicide risk, even after accounting for depression. The findings highlight boredom as an underexplored risk factor, signaling the need for further research and clinical attention.

Authors: Shir Lissak, Yaakov Ophir, Refael Tikochinski, Anat Brunstein Klomek, Itay Sisso, Eyal Fruchter, Roi Reichart

Link: https://arxiv.org/abs/2404.14057v2

Date: 2024-04-26

Summary:

Background: Recent advancements in Artificial Intelligence (AI) contributed significantly to suicide assessment, however, our theoretical understanding of this complex behavior is still limited. Objective: This study aimed to harness AI methodologies to uncover hidden risk factors that trigger or aggravate suicide behaviors. Method: The primary dataset included 228,052 Facebook postings by 1,006 users who completed the gold-standard Columbia Suicide Severity Rating Scale. This dataset was analyzed using a bottom-up research pipeline without a-priory hypotheses and its findings were validated using a top-down analysis of a new dataset. This secondary dataset included responses by 1,062 participants to the same suicide scale as well as to well-validated scales measuring depression and boredom. Results: An almost fully automated, AI-guided research pipeline resulted in four Facebook topics that predicted the risk of suicide, of which the strongest predictor was boredom. A comprehensive literature review using APA PsycInfo revealed that boredom is rarely perceived as a unique risk factor of suicide. A complementing top-down path analysis of the secondary dataset uncovered an indirect relationship between boredom and suicide, which was mediated by depression. An equivalent mediated relationship was observed in the primary Facebook dataset as well. However, here, a direct relationship between boredom and suicide risk was also observed. Conclusions: Integrating AI methods allowed the discovery of an under-researched risk factor of suicide. The study signals boredom as a maladaptive 'ingredient' that might trigger suicide behaviors, regardless of depression. Further studies are recommended to direct clinicians' attention to this burdening, and sometimes existential experience.

--------------------------------------------------------------------------------------------------------

Confronting the Diversity Problem: The Limits of Galaxy Rotation Curves as a tool to Understand Dark Matter Profiles

Confronting the Diversity Problem: The Limits of Galaxy Rotation Curves as a tool to Understand Dark Matter Profiles Galaxy rotation curves provide insights into dark matter profiles, but the observed diversity at dwarf scales challenges the expected universal profile from simulations. By analyzing FIRE simulations, this work demonstrates that factors like non-equilibrium behavior and non-circular motions can cause significant deviations between measured rotation curves and true circular velocities, potentially giving rise to "artificial" diversity unrelated to dark matter profiles.

Authors: Isabel S. Sands, Philip F. Hopkins, Xuejian Shen, Michael Boylan-Kolchin, James Bullock, Claude-Andre Faucher-Giguere, Francisco J. Mercado, Jorge Moreno, Lina Necib, Xiaowei Ou, Sarah Wellons, Andrew Wetzel

Link: https://arxiv.org/abs/2404.16247v1

Date: 2024-04-24

Summary:

While galaxy rotation curves provide one of the most powerful methods for measuring dark matter profiles in the inner regions of rotation-supported galaxies, at the dwarf scale there are factors that can complicate this analysis. Given the expectation of a universal profile in dark matter-only simulations, the diversity of observed rotation curves has become an often-discussed issue in Lambda Cold Dark Matter cosmology on galactic scales. We analyze a suite of Feedback in Realistic Environments (FIRE) simulations of $10^{10}-10^{12}$ $M_\odot$ halos with standard cold dark matter, and compare the true circular velocity to rotation curve reconstructions. We find that, for galaxies with well-ordered gaseous disks, the measured rotation curve may deviate from true circular velocity by at most 10% within the radius of the disk. However, non-equilibrium behavior, non-circular motions, and non-thermal and non-kinetic stresses may cause much larger discrepancies of 50% or more. Most rotation curve reconstructions underestimate the true circular velocity, while some reconstructions transiently over-estimate it in the central few kiloparsecs due to dynamical phenomena. We further demonstrate that the features that contribute to these failures are not always visibly obvious in HI observations. If such dwarf galaxies are included in galaxy catalogs, they may give rise to the appearance of "artificial" rotation curve diversity that does not reflect the true variation in underlying dark matter profiles.

--------------------------------------------------------------------------------------------------------

Deep Models for Multi-View 3D Object Recognition: A Review

Deep Models for Multi-View 3D Object Recognition: A Review While human decision-making leverages multiple viewpoints, machine learning often relies on single-view object recognition, which may be insufficient for complex tasks. This review comprehensively covers deep learning and transformer-based multi-view 3D object recognition methods, providing detailed information on datasets, architectures, fusion strategies, and performance, along with insights into applications and future directions.

Authors: Mona Alzahrani, Muhammad Usman, Salma Kammoun, Saeed Anwar, Tarek Helmy

Link: https://arxiv.org/abs/2404.15224v1

Date: 2024-04-23

Summary:

Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed by a single image may not be sufficient for accurate decision-making, particularly in complex recognition problems. The utilization of multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance. This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks. Specifically, we focus on deep learning-based and transformer-based techniques, as they are widely utilized and have achieved state-of-the-art performance. We provide detailed information about existing deep learning-based and transformer-based multi-view 3D object recognition models, including the most commonly used 3D datasets, camera configurations and number of views, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance on 3D classification and 3D retrieval tasks. Additionally, we examine various computer vision applications that use multi-view classification. Finally, we highlight key findings and future directions for developing multi-view 3D object recognition methods to provide readers with a comprehensive understanding of the field.

--------------------------------------------------------------------------------------------------------

A review of deep learning-based information fusion techniques for multimodal medical image classification

A review of deep learning-based information fusion techniques for multimodal medical image classification Multimodal medical imaging combines various modalities for improved diagnosis and research. This review analyzes deep learning-based fusion techniques for multimodal image classification, exploring input, intermediate, and output fusion schemes. It evaluates their performance, suitability for different scenarios, and challenges, while highlighting the promising future of transformer-based fusion methods.

Authors: Yihao Li, Mostafa El Habib Daho, Pierre-Henri Conze, Rachid Zeghlache, Hugo Le Boité, Ramin Tadayoni, Béatrice Cochener, Mathieu Lamard, Gwenolé Quellec

Link: https://arxiv.org/abs/2404.15022v1

Date: 2024-04-23

Summary:

Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.

--------------------------------------------------------------------------------------------------------

Mechanistic Interpretability for AI Safety -- A Review

Mechanistic Interpretability for AI Safety -- A Review Ensuring the safety and alignment of AI systems requires understanding their inner workings. This review explores mechanistic interpretability, which reverse-engineers neural networks' learned mechanisms and representations into human-understandable concepts. It surveys methodologies, assesses relevance to AI safety, and advocates for scaling techniques to handle complex models and behaviors, potentially preventing catastrophic outcomes.

Authors: Leonard Bereska, Efstratios Gavves

Link: https://arxiv.org/abs/2404.14082v1

Date: 2024-04-22

Summary:

Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse-engineering the computational mechanisms and representations learned by neural networks into human-understandable algorithms and concepts to provide a granular, causal understanding. We establish foundational concepts such as features encoding knowledge within neural activations and hypotheses about their representation and computation. We survey methodologies for causally dissecting model behaviors and assess the relevance of mechanistic interpretability to AI safety. We investigate challenges surrounding scalability, automation, and comprehensive interpretation. We advocate for clarifying concepts, setting standards, and scaling techniques to handle complex models and behaviors and expand to domains such as vision and reinforcement learning. Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.

--------------------------------------------------------------------------------------------------------

Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals Large language models (LLMs) can advance the UN's Sustainable Development Goals (SDGs), but attitudinal disparities between LLMs and humans pose challenges. This study reviews literature on LLM attitudes toward the 17 SDGs, examining potential disparities, underlying causes, and associated risks. It proposes strategies to ensure LLM alignment with SDG principles for a just, inclusive, and sustainable future.

Authors: Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

Link: https://arxiv.org/abs/2404.13885v1

Date: 2024-04-22

Summary:

Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison between their attitudes and support for each goal and those of humans. We examine the potential disparities, primarily focusing on aspects such as understanding and emotions, cultural and regional differences, task objective variations, and factors considered in the decision-making process. These disparities arise from the underrepresentation and imbalance in LLM training data, historical biases, quality issues, lack of contextual understanding, and skewed ethical values reflected. The study also investigates the risks and harms that may arise from neglecting the attitudes of LLMs towards the SDGs, including the exacerbation of social inequalities, racial discrimination, environmental destruction, and resource wastage. To address these challenges, we propose strategies and recommendations to guide and regulate the application of LLMs, ensuring their alignment with the principles and goals of the SDGs, and therefore creating a more just, inclusive, and sustainable future.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithApril 29, 2024Comment