Week Ending 3.17.2024

 

RESEARCH WATCH: 3.17.2024

 

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

Large language models can answer questions about simple videos, but struggle to temporally ground text queries in longer, complex videos. HawkEye is a video-text LLM that performs temporal video grounding in a text-to-text manner, leveraging new time-aware training objectives and segment representations. Applications include video search, analysis, and intelligent video assistants.

Authors:  Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao

Link:  https://arxiv.org/abs/2403.10228v1

Date: 2024-03-15

Summary:

Video-text Large Language Models (video-text LLMs) have shown remarkable performance in answering questions and holding conversations on simple videos. However, they perform almost the same as random on grounding text queries in long and complicated videos, having little ability to understand and reason about temporal information, which is the most fundamental difference between videos and images. In this paper, we propose HawkEye, one of the first video-text LLMs that can perform temporal video grounding in a fully text-to-text manner. To collect training data that is applicable for temporal video grounding, we construct InternVid-G, a large-scale video-text corpus with segment-level captions and negative spans, with which we introduce two new time-aware training objectives to video-text LLMs. We also propose a coarse-grained method of representing segments in videos, which is more robust and easier for LLMs to learn and follow than other alternatives. Extensive experiments show that HawkEye is better at temporal video grounding and comparable on other video-text tasks with existing video-text LLMs, which verifies its superior video-text multi-modal understanding abilities.

--------------------------------------------------------------------------------------------------------

A Hybrid SNN-ANN Network for Event-based Object Detection with Spatial and Temporal Attention

Event cameras offer advantages like high temporal resolution, but converting their data to traditional frames is challenging. This work proposes a hybrid spiking neural network (SNN) and artificial neural network (ANN) for event-based object detection that converts SNN outputs to dense ANN inputs, enabling energy-efficient inference on neuromorphic hardware. Applications include robotics and computer vision.

Authors:  Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci

Link:  https://arxiv.org/abs/2403.10173v1

Date: 2024-03-15

Summary:

Event cameras offer high temporal resolution and dynamic range with minimal motion blur, making them promising for object detection tasks. While Spiking Neural Networks (SNNs) are a natural match for event-based sensory data and enable ultra-energy efficient and low latency inference on neuromorphic hardware, Artificial Neural Networks (ANNs) tend to display more stable training dynamics and faster convergence resulting in greater task performance. Hybrid SNN-ANN approaches are a promising alternative, enabling to leverage the strengths of both SNN and ANN architectures. In this work, we introduce the first Hybrid Attention-based SNN-ANN backbone for object detection using event cameras. We propose a novel Attention-based SNN-ANN bridge module to capture sparse spatial and temporal relations from the SNN layer and convert them into dense feature maps for the ANN part of the backbone. Experimental results demonstrate that our proposed method surpasses baseline hybrid and SNN-based approaches by significant margins, with results comparable to existing ANN-based methods. Extensive ablation studies confirm the effectiveness of our proposed modules and architectural choices. These results pave the way toward a hybrid SNN-ANN architecture that achieves ANN like performance at a drastically reduced parameter budget. We implemented the SNN blocks on digital neuromorphic hardware to investigate latency and power consumption and demonstrate the feasibility of our approach.

--------------------------------------------------------------------------------------------------------

Functional Graph Convolutional Networks: A unified multi-task and multi-modal learning framework to facilitate health and social-care insights

Functional Graph Convolutional Networks unify multi-task and multi-modal learning for handling complex longitudinal data in digital health and social care through task-specific embeddings, classification/regression, and an interpretable knowledge graph. Applications include improving health solutions, care, and well-being across age groups.

Authors:  Tobia Boschi, Francesca Bonin, Rodrigo Ordonez-Hurtado, Cécile Rosseau, Alessandra Pascale, John Dinsmore

Link:  https://arxiv.org/abs/2403.10158v1

Date: 2024-03-15

Summary:

This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well-being at all ages, funGCN offers a unified approach to handle multivariate longitudinal data for multiple entities and ensures interpretability even with small sample sizes. Key innovations include task-specific embedding components that manage different data types, the ability to perform classification, regression, and forecasting, and the creation of a knowledge graph for insightful data interpretation. The efficacy of funGCN is validated through simulation experiments and a real-data application.

--------------------------------------------------------------------------------------------------------

Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

This paper introduces group-aware priors for neural networks that improve robustness to subpopulation distribution shifts. The simple approach outperforms standard training and can be combined with other techniques. Applications include safer real-world deployment across domains with population shifts like healthcare and facial recognition.

Authors:  Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

Link:  https://arxiv.org/abs/2403.09869v1

Date: 2024-03-14

Summary:

Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts.

--------------------------------------------------------------------------------------------------------

Towards Diverse Perspective Learning with Selection over Multiple Temporal Poolings

Time series classification models often use a fixed temporal pooling mechanism that can underperform on certain data. This work proposes an attention-based dynamic pooling selection approach that improves accuracy by leveraging an ensemble of pooling perspectives. Potential applications include time series forecasting across domains.

Authors:  Jihyeon Seong, Jungmin Kim, Jaesik Choi

Link:  https://arxiv.org/abs/2403.09749v1

Date: 2024-03-14

Summary:

In Time Series Classification (TSC), temporal pooling methods that consider sequential information have been proposed. However, we found that each temporal pooling has a distinct mechanism, and can perform better or worse depending on time series data. We term this fixed pooling mechanism a single perspective of temporal poolings. In this paper, we propose a novel temporal pooling method with diverse perspective learning: Selection over Multiple Temporal Poolings (SoM-TP). SoM-TP dynamically selects the optimal temporal pooling among multiple methods for each data by attention. The dynamic pooling selection is motivated by the ensemble concept of Multiple Choice Learning (MCL), which selects the best among multiple outputs. The pooling selection by SoM-TP's attention enables a non-iterative pooling ensemble within a single classifier. Additionally, we define a perspective loss and Diverse Perspective Learning Network (DPLN). The loss works as a regularizer to reflect all the pooling perspectives from DPLN. Our perspective analysis using Layer-wise Relevance Propagation (LRP) reveals the limitation of a single perspective and ultimately demonstrates diverse perspective learning of SoM-TP. We also show that SoM-TP outperforms CNN models based on other temporal poolings and state-of-the-art models in TSC with extensive UCR/UEA repositories.

--------------------------------------------------------------------------------------------------------

Performance Analysis on RIS-Aided Wideband Massive MIMO OFDM Systems with Low-Resolution ADCs

Reconfigurable intelligent surfaces can improve massive MIMO communications, but integrating them with low-resolution ADCs is challenging. This paper analyzes the performance of such RIS-aided wideband systems to provide insights on power scaling laws and RIS phase shift optimization. Key applications are in 6G and beyond wireless networks.

Authors:  Xianzhe Chen, Hong Ren, Cunhua Pan, Zhangjie Peng, Kangda Zhi, Yong Liu, Xiaojun Xi, Ana Garcia Armada, Cheng-Xiang Wang

Link:  https://arxiv.org/abs/2403.09058v1

Date: 2024-03-14

Summary:

This paper investigates a reconfigurable intelligent surface (RIS)-aided wideband massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) system with low-resolution analog-to-digital converters (ADCs). Frequency-selective Rician fading channels are considered, and the OFDM data transmission process is presented in time domain. This paper derives the closed-form approximate expression of the uplink achievable rate, based on which the asymptotic system performance is analyzed when the number of the antennas at the base station and the number of reflecting elements at the RIS grow to infinity. Besides, the power scaling laws of the considered system are revealed to provide energy-saving insights. Furthermore, this paper proposes a gradient ascent-based algorithm to design the phase shifts of the RIS for maximizing the minimum user rate. Finally, numerical results are presented to verify the correctness of analytical conclusions and draw insights.

--------------------------------------------------------------------------------------------------------

AI coach for badminton

By analyzing video footage, this work aims to extract detailed player kinetics and biomechanics insights to suggest technique improvements and customized training programs for enhanced badminton performance. Potential uses include coaching tools and sports analytics for optimizing training.

Authors:  Dhruv Toshniwal, Arpit Patil, Nancy Vachhani

Link:  https://arxiv.org/abs/2403.08956v1

Date: 2024-03-13

Summary:

In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed insights into player kinetics and biomechanics. Through the analysis of stroke mechanics, including hand-hip coordination, leg positioning, and the execution angles of strokes, the research aims to derive predictive models that can suggest improvements in stance, technique, and muscle orientation. These recommendations are designed to mitigate erroneous techniques, reduce the risk of joint fatigue, and enhance overall performance. Utilizing a vast array of data available online, this research correlates players' physical attributes with their in-game movements to identify muscle activation patterns during play. The goal is to offer personalized training and nutrition strategies that align with the specific biomechanical demands of badminton, thereby facilitating targeted performance enhancements.

--------------------------------------------------------------------------------------------------------

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Continual pre-training can efficiently update large language models on new data at a fraction of retraining costs. This accessible approach matches retraining performance through simple strategies like learning rate rewarms/decays and data replay, enabling continual LLM model iteration as more data becomes available over time.

Authors:  Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

Link:  https://arxiv.org/abs/2403.08763v1

Date: 2024-03-13

Summary:

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by final loss and language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English$\rightarrow$English) and a stronger distribution shift (English$\rightarrow$German) at the $405$M parameter model scale with large dataset sizes (hundreds of billions of tokens). Selecting the weak but realistic shift for larger-scale experiments, we also find that our continual learning strategies match the re-training baseline for a 10B parameter LLM. Our results demonstrate that LLMs can be successfully updated via simple and scalable continual learning strategies, matching the re-training baseline using only a fraction of the compute. Finally, inspired by previous work, we propose alternatives to the cosine learning rate schedule that help circumvent forgetting induced by LR re-warming and that are not bound to a fixed token budget.

--------------------------------------------------------------------------------------------------------

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

This zero-shot semantic segmentation approach leverages visual consensus and language-driven class embeddings to improve object segmentation masks, generalization to unseen classes, and consistency within instances. Applications include open-vocabulary object segmentation for vision tasks.

Authors:  Zicheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, QiXiang Ye, Wei Ke

Link:  https://arxiv.org/abs/2403.08426v1

Date: 2024-03-13

Summary:

The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a transformer decoder to generate semantic masks. Despite its effectiveness, prevailing methods within this paradigm encounter challenges, including overfitting on seen classes and small fragmentation in masks. To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings. Moreover, to circumvent noisy alignments from the vision part due to its redundant nature, we introduce route attention into self-attention for finding visual consensus, thereby enhancing semantic consistency within the same object. Equipped with a vision-language prompting strategy, our approach significantly boosts the generalization capacity of segmentation models for unseen classes. Experimental results underscore the effectiveness of our approach, showcasing mIoU gains of 4.5 on the PASCAL VOC 2012 and 3.6 on the COCO-Stuff 164k for unseen classes compared with the state-of-the-art methods.

--------------------------------------------------------------------------------------------------------

Stacking-based deep neural network for player scouting in football 1

Professional sports teams can benefit from this stacking deep learning model that outperforms classical statistical methods in detecting high potential players from large open-source databases for more effective data scouting and recruitment.

Authors:  Simon Lacan

Link:  https://arxiv.org/abs/2403.08835v1

Date: 2024-03-13

Summary:

Datascouting is one of the most known data applications in professional sport, and specifically football. Its objective is to analyze huge database of players in order to detect high potentials that can be then individually considered by human scouts. In this paper, we propose a stacking-based deep learning model to detect high potential football players. Applied on open-source database, our model obtains significantly better results that classical statistical methods.

--------------------------------------------------------------------------------------------------------

Rethinking ASTE: A Minimalist Tagging Scheme Alongside Contrastive Learning

Aspect Sentiment Triplet Extraction aims to extract structured sentiment triplets from text data, but existing approaches often overcomplicate the task. This work introduces a novel minimalist tagging scheme with contrastive learning that achieves comparable or superior performance to state-of-the-art methods, including outperforming large language models in few-shot scenarios.

Authors:  Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu

Link:  https://arxiv.org/abs/2403.07342v1

Date: 2024-03-12

Summary:

Aspect Sentiment Triplet Extraction (ASTE) is a burgeoning subtask of fine-grained sentiment analysis, aiming to extract structured sentiment triplets from unstructured textual data. Existing approaches to ASTE often complicate the task with additional structures or external data. In this research, we propose a novel tagging scheme and employ a contrastive learning approach to mitigate these challenges. The proposed approach demonstrates comparable or superior performance in comparison to state-of-the-art techniques, while featuring a more compact design and reduced computational overhead. Notably, even in the era of Large Language Models (LLMs), our method exhibits superior efficacy compared to GPT 3.5 and GPT 4 in a few-shot learning scenarios. This study also provides valuable insights for the advancement of ASTE techniques within the paradigm of large language models.

--------------------------------------------------------------------------------------------------------

CuentosIE: can a chatbot about "tales with a message" help to teach emotional intelligence?

This chatbot provides an interactive tool for emotional intelligence education using narrative "tales with a message." With a curated tale database, reading comprehension aids, and psychological monitoring indicators, CuentosIE aims to impart emotional lessons through engaging storytelling while tracking users' emotional development.

Authors:  Antonio Ferrández, Rocío Lavigne-Cerván, Jesús Peral, Ignasi Navarro-Soria, Ángel Lloret, David Gil, Carmen Rocamora

Link:  https://arxiv.org/abs/2403.07193v1

Date: 2024-03-11

Summary:

In this article, we present CuentosIE (TalesEI: chatbot of tales with a message to develop Emotional Intelligence), an educational chatbot on emotions that also provides teachers and psychologists with a tool to monitor their students/patients through indicators and data compiled by CuentosIE. The use of "tales with a message" is justified by their simplicity and easy understanding, thanks to their moral or associated metaphors. The main contributions of CuentosIE are the selection, collection, and classification of a set of highly specialized tales, as well as the provision of tools (searching, reading comprehension, chatting, recommending, and classifying) that are useful for both educating users about emotions and monitoring their emotional development. The preliminary evaluation of the tool has obtained encouraging results, which provides an affirmative answer to the question posed in the title of the article.

--------------------------------------------------------------------------------------------------------

Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization

Existing opinion summarization datasets and models exhibit positive sentiment bias. This framework combines large and small language models to generate balanced synthetic review data for debiasing through data augmentation, alleviating emotional bias more economically than using only large models.

Authors:  Yanyue Zhang, Pengfei Li, Yilong Lai, Deyu Zhou

Link:  https://arxiv.org/abs/2403.07693v1

Date: 2024-03-12

Summary:

As more than 70$\%$ of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are reluctant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the over-reliance on a specific framework is to generate additional data based on large language models to balance the emotional distribution of the dataset. However, data augmentation based on large language models faces two disadvantages: 1) the potential issues or toxicity in the augmented data; 2) the expensive costs. Therefore, in this paper, we propose a novel data augmentation framework based on both large and small language models for debiasing opinion summarization. In specific, a small size of synthesized negative reviews is obtained by rewriting the positive text via a large language model. Then, a disentangle reconstruction model is trained based on the generated data. After training, a large amount of synthetic data can be obtained by decoding the new representation obtained from the combination of different sample representations and filtering based on confusion degree and sentiment classification. Experiments have proved that our framework can effectively alleviate emotional bias same as using only large models, but more economically.

--------------------------------------------------------------------------------------------------------

Perennial Semantic Data Terms of Use for Decentralized Web

As decentralized web frameworks like Solid aim to improve data privacy, users struggle with navigating complex cross-app data permissions. This formal Data Terms of Use (DToU) language with automated reasoning enables persistent, user-specified policies spanning apps/activities for enhanced privacy and usability.

Authors:  Rui Zhao, Jun Zhao

Link:  https://arxiv.org/abs/2403.07587v1

Date: 2024-03-12

Summary:

In today's digital landscape, the Web has become increasingly centralized, raising concerns about user privacy violations. Decentralized Web architectures, such as Solid, offer a promising solution by empowering users with better control over their data in their personal `Pods'. However, a significant challenge remains: users must navigate numerous applications to decide which application can be trusted with access to their data Pods. This often involves reading lengthy and complex Terms of Use agreements, a process that users often find daunting or simply ignore. This compromises user autonomy and impedes detection of data misuse. We propose a novel formal description of Data Terms of Use (DToU), along with a DToU reasoner. Users and applications specify their own parts of the DToU policy with local knowledge, covering permissions, requirements, prohibitions and obligations. Automated reasoning verifies compliance, and also derives policies for output data. This constitutes a ``perennial'' DToU language, where the policy authoring only occurs once, and we can conduct ongoing automated checks across users, applications and activity cycles. Our solution is built on Turtle, Notation 3 and RDF Surfaces, for the language and the reasoning engine. It ensures seamless integration with other semantic tools for enhanced interoperability. We have successfully integrated this language into the Solid framework, and conducted performance benchmark. We believe this work demonstrates a practicality of a perennial DToU language and the potential of a paradigm shift to how users interact with data and applications in a decentralized Web, offering both improved privacy and usability.

--------------------------------------------------------------------------------------------------------

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

To analyze large language models' grasp of Korean cultural and linguistic knowledge lacking in translated English benchmarks, CLIcK provides a comprehensive 1,995 QA dataset spanning 11 categories sourced from official exams/textbooks, enabling Korean-centric evaluation and insights.

Authors:  Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh

Link:  https://arxiv.org/abs/2403.06412v3

Date: 2024-03-15

Summary:

Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets that are sourced from Korean data capturing cultural knowledge, only narrow tasks such as bias and hate speech detection are offered. To address this gap, we introduce a benchmark of Cultural and Linguistic Intelligence in Korean (CLIcK), a dataset comprising 1,995 QA pairs. CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture. For each instance in CLIcK, we provide fine-grained annotation of which cultural and linguistic knowledge is required to answer the question correctly. Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension. CLIcK offers the first large-scale comprehensive Korean-centric analysis of LLMs' proficiency in Korean culture and language.

--------------------------------------------------------------------------------------------------------

The reliability of the gender Implicit Association Test (gIAT) for high-ability careers

This study re-analyzes implicit gender bias data to argue that group differences in interests and cognitive abilities, rather than implicit biases measured by gender IATs, better explain the underrepresentation of women in STEM and other high-complexity career fields.

Authors:  S. Stanley Young, Warren B. Kindzierski

Link:  https://arxiv.org/abs/2403.10300v1

Date: 2024-03-15

Summary:

Males outnumber females in many high ability careers, for example, in the fields of science, technology, engineering, and mathematics and professors in academic medicine. These differences are often attributed to implicit, subconscious, bias. One objective of this study was to use statistical p value plots to independently test the ability to support the claim of implicit bias made in a meta analysis of gender bias studies.   The meta analysis examined correlations between implicit bias measures based on the gender Implicit Association Test, g IAT, and measures of intergroup, female and male, behavior. A second objective was to investigate general intelligence g and vocational, things people, interests as explanatory factors for gender differences in high ability careers.   The p value plots constructed using data sets from the meta analysis did not support real associations between the tested variables. These findings reinforce the lack of correlation between g IAT, implicit bias, measures and real world gender behaviors in high ability careers.   More demanding careers, attorneys, engineers, scientists, corporate executives, are recruited from people with higher g. One is dealing with gender groups and the group of high g females is smaller than high g males. Regarding vocational interests, females prefer working with people and males prefer working with things. STEM fields are typically things oriented. One is dealing with gender groups and the group of females who prefer working with things is smaller than the group of males.   These facts make it predictable that there are more males in high complexity, things careers, STEM, academic medicine positions, than females. Implicit bias gIAT measures have little or no explanatory power for gender differences in high ability careers relative to g and interests in working with things.

--------------------------------------------------------------------------------------------------------

Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

For industrial cyber-physical systems threatened by compromised controllers, this game-theoretic approach designs a resilient leading controller that incentivizes alignment of follower strategies, deriving conditions for optimality when dynamics are known and Q-learning algorithms for model-free solutions.

Authors:  Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

Link:  https://arxiv.org/abs/2403.08948v1

Date: 2024-03-13

Summary:

In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly compromising the controller, thereby posing a direct risk to physical security. Within the framework of hierarchical control and incentive feedback Stackelberg game, we design a resilient leading controller (leader) that is adaptive to a compromised following controller (follower) such that the compromised follower acts cooperatively with the leader, aligning its strategies with the leader's objective to achieve a team-optimal solution. First, we provide sufficient conditions for the existence of an incentive Stackelberg solution when system dynamics are known. Then, we propose a Q-learning-based Approximate Dynamic Programming (ADP) approach, and corresponding algorithms for the online resolution of the incentive Stackelberg solution without requiring prior knowledge of system dynamics. Last but not least, we prove the convergence of our approach to the optimum.

--------------------------------------------------------------------------------------------------------

Leveraging Internal Representations of Model for Magnetic Image Classification

With limited training data, this deep learning approach generates informative samples by leveraging the internal representations of neural networks to overcome data scarcity for magnetic image classification tasks, enabling model training from minimal data in various domains.

Authors:  Adarsh N L, Arun P V, Alok Porwal, Malcolm Aranha

Link:  https://arxiv.org/abs/2403.06797v1

Date: 2024-03-11

Summary:

Data generated by edge devices has the potential to train intelligent autonomous systems across various domains. Despite the emergence of diverse machine learning approaches addressing privacy concerns and utilizing distributed data, security issues persist due to the sensitive storage of data shards in disparate locations. This paper introduces a potentially groundbreaking paradigm for machine learning model training, specifically designed for scenarios with only a single magnetic image and its corresponding label image available. We harness the capabilities of Deep Learning to generate concise yet informative samples, aiming to overcome data scarcity. Through the utilization of deep learning's internal representations, our objective is to efficiently address data scarcity issues and produce meaningful results. This methodology presents a promising avenue for training machine learning models with minimal data.

--------------------------------------------------------------------------------------------------------

Stimulate the Potential of Robots via Competition

Drawing inspiration from how competition drives humans to push their limits, this competitive learning framework allows robots to learn advantageous actions from race dynamics, with experiments showing superior performance over single-agent training across competitive multi-agent environments.

Authors:  Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

Link:  https://arxiv.org/abs/2403.10487v1

Date: 2024-03-15

Summary:

It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual robot to acquire knowledge from the competition, fully stimulating its dynamics potential in the race. Specifically, the competition information among competitors is introduced as the additional auxiliary signal to learn advantaged actions. We further build a Multiagent-Race environment, and extensive experiments are conducted, demonstrating that robots trained in competitive environments outperform ones that are trained with SoTA algorithms in single robot environment.

--------------------------------------------------------------------------------------------------------

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

Optical flow is crucial for robotic tasks like SLAM, but existing learning-based methods trade off accuracy for efficiency. NeuFlow achieves real-time, high-accuracy optical flow estimation ideal for small robotic platforms through a global-to-local CNN architecture offering significant speed-ups over state-of-the-art approaches.

Authors:  Zhiyong Zhang, Huaizu Jiang, Hanumant Singh

Link:  https://arxiv.org/abs/2403.10425v1

Date: 2024-03-15

Summary:

Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.