Distantly Supervised Multi-Task Learning

Research Watch


Buzzing and beeping monitors alert doctors and nurses to potentially life-threatening changes in patient conditions, but a lot of time is wasted responding to false alarms. Artificial intelligence might help, according to a recent paper. 



“Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care,” published at ICML 2018, focuses on reducing the number of false alarms from the intensive care ward. 

Currently, hospital ICU wards rely on very basic signals (changes in blood pressure, heart rate, etc.) to trigger an alarm; but the movement of sensors  and other “artefacts” caused by technical glitches trigger frequent false alarms, desensitizing doctors and nurses who may  not react quickly enough to real life threatening events. 

Applying machine learning on multivariate time series data can reduce false alarms.  However, as with many datasets the authors suffer a lack of labeled data necessary for entirely supervised learning. To address this problem, the paper’s authors, a group of researchers in Switzerland led by Patrick Schwab, developed a new technique which learns several auxiliary tasks on the unlabeled data to help increase the performance on the primary task of detecting false alarms.

Fundamentally, this paper makes use of multiple “blocks” which are neural networks themselves, each with its own architecture and parameters. Blocks generally each have their own specific procedures to perform. The initial blocks (for which the authors use ResNets), each take a single stream of time series data. For example, block 1 or P1 might take heart rate data, P2 might take oxygen saturation SO2, and so on.  P1, according to the authors, extracts feature representations, in other words workable representations of the time series data. The feature representations from these blocks (P1, P2, etc.) are then concatenated together with missing value indicators. Missing values sometimes appear due to differences and inconsistencies in the data stream; e.g., medical monitors may record at different intervals or not at all.  DSMT addresses this problem by replacing missing values with zero, but also adds the missing value indicator to identify where it was replaced.  Next the data travel to the multitask blocks, which are used for multitask learning.


Multitask learning (MLT) is a type of learning where a neural network learns several tasks simultaneously, thereby increasing the overall performance. 


This is due to the neural network’s learning commonly shared patterns between the tasks. To effectively use MLT the authors describe using multi-task blocks that forward their auxiliary task to the head block that handles optimization for the main task. This is different from most prior MLT models where all tasks are learned together in one single neural network.  The new model allows the network to focus on optimizing solely for the main task in the head block.  Finally, the head block computes the final output (in this instance whether the alarm is an artefact/false or whether it is true).

The paper introduces an interesting new architecture and shows its utility at identifying false alarms in cases where labeled data is limited. However, as the authors’ own results show, in cases where more labeled data is available (>500) a simple random forest (a basic machine learning model that has been around since the 1990s) out performs DSMTs; and a feature sharing GAN outperforms it when data is extremely limited (i.e., <25 labeled examples). DSMTs are also complicated as they combine multiple neural networks and have a large number of total parameters. The authors do not show if the DSMT architecture will generalize to other datasets and types of data. That said, multi-task and distantly supervised learning will likely continue to play an important role in learning in data-limited scenarios. The paper provides an interesting new structure which incorporates both (MLT and DSL) along with recording a number of baseline results on the complex dataset. In terms of application, the network with just a 100 labeled examples would be able to reduce the number of false alarms brought to the attention of staff by 63.30%.