Time series data are ubiquitous. The broad diffusion and adoption of Internet of Things (IoT), as well as major advances in sensor technology are examples of why such data have become pervasive. These technologies have applications in several domains, such as healthcare, finance, meteorology and transportation, where, for instance, the following tasks have high importance: prediction of the health status of patients, stock market analysis, transportation. Deep Neural Networks (DNNs) have recently been used to create models that improve on the state of the art for some of these tasks. In time series classification and forecasting, Deep Learning (DL) has been beneficial for avoiding heavy data pre-processing and feature engineering. Time series data influences both political and industrial decisions every day, yet there is, surprisingly, very limited research in Machine Learning (ML) for time series - especially in situations where data is scarce or of low-quality.
In many real-world applications, we have the following two scenarios: 1) the amount of available training data is limited, or 2) there is a huge amount of available data which is scarcely or not labelled due to high costs of collecting and annotation of the data. As a result, the future of Artificial Intelligence (AI) will be “about less data, not more”. Thus, there is a need for focusing on modern AI techniques that can extract value from small datasets. These considerations can also contribute to the increasing need to address the sustainability and privacy aspects of ML and AI.
The goal of this project is to overcome the issue of limited available or labelled data for (multivariate) time series modelling, where heterogeneity of the data (e.g. non-stationarity, multi-resolution, irregular sampling) as well as noise, pose further challenges. ML4ITS’s main objective is to advance the state-of-the-art in time series analysis for “irregular” time series. We define time series to be “irregular” if they fall under one or several of the following categories:
- Short: univariate and multivariate time series with a limited amount of data and history.
- Multiresolution: multivariate time series where each signal has a different granularity or resolution in terms of sampling frequency.
- Noisy: univariate/multivariate time series with some additional perturbation appearing in different forms. In this class, we also include time series with missing data.
- Heterogeneous: multivariate time series, usually collected by many physical systems, that exhibit different types of embedded, statistical patterns and behaviours.
- Scarcely labelled and unlabelled: univariate/multivariate time series where only a small part of the data is labelled or completely unlabelled.
The tasks that will be performed for irregular time series are:
- data imputation and denoising
- synthetic data generation
- time series forecasting and classification
- anomaly detection and failure prediction
- quantification of uncertainties for each task in question
We plan to achieve these goals by developing novel A) Transfer Learning and B) Unsupervised learning and Data Augmentation methods. These techniques have hardly been explored in the time series domain.
The project is structured in three main work packages:
- WP1 - Transfer Learning and Few-Shot Learning for Time Series Analysis
- WP2 - Unsupervised Methods and Data Augmentation for Time Series Analysis
- WP3 - Graph Signal Processing (GSP) for Time Series Analysis
We rely on a multidisciplinary study combining different perspectives from the three main scientific communities involved in time series analysis. As a result, the consortium has been composed with this complementarity in mind including researchers across these different fields (IES, MATH, IDI NTNU and SINTEF Digital).