Mikko Kolehmainen , Heikki Junninen, Harri Niska, Toni Patama, Anna Ruuskanen, Kari Tuppurainen and Juhani Ruuskanen

Missing Data Toolbox for Air Quality Datasets

The objective of the study was to find a useful missing data imputing method for air quality forecasting applications. The univariate methods studied were the lin-ear interpolation, spline and nearest neighbour (univariate) interpolation. Multi-variate methods studied were multivariate nearest neighbour (NN), Self-Organising Map (SOM) and Multi-Layer Perceptron (MLP). Additionally, a new approach was developed where univariate methods were combined with multi-variate methods in order to utilise the best properties of both approaches. The results in general showed that the best overall performance can be achieved by combining univariate and multivariate methods and that the way of combining is dependent on the variable inspected. Based on these results a Missing Data Toolbox (MDT) with a Graphical User Interface (GUI) in Matlab environment was created. The MDT encapsulates the different algorithms and enables the treat-ment of missing data in a coherent way. The MDT and GUI were tested on Win-dows and Linux environments.