Mikko Kolehmainen , Heikki Junninen, Harri Niska, Toni Patama, Anna
Ruuskanen, Kari Tuppurainen and Juhani Ruuskanen
Missing Data Toolbox for Air Quality Datasets
The objective of the study was to find a useful missing data imputing
method for air quality forecasting applications. The univariate methods
studied were the lin-ear interpolation, spline and nearest neighbour
(univariate) interpolation. Multi-variate methods studied were multivariate
nearest neighbour (NN), Self-Organising Map (SOM) and Multi-Layer Perceptron
(MLP). Additionally, a new approach was developed where univariate methods
were combined with multi-variate methods in order to utilise the best
properties of both approaches. The results in general showed that the
best overall performance can be achieved by combining univariate and
multivariate methods and that the way of combining is dependent on the
variable inspected. Based on these results a Missing Data Toolbox (MDT)
with a Graphical User Interface (GUI) in Matlab environment was created.
The MDT encapsulates the different algorithms and enables the treat-ment
of missing data in a coherent way. The MDT and GUI were tested on Win-dows
and Linux environments.
Close