Data
This section describes the data and the data collection process for collecting the data necessary to forecast air pollution in Belium.
All data except the EEA air pollution data can be found in the data repository. The EEA data for Belgium can be found at this OSF repository. Use of the data files will be provided in the modelling and experimentation sections of this documentation.
EEA Air pollution
The data can be downloaded here 9-18 using the air quality measurement stations as mandated by Directive 2008/50/EC of the European Parliament. This script will scrape the data automatically for Belgium. Note that the data in the OSF repository is the product of this script and can be fed directly to the VAR model or to the data processing script described below.
The air pollution data can visualized with this script.
COVID-19
Belgium COVID-19 data is available from Sciensano. The description of the variables can be found in this code book and other dataset information here.
We provide the data in this repository.
Traffic volume
Data provided by Bruxelles Mobilite upon request, available here.
We use five tunnels from Brussels, as described in the thesis and visualized in this script. The tunnels of interest can be visualized here.
Data processing
Data processing scripts used in this work are provided in src/data_processing/R. Scripts there include the EEA data scraping script, plotting scripts to visualize the air pollutants, COVID-19, and traffic data.
Importantly, there are two scripts named station_selection_for_mvts.R and save_data_for_mvts.R. If you want to reproduce our work with the transformer and regenerate the training/testing data we use, first run station_selection_for_mvts.R and then save_data_for_mvts.R. This will first select the 41 air measuring stations by criteria of complete data and then generate the training and testing data, as described in the thesis work associated with this project. This, of course, requires the data scraping to be done first, or to download the OSF data (which is the product of the scraping script).