Automatic METAR Data Collector

metar-graph-960px

In this post I’ll show you how you can make a program which will automatically collect data from some website and then further process it. More specifically, I’ll show you how I made myself a PHP script to periodically download half-hourly METAR reports from the website of the Slovak Hydrometeorological Institute and then processed them to show a plot of how temperature and pressure changes over time. You can easily adapt this for any kind of data collection you might wish to do, for example logging of exchange rates over time. So, let’s start!

Introduction

If you’re reading this, then probably there’s some data that you would like to collect over time, whether it’s exchange rates, meteorological data such as temperature, or some other information which changes with time. Ideally, this information should be accessible on some website and the website’s structure shouldn’t be changing, which is the case for example with RSS feeds or websites whose content is updated automatically.

In this case, we’ll use the website of the Slovak Hydrometeorological Institute (SHMU) which provides half-hourly meteorological reports in the so-called METAR format, which contains the date and time of the measurement and the measured data such as temperature, pressure, visibility, wind speed, etc. The institute has a specific webpage which is automatically updated every hour and which we’ll use to extract the data we want: http://www.shmu.sk/sk/?page=483. A screenshot from the site is shown below.

SHMU METAR screenshot

Notice that the website contains measurements for two consecutive times which differ by half an hour — in this case, the measurements were taken at 16:30 and 17:00 UTC (see image above), so we’ll only need to download the reports in one hour intervals to cover measurements from each half hour.

Also notice that each set of measurements contains METAR data measured at different place, which is denoted by the 4-letter code following the “METAR” directive — in this case the codes “LZIB”, “LZKZ”, “LZPP”, etc., represent cities of Bratislava, Kosice, Piestany, and so on. In this case, I only care about measurements made in Bratislava, and so we’ll only download those measurements — they’re shown in red rectangles in the screenshot above.

Next, notice that each METAR measurement starts with the directive “METAR” and ends with the equals sign “=”, between which various data are stored. For example, one of the lines in this case is `METAR LZIB 151700Z 12003KT CAVOK 08/05 Q1031 NOSIG=`. This will make our process of extraction of data easier, since we know that all the data we want is stored between “METAR” and “=”.

Finally, in order to extract the METAR report, we need to know in which HTML structure it is located (by which HTML tags it is surrounded). To determine this, I’ve opened the SHMU website and used the Google Chrome Developer Tools (View»Developer»Developer Tools) to see where the line we want to extract is located. You can also just view the source of the website and search for the line, but I find the Developer Tools to be easier to use. Anyway, below is the screenshot of the HTML structure.

SHMU google developer tools

From the above screenshot we can see that the first set of the METAR measurements that we want to extract (taken at 17:00 UTC time in this case) is enclosed by the `<pre>` and `</pre>` HTML tags, and the same holds for the second set of measurements (in this case taken at 16:30 UTC time). This will allow us in the PHP script which we’ll make shortly to identify the portions of the webpage which we want to extract, since we know they’ll always be surrounded by the `<pre>` tags.

Script for saving METAR data into a file

Now, to actually get the described data, I have made a PHP script which uses the CURL extension to download the METAR data from the SHMU website and parse them using DOM (Document Object Model) and some PHP functions (I’ve used parts of the parsing example from http://htmlparsing.com/php.html as a starting point). See the source code below and save this as a script `metar-parser.php` (I’m assuming it’s in the directory `/path/to/your/metar/file/`).

<!-- file /path/to/your/metar/file/metar-parser.php -->

<?php
// Use the CURL extension to query SHMU and get back a page of results
$url = "http://www.shmu.sk/sk/?page=483";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

// Create a DOM parser object
$dom = new DOMDocument();

// Parse the HTML
@$dom->loadHTML($html);

// Load METAR data
$numberOfParsedContents = 0;
foreach($dom->getElementsByTagName('pre') as $pre) {
 if($numberOfParsedContents == 0) {
 $metar_data1 = $pre->textContent;
 }
 if($numberOfParsedContents == 1) {
 $metar_data2 = $pre->textContent;
 }
 $numberOfParsedContents++;
}

// Extract METAR data for Bratislava
$metar_data_bratislava1 = explode("=", $metar_data1)[0];
$metar_data_bratislava2 = explode("=", $metar_data2)[0];

// Replace possible newlines by spaces
$metar_data_bratislava1 = str_replace("\r\n", " ", $metar_data_bratislava1);
$metar_data_bratislava2 = str_replace("\r\n", " ", $metar_data_bratislava2);

// Insert "METAR" at the beginning if there's none
if($metar_data_bratislava1 == "") $metar_data_bratislava1 = "METAR";
if($metar_data_bratislava2 == "") $metar_data_bratislava2 = "METAR";

// Combine date/time and METAR data
$lineToSave1 = date("Y-m-d H:i:s") . " " . $metar_data_bratislava1 . "=";
$lineToSave2 = date("Y-m-d H:i:s") . " " . $metar_data_bratislava2 . "=";

// Save the METAR measurements to a file
$file = "/path/to/your/metar/file/metar-data.csv";
$current = file_get_contents($file);
$current .= $lineToSave2."\n";
$current .= $lineToSave1."\n";
file_put_contents($file, $current);

?>

So, let’s now look at what each part of the code does. The lines 3-11 basically download the whole source code of the SHMU website into a variable `$html` using the CURL extension, so that we can process it later. CURL is a nice extension for transferring data with various formats, and in this case we use it to transfer an HTML website from the SHMU server to my server. In order to download the website, we first need to init CURL (line 5), then specify the website’s URL (lines 4,7) and the connection timeout (lines 6,9), and finally execute the CURL command to download the website (line 10) and close the connection (line 11). Now we have the whole SHMU website downloaded into a variable `$html`.

Next, we create a Document Object Model (line 14) and parse the HTML of the downloaded website (line 17). When the HTML is parsed in the `$dom` variable, we then go through all the `<pre>` elements in the HTML structure (line 21), and for the first two such occurrences (corresponding with the two sets of METAR measurements we want to extract, see lines 22,25) we put the contents of the `<pre>` tags into two variables `$metar_data1` and `$metar_data2` (line 23,26). These two variables now contain the two sets of METAR measurements for all cities.

Now, since we only want the measurements for Bratislava, which is the first METAR measurement in each set of measurements, we then divide the set of measurements into individual measurements by “exploding” them by the equals sign “=” (lines 32,33), since each individual METAR measurement must end with the equals sign. After this step, the variables `$metar_data_bratislava1` and `$metar_data_bratislava2` contain the two METAR measurements for Bratislava which were taken half an hour apart. In other words, these two variables contain only the contents in the two red boxes shown in the first screenshot on the top of this post.

Next, since the a METAR measurement may be split into two lines on the SHMU website, we want to make sure that any newline symbols will be converted into spaces (lines 36,37).

Also, it may happen that there will be no METAR measurements shown on the SHMU website (this usually seems to happen around midnights, probably because of maintenance), in which case the variables with METAR measurements will be empty. However, to make these measurements easier to parse later on, we’ll replace the empty string with the “METAR” directive (lines 40,41).

Next, to make it easier to look up the measurements later on, we’ll add the date at which our script downloads the data at the beginning of each METAR report, after which we’ll add our parsed METAR report for Bratislava and end it with an equals sign. Therefore, now the variables `$lineToSave1` and `$lineToSave2` will contain something like this: `2015-03-15 17:15:01 METAR LZIB 151600Z 14005KT 9999 FEW025 BKN042 09/04 Q1031 NOSIG=`, where `$lineToSave1` contains the measurement taken half an hour later after `$lineToSave2`.

Finally, we’ll save these two lines into a file on our server with the path specified on line 48. But in order not to replace the data we have saved previously, we’ll first read the current contents of the file (line 49) and then append them with the two measurements stored in `$lineToSave1` and `$lineToSave2` (line 50,51) and finally write all of this into the `metar-data.csv` file. So, for example, the following two lines may be appended to the file:

2015-03-15 17:15:01 METAR LZIB 151530Z 13008KT 9999 SCT034 BKN041 09/04 Q1031 NOSIG=
2015-03-15 17:15:01 METAR LZIB 151600Z 14005KT 9999 FEW025 BKN042 09/04 Q1031 NOSIG=

CRON job

Now we need to make our script above to execute automatically every hour so that we can capture the METAR reports continuously. This can be done by setting up what’s called a CRON job on our server, which is just a way of telling the Linux OS that we want the system to periodically execute some script for us. To do this, login to your server and on the terminal, execute the command `crontab -e`, which will bring up a screen where you’ll be able to add and edit your system’s CRON jobs. It looks like the screenshot below.

CRON job for METAR parser

 

In order to add a CRON job, add the following line to the end of your CRON file, as shown in the screenshot above:

15 * * * * /usr/bin/php /path/to/your/metar/file/metar-parser.php >/dev/null

What this command means is that the first 5 parameters (15 * * * *) represent the minute, hour, day, month, and day of week in which to execute your script, in that order (see this link for more detail). Since a pair of two new METAR reports appear at the SHMU website approximately at every hour, we want our PHP script to execute some 15 minutes after that time (to make sure the data has already been uploaded), and so we’ll tell CRON to execute the script at 15 minutes (15) after every hour (*), every day (*), every month (*) and every day of the week (*). The next parameter to the CRON job is the actual command to execute, and since we want PHP to execute our script `metar-parser.php`, the command will be to launch PHP (/usr/bin/php) with the parameter being the path to the script which we want to execute (/path/to/your/metar/file/metar-parser.php). Lastly, to suppress any output produced by this command, we redirect the command’s output to /dev/null (>,dev/null), which is a place where everything sent to it gets discarded. And so now our script will get executed every hour!

Extract data from METAR reports

Now that our half-an-hourly METAR reports are saved into a file, we’ll want to parse this file to get some useful information such as temperature and pressure which we’ll plot later on. Just for reference, here are some lines from the file `metar-data.csv` which you may see:

2015-03-15 10:15:01 METAR LZIB 150830Z 12010KT 3400 -RA BR BKN008 BKN036 06/05 Q1031 BECMG SCT009 BKN020=
2015-03-15 10:15:01 METAR LZIB 150900Z 13012KT 4200 BR BKN008 06/05 Q1031 BECMG SCT009 BKN020=
2015-03-15 11:15:02 METAR LZIB 150930Z 13014KT 4800 BR SCT008 BKN012 07/05 Q1031 BECMG BKN015=
2015-03-15 11:15:02 METAR LZIB 151000Z 13014KT 5000 BR FEW008 SCT010 BKN014 07/05 Q1031 BECMG BKN015=
2015-03-15 12:15:01 METAR LZIB 151030Z 13012KT 6000 FEW008 BKN013 07/05 Q1031 BECMG BKN015=
2015-03-15 12:15:01 METAR LZIB 151100Z 13012KT 7000 FEW008 BKN015 08/06 Q1031 NOSIG=

So, to extract some useful information from these measurements, I have created another PHP script called `metar-split.php`, which will extract the date and time of measurement, and the temperature and pressure and present them in a CSV format which we’ll later import into Excel for further processing. So, here’s the script:

<!-- file /path/to/your/metar/file/metar-split.php -->

<h1>METAR Splitter</h1>

<?php

echo "<b>Date,Time,Temperature,Pressure</b><br>";

$file = fopen("metar-data.csv", "r");
if($file) {
 while(($line = fgets($file)) !== false) {

 // if there's an empty METAR report, skip it
 if(strpos($line, "METAR=") !== FALSE) continue;

 $output_line = "";

 // get date
 $date = explode(" ", $line)[0];
 $output_line .= $date . ",";

 // get time of measurement
 $time_of_measurement = explode(" ", $line)[4];
 // select only time in HH:MM
 $time_of_measurement = substr($time_of_measurement, 2, 4);
 // insert colon between HH and MM
 $time_of_measurement = substr_replace($time_of_measurement, ":", 2, 0);
 $output_line .= $time_of_measurement . ",";

 // get Temp/DewPoint
 preg_match("/M?[0-9]{2}\/M?[0-9]{2}/", $line, $matches);
 $temperature = $matches[0];
 // get just Temp
 $temperature = explode("/", $temperature)[0];
 // replace M by minus sign
 $temperature = str_replace("M", "-", $temperature);
 $output_line .= $temperature . ",";

 // select pressure
 preg_match("/Q.{4}/", $line, $matches);
 $pressure = $matches[0];
 $pressure = explode("Q",$pressure)[1];
 $output_line .= $pressure . "<br>";

 // display parsed data
 echo $output_line;
 }
 fclose($file);
} else {
 echo "Error reading METAR file.";
}

?>

What this script does is that it first opens the file where our CRON job is saving the METAR reports (line 8) and reads line by line from it until the end of the file (line 10). Then, for each line (corresponding to one METAR report) it does the following.

First, it checks whether the current report is an empty one, in which case it skips it (line 13). If the report is not empty, an output line string is initialized.

Then, the date of the measurement is extracted by exploding the METAR string by spaces, where the date represents the first element of the exploded array (line 18), and this date is then appended to the output line (line 19). For example, if the METAR report starts with `2015-03-15 10:15:01 METAR LZIB 150830Z …`, then the date will be `2015-03-15` (note that the date is not extracted from the date `150830Z` in the actual report, because it doesn’t contain the year and month of measurement unlike the date `2015-03-15` written by our PHP script).

Next, the time of measurement is extracted by exploding the METAR string by spaces and selecting the fifth element of the resulting array. For example, for a report starting with `2015-03-15 10:15:01 METAR LZIB 150830Z …`, the extracted time will be `150830Z` (note that opposite to the case before, here we extract the time `150830Z` from the actual report as opposed to the time `10:15:01` provided by our script, since the time produced by the script corresponds to the time when the METAR report was downloaded, not when it was actually measured). Now, since this`150830Z` date/time is in format `DDHHMMZ`, where `D` is the day, `H` is the hour and `M` is the minute when the measurement was taken, and `Z` represents the UTC  time, to extract just the number of hours and minutes, we take the substring of this date/time from the character on position 2 (ie 3rd character) with the length of substring equal to 4 (line 24). This will extract just the part `HHMM`. Finally, we insert a colon between `HH` and `MM` (line 26) and append this time to the output string (line 27).

The third part we’re going to extract from the METAR report is temperature. Since the temperature in METAR reports is always in the format `MTT/MDD`, where `M` represents a possible minus sign (which may or may not be there), `T` represents the temperature in °C and `D` represents the dew point in °C. To extract this pattern, we use regular expressions (lines 30, 31). Next, since we’re only interested in the temperature, we split the string `MTT/MDD` by the forward slash to get just `MTT` (line 33). Finally, to make things easier to process later, we replace `M` by an actual minus sign (line 35) and append the temperature to the output line (line 36).

The last thing we’ll extract from the report is pressure. Since the METAR reports give pressure in the form `QPPPP` where `PPPP` is the pressure in hPa, we extract this string by using regular expressions (lines 39,40), and remove the leading letter `Q` (line 41) to get just the pressure in hPa. This pressure is then appended to the output line (line 42).

Finally, as we have now extracted the date, time, temperature and pressure from the METAR report and separated them by commas (thus forming a CSV format), we now display the whole output line (line 45). As the `while` cycle repeats itself, these data are extracted and displayed for every METAR report, and an example output from this `metar-split.php` script might look like this:

Date,Time,Temperature,Pressure
2015-01-23,14:30,07,1014
2015-01-23,15:00,06,1014
2015-01-23,15:30,06,1014
2015-01-23,16:00,06,1014
2015-01-23,16:30,06,1014
2015-01-23,17:00,06,1015
2015-01-23,17:30,05,1015
2015-01-23,18:00,05,1015
2015-01-23,18:30,05,1015

Plot data in Excel

Now, let’s actually use the data that we have extracted by plotting them on a graph in Excel, and try to deduce if there’s any correlation between temperature and pressure over time. To do this, open the script `metar-split.php`, copy all its outputs, paste them into a file on your computer and save the file in the CSV (comma separated values) format, e.g. `metar-parsed.csv`.

Then, open Excel and import this CSV file into your spreadsheet (File » Import » CSV File » Import).

When the data has been imported, select the data you want to plot and select Marked Scatter from the Charts tab. This should plot the temperature and pressure against time, as shown below.

Temperature and Pressure vs TIme

However, as you can see, there is not much that can be deduced form the graph, because the scale is too large to see the differences in temperature and pressure. To decrease the scale, one thing we might do is to decrease the values for pressure, since they are 3 orders of magnitude greater than the values for temperature. To do this, we can subtract the mean or atmospheric pressure of 1013hPa from each pressure value, which will decrease the pressure values until the point when their magnitudes will be similar to the temperature magnitudes. So, as shown below, we create another column where we subtract 1013hPa from the pressure in each row and plot that column instead.

Temperature and Pressure vs Time 2

Now we can see that the values of temperature can be easily distinguished, and the same applies for pressure. Therefore, this graph shows the variation with measurement number (on the x-axis) of temperature in °C (on the y-axis) as well as the pressure in hPa -1013hPa (also on the y-axis). In this case, we can see that the temperature (denoted blue) decreases over time while the pressure oscillates.

However, it might be much more interesting to see how the temperature and pressure changes in a longer term than just a few measurements, say one month. To do this, I have let the CRON on my Linux server to run the PHP script for the past around 1 month, quietly gathering METAR reports every hour. After importing all this data into Excel and plotting it on a scatter plot graph, I have produced the following graph. The x-axis shows the number of measurement and the y-axis shows the measured temperature (in °C) and pressure (in hPa -1atm = hPa -1013hPa).

METAR graph of 1 month data

So, although I didn’t find any correlation between pressure and temperature as I had hoped, the graph can at least be used to show that the mean temperature rises over the course of one month. This seems to correspond with reality, as these measurements were made from late January to early March, and during that time the temperature on the northern hemisphere increases.

It might be even more interesting to make a similar plot for values measured over a course of a whole year to determine how much the average temperature will rise during the summer and how much it will fall during the winter. Also, another interesting experiment might be to measure the data over the course of multiple years, which might show whether global warming is happening and by how many degrees the temperature increases each year.

So, I’ll leave my METAR downloading script running for some more time and see if I’ll discover any interesting patterns. Thanks for reading and I hope you enjoyed the post!