Bike Tour Business Intelligence: Part 1 [Reading GPX files with SSIS]

Last year I spent the summer riding my bike across Europe, clocking up 6100km through 7 different countries. Riding through the countryside with the wind blowing through your hair is a great feeling and I had a fantastic time. Now I’m back and, being a data professional, I wanted to analyse the data logged by my Garmin Edge 800 GPS.

The goal of this project is to read the approximately 120 files generated by the GPS into a SQL Server database and use BI tools to see if I can discover anything interesting in the data. This will be a series of posts, starting with extracting the data from a single GPX file and into SQL Server.

Analysing the GPX files

The Garmin Edge creates GPX files (which are XML) following the GPX Exchange Format although with a couple of Garmin-only extensions. When the timer is started the device records a series of data fields every few seconds:

  • Date and Time
  • Location (lat and long)
  • Elevation
  • Temperature*
  • Cadence*

(* From studying a GPX file these fields aren’t always captured)

The data comes from a <trkpt> tag in the GPX, itself a child of the <trkseg> tag.

Something apparent from the data is there’s no mention of speed or distance. We’re going to need to calculate that ourselves later.

Reading the XML in SSIS

The goal of this task is to get SSIS to read the XML from a single GPX file and output it to a SQL Server destination. This approach is straightforward to test and only a simple extension is required to iterate over a series of GPX files.

I couldn’t figure out how to output the trkpt elements using the built-in XML handling in SSIS (it kept sending 0 rows through the pipeline), so let’s use a Script Component and some C# instead.

Create a new ‘file’ connection manager pointing to the GPX file to be read. Give it a sensible name (I have chosen ‘gpx-file’) then create a new Data Flow and add a Script Component, being sure to choose the Data Source component type, and choose a name. Screen Shot 2015-02-13 at 12.25.49 Double click the Script Component to bring up the properties window, then on the ‘Connection Manager’ tab add a new connection manager named ‘conn’, linked to the connection manager created earlier. Add the output columns, one for each piece of data we need:

Column Name Type
lat numeric [DT_NUMERIC] – precision 9, scale 6
lon numeric [DT_NUMERIC] – precision 9, scale 6
ele four-byte signed integer [DT_I4]
cad four-byte signed integer [DT_I4]
atemp four-byte signed integer [DT_I4]
loggedtime database timestamp with precision [DT_DBTIMESTAMP2]

Bring up the script editor window. The key method is CreateNewOutputRows() which is responsible for adding rows to the output buffer. In our case this involves reading the XML document and running an XPath query to retrieve all <trkpt> elements. For each element the XML is parsed and a saved to a new row in the output buffer.

Saving to a SQL Server Destination

Next create a new connection to the SQL Server database where the processed GPX data will be stored, then in the data flow create a new SQL Server Destination. Link the destination to the Script Source then open the properties of the newly created destination. Create a table to store the data – edit the default table to something more memorable.

The data flow should look like this: workflow Now it’s time to test. Run your SSIS package. It’ll probably fail, saying that you need to run the package as an administrator. Save everything, reopen SSDT as an admin then try again.

All green ticks!

All green ticks!

And if everything’s worked you’ll see the data in the table in SQL Server as well. To download the source and sample GPX file head over to my github account or grab from here.

This entry was posted in SSIS and tagged , . Bookmark the permalink.

One Response to Bike Tour Business Intelligence: Part 1 [Reading GPX files with SSIS]

  1. Pingback: Bike Tour Business Intelligence: Part 2 [Geography] | Robin Watkins on SQL Server and more

Leave a Reply

Your email address will not be published. Required fields are marked *