Last year I spent the summer riding my bike across Europe, clocking up 6100km through 7 different countries. Riding through the countryside with the wind blowing through your hair is a great feeling and I had a fantastic time. Now I’m back and, being a data professional, I wanted to analyse the data logged by my Garmin Edge 800 GPS.
The goal of this project is to read the approximately 120 files generated by the GPS into a SQL Server database and use BI tools to see if I can discover anything interesting in the data. This will be a series of posts, starting with extracting the data from a single GPX file and into SQL Server.
Analysing the GPX files
The Garmin Edge creates GPX files (which are XML) following the GPX Exchange Format although with a couple of Garmin-only extensions. When the timer is started the device records a series of data fields every few seconds:
- Date and Time
- Location (lat and long)
(* From studying a GPX file these fields aren’t always captured)
The data comes from a <trkpt> tag in the GPX, itself a child of the <trkseg> tag.
<trkpt lat="46.272508939728141" lon="5.273465411737561">
Something apparent from the data is there’s no mention of speed or distance. We’re going to need to calculate that ourselves later.
Reading the XML in SSIS
The goal of this task is to get SSIS to read the XML from a single GPX file and output it to a SQL Server destination. This approach is straightforward to test and only a simple extension is required to iterate over a series of GPX files.
I couldn’t figure out how to output the trkpt elements using the built-in XML handling in SSIS (it kept sending 0 rows through the pipeline), so let’s use a Script Component and some C# instead.
Create a new ‘file’ connection manager pointing to the GPX file to be read. Give it a sensible name (I have chosen ‘gpx-file’) then create a new Data Flow and add a Script Component, being sure to choose the Data Source component type, and choose a name. Double click the Script Component to bring up the properties window, then on the ‘Connection Manager’ tab add a new connection manager named ‘conn’, linked to the connection manager created earlier. Add the output columns, one for each piece of data we need:
|lat||numeric [DT_NUMERIC] – precision 9, scale 6|
|lon||numeric [DT_NUMERIC] – precision 9, scale 6|
|ele||four-byte signed integer [DT_I4]|
|cad||four-byte signed integer [DT_I4]|
|atemp||four-byte signed integer [DT_I4]|
|loggedtime||database timestamp with precision [DT_DBTIMESTAMP2]|
Bring up the script editor window. The key method is CreateNewOutputRows() which is responsible for adding rows to the output buffer. In our case this involves reading the XML document and running an XPath query to retrieve all <trkpt> elements. For each element the XML is parsed and a saved to a new row in the output buffer.
public override void CreateNewOutputRows()
object txn = new object();
// conn will be the full filename of the GPX file to read
var conn = (string)Connections.conn.AcquireConnection(txn);
XmlDocument doc = new XmlDocument();
* Add a namespace manager as the Garmin GPX files refer to a
* number of different namespaces.
* Need to name the default (i.e. unqualified) namespace
* as otherwise the C# runtime will return no results
var nsmgr = new XmlNamespaceManager(doc.NameTable);
//Select all trackpoints and iterate over them
XmlNodeList nodes = doc.SelectNodes("//i:trkpt", nsmgr);
foreach (XmlNode node in nodes)
OutputBuffer.lat = Convert.ToDecimal(node.Attributes["lat"].Value);
OutputBuffer.lon = Convert.ToDecimal(node.Attributes["lon"].Value);
OutputBuffer.loggedtime = Convert.ToDateTime(node["time"].InnerText);
OutputBuffer.ele = Convert.ToInt32(Convert.ToDecimal(node["ele"].InnerText));
//if the temp/cadence element is available, get the value
var atemp = node.SelectSingleNode("i:extensions/gpxtpx:TrackPointExtension/gpxtpx:atemp", nsmgr);
if (atemp != null)
OutputBuffer.atemp = Convert.ToInt32(atemp.InnerText);
var cad = node.SelectSingleNode("i:extensions/gpxtpx:TrackPointExtension/gpxtpx:cad", nsmgr);
if (cad != null)
OutputBuffer.cad = Convert.ToInt32(cad.InnerText);
Saving to a SQL Server Destination
Next create a new connection to the SQL Server database where the processed GPX data will be stored, then in the data flow create a new SQL Server Destination. Link the destination to the Script Source then open the properties of the newly created destination. Create a table to store the data – edit the default table to something more memorable.
CREATE TABLE [GPXData] (
The data flow should look like this: Now it’s time to test. Run your SSIS package. It’ll probably fail, saying that you need to run the package as an administrator. Save everything, reopen SSDT as an admin then try again.