Bike Tour Business Intelligence: Part 2 [Geography]

My previous post covered using SSIS to read XML from a GPX file saving the data to a new table in a SQL Server Destination. This post will cover extending the SSIS workflow to create spatial data; the next on investigations into reverse-geocoding using SQL Server.

SQL Spatial

Spatial data has been part of SQL Server since 2008 with the introduction of two new data types: geometry and geography. These data types can be used to store points and shapes and SQL Server offers a number of built-in functions to reason with them, such as finding out the distance between points or if two shapes intersect. Much easier than implementing the code yourself!

Whether to use geometry or geography depends on what the data is being used for. There’s a good explanation of the difference on the SQL From the Trenches blog – essentially it’s flat earth vs round earth co-ordinate systems. As our data set relates to points on a map then the geography data type’s the right one.

Extending Data Flow to add geography data type

The XML data being imported already contains a latitude and longitude and we can use this to create an extra column in the SSIS data flow – a point of type geography.

This needs a new component to be added to the workflow. After the changes, the finished data flow should look like this:

Screen Shot 2015-02-14 at 12.19.13

(the eagle-eyed will notice that the destination type has changed since the previous post. This’ll be explained in due course)

Drag a new Script Component to the data flow and choose “Transformation” type.

Screen Shot 2015-02-14 at 11.35.09

Connect the new component to the “Read GPX XML” component then bring up the properties window. Under “Input Columns” select only the lat/lon columns then under the “Inputs and Outputs” page create a new output column named ‘location’. SSIS doesn’t natively support SQL Server’s geography data type so instead set the data type to ‘image [DT_IMAGE]‘.

Screen Shot 2015-02-14 at 12.18.40

Bring up the script editor and first add a reference to the CLR assembly which contains the definition for the geography type: Microsoft.SqlServer.Types. For some reason this isn’t in the GAC so you’ll need to browse to the DLL – on my SQL 2014 server it’s C:\Program Files\Microsoft SQL Server\120\Shared\Microsoft.SqlServer.Types.dll.

Next we need to write the code that adds the geography column to each row. Initialising the MemoryStream and BinaryWriter instances during the PreExecute() method is more efficient compared with unnecessarily creating a new instance for each input row and will help to make the workflow run as quickly as possible.

The Input0_ProcessInputRow() method is called once per row and adds a new instance of SqlGeography to the OutputBuffer.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Types;
using System.IO;
using System.Data.SqlTypes;

public class ScriptMain : UserComponent

    private MemoryStream ms;
    private BinaryWriter bw;

    /// This method is called once, before rows begin to be processed in the data flow.
    public override void PreExecute()
        // Only allocate these once
        ms = new MemoryStream(10000); 
        bw = new BinaryWriter(ms);

  /// Called once for every row in the data set. Create SQlGeography instance and add to output column.
    public override void Input0_ProcessInputRow(Input0Buffer Row)
        // Create a SqlGeography object representing the given data
        SqlGeography g = SqlGeography.Point((double), (double)Row.lon, 4326);

        // Serialize to a memory stream

        // Copy data from memory stream to output column with DT_IMAGE format
        Row.location.AddBlobData(ms.GetBuffer(), (int)ms.Length);

Next delete the original SQL Server Destination and create an OLE DB Destination instead. The reason for this is using the existing SQL Server destination with a geography data type throws an error during execution:

An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80040E07  Description: "Operand type clash: image is incompatible with geometry".

Connect the “Add Geography Column” component to the new OLE DB Destination then bring up the properties window. Re-use the existing connection manager. As a new column’s been added, drop the destination table using SSMS then bring up the ‘New…’ window which shows the SQL code used to create a new version of the table. Give it a new name and change the data type of the location column from varbinary(max) to geography :

    [lat] numeric(9,6),
    [lon] numeric(9,6),
    [atemp] int,
    [cad] int,
    [ele] int,
    [loggedtime] datetime2(0),
    [location] geography

Map the columns in the ‘mappings’ page then save everything and run the workflow. If there are no errors then the extra column will appear when the table is queried in SQL Server.

Tip: Add an ‘Execute SQL task’ to the Control Flow, before the Data Flow Task, to truncate the GPXData table each time the workflow is run.

Viewing the data inside SSMS

Inside the results view in SSMS the location column looks like any other binary data, but when a resultset has a column containing spatial data SSMS adds a new ‘spatial results’ tab which renders all the points/shapes onto a map-like surface for visualisation.

The short track visualised inside SSMS

The short track visualised inside SSMS

In the next post we’ll be loading country geodata into SQL Server and writing queries to reverse-geocode our GPX files – in other words providing actual location data (such as country) for each of the points in our GPX file.

This entry was posted in Spatial, SQL Server, SSIS and tagged , , , . Bookmark the permalink.

One Response to Bike Tour Business Intelligence: Part 2 [Geography]

  1. Pingback: Bike tour Business Intelligence: Part 3 [Reverse Geocoding] | Robin Watkins on SQL Server and more

Leave a Reply

Your email address will not be published. Required fields are marked *