Bike Tour Business Intelligence: Part 6 [Building the Fact Table]

At this stage the BI database contains the Date, Time and Location Dimension tables as well as a staging table with all the points captured by my GPS. This post covers creating and populating the Fact table, as well as running some queries over the dataset.

To recap, the star schema design has three Dimension tables all related to the Fact table, as shown below:

Star schema GPX

Creating the Fact Table

The Fact table creation flow involves two steps – the creation and population of the Fact table itself, followed by the creation of a clustered index (on all key columns) and the foreign key relationships to the dimension tables.

part6_fact_design

The SQL SELECT .. INTO query makes it easy to create and populate the Fact table:

With some additional Execution SQL Tasks to prepare the database for the data load and to clear down staging tables, the final SSIS workflow is complete (click to view the full workflow):

part6_workflow_thumb

Click to see the workflow in detail

Running it against all my GPX files results in a Fact table of just over 350,000 rows.

Some sample queries

With all the data loaded it’s hard to resist writing some queries…. which I did, and then promptly found some outlying data points. Cycling in the Czech Republic was not a highlight and we took a shortcut through Prague by train. As I still had my GPS recording, it captured the journey leaving a couple of 50km+ segments which skew the data. I added a filter to the Fact table generation query so that only points under 10km apart are imported, ensuring only data generated by cycling is available for analysis.

What days did we cycle the farthest?

part6_longestdays

What’s the longest daily ride per country?

part6_longest_per_country

Did we ride further in the morning morning or the afternoon?

part6_AmPm

Which day of week did we have the highest moving average speed (assume moving is speed > 5kmh)?

Answer…. none in particular!

part6_longestday

Summary so far

The DW database now contains all the points recorded by my GPS – next up I will be creating an Analysis Services Cube and performing some analytics over the dataset.

This entry was posted in SQL Server, SSIS. Bookmark the permalink.

2 Responses to Bike Tour Business Intelligence: Part 6 [Building the Fact Table]

  1. Jorge says:

    Great post! thank you for sharing.
    Would it possible for you to share a copy (backup) of your database? it would be great if I could get a jump start. I started road biking about 2 years ago and I have collected gpx files since the beginning, and I thought it would be great to convert the data into information.
    Thank you!
    Jorge

    • Robin says:

      Hi Jorge,
      Sorry it took a while to get back to you. Here’s a link to the T-SQL scripts to create the database as well as the first 6 weeks of my imported GPX data. Hope you find it useful.

Leave a Reply

Your email address will not be published. Required fields are marked *