Bike Tour Business Intelligence: Part 6 [Building the Fact Table]

At this stage the BI database contains the Date, Time and Location Dimension tables as well as a staging table with all the points captured by my GPS. This post covers creating and populating the Fact table, as well as running some queries over the dataset.

To recap, the star schema design has three Dimension tables all related to the Fact table, as shown below:

Star schema GPX

Creating the Fact Table

The Fact table creation flow involves two steps – the creation and population of the Fact table itself, followed by the creation of a clustered index (on all key columns) and the foreign key relationships to the dimension tables.

part6_fact_design

The SQL SELECT .. INTO query makes it easy to create and populate the Fact table:

SELECT
	d1.DateSK AS StartDateSK,
    t1.TimeSK AS StartTimeSK,
    d2.DateSK AS EndDateSK,
    t2.TimeSK AS EndTimeSK,
    first_locationsk AS StartLocationSK,
    next_locationsk AS EndLocationSK,
    total_distance AS DistanceMetres, 
    duration AS DurationSecs, 
    speed_kmh AS SpeedKmh 
 INTO FactGPX 
 FROM GPX_Staging
INNER JOIN 
    DimDate d1 ON d1.Date = CONVERT(date,first_loggedtime)
INNER JOIN 
    DimDate d2 ON d2.Date = CONVERT(date,next_loggedtime)
INNER JOIN 
    DimTime t1 ON t1.Time = CONVERT(time, first_loggedtime)
INNER JOIN 
    DimTime t2 ON t2.Time = CONVERT(time, next_loggedtime)
ORDER BY 
    first_loggedtime ASC

With some additional Execution SQL Tasks to prepare the database for the data load and to clear down staging tables, the final SSIS workflow is complete (click to view the full workflow):

part6_workflow_thumb

Click to see the workflow in detail

Running it against all my GPX files results in a Fact table of just over 350,000 rows.

Some sample queries

With all the data loaded it’s hard to resist writing some queries…. which I did, and then promptly found some outlying data points. Cycling in the Czech Republic was not a highlight and we took a shortcut through Prague by train. As I still had my GPS recording, it captured the journey leaving a couple of 50km+ segments which skew the data. I added a filter to the Fact table generation query so that only points under 10km apart are imported, ensuring only data generated by cycling is available for analysis.

What days did we cycle the farthest?

SELECT 
	d.StandardDate AS [Date], 
	(SUM(f.DistanceMetres) / 1000) AS [TotalDistanceKm]
FROM FactGPX f
JOIN DimDate d ON d.DateSK=f.StartDateSK
GROUP BY d.StandardDate
ORDER BY (SUM(f.DistanceMetres) / 1000) DESC

part6_longestdays

What’s the longest daily ride per country?

;WITH CTE(Date, CountryName, TotalDistanceKm, Ranking) AS 
(
SELECT
	d.StandardDate AS [Date], 
	l.CountryName,
	(SUM(DistanceMetres) / 1000) AS [TotalDistanceKm],
	RANK() OVER (PARTITION BY CountryName 
		ORDER BY (SUM(DistanceMetres) / 1000) DESC) AS Ranking
FROM FactGPX f
JOIN DimDate d ON d.DateSK=f.StartDateSK
JOIN DimLocation l ON l.LocationSK=f.StartLocationSK
GROUP BY l.CountryName, StandardDate
)
SELECT Date, CountryName, TotalDistanceKm 
FROM CTE 
WHERE Ranking = 1 
ORDER BY TotalDistanceKm DESC

part6_longest_per_country

Did we ride further in the morning morning or the afternoon?

SELECT 
	t.AmPm AS [AmPm], 
	(SUM(f.DistanceMetres) / 1000) AS [TotalDistanceKm]
FROM FactGPX f
JOIN DimTime t ON t.TimeSK=f.StartTimeSK
GROUP BY t.AmPm
ORDER BY (SUM(f.DistanceMetres) / 1000) DESC

part6_AmPm

Which day of week did we have the highest moving average speed (assume moving is speed > 5kmh)?

SELECT 
	d.DayOfWeek,
	(SUM(f.DistanceMetres) / SUM(f.DurationSecs) * 3.8) AS [AverageSpeedKmh]
FROM FactGPX f
JOIN DimDate d ON d.DateSK=f.StartDateSK
WHERE f.SpeedKmh > 5
GROUP BY d.DayOfWeek
ORDER BY (SUM(f.DistanceMetres) / SUM(f.DurationSecs) * 3.8) DESC

Answer…. none in particular!

part6_longestday

Summary so far

The DW database now contains all the points recorded by my GPS – next up I will be creating an Analysis Services Cube and performing some analytics over the dataset.

This entry was posted in SQL Server, SSIS. Bookmark the permalink.

2 Responses to Bike Tour Business Intelligence: Part 6 [Building the Fact Table]

Leave a Reply

Your email address will not be published. Required fields are marked *