Skip to main content
Data Source
Ookla Data, Statistics Canada Data
Organizer
Byron Chu, Barton Satchwill; Cybera


Background

The Government of Canada has committed to helping 95% of Canadian households and businesses access high-speed internet at minimum speeds of 50 Mbps download and 10 Mbps upload (hereinafter referred to as the “Commitment”) by 2026, and 100% by 2030. According to the CRTC, currently 45.6% of rural community households have access to the Commitment based on what’s available to them via an Internet Service Provider (e.g. Shaw, Telus, etc.) in their region, rather than what a rural household actually realizes at home in terms of internet speeds. 

 

Research Question

Data Description

For this case study, we would like to understand the state of internet connectivity in both rural and underserved Canadian communities using consumer-provided data. Using data directly from the consumer, we will be able to understand connectivity in these communities as measured in their own homes. Specifically, we are looking for:

  • A statistical analysis of the current realized and forecasted internet speeds (upload and download) for rural and underserved communities in terms of progress towards the Commitment;
  • A comparative analysis of rural and underserved communities in terms of progress towards the Commitment; and
  • The identification of statistically reliable methods to assess and compare rural and underserved communities's realized internet access.

For this study in particular, the identification of reliable and reproducible statistical methods to understand connectivity of rural and underserved Canadian communities is critical.  Our consumer-centric data set will likely differ significantly from the CRTC's current assessment of broadband availability.

 

Variables

In this dataset, internet performance metrics are aggregated by a tile (an area of the earth measuring approximately 610.8 meters by 610.8 meters), quarter of the year, and type of internet connection (fixed broadband or mobile).

 

Column

Type

Description

Example Values

quadkey

string

a unique number representing a tile

0212131231203101

avg_d_kbps

integer

average download speed of all speed tests taken from a tile (in kbps)

93283

avg_u_kbps

integer

average upload speed of all speed tests taken from a tile (in kbps)

10108

avg_lat_ms

integer

average latency of all speed tests taken from a tile (in milliseconds)

12

tests

integer

total number of speed tests conducted from a tile

45

devices

integer

total number of unique devices from which speed tests were taken in a tile

8

year

string

year

[2019,2020,2021]6

quarter

string

a quarter of the year

[Q1, Q2, Q3, Q4]6

conn_type

string

type of internet connection

[fixed, mobile]6

PRUID1

string

Province 2 digit id

52

PRNAME1

string

Province name

Nova Scotia / Nouvelle-Écosse

CDUID1

string

Census division id

1208

CDNAME1

string

Census division name

Division No. 17

DAUID1,2

string

Dissemination area id

48190283

SACTYPE1,3

string

Statistical area classification

[1,2,3,4,5,6,7,8]6

DA_POP4

float

Dissemination area population, integer or NaN (not a number)

350.0

PCUID1,5

string

Population centre id

0348

PCNAME1,5

string

Population centre name

Halifax

PCTYPE1,3,5

string

Population centre type

[1,2,4,6]6

PCCLASS1,3,5

string

Population centre class

[4,3,2]6

geometry

polygon

Polygon object containing the geometry of the tile in WGS84 format and WKT representation

POLYGON ((-114.18 51.04 , ... , -114.18 51.04))

 

  1. Geometry/geography joins were done using data from 2016 Census Boundary Files, in particular the Dissemination areas and Population centres shapefiles.
  2. Dissemination areas are the smallest area for which public population statistics are distributed by Statistics Canada.
  3. More information on the values that the SACTYPE, PCUID, PCTYPE, and PCCLASS columns can take are available here: Boundary Files, Reference Guide and Dictionary, Census of Population, 2016 (in particular tables 1.12 and 1.13) 
  4. Data for the populations was taken from the 2016 census data table: Canada, Provinces and Territories, Census Divisions, Census Subdivisions and Dissemination Areas.
  5. Population centres (StatsCan definition)  labels exist only where the shapefile geometry and the population centre geometry is the largest overlapping area with the Ookla tile when doing the geometry overlay action (using GeoPandas). 
  6. Instead of example values, the values indicated are all enumerated values for this column.

 

Data Access

Data for this case study challenge can be found HERE (CSV) or HERE (SHAPEFILE)

The data provided for this case study is a combination of datasets from Ookla and Statistics Canada. In order to create this subset of the data and annotate it with Canada specific boundaries and populations, the data was processed using GeoPandas to filter data to those Ookla tiles which intersect with Canada and its coastal waters (digital boundary files) and then calculating overlays of dissemination areas and population centres. Where overlaps with multiple canadian geometries existed, the tile was labelled with the area with which it shares the largest fractional area. 

For comparing rural and municipal internet speeds, it may be important to consider the following:

  • Whether the tile is labelled with a population centre or not;
  • SACTYPE - which provides information on the level of municipal influence as defined by Statistics Canada; and/or,
  • Whether a population centre is a small, medium or large (PCCLASS), or its type classification (PCTYPE). For example, it may be interesting to contrast results from small population centres in the rural areas against large population centres.

Data Sources

Ookla Data

Around the globe, millions of internet speed tests are taken on Ookla platforms everyday. Under the Ookla for Good initiative, massive datasets containing performance metrics for the internet speed tests taken on Ookla platforms in 2019 - 2021 are made available publicly under CC BY-NC-SA 4.0 license. Access the raw Speedtest by Ookla Global Fixed and Mobile Network Performance Maps Dataset here

Statistics Canada Data

See footnotes below the above table.

Sample Ookla Data Exploration 

To help you get started, here are a few tutorials published by Ookla in both R and Python programming languages. As well, here is a sample Jupyter notebook that Cybera has put together that explores data on connectivity in rural Alberta, Canada. The notebook is developed using the Python programming language, and contains a brief introduction to requesting data via Ookla’s API, libraries required to parse and process the data, and explores combining the Ookla dataset along with shapefiles of the selected region to provide context.

 

References

Byron Chu, Cybera
Barton Satchwill, Cybera

 

Si vous rencontrez des difficultés ou si vous avez des questions, n'hésitez pas à nous contacter : datascience@cybera.ca.