Background
The Government of Canada has committed to helping 95% of Canadian households and businesses access high-speed internet at minimum speeds of 50 Mbps download and 10 Mbps upload (hereinafter referred to as the “Commitment”) by 2026, and 100% by 2030. According to the CRTC, currently 45.6% of rural community households have access to the Commitment based on what’s available to them via an Internet Service Provider (e.g. Shaw, Telus, etc.) in their region, rather than what a rural household actually realizes at home in terms of internet speeds.
Data Description
For this case study, we would like to understand the state of internet connectivity in both rural and underserved Canadian communities using consumer-provided data. Using data directly from the consumer, we will be able to understand connectivity in these communities as measured in their own homes. Specifically, we are looking for:
- A statistical analysis of the current realized and forecasted internet speeds (upload and download) for rural and underserved communities in terms of progress towards the Commitment;
- A comparative analysis of rural and underserved communities in terms of progress towards the Commitment; and
- The identification of statistically reliable methods to assess and compare rural and underserved communities's realized internet access.
For this study in particular, the identification of reliable and reproducible statistical methods to understand connectivity of rural and underserved Canadian communities is critical. Our consumer-centric data set will likely differ significantly from the CRTC's current assessment of broadband availability.
In this dataset, internet performance metrics are aggregated by a tile (an area of the earth measuring approximately 610.8 meters by 610.8 meters), quarter of the year, and type of internet connection (fixed broadband or mobile).
|
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Geometry/geography joins were done using data from 2016 Census Boundary Files, in particular the Dissemination areas and Population centres shapefiles.
- Dissemination areas are the smallest area for which public population statistics are distributed by Statistics Canada.
- More information on the values that the SACTYPE, PCUID, PCTYPE, and PCCLASS columns can take are available here: Boundary Files, Reference Guide and Dictionary, Census of Population, 2016 (in particular tables 1.12 and 1.13)
- Data for the populations was taken from the 2016 census data table: Canada, Provinces and Territories, Census Divisions, Census Subdivisions and Dissemination Areas.
- Population centres (StatsCan definition) labels exist only where the shapefile geometry and the population centre geometry is the largest overlapping area with the Ookla tile when doing the geometry overlay action (using GeoPandas).
- Instead of example values, the values indicated are all enumerated values for this column.
Data for this case study challenge can be found HERE (CSV) or HERE (SHAPEFILE).
The data provided for this case study is a combination of datasets from Ookla and Statistics Canada. In order to create this subset of the data and annotate it with Canada specific boundaries and populations, the data was processed using GeoPandas to filter data to those Ookla tiles which intersect with Canada and its coastal waters (digital boundary files) and then calculating overlays of dissemination areas and population centres. Where overlaps with multiple canadian geometries existed, the tile was labelled with the area with which it shares the largest fractional area.
For comparing rural and municipal internet speeds, it may be important to consider the following:
- Whether the tile is labelled with a population centre or not;
- SACTYPE - which provides information on the level of municipal influence as defined by Statistics Canada; and/or,
- Whether a population centre is a small, medium or large (PCCLASS), or its type classification (PCTYPE). For example, it may be interesting to contrast results from small population centres in the rural areas against large population centres.
Data Sources
Ookla Data
Around the globe, millions of internet speed tests are taken on Ookla platforms everyday. Under the Ookla for Good initiative, massive datasets containing performance metrics for the internet speed tests taken on Ookla platforms in 2019 - 2021 are made available publicly under CC BY-NC-SA 4.0 license. Access the raw Speedtest by Ookla Global Fixed and Mobile Network Performance Maps Dataset here.
Statistics Canada Data
See footnotes below the above table.
Sample Ookla Data Exploration
To help you get started, here are a few tutorials published by Ookla in both R and Python programming languages. As well, here is a sample Jupyter notebook that Cybera has put together that explores data on connectivity in rural Alberta, Canada. The notebook is developed using the Python programming language, and contains a brief introduction to requesting data via Ookla’s API, libraries required to parse and process the data, and explores combining the Ookla dataset along with shapefiles of the selected region to provide context.
Byron Chu, Cybera
Barton Satchwill, Cybera
Si vous rencontrez des difficultés ou si vous avez des questions, n'hésitez pas à nous contacter : datascience@cybera.ca.