CANSSI National Case Study Competition 2019
Predicting Delays in BC Ferries
Expand your statistical, collaboration, and problem-solving skills in this Canadian Statistical Sciences Institute National Case Study Competition (CANSSI NCSC). Students will apply their knowledge to solve a real-world problem using a dataset about BC Ferries. As organizations look for students who have real-world problem-solving skills, you’ll gain valuable experience that better prepares you for a successful career in statistics.
The CANSSI NCSC is a project for students enrolled in undergraduate or graduate programs at Canadian universities. Students will compete in a statistical prediction task. The data for this competition will be made available on September 3, and students will be able to submit their solutions online until October 3. Students may register for the CANSSI NCSC starting September 3. Carleton University, Concordia University, MacEwan University, Simon Fraser University, and the University of New Brunswick will host regional competitions. The best solutions of their participating students will be awarded cash prizes. Winners of the regional competitions will be invited to compete in a final national poster championship at Simon Fraser University in Burnaby, BC at the CANSSI headquarters on November 2.
This competition provides a unique opportunity to develop your problem-solving skills and allows you to build creative solutions to a real-world problem—skills that are highly desirable in all organizations. On top of being able to work collaboratively with your team, you’ll hone your presentation skills as you present your solutions to our judges. Not to mention winners get a guaranteed interview with Statistics Canada for a full-time or co-op position and a cash prize.
- Winners will get guaranteed job interviews with Statistics Canada for full-time and co-op positions.
- Cash prizes will be awarded to the winning teams.
September 3, 2019—Dataset and registration available online
October 2, 2019—Online predictions due
September 30 to October 4, 2019—Regional competition (Carleton University, Concordia University, MacEwan University, University of New Brunswick, and Simon Fraser University)
November 2, 2019—Final competition at Simon Fraser University
Students enrolled in undergraduate or graduate programs at a Canadian university or college may participate in this competition. People that are not enrolled in an undergraduate or graduate program may still participate, but they will not be eligible for the cash prizes or judging in the regional competition or national poster championship.
This national case study competition is about predicting ferry delays in BC Ferry sailings around Vancouver and Victoria harbours. The dataset consists of 61,880 sailings occurring between August 2016 and March 2018. The dataset is split into a training dataset including 80% of the sailings (49,504 sailings between August 2016 and November 2017) and a testing dataset including 20% of the sailings (12,376 sailings between November 2017 and March 2018). The task is to predict whether or not each sailing described in the testing dataset was delayed. A variety of covariates are provided for each sailing (date, time of departure, departure terminal, arrival terminal, the name of the vessel, and so on). These covariates are described more fully in the Data section below. In addition to these covariates, some weather data and traffic data is provided.
In the regional competitions and national poster championship, students will be judged based on the accuracy of their delay predictions (percent correct), and also a report in which they discuss their methods and results and additional insight about the data provided by their analysis.
The ferry dataset involves records about 61,880 sailings occurring between August 2016 and March 2018 for routes starting or ending at one of Horseshoe Bay, Swartz Bay, Tsawwassen, and Departure Bay. For each sailing the following information is provided:
- Name of vessel
- Scheduled departure time
- Departure harbour
- Arrival harbour
- Date (including day of week and day of year)
For the 49,504 sailings among training data, the actual duration of the sailing is provided and an indicator is provided describing whether or not the sailing was delayed. For the 12,376 sailings in the testing data, the actual duration of the sailing and the delay indicator are not provided and instead the delay indicator must be predicted.
A time series of temperature and humidity from Vancouver Harbour is also provided, along with a time series of temperature, humidity, pressure, wind speed, and wind direction from Victoria Harbour. A time series of ordinal traffic volume data from the Lions Gate Bridge is also provided (in which traffic is ranked on a scale between 1 and 5). This bridge links downtown Vancouver and North Vancouver, a major arterial route towards the Horseshoe Bay Ferry terminal.
Further detail about these data will be provided simultaneously with the data release on September 3. These data are in the public domain and may be redistributed or modified.
- You may work in teams of up to three people.
- You may use any libraries, software, programming languages, or methods in this contest.
- You may use any code you find on the internet provided that
- the code is available under an open license (e.g., anything from http://stackoverflow.com is fine);
- you note the outside sources that you’ve used as a comment in your code.
- You may use code written by other participants that aren’t on your team provided that
- you have their permission to use it;
- you note them as a source that you’ve used as a comment in your code.
- You may ask professors or supervisors or other people outside the contest for help and advice, but all your work must be done by your team.
- Instructors are free to use this competition as a class assignment.