Can Google Flu Trends Predict the Frequency and Results of Tests for Influenza and Other Respiratory Illnesses?


Data Source: 

Google Flu Trends (GTF)


Dena Schanzer, Public Health Agency of Canada


For this case study, you will explore trends in tests for influenza and other respiratory illnesses in Canada. You will also assess the usefulness of Google search data for predicting these trends.

Data Sources

Study data are from the Public Health Agency of Canada’s FluWatch and also from Google Flu Trends (GTF).

FluWatch is Canada's national surveillance system that monitors the spread of influenza (i.e., the “flu”) and flu-like illnesses on an on-going basis. The Respiratory Virus Detection Surveillance System (RVDSS) collects data from selected laboratories across Canada on the number of tests performed and the number of positive tests for influenza and other respiratory viruses; weekly reports are prepared. Currently, laboratory results are reported for influenza, respiratory syncytial virus (RSV), parainfluenza virus, adenovirus, human metapneumovirus (hMPV), rhinovirus and coronavirus, along with some results by virus type or sub-type. Many of these viruses have been detected year-round, although there is some variation in the seasonal pattern. Rhinovirus, parainfluenza virus and adenovirus account for most of the viral detections over the summer, although influenza and RSV account for most of the annual detections. Influenza is also the most common respiratory virus detected among hospitalized adult patients. RSV is associated primarily with bronchiolitis and is the most common respiratory virus detected among pediatric inpatients. Typical of an epidemic, the timing of peak influenza activity varies from year-to-year, though the peak usually occurs in the winter months. The phases of an epidemic consist of a pre-epidemic period when the number of infections is still small and the circulating strain may die out without producing an epidemic, a period of exponential growth, followed by a period of peak activity and finally a period of epidemic decline.

Data for the 2009 H1N1 influenza pandemic are captured in the FluApos numbers. The first cases of the 2009 pandemic strain were confirmed in Canada in the last week of April 2009 and testing of patients with acute respiratory infections increased significantly during the spring wave. The pandemic (i.e., H1N1) 2009 virus, originally referred to as 'swine flu', originated in Mexico and the outbreak was first detected in Mexico City on March 18, 2009. A plot of the virological data for Canada shows distinct spring and fall waves with peaks corresponding to the week of June 7 and November 1, 2009. You may want to exclude the spring wave when trying to predict the timing of the seasonal peak. With the exception of the pandemic period, the influenza season runs from September of one year to August of the following year so that an epidemic is not divided across calendar years. Additional information about FluWatch can be found at the Government of Canada web site.

Google provides a variety of data through its public data portal. GFT was launched in 2008 to explore the potential use of Google keyword searches for monitoring the spread of infectious diseases, including influenza. Although Google no longer publishes current estimates for GFT on their public data portal, they do make these data available to research groups.

The FluWatch data cover the period from September 7, 2003 to August 23, 2015 and the GFT data cover the period from September 28, 2003 to August 9, 2015. The data include:

  • FluWatch weekly frequencies of virological lab tests and positive test results for Canada and selected provinces (Ontario, Quebec, British Columbia, Alberta). Note that only a small proportion of suspected cases are tested each week. This proportion will vary with provincial health policy, and is not necessarily reflective of the total provincial population.
  • GTF weekly data for all of Canada and each of the provinces (except Prince Edward Island).

Research Question: 

  • Are the seasonal trends in the GFT data more strongly associated with the number of influenza infections (i.e., number of positive tests) or the total number of tests? Does GFT lead or lag the number of positive influenza tests?
  • Are the seasonal trends in the GFT data associated with the number of tests and positive test results for other respiratory viruses?
  • How many weeks ahead can you predict the peak in the number of influenza positive tests? Does the data from GFT help to predict the peak?


FluWatch Data Dictionary


Variable Name Description
Date Date of testing, based on the first day of the week
FluApos Frequency of Influenza A positive tests
FluBpos Frequency of Influenza B positive tests
FluPos Frequency of Influenza A + B positive tests
FluTest Frequency of Influenza A and B tests
RSVtest Frequency of Respiratory Syncytial Virus (RSV) tests
RSVpos Frequency of Respiratory Syncytial Virus (RSV) positive tests
adenot Frequency of tests for adenovirus
adeno Frequency of positive tests for adenovirus
parat Frequency of tests for parainfluenza virus
para Frequency of positive tests for parainfluenza virus
Rhinot Frequency of tests for rhinovirus
Rhino Frequency of positive tests for rhinovirus
hMPVt Frequency of tests for Human Metapneumovirus (hMPV)
hMPV Frequency of positive tests for Human Metapneumovirus (hMPV)
Coronat Frequency of tests for coronavirus
Corona Frequency of positive tests for coronavirus



Data Files: