Date Source: 

GSK has provided this Case Study in cooperation with Edmee Franssen, BSc, MSc., Statistician Consultant


Dr. Peggy Ng, York University, or Nevin Chan, MA


Asthma is a chronic condition and one measure of how well a subject’s asthma is controlled is by daily scoring of various asthma symptoms over a given time period. Common asthma symptoms recorded are cough, wheeze and shortness of breath. Each symptom is usually rated from 0-3: no symptoms, mild, moderate and severe. Symptoms during the day can be different than night symptoms. Also, use of rescue medication is a sign of poor control. However, this symptom score data may be summarised in several different ways and then also analysed in several different ways. This project is designed, via use of the data from two asthma trials, to assess whether differing methods of summary and analysis are more or less sensitive to detecting treatment differences, and whether one method provides summary statistics that are easier to understand and interpret. We have data from two 12 week large, multi-centre asthma trials available for this analysis.

The project

The overall aim of the project is to explore the use of different models for the repeated measurements of ordered categorical multi-dimensional symptoms data. Specific approaches that could be explored (but not limited to) include:

  1. appropriate summary measures. For example individual or combined scores; raw scores; percentage of symptom-free days; percentage of days with only mild symptoms.
  2. logistic regression, with each symptom/combined symptoms as a binary variable.
  3. extension of logistic regression to a generalised estimating equation (GEE) to allow for the serial correlation between the repeated measures of symptoms.
  4. proportional odds or continuation ratio models for an ordered categorical response.

Note: it is not expected that a project would cover all the areas listed above. A detailed exploration of one or two of the areas could make a useful project. For any of the models, a discussion of the following practical issues would be beneficial: (a) does the model produce a meaningful estimate of treatment effect, (b) can the model be fit successfully using standard software (e.g. SAS), (c) are the assumptions of the model met by the data. The following covariates are also of interest: country of recruitment, age, gender.



Research Question: 

To explore the use of different models for categorical symptoms data recorded in a daily diary card by subjects with a chronic condition.

This project is suitable for students interested in exploring and researching methods not necessarily covered in detail on an MSc course, but with important practical applications.



The dataset is available as a SAS binary file containing diary card symptom score data from 2 studies (labelled A and B). Below is a brief description of the variables:

ITT = Is subject in Intent-to-treat population?
ANA = Is subject in Analysis population? This is more important as this is what was used for the primary symptom score analyses. The analysis population is the same as ITT except it excludes all subjects from one centre which collected the data in a non-satisfactory way.
TRTMNT = Treatment, either FP or Placebo.

AMWHEEZ = Daytime Wheeze Symptom score
AMCOUGH = Daytime Cough Symptom Score
AMBREATH = Daytime Shortness of Breath Symptom Score
PMWHEEZ = Night-time Wheeze Symptom score
PMCOUGH = Night-time Cough Symptom Score
PMBREATH = Night-time Shortness of Breath Symptom Score

For the above the scores refer to:
0 = no symptoms
1 = mild symptoms, not troublesome
2 = moderate symptoms
3 = severe, troublesome symptoms

AMUSE = Daytime Rescue Medication Use
PMUSE = Night-time Rescue Medication Use

The above 2 variables give the number of occasions on which rescue Ventolin was used to relieve asthma symptoms.

DASMTDT = Date of diary card observation.
DAY = Day number relative to when subject first received treatment (Day 0). Ie. Days –7 to –1 refer to last week of baseline.
FREQBASE = Description of subjects severity of asthma prior to the trial, either Chronic Persistent or Episodic (less severe).
WDW = Did subject withdraw during the study?
WDWDT = Date of withdrawal (if applicable)
WDWREAS = Reason for Withdrawal (if applicable)
AGE = Age in years.
CENTRE = Investigational site.


  • Ashby M et al. An annotated bibliography of methods for analysing correlated categorical data. Statistics in Medicine 11, 67-99 (1992).
  • Lindsey JK et al. Simple models for repeated ordinal responses with an application to a seasonal rhinitis clinical trial. Statistics in Medicine 16, 2873-2882 (1997).