Case Studies for the 2003 Annual Meeting

Blood Pressure

Last modified 2008-06-18
Please check this page regularly for updates, corrections, and answers to frequently asked questions!

Table of Contents


Our appreciation goes out to Dr. Raymond Lam, GlaxoSmithKline, Toronto, Ontario, Canada for providing this case study.


Genes contribute to the development and progression of disease and they also influence how individuals respond to medicines. At GlaxoSmithKline (GSK), we are conducting genetic and genomic research which will allow the medical community to accurately prescribe the right medicine for the right patient.

In genetics research studies often hundreds to thousands of genetic markers, together with many clinical measurements, are collected. Statistical tools are useful for separating ‘true’ genes from ‘false’ alarms.

Data Description

The data file (ascii file, comma delimited data file) contains 500 observations (subjects) and 501 variables. Of the 500 subjects, 250 had low blood pressure and 250 had high blood pressure (i.e. hypertension). The 501 variables consist of one response variable (systolic blood pressure) and 500 predictors (17 clinical covariates and 483 genetic markers). These variables are described below.

Table 1: Attributes Used in This Study

Variable Description
Systolic Blood Pressure (SBP) Continuous response variable
Gender Binary Variable:
M = Male, F = Female
Marital Status Binary variable:
Y = Married, N = Not Married
Smoking Status Binary variable:
Y = Smoker, N = Non-Smoker
Age Continuous variable (years)
Weight Continuous variable (lbs)
Height Continuous variable (inches)
Body Mass Index (BMI) Continuous variable:
(Weight/Height2) x 703
Overweight Categorical variable:
1 = Normal, 2 = Overweight, 3 = Obese.
Race Categorical variable taking values 1, 2, 3, or 4.
Exercise level Categorical variable:
1 = Low, 2 = Medium, 3 = High
Alcohol Use Categorical variable:
1 = Low, 2 = Medium, 3 = High
Stress Level Categorical variable:
1 = Low, 2 = Medium, 3 = High
Salt (NaCl) Intake Level Categorical variable:
1 = Low, 2 = Medium, 3 = High
Childbearing Potential Categorical variable:
1 = Male, 2 = Able Female, 3 = Unable Female
Income Level Categorical Variable:
1 = Low, 2 = Medium, 3 = High
Education Level Categorical Variable:
1 = Low, 2 = Medium, 3 = High
Treatment (for hypertension) Binary Variable:
Y = Treated, N = Untreated
483 Genetic Markers 0_0, 0_1, 1_1


For this case study, a genetic data set is generated based on a complex genetic model we developed at GSK. There are 500 predictors (483 genetic markers and 17 clinical covariates). The goal is to identify the ‘true’ predictors among the 500 variables and, at the same time, control the false discovery rate. Therefore, the objectives are:

  1. Identify ‘true’ genes and clinical covariates
  2. Control False Discovery (number of true X’s versus number of false X’s identified)


  • Scottish Intercollegiate Guidelines Network (SIGN) (January 2001).
    Hypertension in Older People
  • National Institutes of Health, ‘National Heart, Lung, and Blood Institute’ (
    Lowering Blood Pressure
  • Hyman, D.J., and Valory, N.P. (2001). Characteristics of Patients with Uncontrolled Hypertension in the United States.
    The New England Journal of Medicine, Volume 345, No. 7, p 479-486.