Blood Pressure

2003

Date Source: 

Dr. Raymond Lam, GlaxoSmithKline, Toronto, Ontario, Canada

Organizer: 

Dr. Peggy Ng Atkinson Faculty of Liberal and Professional Studies Associate Professor in Management Science and Applied Statistics York University, Toronto

Introduction

Genes contribute to the development and progression of disease and they also influence how individuals respond to medicines. At GlaxoSmithKline (GSK), we are conducting genetic and genomic research which will allow the medical community to accurately prescribe the right medicine for the right patient.
 

In genetics research studies often hundreds to thousands of genetic markers, together with many clinical measurements, are collected. Statistical tools are useful for separating ‘true’ genes from ‘false’ alarms.

 

Research Question: 

For this case study, a genetic data set is generated based on a complex genetic model we developed at GSK. There are 500 predictors (483 genetic markers and 17 clinical covariates). The goal is to identify the ‘true’ predictors among the 500 variables and, at the same time, control the false discovery rate. Therefore, the objectives are:

  1. Identify ‘true’ genes and clinical covariates
  2. Control False Discovery (number of true X’s versus number of false X’s identified)

Variables: 

The data file (ascii file, comma delimited data file) contains 500 observations (subjects) and 501 variables. Of the 500 subjects, 250 had low blood pressure and 250 had high blood pressure (i.e. hypertension). The 501 variables consist of one response variable (systolic blood pressure) and 500 predictors (17 clinical covariates and 483 genetic markers). These variables are described below.
 

Table 1: Attributes Used in This Study
 

Variable Description
Systolic Blood Pressure (SBP) Continuous response variable
Gender Binary Variable: 
M = Male, F = Female
Marital Status Binary variable: 
Y = Married, N = Not Married
Smoking Status Binary variable: 
Y = Smoker, N = Non-Smoker
Age Continuous variable (years)
Weight Continuous variable (lbs)
Height Continuous variable (inches)
Body Mass Index (BMI) Continuous variable: 
(Weight/Height2) x 703
Overweight Categorical variable:
1 = Normal, 2 = Overweight, 3 = Obese.
Race Categorical variable taking values 1, 2, 3, or 4.
Exercise level Categorical variable: 
1 = Low, 2 = Medium, 3 = High
Alcohol Use Categorical variable: 
1 = Low, 2 = Medium, 3 = High
Stress Level Categorical variable: 
1 = Low, 2 = Medium, 3 = High
Salt (NaCl) Intake Level Categorical variable: 
1 = Low, 2 = Medium, 3 = High
Childbearing Potential Categorical variable: 
1 = Male, 2 = Able Female, 3 = Unable Female
Income Level Categorical Variable: 
1 = Low, 2 = Medium, 3 = High
Education Level Categorical Variable: 
1 = Low, 2 = Medium, 3 = High
Treatment (for hypertension) Binary Variable: 
Y = Treated, N = Untreated
483 Genetic Markers 0_0, 0_1, 1_1
 

 

Data Files: 

References: 

  • Scottish Intercollegiate Guidelines Network (SIGN) (January 2001). Hypertension in Older People.
  • National Institutes of Health, ‘National Heart, Lung, and Blood Institute’ (nhlbi.nih.gov). 
    Lowering Blood Pressure
  • Hyman, D.J., and Valory, N.P. (2001). Characteristics of Patients with Uncontrolled Hypertension in the United States. 
    The New England Journal of Medicine, Volume 345, No. 7, p 479-486.