Longitudinal Data Analysis


Data Source: 

Marcelle Tremblay, Dafna Kohen, Georgia Roberts, Karla Nobrega, and Patricia Whitridge from Statistics Canada


Dr. Peggy Ng Atkinson Faculty of Liberal and Professional Studies Associate Professor in Management Science and Applied Statistics York University, Toronto


This case study introduces the students to the analysis of longitudinal data, using several different approaches (repeated measures, time varying covariates and growth curves to name a few) as well as introducing the student to the differences in model-based and design-based approaches to estimation. A sub-sample of the synthetic data file from the National Longitudinal Study on Children and Youth (NLSCY) is used to introduce the problems related to estimation and variance calculation in longitudinal surveys. Many issues relating to children’s behaviour and care are important concerns for Canadians and so the data set therefore was created with the objective that the student confronts a realistic situation in a longitudinal study and assesses the best way in which to analyze and present the data.

There are three sections to this case study. The student may choose to do any or all of the three components of this case study.

1. Analysis of outcomes over time

  • Design-based estimates :  Study the trend over time of repeated measures of the hours in daycare, anxiety and/or aggression. Take note of issues such as missing data (due to attrition as well as other mechanisms) and the calculation of variance estimates in a design-based framework, compare estimates using a model-based approach.

2. Analysis of outcomes over time with co-variates 

Study the relationship between hours in daycare, anxiety and/or aggression and child’s age, gender, family status, work status, province, urban or rural, the number of siblings, maternal depression and mother’s education at the time of the first survey over time.

  • Fixed effects models Several of the co-variates can be considered fixed effects and if the student chooses to model only the fixed effects, then the student can compare the different estimates of variance using the two approaches to illustrate the importance of the survey design in variance estimation. Additionally, the student may wish to look at the effects of attrition on the estimates.
    • Model-based estimates
    • Design-based estimates
  • Time-varying effects models and Random effects models:  Since the theory for variance calculation with both time varying and random effects is an area of on-going development in a design-based framework the students wishing to model time varying covariates will look at model-based estimates in their analysis. Additionally, the student may wish to look at the effects of attrition on the estimates.
    • Model-based estimates

3. Growth Curves 

Study the relationship between hours in daycare, anxiety and/or aggression and child’s age, gender, family status, work status, province, urban or rural, the number of siblings, maternal depression and mother’s education at the time of the first survey over time using growth curve models. As with other types of models time-varying and random effects are a new area of research in design-based estimation and so we will limit the growth curve analysis to model-based estimation. Additionally, the student may wish to look at the effects of attrition on the estimates.

  • Model-based estimates
    • Fixed effects
    • Time-varying effects
    • Random effects 

​​Anxiety and/or Aggression:

Antisocial and violent behaviors and their associations with violent crimes are on the increase and receive a large share of media attention. Although many studies have examined predictors of antisocial and aggressive behaviors, fewer studies have examined the development of antisocial and aggressive behaviors over time. Predictors of antisocial and aggressive behaviors have included child factors such as age and gender and family socio-economic conditions such as poverty. Disorders are more prevalent for boys and tend to increase with age. Poverty has been shown to be related to behavior problems by numerous mechanisms; via the communities children live in (Sampson), via punitive and harsh parenting behaviors, poor parental mental health and parental stress (Elder, Mcloyd). Children who live in poor families and poor neighborhoods are also more likely to be associated with deviant peer groups or gangs likely to be involved in delinquent activities.

Most cross sectional and longitudinal studies show that the risk of committing a violent offense is highest during mid adolescence and adult violence is linked to a history of youth violence (Farrington, 1994; Heusman, et al., 1984; Serbin et al., 1991). Programs designed to prevent youth violence and programs to rehabilitate youth with behavior problems are often targeted at adolescents. However, other studies of preschool children tend to show that as children get older they generally resort to less physically aggressive behaviors (Cairns et al., 1989; Choquet, 1996). That is, for the majority of children, aggression peaks during the preschool years (a period in which most toddlers exhibit aggressive behaviors) and declines over time. Children who are physically aggressive and who do not learn appropriate behaviors tend to be at risk for additional problems such as anxiety, antisocial behaviors, and peer rejection (Tremblay et al., 1992). Moreover, children’s behavioral and emotional well being are critically important to healthy development. Those with behavior problems are not only at increased risk of criminal and delinquent activities but are also at high risk of poor school performance, grade failure, involvement in criminal and delinquent activities, unemployment, and poor physical and mental health (add ref; Power et al., 1991). Since competing theories exist about the development of aggressive behaviors in children and youth we do not know what the pattern of aggressive behavior looks like as children age. Do aggressive behaviors increase or decrease, as children get older? And what factors (child, family, etc.) predict differences in these trajectories? Although the frequency of emotional problems is somewhat reduced compared to aggressive behavior problems, findings from cross sectional studies suggest that symptoms increase with age with the highest rate for boys 8-11 and the lowest for girls aged 4-7 (Offord & Lipman, 1996). However, we know less about the developmental trajectories of children who exhibit anxious behaviors primarily because problems of anxiety appear to be less frequent, tend to go unnoticed by teachers and are not as detrimental to society.

Number of hours in daycare:

One of the most dramatic changes in Canada over the past 30 years has been the increase of single parent households and the increase of dual earner families. These changes have important implications for how young families care for their children resulting in a dramatic increase in the number of families requiring childcare arrangements for their children. Research has shown the benefits of participating in high quality, developmentally appropriate child care on preschoolers’ as well as school aged children’s cognitive and behavioural development (Burchinal, Lee & Ramey, 1989; Kohen, Hertzman & Willms, 2002; McCartney, 1984; Andersson, 1989; Broberg et al., 1997; Rosenthal & Vandell, 1996). However, little research has been conducted examining long-term effects of childcare arrangements on child outcomes such as behaviour. Results from studies examining associations with behavior problems are mixed. Some studies find increases in children’s behaviour problems for those who participate in child care arrangements, while others do not. It has been difficult to determine if it is children with more behaviour problems who are more likely to be placed in childcare arrangements or children who participate in childcare arrangements that are more likely to have behaviour problems. Moreover, we expect that patterns of care would vary with child age. In general, the number of hours in care should decrease over time (as children age) since once formal education begins the need for childcare decreases. However, we may still expect differences in the trajectories of child care use. For example, some children will have no care when young but may participate in after school care arrangements once they enter school (e.g. in the case where a non-working mother returns to work when children are school age), others children may be high care users until adolescence, and others may consist of a group that never uses child care. Are these groups accurate descriptions of the trajectories of child care or do the trajectories differ? Are trajectories differentially associated with child outcomes such as behaviour problems? Predictors of childcare use include socio-demographic factors such as family status, number of sibling, maternal education and employment (Kohen, Hertzman & Willms, 2002). Do these factors predict different trajectories of child care use?

Analysis Instructions (MS Word document)

Research Question: 


For this case study a survey example will be used to introduce students to the many issues associated with the analysis of longitudinal data. For lack of alternatives, longitudinal data are often analyzed with cross sectional statistical methods, for instance, t tests, ANOVA and ordinary least squares regression.

Analytical Questions

Researchers with an interest in child development are often interested in changes that children experience as they grow as well as examining the factors that contribute to change. To date, many researchers have been limited by the availability of cross sectional data with few having access to longitudinal data. It has only been in recent years however that large longitudinal survey data sets focusing on child and youth development have been made available necessitating the use of sophisticated longitudinal methodology to analyze change over time. One area that has made use of longitudinal data has been the study of the development of children’s behavior problems over time.



This study will use a small sample from the of the synthetic data file from the National Longitudinal Study on Children and Youth (NLSCY) is used to introduce the problems related to estimation and variance calculation in longitudinal surveys. The context of the data analysis problem is the many issues relating to children’s behaviour and care that are presently important issues across Canada.

There are three data sets available in connection with the NLSCY SSC case study:

1. Primary file

2. Cycle 1 longitudinal weight bootstrap file

3. Cycle 4 funnel weight bootstrap file

For detailed descriptions of each of these files see data description (MS Word document).




Anderson, T.W. (1958), An Introduction to Multivariate Statistical Analysis, New York: John Wiley & Sons, Inc.

Andersson, B. E. 1989. Effects of day care on cognitive and socio-emotional competence of thriteen year old Swedish school children. Child Development, 63, 20-36

Andersson, B.E. 1992. Effects of public day-care: A longitudinal study. Child Development, 60(4), 857-66

Bijleveld, C. C. J. H., & van der Kamp, T. (1998). Longitudinal data analysis: Designs, models, and methods. Newbury Park: Sage.

Brown, C.H. & Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging development epidemiologic perspective. American Journal of Community Psychology, special issue on prevention science, 27, 673 709.

Brown, C.H., Indurkhya, A. & Kellam, S.K. (2000). Power calculations for data missing by design: applications to a follow up study of lead exposure and attention. Journal of the American Statistical Association, 95, 383 395.

Broberg, A., Wessels, H., Lamb, M.E, Hwang, C. 1997. Effects of day care on the development of cognitive abilities in 8 year olds: A longitudinal study. Developmental Psychology, 33(1), 62-69

Burchinal, M., Lee, M.W. & Ramey, C.T. 1989. Type of day care and preschool intellectual development in disadvantaged children. Child Development, 60, 128-37

Cairns R, Cairns B, Neckerman H, Ferguson L, Gariepy J. Growth and aggression: 1. Childhood to early adolescence. Developmental Psychology 1989;25:320-30.

Choquet M. La violence des jeunes: Donnees epidemiologiques. In Rey C, ed. Les adolescents face a la violence, pp 51-63. Paris: Syros, 1996

Collins, L.M. & Sayer, A. (Eds.) (2001). New Methods for the Analysis of Change. Washington, D.C.: APA.

Cochran, W.G. and Cox, G.M. (1957), Experimental Designs, Second edition. New York: John Wiley & Sons, Inc.

Curran, P.J., & Bollen, K.A. (2001). The best of both worlds: Combining autoregressive and latent curve models. In Collins, L. M. & Sayer, A. G. (Eds.), New Methods for the Analysis of Change (pp. 105 136). Washington, DC: American Psychological Association.

Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications. Mahwah, NJ: Lawrence Erlbaum Associates.

Elder, G.H., Conger, R.D., Foster, E.M. & Ardelt, M. 1992. Families under economic pressure. Journal of Family Issues, 13(1), 5-37

Elder, G.H., Nguyen, T.V. & Caspi, A. 1985. Linking family hardship to children’s lives. Child Development 56, 361-75

Farrington D. Childhood, adolescent, and adult features of violent males. In Huesmann L, ed. Aggressive behaviour: current perspectives, pp 215-40. New York: Plenum, 1994.

Ferrer, E. & McArdle, J.J. (2003). Alternative structural models for multivariate longitudinal data analysis. Structural Equation Modeling, 10, 493 524.

Goldstein, H. (1995). Multilevel statistical models. Second edition. London: Edward Arnold.

Huesmann L, Eron L, Lefkowitz M, Walder L. Stability of aggression over time and generations. Developmental Psychology 1984;20:1120-34.

Jennrich, R.I., & Schluchter, M.D. (1986). Unbalanced repeated measures models with structured covariance matrices. Biometrics, 42, 805 820.

Khoo, S.T. & Muthén, B. (2000). Longitudinal data on families: Growth modeling alternatives. In Multivariate Applications in Substance use Research, J. Rose, L. Chassin, C. Presson & J. Sherman (eds.), Hillsdale, N.J.: Erlbaum, pp. 43 78. (#79)

Kohen, D.E., Hertzman C., & Willms, D.W. 2002. The importance of quality child care. In J.D. Willms (Ed.) Vulnerable Children, University of Alberta Press, 2002.

Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38, 963 974.

Lange, N. and Ryan, L. (1989), Asymptotic Normality in Random Effects Models, Annals of Statisics, 17, 624-642

Liang, K and Zeger, S (1986) Longitudinal Data Analysis Using Generalized Linear Model, Biometrika, 73, 1, pp 13-22

Lindstrom, M.J., & Bates, D.M. (1988). Newton Raphson and EM algorithms for linear mixed effects models for repeated measures data. Journal of the American Statistical Association, 83, 1014 1022.

Littell, R., Milliken, G.A., Stroup, W.W., & Wolfinger, R.D. (1996). SAS system for mixed models. Cary NC: SAS Institute.

Lohr, S.L. (1999) Sampling: Design and Analysis, Pacific Grove; CA: Duxbury

MacCallum, R.C., Kim, C., Malarkey, W.B., Kiecolt-Glaser, J.K. (1997). Studying multivariate change using multilevel models and latent curve models. Mulitvariate Behavioral Research, 32(3), 215-253

McArdle, J.J. & Epstein, D. (1987). Latent growth curves within developmental structural equation models. Child Development, 58, 110 133.

McArdle, J.J. & Hamagami, F. (2001). Latent difference score structural models for linear dynamic analyses with incomplete longitudinal data. In Collins, L. M. & Sayer, A. G. (Eds.), New Methods for the Analysis of Change (pp. 137 175). Washington, DC: American Psychological Association.

McCartney, K. 1984. Effect of quality of day care environment on chidlren’s language development. Developmental Psychology, 20(2), 244-60

Miyazaki, Y. & Raudenbush, S.W. (2000). A test for linkage of multiple cohorts from an accelerated longitudinal design. Psychological Methods, 5, 44 63.

Meredith, W. & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107 122.

Moerbeek, M., Breukelen, G.J.P. & Berger, M.P.F. (2000). Design issues for experiments in multilevel populations. Journal of Educational and Behavioral Statistics, 25, 271 284.

Muthén, B. (1997). Latent variable modeling with longitudinal and multilevel data. In A. Raftery (ed), Sociological Methodology (pp. 453 480). Boston: Blackwell Publishers.

Muthén, B. & Curran, P. (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371 402.

Nagin, D.S. & Tremblay, R.E. (2001). Parental and early childhood predictors of persistent physical aggression in boys from Kindergarten to high school. Arch Gen Psychiatry, 58, 389-394

Nagin, D.S., Pagani, L, Tremblay, R.E. & Vitaro, F. (2003). Life course turning points : The effect of grade retention on physical aggression. Development and Psychopathology, 15, 343-361.

Nagin, D.S., & Tremblay, R.E. (1999). Trajectories of boys’ physical aggression, opporisiton, and hyperactivity on the path to physically violent and non-violent juvenile delinquency. Child Development, 70(5), 1181-1196

Power C, Manor O, Fox A. Health and class: the early years. London: Chapman and Hall, 1991.

Rao, C.R. (1958). Some statistical models for comparison of growth curves. Biometrics, 14, 1 17.

Ratkowsky, D. (1990), “Handbook of Nonlinear Regression Models,” Marcel Dekker: New York and Basel.

Raudenbush, S.W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501-25

Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods. Second edition. Newbury Park, CA: Sage Publications.

Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2, 173 185.

Raudenbush, S.W. & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199 213.

Rosenthal, R. & Vandell, D. (1996). Quality of care at school-aged child care programs: Regulatable features, observed experiences, child perspectives and parent perspectives. Child Development, 67, 2434-45.

Satorra, A. & Saris, W. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 51, 83 90.

Serbin L, Schwartzman A, Moskowitz D, Ledingham J. Aggressive, withdrawn and aggressive-withdrawn children in adolescence: into the next generation. In Pepler D, Rubin K, eds. The development and treatment of childhood aggression, pp 51-70. Hillsdale: Erlbaum, 1991

Singer, J.D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 23, 323 355.

Singer, J.D. & Willett, J.B. (2003). Applied longitudinal data analysis. Modeling change and event occurrence. New York, NY: Oxford University Press.

Snijders, T. & Bosker, R. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. Thousand Oakes, CA: Sage Publications.

Tremblay RE, Masse B, Perron D, Leblanc M. Disruptive behaviour, poor school achievement, delinquent behaviour, and delinquent personality: longitudinal analyses. Journal of Consulting & Clinical Psychology 1992;60:64-72.

Tucker, L.R. (1958). Determination of parameters of a functional relation by factor analysis. Psychometrika, 23, 19 23.