Aller au contenu principal
Data Source
National Longitudinal Survey of Children and Youth (NLSCY)
Organizer
Dafna Kohen,Sander Post, Karla Nobrega, and Patricia Whitridge from Statistics Canada

Overview

The data for this study are taken from the synthetic file released for cycle three of the National Longitudinal Survey of Children and Youth (NLSCY). The data provided represent only a subset of the data available. The provided data represent children aged 4, 5 or 6, living in one of 24 major metropolitan areas. 1,016 records are provided. In addition to studying the relationship between child outcomes and determinants, you will learn about hierarchical methods and small area statistics.

Appendices

Introduction

Neighbourhood factors such as poverty and residential instability have been identified as being important in explaining neighbourhood problems such as delinquency and crime encountered in many poor urban neighbourhoods (Sampson, 1992; Sampson & Groves, 1989; Sampson & Morenoff, 1997). Neighbourhood conditions of poverty and instability impede the establishment of formal and informal institutions of neighbourhood organization which are believed to maintain and foster strong community relations as well as public order within a community. For example, neighbourhood safety and cohesion or a sense of trust and belonging are seen to strengthen the community and have positive effects on its members. Often these factors are spatially based so that poverty conditions co-occur in similar areas (Massey, 1990; 1996; Massey & Denton, 1993). The geographic or spatial associations may be due in part to housing policies, housing affordability, as well as to conditions of ethnic and economic segregation (Wilson, 1987). For example, public housing is often found in predominantly low socio-economic neighbourhoods leading to areas of isolated and concentrated poverty as well as other separate areas of concentrated affluence. These differences as well as the conditions of neighbourhoods children reside in may be important for child health and well-being. When discussing the associations of neighbourhood characteristics with child outcomes it is important to note that both risk and protective factors occur at multiple levels, individual, family, and neighbourhood and it is not just a single protective or risk factor but the accumulation of factors that result in negative or positive child and family outcomes.
 

The emerging literature on the effects of neighbourhood factors on children and youth has focused on structural characteristics of the neighbourhood such as income/socio-economic conditions and residential instability yet most of the literature is based on studies conducted in the United States. Most studies have focused on outcomes in early childhood or late adolescence (see Leventhal & Brooks-Gunn, 2002 for review). Some consistent findings have been reported. For example, neighbourhood effects for socio-economic factors are more common than effects of residential instability across all child outcomes, and neighbourhood effects are generally small (explaining 5-10% of the variability in outcomes). As would be expected, family level factors tend to be more strongly associated with individual child outcomes than neighbourhood level factors but neighbourhood effects are consistently reported even after controlling for family level factors, for outcomes of children, youth, and adolescents.

Data Description

National Longitudinal Survey of Children and Youth

The National Longitudinal Survey of Children and Youth (NLSCY) is a long-term survey designed to measure child development and well-being. The first cycle of the survey was conducted by Statistics Canada in 1994-1995 on behalf of Human Resources Development Canada. The requirement for the NLSCY design was to select a representative sample of children in Canada and to follow and monitor these children over time into adulthood. All of the information for the household collection was collected in a face-to-face or telephone interview using computer-assisted interviewing (CAI). Questions were asked to the respondent in the home or by telephone and directly entered into a computer by the interviewer.
 

Before the NLSCY was undertaken there were few statistical studies describing a broad range of characteristics of children in Canada. Measures of health, well-being and life opportunities are needed, however, if governments and researchers hope to learn more about the ongoing life conditions of Canadian children and youth, and their developmental experiences. Longitudinal data are central to discovering developmental changes occurring in children over time, and studying the impacts of the social environment of the child and various family-related factors.
 

The primary objective of the NLSCY is to develop a national database on the characteristics and life experiences of children and youth in Canada as they grow from infancy to adulthood. The more specific objectives of the NLSCY are:

  • To determine the prevalence of various biological, social and economic characteristics and risk factors of children and youth in Canada,
  • To monitor the impact of such risk factors, life events and protective factors on the development of these children,
  • To provide this information to policy and program officials for use in developing effective policies and strategies to help young people live healthy, active and rewarding lives.

Underlying these objectives is the need to:

  • Fill an existing information gap regarding the characteristics and experiences of children in Canada, particularly in their early years,
  • Focus on all aspects of the child in a holistic manner (i.e., the child, his/her family, school, and community),
  • Provide national, and as far as possible, provincial-level data,
  • Explore subject areas that are amenable to policy intervention and which affect a significant segment of the population.

Background: Survey Weights

Suppose we have a finite population P, of size N=100 individuals. We are interested in estimating a total, mean or other variable of interest from this population. In a simple random sample s of size n=20 (in a simple random sample each individual has the same probability to be selected in the sample) we observe y1, y2, …, y20. How can we estimate the population total Y of y1 to y100? Since the population size is N=100, each individual in the sample represents 5 individuals in the population and is assigned a sampling weight of 5. If wi is the sampling weight of individual I in the sample, in this example, wi=5 for I=1,…,20, the estimator of the total Υ is: 
 

alt text
 

In the previous example all the individuals had the same sampling weight. In surveys it is common to select a sample with unequal probability of selection and hence unequal weights. In the data set for this case study, each individual has an associated sampling weight, but they are not all equal, since the survey was not a simple random sample. Using these in the analysis will help the results reflect the survey population, not just the survey sample.

Background: Geo-Codes

Census Metropolitan Area (CMA)

A very large urban area, together with adjacent urban and rural areas that have a high degree of economic and social integration with that urban area. A CMA is comprised of one or more contiguous census subdivisions (CSD). CMA’s are defined by Statistics Canada.
 

A CMA is delineated around an urban area (called the urbanized core and having a population of at least 100,000, based on the previous census). Census subdivisions are included in the CMA on the basis of decennial place-of-work commuting data. Once an area becomes a CMA, it is retained in the program even if its population subsequently declines.
 

Census Metropolitan Area (CMA) codes are listed in the data documentation.

Background: Linking data files

If you choose to include the CMA level variables, you must first merge this data by CMA onto the NLSCY synthetic file. In addition to this file you are free to add on other macro level variables from other sources.


Macro Level Data: National Longitudinal Survey of Children and Youth (NLSCY)
Data file sheet for Download
: Excel


The data for this study were taken from published results and we would like to thank Nancy Ross of McGill University for allowing us to use the data in this case study. The file contains five variables, the combined province-CMA code, the median share of income, the Gini coeficient, the percentage of persons below the poverty line, and the median income for each of the CMAs.
 

Income inequality measures were calculated for households in 53 Canadian and 282 U.S. metropolitan areas with populations greater than 50,000 in 1991 (Canada). Income inequality measures for Canadian metropolitan areas were derived from a specially prepared micro data file of the 2B sample of the 1991 Census of Population. The 2B sample represents information gathered from 20% of Canadian households which includes detailed information regarding income sources and amounts. Income included income for all household members from wages and salaries, net self-employment income, government transfers and investment income. All of the measures were calculated with earned household income over 1,000 dollars.
 

Median Share:
 

A median share is a middle-sensitive measure of income inequality defined as the proportion of total or earned household income belonging to the less well-off 50 percent of households within a geographical area. In order to estimate the median share, the population has first to be ranked from low to high income, second, identify the income category containing the 50th percentile of the population, i.e., the median, and finally, calculate the proportion of total household income earned by the first half of the population.
 

The median income falls within the income category that contains the 50 th percentile of the population ranked from low to high. The median income value can be linearly extrapolated assuming that the distribution of income within the income category is linear.
 

Gini Coefficient:
 

The Gini coefficient is an overall measure quantifying the degree of income inequality of a particular income distribution and can be derived directly from the Lorenz curve. The Lorenz curve represents the cumulative distribution of households (horizontal axis) against the cumulative distribution of income (vertical axis) (Figure 1). In situations of perfect equality, the shares of population and income will be equal and a 45-degree line on the graph represents this perfect equality. For example, in a situation of perfect equality, 10% of population has 10% of income. In reality, the actual cumulative shares of income possessed by the cumulative shares of the population will fall below this line of perfect equality. It is this Lorenz Curve that allows the estimation of the Gini Coefficient, a global income inequality measure explained below.
 

The Gini coefficient is calculated as follows:
 

alt text
 

where A is the area surrounded with the line of perfect equality and the Lorenz curve and B is the area below Lorenz curve (see Figure 1). It is clear from the Figure that the Gini coefficient is a middle-sensitive inequality measure since the measure is more sensitive to the middle range of the income distribution. As is true for any measure of proportions, the Gini coefficient lies between 0 and 1, where a Gini coefficient close to 0 indicates a more equal income distribution while a coefficient close to 1 indicates a more unequal income distribution. Using this measure in isolation, however, can be somewhat misleading given that, for example, two Gini coefficients can be equivalent with totally different underlying Lorenz curves. 
 

Figure 1: Lorenz Curve for the State of Alabama, 1990
 

alt text
 

In order to facilitate the calculation of the Gini coefficient, the above equation can be re-expressed as Gini=1-2B, since the area under the line of perfect equality (see Figure 1) is A+B=1⁄2, therefore A=1⁄2 -B and the Gini=(1⁄2-B)/1⁄2 . As such, only the area B needs to be calculated to estimate the Gini coefficient.
 

Proportion of persons below the poverty threshold of half the median income
 

This measure is defined as the proportion of persons below half of the median inco me. Persons living under this threshold are considered living in poverty for the purposes of this study.
 

Coefficient of Variation
 

The coefficient of variation (CV) is a summary measure of income dispersion (illustrating also the degree of inequality in the income distribution) and is considered to be a “top-sensitive” income inequality measure. High incomes will result in a greater increase in the CV compared to low or average incomes. The CV is the standard deviation of income divided by the average income and can be written as:
 

alt text
 

where
 

alt text

is the overall average income, pi is the proportion of the population within income category i and yi is the average income within the income category i.
 

Urban, population 100,000 to 499,999
 

The proportion pi is identical to the rectangle width used for the Gini coefficient calculation. This measure gives more weight to larger deviations and expresses the standard deviation as a proportion to the average income. The larger the CV, the greater the income inequality and the skewness in the income distribution.
 

Median Income
 

The median income of the city is the median income where income included income for all household members from wages and salaries, net self-employment income, government transfers and investment income above 1,000.

 

Research Question

Pour la présente étude de cas, nous utiliserons un exemple tiré d’une enquête pour :
 

1. étudier la hiérarchie (Problème 1 : Modèles hiérarchiques linéaires) des données d’enquête – comprendre les micro et macro‑niveaux d’agrégation :

  • étudier la relation entre les résultats chez l’enfant (problèmes de santé chroniques chez l’enfant (nombre de problèmes dont souffre l’enfant; 3 catégories), les blessures chez l’enfant (variable binaire) ou les compétences cognitives (variable continue)) et les variables dépendantes observées à un micro et à un macro‑niveau à l’aide d’un modèle hiérarchique linéaire;
  • comparer le modèle hiérarchique aux modèles de régression traditionnels;

2. étudier les questions régionales au moyen de cet ensemble de données (Problème 2 : Statistiques régionales) – comprendre les problèmes :

  • choisir une méthode d’estimation des résultats pour les régions pour lesquelles les données au niveau individuel sont peu nombreuses;
  • comparer les résultats à ceux obtenus par des méthodes qui ne tiennent pas compte du problème que posent les petits domaines.

 

Variables

Les données utilisées pour la présente étude de cas proviennent du fichier synthétique diffusé pour le troisième cycle de l’Enquête longitudinale nationale sur les enfants et les jeunes (ELNEJ). Toutes les variables provenant directement du fichier de l’ELNEJ conservent leur nom original.
 

Les questions ont été posées à la personne connaissant le mieux (PCM) l’enfant. Dans la plupart des cas, il s’agissait de la mère.
 

Les données fournies, qui ne représentent qu’un sous‑ensemble des données disponibles, pour les enfants de 4, 5 et 6 ans résidant dans l’une de 24 grandes régions métropolitaines. En tout, 1 016 enregistrements sont fournis.
 

En général, nous avons réduit les descriptions des données présentées plus loin comparativement à la documentation fournie avec le fichier synthétique, afin d’éliminer les valeurs qui ne figurent pas dans le fichier préparé pour l’étude de cas.
 

Deux variables originales, CDMCD08 – Nombre de frères et sœurs et CSFHQ01 – Nombre d’années de résidence, ont été regroupées, car les observations sont très dispersées. Une variable supplémentaire a été créée pour l’étude de cas. Intitulée chronic, il s’agit d’un dénombrement des problèmes de santé chroniques dont souffre l’enfant.
 

Les pages qui suivent décrivent le cliché d’enregistrement du fichier plat. Dans l’en‑tête de chaque section figurent le nom de la variable, la position du premier octet de données et la longueur de la variable (en octets). Vient ensuite une brève description de l’élément de données, qui inclut souvent la question posée au répondant, ainsi qu’un ensemble de codes qui figurent dans le fichier de données et leur signification. 
 

Variable: PRCMA

Province RMR code
10001 St. Johns
12205 Halifax
13310 Saint John
24408 Chicoutimi
24421 Quebec
24442 Trois Rivers
24462 Montreal
35532 Oshawa
35535 Toronto
35537 Hamilton
35539 St. Catherines
35541 Kitchener
35555 London
35559 Windsor
35580 Sudbury
35595 Thunder Bay
36505 Ottawa-Hull
46602 Winnipeg
47705 Regina
47725 Saskatoon
48825 Calgary
48835 Edmonton
59933 Vancouver
59935 Victoria

Variable: MEDSHARE 
Part médiane 
Variable continue

Variable: GINI 
Coefficient de Gini 
Variable continue

Variable: POVPOP 
Proportion de personnes dont le revenu est inférieur à la moitié du revenu médian 
Variable continue

Variable: MEDINC 
Revenu médian 
Variable continue 

  1. Variable: CHILDID (Position: 1, Longueur: 6)
  2. Numéro d’identification de l’enfant. 
    Il s’agit d’un numéro d’identification à six chiffres qui n’a aucune signification intrinsèque. Il est utilisé uniquement pour identifier un enregistrement. 

  3. Variable: CMMCQ01 (Position: 7, Longueur: 1)
  4. Âge de l’enfant. 
    Code Signification
    4 4 ANS
    5 5 ANS
    6 6 ANS
  5. Variable: CMMCQ02 (Position: 8, Longueur: 1)
  6. Sexe de l’enfant. 
    Code Signification
    F FÉMININ
    M MASCULIN
  7. Variable: SHXSECWT (Position: 9, Longueur: 10)
  8. Poids de partage transversal appliqué à l’enfant (xxxxx.xxxx). 

  9. Variable: CDMCD08 (Position: 18, Longueur: 1)
  10. Nombre total de frères et sœurs (de l’enfant) vivant dans le ménage (incluant les frères et sœurs complets, les demi‑frères et sœurs, les frères et sœurs par alliance, les frères et sœurs adoptés et les frères et sœurs de familles d’accueil et excluant l’enfant lui même). Inclut les frères et sœurs de tout âge. 
    Code Signification
    00 Pas de frère ni de sœur
    01 un frère ou une soeur
    02 deux frères et(ou) sœurs ou plus
    96 SANS OBJET
    97 NE SAIT PAS
    98 REFUS
    99 NON DÉCLARÉ
  11. Variable: CGEHbD06 (Position: 19, Longueur: 3)
  12. Code de région métropolitaine de recensement (RMR). 
    Code Signification
    001 St. Johns
    205 Halifax
    310 Saint John
    408 Chicoutimi‑ Jonquière
    421 Québec
    442 Trois‑Rivières
    462 Montréal
    532 Oshawa
    535 Toronto
    537 Hamilton
    539 St. Catherines
    541 Kitchener
    555 London
    559 Windsor
    580 Sudbury
    595 Thunder Bay
    505 Ottawa-Hull
    602 Winnipeg
    705 Regina
    725 Saskatoon
    825 Calgary
    835 Edmonton
    933 Vancouver
    935 Victoria
  13. Variable: CGEHbD04 (Position: 22, Longueur: 1)
  14. Taille de la région de résidence de l’enfant, selon les dénombrements du Recensement de 1996. 
    Code Signification
    4 Urbaine, 100 000 à 499 999 habitants
    5 Urbaine, 500 000 habitants et plus
    * Nota: Il existe naturellement d’autres niveaux de population, pour des plus petites villes et régions rurales. Cependant, comme nous ne fournissons les données que pour les grands centres urbains, aucun enregistrement des fichiers ne contient des données pour une ville dont la population est inférieure à 100 000 habitants.
  15. Variable: CHLCQ37 (Position: 23, Longueur: 1)
  16. Au cours des 12 derniers mois, a‑t‑il(elle) subi une blessure? 
    Code Signification
    1 OUI
    2 NON
    7 NE SAIT PAS
    9 NON DÉCLARÉ
    *Nota: Une question subséquente portait sur la nature de la blessure, ou de la blessure la plus grave en cas de blessures multiples. Les réponses rentraient dans les catégories suivantes : 
    • FRACTURE
    • BRÛLURE
    • DISLOCATION
    • ENTORSE OU FOULURE
    • COUPURE, ÉRAFLURE OU MEURTRISSURE
    • COMMOTION
    • EMPOISONNEMENT PAR ABSORPTION D’UNE SUBSTANCE OU D’UN LIQUIDE
    • LÉSION INTERNE
    • BLESSURE DENTAIRE
    • AUTRE
    • BLESSURES MULTIPLES
  17. Variable: CINHD08 (Position: 24, Longueur: 6)
  18. Statut socio‑économique - transversale 
    Cette variable est dérivée de cinq autres : niveau de scolarité de la PCM, niveau de scolarité du(de la) conjoint(e), prestige de la profession de la PCM, prestige de la profession du(de la ) conjoint(e) et revenu du ménage. Une explication complète de l’établissement de cette variable figure à l’annexe A. En général, le statut socio‑économique est d’autant plus élevé que la valeur de cette variable est élevée. 
    Code Signification
    -4.000 : 02.000 -4.000 : 02.000
    99.996 SANS OBJET
    99.997 NE SAIT PAS
    99.998 REFUS
    99.999 NON DÉCLARÉ
  19. Variable: CPPCS01 (Position: 30, Longueur: 3)
  20. Score normalisé de l’EVIP. Cette variable donne le score de l’enfant sur l’Échelle de vocabulaire en images Peabody qui est décrite plus en détails à l’annexe B. Le score est normalisé par tranche d’âge de deux mois, de sorte qu’un score de 100 pour un enfant de cinq ans soit équivalent à un score de 100 pour un enfant de 6 ans. 
     
    Code Signification
    040:160 040:160
    996 SANS OBJET
    999 NON DÉCLARÉ
  21. Variable: CSFHQ01 (Position: 33, Longueur: 2)
  22. Les prochaines questions portent sur le quartier où vous habitez. Depuis combien d’années demeurez‑vous à cette adresse? (INDIQUER 0 SI MOINS D’UNE ANNÉE.) 
    Code Signification
    00 : 12 De 0 à 12 années
    13 13 années et plus
    96 SANS OBJET
    97 NE SAIT PAS
    98 REFUS
    99 NON DÉCLARÉ
  23. Variable: CSFHS6 (Position: 35, Longueur: 2)
  24. Score du quartier. Il s’agit d’une variable dérivée mesurant la cohésion du quartier établie d’après les réponses pondérées aux questions suivantes : CSFHQ06A, CSFHQ06B, CSFHQ06C, CSFHQ06D et CSFHQ06E. Les valeurs ont été inversées pour créer l’échelle. Aucune donnée na été imputée pour calculer ce score qui varie de 0 à 15, une valeur élevée indiquant un grand degré de cohésion du quartier. Les questions sur lesquelles il est fondé sont décrites à l’annexe C. 
    Code Signification
    01 00
    01 01
    02 02
    03 DON’T KNOW
    03 REFUSAL
    04 04
    05 05
    06 06
    07 07
    08 08
    09 09
    10 10
    11 11
    12 12
    13 13
    14 14
    15 15
    96 SANS OBJET
    97 NE SAIT PAS
    98 REFUS
    989 NON DÉCLARÉ
  25. Variable: Chronic (Position: 37, Longueur: 1)
  26. Cette variable, qui a été dérivée pour l’étude de cas, précise le nombre de problèmes de santé chroniques faisant partie de l’ensemble de problèmes mentionnés plus bas dont souffre l’enfant. Seuls les enfants pour lesquels une réponse Oui figurait dans l’enregistrement ont été considérés comme ayant le problème de santé en question — aucune imputation n’a été faite pour remplacer les données manquantes: 
    • Asthme
    • Allergies
    • Bronchite
    • Maladie cardiaque
    • Épilepsie
    • Infirmité motrice cérébrale
    • Problèmes rénaux/maladie rénale
    • Handicap mental
    • Difficultés d’apprentissage
    • Troubles émotionnels/psychologiques/nerveux
    Code Signification
    0 Aucun des problèmes de santé chroniques susmentionnés.
    1 1 des problèmes de santé chroniques susmentionnés.
    2 Au moins 2 des problèmes de santé chroniques susmentionnés.

Données individuelles provenant du fichier synthétique de l’ELNEJ

Text Excel SAS

 

References
  • Kreft, Ita and De Leeuw, Jan. Introducing Multilevel Modelling, London, Sage publications Ltd. 1998
  • Goldstien H. Multilevel Statistical Models, New York, John-Wiley, 1995
  • Rao, J.N.K. Small Area Estimation, New York, John-Wiley, 2003
  • Neter, Kutner, Nachtshen, Wasserman. Applied Linear Statistical Models, McGraw-Hill Inc, 1996
  • Dunn, L. M. & Dunn, L. M. (1981). Peabody Picture Vocabulary Test-Revised. Circle Pines, MN: American Guidance Service.
  • Leventhal, T. & Brooks-Gunn, J. (2000). The neighbourhoods they live in: The effects of neighbourhood residence upon child and adolescent outcomes. Psychological Bulletin, 126, 309-337.
  • Massey, D. S. (1996). The age of extremes: Concentrated affluence and poverty in the twenty-first century. Demography, 33, 395-412.
  • Massey, D. S. & Denton, N. A. (1993). American apartheid: Segregation and the making of the underclass. Cambridge, MA: Harvard University Press.
  • Naglieri, J. & Pfieffer, S. (1983). Stability, concurrent and predictive validity of the PPVT-R. Journal of Clinical Psychology, 39, 965-967.
  • Sampson, R. & Groves, W. (1989). Community structure and crime: Testing social disorganization theory. American Journal of Sociology, 94, 774-802.
  • Sampson, R. J. (1992). Family management and child development: Insights from social disorganization theory. In J.McCord (Ed.), Facts, Frameworks, and Forecasts (pp. 63-93). New Brunswick, U.S.A.: Transaction.
  • Sampson, R. J. & Morenoff, J. (1997). Ecological perspectives on the neighborhood context of urban poverty: Past and present. In J.Brooks-Gunn, G. J. Duncan, & J. L. Aber (Eds.), Neighborhood Poverty: Policy Implications in Studying Neighborhoods (pp. 1-22). New York: Russell Sage Foundation Press.
  • Wilson, W. J. (1987). The truly disadvantaged. Chicago, IL: University of Chicago Press.