Mediating Role of Education and Lifestyles in the Relationship between Early-Life Conditions and Health: Evidence from the 1958 British Cohort

The paper focuses on the long-term effects of early-life conditions with comparison to lifestyles and current socioeconomic factors on health status in a cohort of British people born in 1958. Using the longitudinal follow-up data at age 23, 33, 42 and 46, we build a dynamic model to investigate the influence of each determinant on health and the mediating role of education and lifestyles in the relationship between early-life conditions and later health. Direct and indirect effects of early-life conditions on adult health are explored using auxiliary linear regressions of education and lifestyles and panel Probit specifications of self-assessed health with random effects addressing individual unexplained heterogeneity. Our study shows that early-life conditions are important parameters for adult health, their contribution to health disparities increases from 17.8% to 23% when mediating effects are identified. They also shape other health determinants: the contribution of lifestyles reduces from 28% down to 22% when indirect effects of early-life conditions are distinguished. Noticeably, the absence of father at the time of birth and experience of financial hardships represent the lead factors for direct effects on health. The absence of obesity at 16 influences health both directly and indirectly working through lifestyles.


Introduction
Numerous literature references have agreed the important role played by current individual social characteristics, such as income, education level, wealth, and social status e.g. smoking (Kenkel 1991, de Walque 2007, Etilé and Jones 2010. It is therefore essential to understand the interrelationships between those various determinants of health in order to evaluate their respective contribution to the magnitude of health inequalities. The objective of this study is to explore the long term effects of social and health-related earlylife conditions, education, and lifestyles on health and to understand the interdependence between those three health determinants. Relying on a dynamic model of health status over the life-cycle, our empirical analysis aims to investigate the effect of each determinant in overall health inequality and determines whether early-life conditions influence health directly or indirectly, that is via affecting education and lifestyles.
Our findings provide new elements on the determinants of health inequalities which are relevant for policy makers and that remained to be empirically assessed. Firstly, the role of early-life conditions is explored in direct and indirect terms with a larger set of indicators than previous analyses, including parental social and health conditions in addition to the individual's initial health status. Secondly, this research analyses the evolution of unhealthy lifestyles, their changes over an extended period of time and their association with health status. Finally, the longitudinal dimension of those data allows using dynamic panel analysis in order to control for unexplained individual heterogeneity and explain impact of past health status.
The structure of the paper is as follows. Section 2 describes the model that is empirically tested. Section 3 describes the National Child Development Study (NCDS) data and the variables of interest. Section 4 presents the empirical results and section 5 concludes.

General health production function
In contrast with Jusot et al. , who focused on a reduced-form model of childhood circumstances and lifestyles, we use a full model specification including individual qualification. Our approach also differs from Contoyannis and Jones (2004), Häkkinen et al. (2006), and Balia and Jones (2008), as our health production function includes early-life conditions as a potential determinant for health in addition to education and lifestyles. Furthermore, we built a dynamic model of health using longitudinal data.
The individual health status can be written using the following health production function: Moreover, initial health such as birth weight and health problems during childhood and adolescence also significantly influence health in adulthood and the most adverse health risks in adulthood tend to be experienced by people having experienced poor health in childhood and adolescence (Moser et al. 2003, Case et al. 2005). The vector represents individual's education level measured by the highest qualification achieved at age 46 and is not a time-variant variable. Researchers in many countries have found a relevant and persistent association between education and health as measured by various health measures (Grossman 2006 Nevertheless, we need to account the potential endogeneity bias related to the respective correlations between early-life conditions, lifestyles and education with past health, which can be ruled out using a dynamic specification and introducing past health into the health production function. The introduction of past health status in our empirical model allows us to capture the state dependence in health reports and strongly reduces the impact of individual heterogeneity. In a dynamic context, initial health is likely to be correlated with unobserved heterogeneity affecting and if is considered exogenous this will lead to inconsistent estimators. We follow the alternative approach suggested by Wooldridge 2 (2002), which requires to specify the distribution of given and other exogenous variables and so, include at least the first value of the independent variable, . 1 Lifestyles could therefore be regarded both as a measure of lifestyles shocks on health via the past lifestyle variables and as a measure of long-term or "permanent" lifestyles on health via the average lifestyle. Nevertheless, from our point of view the follow-up of lifestyles being limited to four points of time and to the use of binary lifestyle variables does not justify to interpret the effects of lifestyles on health in terms of permanent and transitory effects. 2 Two other methods to address initial conditions problems could have been considered: Heckman (1981) and Orme (2001). The former suggests approximating the reduced form and then specifying ; is then given by integrating out (where includes all the regressors). The two main difficulties with this method are specifying the distribution of initial health, and computing time. As for Orme (2001), he suggested a two-step bias corrected procedure that is locally valid when the correlation between and is approximated to zero. A couple of recent works compared the relative performance of the three methods. Whereas Miranda (2007) concluded that the Heckman method delivers estimators that are hardly subject to bias and that are estimated with high precision; Arulampalam and Stewart (2009)  early-life conditions, education, and lifestyles. Therefore, we complement this primary specification with a mediating specification that aims to describe whether early-life conditions influence health directly or indirectly, that is via affecting education and lifestyles.

Mediating effects identification
The mediating specification aims to identify whether explanatory variables influence health directly or indirectly, that is by affecting or being affected by another explanatory variable. Let us firstly consider a more general case where individual health status is defined according to a set of variables , such as her early-life conditions, and a set of variables , such as her education or lifestyles.
(Eq. 4) We consider that potentially mediates the relationship between and . For example, the early-life condition mother's qualification may affect adult health through an effect on individual's qualification attainment (see Figure 1) as exhibited in the pathway model that has been well-studied However, as we are in a full model specification we cannot ignore the role played by on : (Eq.5b) all cases. Moreover, the authors found that it is advantageous to allow for correlated random effects using the approach of Mundlak (1978 Let us now consider our present study, using Figure 1, the set of variables could represent early-life conditions and the set of variables could be both education and lifestyles. In addition, the dashed arrow in Figure 1 suggests that the set of variables could represent both early-life conditions and education, and the set of variables be lifestyles only, hence there are potentially two layers of mediating effects to distinguish: mediating effects between early-life conditions and health working through education and lifestyles (mediation 1), and mediating effects between education and health working through lifestyles (mediation 2). The two different mediating specifications will be tested and compared. In concrete terms, Eq.5b corresponds to the general health production function (Eq. 3, model 1c) whereas the mediating specification is a two-step estimation based on auxiliary equations and then the estimation of the health production function described in (Eq. 5c). The sets of auxiliary equations being estimated can be written as follows: If we replace those auxiliary equations into equation Eq.3, the health production function in the mediating specification becomes: Using Eq. 5d, we can express the respective direct, mediating (indirect) and total effects of early-life conditions and education on health as follows:

Direct effects of early-life conditions on health Total mediating effects of early-life conditions on health Mediating effects via education Mediating effects via lifestyles
Mediating effects of education on health via lifestyles

Health determinants decomposition
The second part of our empirical analysis inquires to which extent the account of mediating effects influences the contribution of each determinant to health disparities. Shorrocks ( 1982) showed that if we are interested in an absolute measure of inequality, i.e. a measure invariant to one translation, the variance is a good index and its natural decomposition presents the desired properties.
The alternative specifications of the health production function we considered in sections 2.1 and 2.2 are based on strictly identical regressors and so, they have the same variance. The variance of both models is estimated using bootstrap method to assess variability of estimated coefficients from the panel random effect Probit (using 300 replications).
In Assuming that the variance of the error term and follows a normal distribution in this random effect Probit model, we can write: (Eq. 8) Given the longitudinal data, the variance of the latent health variable can either be decomposed directly using the explained variance of the latent health variable, as measured by the pseudo-R 2 based on all the waves. The share of inequality associated to the variable at time can thus be written as: (Eq. 9)

The National Child Development Study
The National Child Development Study (NCDS) is a continuing, multi-disciplinary longitudinal study which focuses on all the people born in one week in March 1958 in England, Scotland and Wales. Information was gathered from almost 17,500 babies. Following the initial birth survey in 1958, there have been seven attempts to trace all members of the birth cohort in order to monitor their physical, educational, social and economic development. These were carried out in 1965, 1969, 1974, 1981, 1991, 1999/2000 and 2004. For the birth survey, information was obtained from the mother and from medical records by the midwife. For the purpose of the first three NCDS surveys, information was obtained from parents, head teachers and class teachers, the schools health service and the subjects themselves (who completed tests of ability and, latterly, questionnaires). In the 1981 and later surveys, information was gathered by professional survey research interviewers.
In 1981 information was obtained from cohort members and from the 1971 and 1981 Censuses. In the 1991 survey there was a professional interview with the cohort member along with selfcompletion questionnaires from NCDS subjects and husbands, wives, and cohabiters. For the 1999-2000 sweeps, information was obtained from cohort members by interviewer and self-completion using CAPI. The 2004 survey was administered by telephone.

The sample
For the purpose of our study, we focus on the four last sweeps of the cohort (t= 0, ..., 3 Table A.II)

Health variable
The NCDS includes only one repeated measure of the respondent's health in the cohort, namely self-assessed health (SAH). Respondents are asked to rate their own health on a four or five point categorical scale ranging from poor (sweeps 4, 5 and 6) or respectively very poor (wave 7) to excellent health status. Given the changes in scale in the variable over the different waves, we use SAH as a binary variable 4 which takes the value one if the individual rates her health as good health or higher, and zero if she rates her health less than "good". Self-assessed health has been shown to be a good predictor of mortality, morbidity and subsequent use of health care (Idler and Benyamini 1997). The distribution of health status in the balanced sample shows the age effect on health status over the life-cycle (see Table A.III). Whereas good health represents 92.7% of respondents at 23 years old, the proportion of respondents reporting a good health declines to 78.3% at 46 years old.
Between the first three sweeps the mean is declining by a constant rate of 4 percentage points. There is a break with a decrease of 6 percentage points between the two last sweeps despite they are separated by four years only. This difference could be explained by an increasing effect of ageing on health when the cohort member enters her forties. This shift could also come from the change in the categorical scale of self-assessed health between sweep 6 and sweep 7 and this latter issue is minimised by the dichotomisation of health.

Socioeconomic status
The NCDS provides several current social characteristics. Education is provided at each wave and we use the highest qualification achieved over the period, generating a three categories discrete variable: having a qualification lower than O-level, having O-level or A-level, and having a qualification higher than A-level. About one fifth of respondents have a qualification lower than secondary school.

Lifestyles variables
The NCDS includes a longitudinal follow-up of lifestyles and health records at age 23, age 33, age 42 and age 46. We consider four lifestyles binary variables (presented in Table A The absence of obesity is the fourth lifestyle that we consider. Obesity may appear as an intermediated or genetic outcome of health and not a pure lifestyle. Given that we can control the genetic and the family transmitted effect on obesity using the respondent's obesity status when she was 16, the absence of obesity will thus captures aggregated effects of lifestyles. Absence of obesity is constructed using the reported height and weight and calculating individuals' body mass index (BMI 5 ). The absence of obesity is a binary variable taking the value one if the cohort member's BMI is strictly lower than 30 and zero otherwise.

Early-life conditions
The vector of early-life conditions that we consider has three main types of variables: social conditions in childhood, parents' health and health-related behaviours, and child and adolescence as well as a birth weight below 2.5kg as health indicators before adulthood using the same dataset.
Furthermore, we include obesity at 16 years old. We have computed BMI using medical assessment 5 BMI in kg/m2= weight/height² of height and weight and evaluated obesity level using gender-specific thresholds values found to be a good predictor of obesity at 18 (Lahti-Koski and Gill 2004).

Random effect dynamic panel Probit results
The results of the random effect panel Probit of the general specification are presented in Table   I. Three different models are reported: model 1a and model 1b are static models with random effect with model 1b including the average individual lifestyles over the studied period; whereas model 1c is a dynamic random effects Probit model. The results show that several early-life conditions have a statistically significant effect on the probability to report good health regardless of the model. Individuals whose father belonged to the lowest social class, namely partly skilled and unskilled workers as well as individuals who had no father at the time of their birth are significantly less likely to report good health. Similarly, the experience of financial hardship during childhood has a significant and negative effect on reports of good health. The mother's education level is also found as a statistically significant determinant of poor health reports whereas the father's education is not significant in any of the models. This may be explained by father's social class being significant and so, absorbing the effect of father's education on descendant's health. Unlike mother's illness, father's illness significantly reduces the probability to report good adult health. Mother's smoking behaviour appears to be significant for descendant's report of good health but the significant level weakens in the dynamic model.

Auxiliary equations estimations
Prior to the mediating specification, the auxiliary equations are estimated. The results are presented in Appendix (Table A.V to Table.  birth weight appear to be positively associated with prudent drinking. Middle qualification is found to be negatively associated with drinking prudently when education is introduced in the auxiliary equation (Table A.VIII, column b). Finally, the absence of obesity at 16 is statistically associated with non obesity in adulthood; in addition, mother's smoking, father's SES and mother's low education are found statistically significantly for the reduction of the absence of obesity. When education is included within the auxiliary equation (Table A.IX, column b), low individual education appears to be significantly and negatively associated with the absence of obesity and the introduction of individual education erases the significant effect of parents' education which was previously observed.

Random effect dynamic panel Probit results: mediating specification
The results of the mediating specifications of the health production function are presented in Table II. They have been obtained by replacing actual variables of education and lifestyles by the estimated residual terms of the different auxiliary equations whose results were described in the previous section. The two mediating specifications are presented. Noticeably, the estimated coefficients associated to education and lifestyles in mediation 1 and to lifestyles only, in mediation 2 are strictly identical with the estimated coefficients in model 1c (see Table I    The decomposition in the baseline specification shows that the most important contribution to health inequalities comes from the state dependence of health and the initial health, which would

Conclusion
In this paper, we developed a model to evaluate the contribution of several essential Finally, the dynamic panel analysis permits controlling a large part of individual unexplained heterogeneity as well as the important effect of health state dependence over time. Our study has some limitations. The inequality measure is based on the explained part of the variance that is allowed by the model specifications. According to the pseudo-R² that is built using the variance of the latent variable, we would be able to explain about 18%. Therefore the unexplained health inequality remains very large. The panel data perspective also presents several limits. The first problem is the presence of attrition due to mortality in the cohort that we have ignored in the analysis. This leads us to an underestimation of the effect of early-life conditions, adult socioeconomic factors and lifestyles on health inequality as we worked on a selected sample of British people still alive at 46 years old. We did investigate mortality in our data and we found that mortality rate appears to be more important before age 23 than between age 23 and age 46. Finally, the NCDS cohort has a singular structure as the different waves are not equidistant in time. In particular there is a four year interval between the two last sweeps whereas there were about ten years between the past sweeps. We tried to catch this effect by introducing a year dummy into the models. Therefore, the estimated coefficients in the models can be interpreted as a mean of the effects of lifestyles, education, and early-life conditions over time.