Introduction
As historians defend, a powerful tool to predict the future is to study the past. In the same way, we have learned how health determinants developed in perinatal and early childhood are associated with health outcomes in adulthood,1-3 an association known as the “fetal origins” hypothesis.4 Birth data, such as birth weight or gestational age, have therefore become essential information in epidemiological research studies, and they have risen as important risk factors for chronic diseases, such as diabetes, cardiovascular disease or cancer, as different hormones and growth factors can affect stem cells and susceptible tissues during intrauterine life.5-7
To evaluate the effect that early life events and life-styles can have in adult health, long-term follow-up studies are needed.8 But, how long ago do we have to go back to gather key information of personal history? Some paediatric cohorts start at birth or during gestation, other recruit participants even before the conception, at pre-gestational visits of the future mothers. The longitudinal prospective design is, without any doubt, the best-suited design for collecting data of both exposures and outcomes. However, cohort studies, with long periods of follow up, have several difficulties such as their high cost and important challenges to obtain sufficiently high long-term retention rates.9,10
A usual method for evaluating the effects of early life factors consist on conducting longitudinal cohort studies that use self-reported retrospective information. This approach increases feasibility by cutting costs and partially mitigates some of the problems of long-term prospective studies, shortening the time of needed follow-up. However, when self-reported or parent-reported retrospective data are used, the potential risk for recall-bias becomes higher.8
Birth weight is probably the most common early life factor collected retrospectively. The gold-standard for birth weight is the weight of the new-born recorded by clinical staff at delivery in the born-certificate or in either mother's or new-born's medical history.11 Even though self or parent-reported birth weight is frequently used and easy to validate, few studies have evaluated its validity. In general, parent-reported birth weight of their offspring showed better correlation coefficients with the gold standard, than self-reported birth data.11-13 Moreover, most of the existing evidence evaluates the validity of self-reported birth data in adulthood, and shows disparity of results.14,15 As many long-term studies often rely on self-reported data, we consider that further examination of potential bias and reporting errors in early life factors is needed. Moreover, the validity of self-reported or parent-reported retrospective data needs to be evaluated in different populations, frame-times and circumstances.8
The aim of this study was to test the response rate and the validity of parent-reported birth data (such as birth weight, birth length and gestational age) in the SENDO project.
Method
Study population
The SENDO project (SEguimiento del Niño para un Desarrollo Óptimo) is a prospective paediatric cohort of Spanish children, with permanently open recruitment, initiated by the Department of Preventive Medicine and Public Health at the University of Navarra in collaboration with the Public Health System of Navarra.16 The SENDO project, started in 2015 in Navarra, inspired by large paediatric cohorts such as the Growing Up Today Study from Harvard University and was designed based on the experience from the Seguimiento Universidad de Navarra (SUN) cohort, a similar study of the same Department focused on adult population.17,18
Participants in SENDO enter the cohort when they are 4 to 6 years old, and are followed up through an annual on-line questionnaire, with no estimated finishing date or end of follow up. Inclusion criteria were age and residence in Spain. Since all the questionnaires are online, the only exclusion criterion was the inaccessibility to Internet. Birth and perinatal data were obtained at baseline, along with socio-demographic, lifestyle and health-related information. The pilot study of the SENDO project was developed between 2015 and 2016. Since 2017, the recruitment is permanently open and so, new participants may enter the cohort every day.
For this validation study we included 241 participants recruited between 2015 and 2017 who had completed the baseline questionnaire (Q0) by December 2017. To ensure that the sample size was enough we performed a statistical power analysis using a specific formula, suggested by Norman and Streiner19,20, and assuming a correlation coefficient of 0.5, and an alpha risk of 0.05, we had enough sample size to guarantee a power of 90%.
Variables
Information was collected through an online self-administered questionnaire. Birth weight, birth length and gestational age were gathered at baseline (when participants were 4 to 6 years old). Quantitative information of birth weight (in grams) and birth length (in centimetres) were collected through an open-answer question. Information about gestational age was collected through a multiple-choice question with only one option: less than 25 weeks, from 25 to less than 30 weeks, from 30 to less than 35 weeks, from 36 to less than 38 weeks, from 38 to less than 40 weeks, from 40 to less than 43 weeks and 43 weeks or more. Participants with missing information on birth weight, birth length or gestational age were categorized as missing and therefore not included in the validation study.
Objectively measured birth weight (in grams), birth length (in centimetres) and gestational age (in weeks) were collected from participants’ medical records. Qualified nurse personal usually records birth weight and birth length at delivery. Gestational age (weeks and days) is also recorded in offspring's medical history by qualified medical staff during birth hospitalization.
Assessment of other variables: the baseline SENDO questionnaire also included socio-demographic information (age, sex, race/ethnic group, parental education), personal and family medical history, anthropometric measures, dietary habits, data on physical activity and a semi-quantitative food frequency questionnaire. Information about these variables was merely used for the description of the sample.
Statistical analysis
We used medians and percentages to describe quantitative and qualitative variables respectively. Groups were compared using the Man-Whitney U test for continuous variables and Fisher's exact test for qualitative ones. For the validation study we used intra-class correlations (ICC) for quantitative variables such as birth weight and birth length, and the weighted Kappa Index for gestational age, as it was recorded as a qualitative variable in the questionnaire. Bland-Altman plots were represented to compare parent-reported and objective data. These graphs represent the differences between the referred versus the measured data, the error (Y axis) against the arithmetic mean of the two values (X axis). This difference plot is useful to reveal a relationship between the magnitude of the measurement and the error, as well as to look for biases and to identify possible outliers.21,22
STATA 12.0 was used for all the analyses. All p-values are two-tailed and statistical significance was set at p<0.05.
Ethical aspects
The SENDO project follows the rules of the Declaration of Helsinki on the ethical principles for medical research in human beings. The parents or legal tutor of all participants signed an informed consent prior to entering the study. The Clinical Research Ethics Committee and the Institutional Review Board of Navarra approved SENDO Project protocol.
At recruitment, participants are informed that the SENDO project is focused on evaluating the association of diet and lifestyle with health-related outcomes in childhood and adolescence. None of the participants specifically knew, at the time they filled-up the baseline questionnaire, that their answers would be validated in this study, but they had given permission to access their children's medical records.
Results
From the 241 participants in the SENDO project at the time this study was done, those with information on their basal questionnaire about birth weight, birth length and gestational weeks at delivery were included. The validation sample included 206 children, 56% males, median age 5.3 years-old, mostly white and from highly educated families.
Table 1 shows the main characteristics of children in the SENDO project. Participants included in the validation study did not differ from the rest of the children in the SENDO project in terms of anthropometric measures (weight, length and body mass index), age or parental educational level. Questionnaires were more often answered by mothers, who were mostly young-middle aged and highly educated women. Although both groups were mainly composed of white participants, we found statistically significant differences between groups for race, and for number of siblings, but not for the position of the participant among siblings. We also found differences in sex, with lower percentage of females in the validation sample. The mean age was 5.4 years (standard deviation [SD]: 0.8) in the validation sample versus 5.6 years (SD: 0.9) in the SENDO cohort. Time frame between birth and birth data collection was from 4 to 6 years. SENDO parental response rate on birth information was over 99% for birth weight and gestational age, and 76% for birth length.
Participants in the validation studya | Rest of participants in SENDO projecta | p | |
---|---|---|---|
Age (years) | 5.3 (4.1-7.3) | 5.6 (3.9-7.8) | 0.10 |
Sex | 0.02 | ||
Males | 56.80% | 37.13% | |
Females | 43.20% | 62.86% | |
Weight (kg) | 20 (13-38.4) | 19.75 (14-28) | 0.74 |
Length (cm) | 112.0 (95-130) | 111 (87-133) | 0.59 |
Body mass index (Z score) | −0.2 (−2.9-5.2) | −0.3 (−1.1-4.6) | 0.44 |
Race/ethnic group | <0.01 | ||
White | 99.49% | 92.28% | |
Other | 0.49% | 8.0% | |
Questionnaire responder (mother) | 96.6% | 94.2% | 0.38 |
Mother education level | 0.66 | ||
Undergraduate | 3.40% | 5.71% | |
Graduate | 16.99% | 20.00% | |
University degree | 56.80% | 48.57% | |
Master | 22.82% | 25.71% | |
Number of siblings | 1 (0-10) | 2 (1-6) | <0.01 |
Position among siblings (first born) | 39.22% | 34.29% | 0.17 |
a. Numbers are medians for quantitative variables and percentages for qualitative ones.
Birth weight validation
The ICC for birth weight was 0.95 (95% confidence interval [95%CI]: 0.94-0.96) when parental-reported birth weight was compared to objectively measured birth weight. The mean relative standard error, the difference between the reported and the measured data relative to their arithmetic mean, was 0.7%.
We used Bland-Altman plots to represent the agreement between parental-reported and objectively measured data. In this kind of graphs, the differences between the self-reported and the objectively measured data (Y axis) are plotted against the arithmetical mean of the two values (X axis). We found that the average error for birth weight was small and that there was not any trend along the X axis, showing no graphic evidence of any relation between the reported error and the magnitude of the birth weight (Fig. 1).
Birth length validation
ICC for birth length was 0.78, (95%CI: 0.73-0.83). The mean relative standard error was 2.7%. The Bland-Altman graph (Fig. 2) displays a scatter plot of the differences between parent-referred and objectively measured birth length (Y axis) against the arithmetic mean of these two values (X axis). The figure shows most of the dots are displayed along the horizontal y=0 line, included within the −1.96 SD and+1.96 SD limits, and with no trend suggestive of bias.
Gestational age
We used the weighted Kappa index to determine the agreement between parent- referred duration of pregnancy and gestational age recorded in participants’ medical history. We obtained a Kappa coefficient of 0.90 (95%CI: 0.89-0.90) with a 97% of agreement.
Total agreement in evaluated data
We also determined the percentage of answers for each variable that were reported by parents exactly as they were recorded on participant's medical history. The exact agreement between parent-reported and objectively measured data was 54% for birth weight, 46% for birth length, and 86% for categorized gestational age.
Discussion
In this validation study, we found a very good agreement between parent-reported and objectively measured data of their offspring's birth information. Lately, the use of self-reported information in epidemiological research has been criticized. However arguments against self-reported information sometimes show a lack of understanding of basic principles in epidemiology.23,24 We observed a high response rate on birth information (over 99% for birth weight and gestational age). The differences observed between the participants in the validation sample and the rest of the children in the SENDO project do not affect the interpretation and validity of the results, as they are not related to parents’ capacity of reporting birth information data.
The calculated ICC for birth weight revealed a very good level of correlation, and the calculated ICC for birth length, although inferior, was also good.22 Our results show that parent-reported birth weight and birth length are valid tools and may be used to collect retrospective information in epidemiological research. The observed agreement and the calculated weight Kappa index for gestational age were high, supporting the use of this parent-reported data in future investigations. We used Kappa index because it is more robust than the percent agreement, as it considers the possibility of the random agreement.
Our results agree with previous studies that found that the information about offspring's birth is usually well reported, better even than the information about one self's birth. Adult-based studies report lower ratios of response rate on birth weight (around 25-28%) and poor agreement, while we found a response rate over 99% and high agreement.8,13-15
This can be easily explained by the fact that it is easier to recall data on important or stressful events. Giving birth to a child is, without doubt, a significant moment in life, therefore data related to birth are more accurately recalled.25,26 Also, our results are similar to other cohorts based on highly educated participants,27,28 such as the US Nurses’ Health Study II, which showed a higher response rate, higher agreement and better validity than other studies in adult population based on participants with low income or low educational level studies.12,14
The good results found in our validation study can be partly explained by some the characteristics of the SENDO project, which is settled in Spain, a developed European country with a strong universal health system that offers full coverture to all citizens. Our sample, formed mostly by white highly educated participants, is not representative of the general Spanish population. Nevertheless, previous studies have shown that the representativeness of the sample is not always necessary and that including highly educated participants raises the response rate and adds validity to their auto-referred data. Some of the best cohort studies, such as the Nurses’ Health Study and the Growing Up Today Study, both from the Harvard T.H. Chan School of Public Health are not based on representative samples of the population under study.29-31
Despite these results, our study has some limitations. First, the little variability in ethnic and socio-demography data may affect the generalization of our results. Nevertheless, samples based on high-educated and highly committed participants, despite not being representative, are the best scenario for studies based on self-reported data, as that kind of participants are more likely to provide valid information for a long period of time. Second, all the participants were 4 to 6 years old at recruitment (and at the moment of the collection of the information). Therefore, we acknowledge that correlation and agreement may be lower if the time frame between birth and recruitment is larger. The age at recruitment in the SENDO project was settled with the hope that, at that point, the parents of the participants would be able to recall birth and perinatal information and thus, the collected information would be valid as to be used in epidemiologic research. Our results now support those assumptions. Also, the possibility of misclassification bias due to measurement error associated to the use of different devices cannot be discarded. However, since it is unlikely that misclassification is associated with other variables in the study, the most likely misclassification is non-differential, which, in any case, will bias the association toward the null value. Our study also has important strengths as the high response rate and the accessibility to all participantś medical records.
In conclusion, parent-reported information of offspring's birth (including weight, length and gestational age), collected at recruitment in the SENDO project, when children were 4 to 6 years old, is valid as to be used in epidemiological studies. We consider that in general, birth information reported by highly educated parents in the following 4-6 years after delivery can be used with confidence in life-course studies.
What is known about the topic?
Birth information, such as birth weight or birth length, may be important to predict different health-related outcomes in adulthood. This information is usually collected retrospectively and mostly self-reported. Thus, the questionnaires used for the collection of that kind of information need to be validated.
What does this study add to the literature?
We found an excellent correlation between parent-reported and objectively measured birth data, including birth weight, birth length and gestational age. Parent-reported offspring's birth information in the SENDO project 4 to 6 years after delivery can be used with confidence in life-course studies.