Fertility and intelligence

I investigate the relationship between fertility and intelligence in a data set from Wisconsin

Jonatan Pallesen
05-21-2019

Introduction

A new study finds that among men born in Sweden from 1951-1967, less intelligent men tend to have fewer kids. I look into whether this pattern is also true in people in a data set of people born in Wisconsin in ~1939.

The data is from the Wisconsin Longitudinal Study. Data preparation is performed here.

read data


library(pacman)

p_load(tidyverse, magrittr, plotrix, broom, rsample, purrr, janitor, feather)

source('../../src/extra.R', echo = F, encoding="utf-8")

df <- read_feather("data/data.f") %>% 
  select(iq100, nkids, gender, rtype, income, eduyears, nkids75, nkids92) %>% 
  drop_na(iq100, nkids, gender)

levels = c("< 74", "74-81", "81-89", "89-96", "96-104", 
           "104-111", "111-119", "119-126", "> 126")

df %<>% drop_na(iq100, nkids, gender) 

df %<>% 
  mutate(
    iq_group = case_when(
      iq100 < 74 ~ "< 74",
      between(iq100, 74, 81) ~ "74-81",
      between(iq100, 81, 89) ~ "81-89",
      between(iq100, 89, 96) ~ "89-96",
      between(iq100, 96, 104) ~ "96-104",
      between(iq100, 104, 111) ~ "104-111",
      between(iq100, 111, 119) ~ "111-119",
      between(iq100, 119, 126) ~ "119-126",
      iq100 > 126 ~ "> 126"
    )
  ) %>% 
  mutate(iq_group = factor(iq_group, levels = levels))

sample size


df %>% tabyl(gender) %>% select(-percent)
gender n
female 5540
male 5001


Analysis

I divide the data set into the same IQ subgroups as in the Swedish study. We do not see the same relationship as in the Swedish study between low IQ and low fertility in men. In women we see a strong negative correlation between IQ and fertility.

code


df %>% group_by(gender, iq_group) %>% 
  summarise(
    mean = mean(nkids),
    se = std.error(nkids)
    ) %>% 
  ggplot(aes(x = iq_group, y = mean, group=gender, color=gender)) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width=.1) +
  geom_line(size=1.1) +
  geom_point(size=3) +
  labs(x = "IQ", y = "n_children")

Note that there are some issues with the reporting of number of children in the data set. The initial question was given in 1975, when the participants were ~36 years old. Thus there is still time to have more children, especially for men. This is likely the reason that men’s overall fertility is lower than women’s in the figure. The question was repeated again in 1992, and the above figure includes this update, but there was quite low response rate. It is also possible that especially more intelligent men tend to have kids later in life, which would make us question the dip in the graph for men at high IQ.

The gender difference in the influence of IQ on the number of offspring is significant in an interaction test:


df <- bind_rows(
  df %>% filter(gender == "male") %>% mutate(nkids_z = stdize(nkids)),
  df %>% filter(gender == "female") %>% mutate(nkids_z = stdize(nkids))
)

lm(nkids_z ~ iq100 * gender, df)
Fitting linear model: nkids_z ~ iq100 * gender
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.637 0.0945 6.74 1.67e-11
iq100 -0.00637 0.000936 -6.81 1.04e-11
gendermale -0.336 0.133 -2.53 0.0115
iq100:gendermale 0.00337 0.00131 2.57 0.0103


Dysgenics model

A subset of the data set also has polygenic scores available. Thus we can calculate the expected decline in IQ / education polygenic scores for the population caused by selection. This is the same method as used by Kong et al. We use the polygenic score for educational attainment as a proxy for the polygenic score for IQ, since they have very high correlation.

read data


df <- read_csv("data/data_nosibs.csv", guess_max=1500) %>% 
  mutate_at(vars(starts_with("z_cmkdb")), funs(. - 1900 - birthyear)) %>% 
  mutate(mean_birth_age = rowMeans(select(., starts_with("z_cmkdb")), na.rm = T)) %>%
  drop_na(pgs_edu)

sample size


df %>% tabyl(gender) %>% select(-percent)
gender n
female 2783
male 2496

Formulas

\[\begin{equation} X = \text{Education attainment polygenic score / IQ with mean 0 and std 1} \end{equation}\]

\[\begin{align} Regression 1:\nonumber \\ \text{Number of children} &\sim X \\ \text{Number of children} &= a + bX \\ \nonumber \\ Regression 2:\nonumber \\ \text{Average age at child birth} &\sim X \\ \text{Average age at child birth} &= c + dX \end{align}\]

\[\begin{equation} \beta = \frac{b}{a*c} - \frac{d * log(\frac{a}{2})}{c^2} \end{equation}\]

\[\begin{align} \text{If X is polygenic score:} \nonumber \\ \text{Mean change in genetic edutainment per year} &= \beta \\ \nonumber \\ \text{If X is IQ:} \nonumber \\ \text{Mean change in genetic intelligence per year} &= h^2 * \beta \end{align}\]

Calculate the mean change in edu polygenic score per year

calculate_beta


set.seed(0)

n_samples <- 1000

calc_bootstrap <- function(split, measure) {
  r1 <- lm(as.formula(glue("nkids ~ {measure}")), analysis(split)) %>% 
    tidy %$% estimate 
  r2 <- lm(as.formula(glue("mean_birth_age ~ {measure}")), analysis(split)) %>% 
    tidy %$% estimate 
  
  a <- r1[1]; b <- r1[2]; c <- r2[1]; d <- r2[2]
  
  return(b / (a * c) - d *log(a/2) / (c*c))
}

get_beta <- function(g) {
  boots <- bootstraps(df %>% filter(gender==g), times = n_samples)
  res <- boots %>% mutate(beta = map_dbl(splits, calc_bootstrap, "pgs_edu"))
  alpha <- .05
  res[3] %>% 
    summarise(gender = g,
              mean_beta = mean(beta), 
              high = quantile(beta, 1 - alpha / 2),
              low = quantile(beta, alpha / 2)
              )
}

betas <- bind_rows(get_beta("male"), get_beta("female"))

betas <- bind_rows(
  betas, 
  betas %>% summarise(
    mean_beta = mean(mean_beta), 
    high = mean(high), 
    low = mean(low), 
    gender = "combined"
    )
  ) %>% 
  rename("CI_95%_lower_bound" = high, "CI_95%_upper_bound" = low)

beta <- betas %>% filter(gender == "combined") %>% pull(mean_beta)

betas
gender mean_beta CI_95%_lower_bound CI_95%_upper_bound
male -0.000911 -0.000151 -0.00169
female -0.00137 -0.000619 -0.00208
combined -0.00114 -0.000385 -0.00188

IQ points lost per decade due to genetic selection

We use the combined beta in this estimate.

iq_decline


change_pgs = beta * 10

change_iq_per_pgs = coef(lm(iq100 ~ pgs_edu, df))[["pgs_edu"]]

change_iq_from_pgs = change_pgs * change_iq_per_pgs

h2_iq_high = 0.8

h2_iq_low = 0.4

explained_var_pgs = summary(lm(iq100 ~ pgs_edu, df))$adj.r.squared

change_iq_high = change_iq_from_pgs * h2_iq_high / explained_var_pgs

change_iq_low = change_iq_from_pgs * h2_iq_low / explained_var_pgs

tribble(
  ~Description, ~Value, ~Variable,
  "Change in EDU PGS per decade", change_pgs, "change_pgs", 
  "Change in IQ per std change in EDU PGS", change_iq_per_pgs, "change_iq_per_pgs", 
  "Change in IQ per decade estimated from the change in EDU PGS", 
  change_iq_from_pgs, "change_iq_from_pgs", 
  "Variance in IQ in the data that is explained by the EDU PGS", 
  explained_var_pgs, "explained_var_pgs")

tribble(~"IQ Heritability estimate", ~"Predicted change in IQ per decade",
        "0.8 (classical behavior genetics)", change_iq_high,
        "0.4 (GCTA GREML)", change_iq_low)
Description Value Variable
Change in EDU PGS per decade -0.0114 change_pgs
Change in IQ per std change in EDU PGS 4.65 change_iq_per_pgs
Change in IQ per decade estimated from the change in EDU PGS -0.053 change_iq_from_pgs
Variance in IQ in the data that is explained by the EDU PGS 0.0981 explained_var_pgs
IQ Heritability estimate Predicted change in IQ per decade
0.8 (classical behavior genetics) -0.432
0.4 (GCTA GREML) -0.216