📋 AE 12 - Multiple Logistic Regression
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.
TenYearCHD:
age: Age at exam time (in years)
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College
risk_fit <- glm(TenYearCHD ~ age + education,
data = heart_disease, family = "binomial")
risk_fit |> tidy() |> kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |
| age | 0.073 | 0.005 | 13.385 | 0.000 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 |
\[ \small{\log\Big(\frac{\hat{\pi}}{1-\hat{\pi}}\Big) = -5.385 + 0.073 ~ \text{age} - 0.242 ~ \text{ed2} - 0.235 ~ \text{ed3} - 0.020 ~ \text{ed4}} \]
risk_fit <- glm(TenYearCHD ~ age + education,
data = heart_disease, family = "binomial")
risk_fit |> tidy() |> kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |
| age | 0.073 | 0.005 | 13.385 | 0.000 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 |
As age increases by a year, the typical log-odds of developing coronary heart disease within the next 10 years increases by 0.073 for patients with the same level of education.
As age increases by a year, the typical odds of developing coronary heart disease within the next 10 years increases by a factor of \(\exp(0.073)\approx 1.08\) (i.e. 8%) for patients with the same level of education.
risk_fit <- glm(TenYearCHD ~ age + education,
data = heart_disease, family = "binomial")
risk_fit |> tidy() |> kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |
| age | 0.073 | 0.005 | 13.385 | 0.000 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 |
Patients of the same age who have [a High School diploma or GED/Some College or Vocational School/a College education], the typical log-odds of developing coronary heart disease within the next 10 years is 0.242/0.235/0.020 lower than patients with only some high school.
Patients of the same age who have [a High School diploma or GED/Some College or Vocational School/a College education], the typical odds of developing coronary heart disease within the next 10 years is 79.5%/79.1%/98.0% of the odds for patients with only some high school.
Hypotheses: \(H_0: \beta_j = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_j \neq 0\), given the other variables in the model
Test Statistic: \[z = \frac{\hat{\beta}_j - 0}{SE_{\hat{\beta}_j}}\]
P-value: \(P(|Z| > |z|)\), where \(Z \sim N(0, 1)\), the Standard Normal distribution
We can calculate the C% confidence interval for \(\beta_j\) as the following:
\[ \Large{\hat{\beta}_j \pm z^* SE_{\hat{\beta}_j}} \]
where \(z^*\) is calculated from the \(N(0,1)\) distribution
Note
This is an interval for the change in the log-odds for every one unit increase in \(x_j\)
The change in odds for every one unit increase in \(x_j\).
\[ \Large{\exp\{\hat{\beta}_j \pm z^* SE_{\hat{\beta}_j}\}} \]
Interpretation: We are \(C\%\) confident that for every one unit increase in \(x_j\), the odds multiply by a factor of \(\exp\{\hat{\beta}_j - z^* SE_{\hat{\beta}_j}\}\) to \(\exp\{\hat{\beta}_j + z^* SE_{\hat{\beta}_j}\}\), holding all else constant.
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 | -5.995 | -4.788 |
| age | 0.073 | 0.005 | 13.385 | 0.000 | 0.063 | 0.084 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 | -0.463 | -0.024 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 | -0.501 | 0.023 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 | -0.317 | 0.266 |
Hypotheses:
\[ H_0: \beta_{age} = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_{age} \neq 0 \] given education is in the model
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 | -5.995 | -4.788 |
| age | 0.073 | 0.005 | 13.385 | 0.000 | 0.063 | 0.084 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 | -0.463 | -0.024 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 | -0.501 | 0.023 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 | -0.317 | 0.266 |
Test statistic:
\[z = \frac{0.07328 - 0}{0.00547} = 13.39\]
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 | -5.995 | -4.788 |
| age | 0.073 | 0.005 | 13.385 | 0.000 | 0.063 | 0.084 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 | -0.463 | -0.024 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 | -0.501 | 0.023 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 | -0.317 | 0.266 |
P-value:
\[ P(|Z| > |13.39|) \approx 0 \]
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.385 | 0.308 | -17.507 | 0.000 | -5.995 | -4.788 |
| age | 0.073 | 0.005 | 13.385 | 0.000 | 0.063 | 0.084 |
| education2 | -0.242 | 0.112 | -2.162 | 0.031 | -0.463 | -0.024 |
| education3 | -0.235 | 0.134 | -1.761 | 0.078 | -0.501 | 0.023 |
| education4 | -0.020 | 0.148 | -0.136 | 0.892 | -0.317 | 0.266 |
Conclusion:
The p-value is very small, so we reject \(H_0\). The data provide sufficient evidence that age is a statistically significant predictor of whether someone is will develop heart disease in the next year, after accounting for education.
Suppose there are two models:
We want to test the hypotheses
\[ \begin{aligned} H_0&: \beta_{q+1} = \dots = \beta_p = 0 \\ H_A&: \text{ at least one }\beta_j \text{ is not } 0 \end{aligned} \]
To do so, we will use the Nested Likelihood Ratio test (LRT), also known as the Drop-in-deviance test,
Hypotheses:
\[ \begin{aligned} H_0&: \beta_{q+1} = \dots = \beta_p = 0 \\ H_A&: \text{ at least 1 }\beta_j \text{ is not } 0 \end{aligned} \]
Test Statistic: \[G = (-2 \log L_{reduced}) - (-2 \log L_{full})\]
or sometimes
Test Statistic: \[G = (-2 \log L_{0}) - (-2 \log L)\]
P-value: \(P(\chi^2 > G)\), calculated using a \(\chi^2\) distribution with degrees of freedom equal to the difference in the number of parameters in the full and reduced models
education to a model with only age?First model, reduced:
education to the model?Calculate deviance for each model:
education to the model?Drop-in-deviance test statistic:
education to the model?Calculate the p-value using a pchisq(), with degrees of freedom equal to the number of new model terms in the second model:
Conclusion: The p-value is between 0.1 and 0.05 indicating mild but not strong evidence that a model with education is a useful predictor when age is already in the model.
We can use the anova function to conduct this test
Add test = "Chisq" to conduct the drop-in-deviance test