Biostatistics on the Cholesterol Case Study
- Griffen Herrera
- Feb 22, 2023
- 2 min read
The data is based on the National Cooperative Gallstone Study, where one of the major interests was to study the safety of the drug chenodiol for the treatment of cholesterol gallstones. In this study, patients were randomly assigned to treatment or placebo. We focus on a subset of data on patients who were assigned to the high dose treatment (=1) or placebo (=0) groups.
Variable list: Treatment Group (1=High dose chenodiol, 0=Placebo), Subject ID,
Response at Baseline, Response at Month 6, Response at Month 12, Response at
Month 20, Response at Month 24.
Assuming that the rate of increase in each group is approximately constant
throughout the duration of the study, I will perform a test of whether the rate of
increase differs in the groups, allowing for randomly varying intercepts and slopes. This model equation below allows for random intercepts and slopes.

I ran this model on SAS (the code will be at the bottom of the post).


Above shows the difference between Var(yij) & Var(yij|bi) is that Var(yij|bi) corresponds to the individual variation. Meanwhile Var(yij) corresponds to the variation with respect to the group/population mean which is why it is bigger.
In the scenario that the investigators in this study classified the patients as hyperlipidemia if the value of the outcome variable 𝑦 in the data set is greater than 230. I will create a new variable “h”.
h = 1 for 𝑦> 230
h = 0 for 𝑦≤230
Essentially the goal of the study is to examine whether the risk of having hyperlipidemia
change over time, and whether the patterns are different for the two treatment groups. Below is the model equation followed by fitting the model using the GEE approach.


In the table above we realize that the Group has no significance in the model based on the P-value being greater than the significance level of 0.05. Now I will calculate the estimated probability of hyperlipidemia for each treatment group for the first and the last time points.

Now I am interested in individual effects. Below will show a generalized linear mixed model with random intercept that assumes the log odds of hyperlipidemia change with a linear trend over time, allowing for the trend to be different for treatment groups and also fit this model as well.



My conclusion based on the result above is that the group with respect to time is not significant since the p-value is greater than the significance level of 0.05.
In conclusion, Marginal (GEE) is focused on the population average rather than Mixed models (GLMM) focused on individual trajectories. Which either model will significance pending on the type of scientific questions being asked. For example, GEEs seek to model a population average. Mixed-effect/Multilevel models are subject-specific, or conditional, models. They allow us to estimate different parameters for each subject or cluster. In other words, the parameter estimates are conditional on the subject/cluster. Then GLMMs are good to use when the dependent variable is binary, ordinal, count or quantitative but not normally distributed. They are also useful when the dependent variable involves repeated measures, since GLMMs can model autocorrelation. Based on comparing both their results I see that the two models present very similar p-values, while the beta values are different.
Comments