/*
Bellavia A. November 29, 2019
Stata commands mediation and interaction analysis, KI
The script generates and analyses a sample population of 10.000 individuals
with information on race/ethnicity, fast-food consumption, exposure to a
certain chemicals (e.g. DiNP, metabolite of diisononyl phthalate),
and diabetes. Despite some of the associations are chosen
based on real data, the sample does not represent a real population but only a
simplified situation. The purpose of the data is to illustrate the estimation
and interpretation of interaction terms and direct/indirect effects.
*/
*********************
* 1. Data Simulation*
*********************
clear all
set more off
set obs 10000
set seed 12
/* Generate race/ethnicity as a binary covariate
with 19% of black-American */
gen x = rbinomial(1,.19)
/*Generate the binary mediator (yes/no) of fast-food consumption.
Use results from Zota et al, 2016, showing a proportion of 44% fast-food
consumers among black-American, and 33% among other groups*/
gen m1=.
replace m1=rbinomial(1,.44) if x==1
replace m1=rbinomial(1,.33) if x==0
/*The following lines generate a second continuous mediator representing the urinary
concentration of a specific chemical. We assume that this covariate is associated
with both race/ethnicity and fast-food consumption. We also assume that an interaction
between race/ethnicity and fast-food consumption is present. The chosen coefficients will
provide a covariate an average DiNP concentration of 11 ug/l in the entire population,
10.5 ug/l among non black-American, and 12.6 ug/l among black-American. */
*Constant
scalar beta0 =10.5
*Main effect of race/ethnicity
scalar beta1 =1.5
*Main effect of fast-food consumption
scalar beta2=0.9
*Additive interaction
scalar beta3=0.5
gen inter=m1*x
gen m2 = rnormal(beta0+beta1*x+beta2*m1+beta3*inter,2)
*(Please note that in real situations environmental chemicals are seldom normally distributed)
/*Generates diabetes (yes/no) as a function of race/ethnicity (OR= 1.1),
fast-food consumption (OR=1.2), and DiNP urinary concentration
(OR=1.3 for each unit increase of DiNP).*/
scalar beta00 =-4.3
*diabetes around 10%
scalar beta11 =log(1.1)
scalar beta12 =log(1.2)
scalar beta13=log(1.2)
gen y = rbinomial(1,exp(beta00 + beta11 * x + beta12 * m1 + beta13 * m2)/ (1 + exp(beta00 + beta11 * x + beta12 * m1 + beta13 * m2)))
**********************
* 2. Data description*
**********************
tab y
tab x y
tab m1 y
tab m1 x, col
sum m2, d
tabstat m2, by(x) stat(mean sd)
tabstat m2, by(y) stat(mean sd)
tab y x, col
*****************
* 3. Interaction*
*****************
*Q1
reg m2 m1
*Q2a
reg m2 m1 if x==0
reg m2 m1 if x==1
*Q2b
reg m2 m1 x inter
lincom _b[m1]+_b[inter]
*Q3
capture drop m2cat
gen m2cat=0
replace m2cat=1 if m2>11.5
logit m2cat m1, or
*Q4
logit m2cat m1 if x==0, or
logit m2cat m1 if x==1, or
logit m2cat m1 x inter, or
*Code for RERI
nlcom exp(_b[m1]+_b[x]+_b[inter])-exp(_b[m1])-exp(_b[x])+1
/* There is a positive interaction between race/ethnicity and fast-food consumption
in predicting DiNP urinary concentration.
Reducing fast-food consumption will reduce the average DiNP
concentration in the population. Public health recommendation aiming at
reducing fast-food consumption should be primarely given to the black-American
population. */
*****************
* 4. Mediation - binary outcome*
*****************
* Q1a) X->Y
logit y x, or
matrix a=e(b)
* Black-Americans have 60% higher odds of diabetes
* Q1b) X->M2
reg m2 x
matrix b=e(b)
* MEP concentration is 1.80 ug/l higher among Black-Americans
* Q2) X, M2 -> Y
logit y x m2, or
matrix c=e(b)
* When adjusting for MEP concentration, the odds ratio goes down to 1.2
* Q3-Q4-Q5)
*Retrieve coefficients to calculate direct and indirect effects
scalar logtotaleff=a[1,1]
scalar totaleff=exp(logtotaleff)
scalar logdirecteff=c[1,1]
scalar directeff=exp(logdirecteff)
scalar a=c[1,2]
scalar b=b[1,1]
scalar c=a*b
scalar indirecteff_product=exp(c)
scalar f=logtotaleff-logdirecteff
scalar indirecteff_difference=exp(f)
scalar pm=c/logtotaleff
di totaleff
di directeff
di indirecteff_product
di indirecteff_difference
di pm
/* The difference and product method gives slightly different results because
the outcome is binary, and the equivalence requires rare outcomes. */
* The paramed command gives the same results with standard errors
paramed y, avar(x) mvar(m2) a0(0) a1(1) m(7) yreg(logistic) mreg(linear) nointer
* Higher exposure to MEP explains 65% of the racial/ethnic disparity in diabetes
*****************
* 5. Mediation - continuous outcome*
*****************
* Q1a) X->Y
reg m2 x
matrix a=e(b)
* MEP concentration is 1.8 ug/l higher among Black-Americans
* Q1b) X->M
logit m1 x, or
matrix b=e(b)
* Black-Americans have 60% higher odds of consuming fast-food
* Q2) X, M -> Y
reg m2 m1 x
matrix c=e(b)
* When adjusting for fast-food consumption the difference in MEP concentration goes down to 1.7 ug/l
* Q3-Q4)
*Retrieve coefficients to calculate direct and indirect effects
scalar totaleff=a[1,1]
scalar directeff=c[1,2]
scalar indirecteff=totaleff-directeff
scalar pm=indirecteff/totaleff
di totaleff
di directeff
di indirecteff
di pm
paramed m2, avar(x) mvar(m1) a0(0) a1(1) m(0) yreg(linear) mreg(logistic) nointer
/* A small proportion of the disparity in MEP exposure is explained by fast-food
consumption (6%)*/
* Q5)
scalar a=c[1,1]
scalar b=b[1,1]
scalar indirecteff_product=a*b
di indirecteff_product
scalar pm=indirecteff_product/totaleff
di pm
/* The product and difference method stronlgy differn because data we
are not taking into account the interaction between x and m1 in predicting
m2. The standard approach to mediation analysis does not allow including
interactions and may provide biased results when this is present. */
* The conterfactual approach to mediation allows for exposure-mediator interaction
paramed m2, avar(x) mvar(m1) a0(0) a1(1) m(0) yreg(linear) mreg(logistic)