Given the following model:
m <- lm(mpg ~ hp + wt + gear, data = mtcars)
suppose that we wish to examine the effect of hp
by calculating expected values at several levels of hp
while averaging over the observed values of wt
and gear
.
Which do you like better?
smargins_v1(m, hp = quantile(hp))
## hp .sim_number .smargin_qi
## 1 52 1 24.08801
## 2 52 2 25.04764
## 3 52 3 24.43555
## 4 52 4 24.30619
## 5 52 5 24.73331
## 6 52 6 23.40831
## # - - - - - - - - - - - - - - - -
## 24995 335 4995 13.02764
## 24996 335 4996 13.23298
## 24997 335 4997 11.66770
## 24998 335 4998 13.32230
## 24999 335 4999 16.47527
## 25000 335 5000 14.11117
or
smargins_v2(m, "hp")
## hp .sim_number .smargin_qi
## 1 52 1 22.57997
## 2 52 2 23.56909
## 3 52 3 24.91277
## 4 52 4 22.35991
## 5 52 5 22.69978
## 6 52 6 23.41944
## # - - - - - - - - - - - - - - - -
## 24995 335 4995 11.08558
## 24996 335 4996 10.73280
## 24997 335 4997 12.42628
## 24998 335 4998 14.58362
## 24999 335 4999 14.22267
## 25000 335 5000 17.25174
?
Now suppose that we want to choose the values of hp ourselves:
smargins_v1(m, hp = seq(50, 350, by = 50))
## hp .sim_number .smargin_qi
## 1 50 1 23.67599
## 2 50 2 23.94712
## 3 50 3 23.86978
## 4 50 4 25.25884
## 5 50 5 22.10601
## 6 50 6 24.62478
## # - - - - - - - - - - - - - - - -
## 34995 350 4995 13.01436
## 34996 350 4996 15.58402
## 34997 350 4997 13.68577
## 34998 350 4998 10.90554
## 34999 350 4999 10.84421
## 35000 350 5000 12.99449
or
smargins_v2(m, "hp", xvalues = list(hp = seq(50, 350, by = 50)))
## hp .sim_number .smargin_qi
## 1 50 1 24.39110
## 2 50 2 23.12868
## 3 50 3 22.31790
## 4 50 4 22.94109
## 5 50 5 23.00245
## 6 50 6 22.22947
## # - - - - - - - - - - - - - - - -
## 34995 350 4995 13.152172
## 34996 350 4996 7.793063
## 34997 350 4997 12.345081
## 34998 350 4998 13.447767
## 34999 350 4999 15.909270
## 35000 350 5000 11.656213
Now which do you prefer?
smargins_v1
is more verbose for the simple case, but less so when the user does wish to specify values. Which one should we pick? Or is there some combination or alternative that would be even better?
Currently there is a scompare
function that calculates pairwise differences for each combination of values on a particular predictor variable. For example, we can compare the differences between expected values for each level of gear
as follows:
summary(scompare(smargins(m, gear = sort(unique(gear))), var="gear"))
## gear mean sd median lower_2.5 upper_97.5
## 1 3 vs 4 -1.011286 0.8398293 -1.002223 -2.67395 0.6376987
## 2 3 vs 5 -2.022573 1.6796587 -2.004445 -5.34790 1.2753973
## 3 4 vs 5 -1.011286 0.8398293 -1.002223 -2.67395 0.6376987
It would be nice to offer some flexibility here so that comparisons other than pairwise can be made. There are a few different interfaces we might choose from.
Some options include:
car::lht
, and https://github.com/goshevs/effcomp
car::lht
emmeans::contrast
A few examples follow below.
summary(scompare_v1(smargins(m,
gear = sort(unique(gear))),
var = "gear",
contrast = c(-1, 0, 1)))
## gear mean sd median lower_2.5 upper_97.5
## 1 -1 0 1 2.006667 1.711305 1.995972 -1.320703 5.319751
summary(scompare_v1(smargins(m,
gear = sort(unique(gear))),
var = "gear",
contrast = c(-0.5, -0.5, 1.0)))
## gear mean sd median lower_2.5 upper_97.5
## 1 -0.5 -0.5 1 1.49402 1.27138 1.480393 -0.9675313 4.073178
summary(scompare_v2(smargins(m,
gear = sort(unique(gear))),
var = "gear",
contrast.fun = function(a, b, c) a - b + 0*c))
## gear mean sd median lower_2.5 upper_97.5
## 1 comparison -1.038076 0.8496704 -1.037681 -2.693875 0.621474
Naming is a problem…
summary(scompare_v2b(smargins(m,
gear = sort(unique(gear))),
var = "gear",
contrast.funs = list("3 vs 4" = function(a, b, c) a - b + 0*c)))
## gear mean sd median lower_2.5 upper_97.5
## 1 3 vs 4 -1.010048 0.8244786 -1.015673 -2.612933 0.6068117
summary(scompare_v2b(smargins(m,
gear = sort(unique(gear))),
var = "gear",
contrast.funs = list("3 and 4 vs 5" = function(a, b, c) .5*a + .5*b + - c)))
## gear mean sd median lower_2.5 upper_97.5
## 1 3 and 4 vs 5 -1.541021 1.298242 -1.534998 -4.101106 0.9724547
After looking at these examples I’m not sure we have a clear winner yet. Perhaps there is another way that I haven’t thought of yet? If you think of one please let me know by opening an issue at https://github.com/izahn/smargins/issues