Specify x variable names or values?

Given the following model:

m <- lm(mpg ~ hp + wt + gear, data = mtcars)

suppose that we wish to examine the effect of hp by calculating expected values at several levels of hp while averaging over the observed values of wt and gear.

Which do you like better?

smargins_v1(m, hp = quantile(hp))

##       hp .sim_number .smargin_qi
## 1     52           1    24.08801
## 2     52           2    25.04764
## 3     52           3    24.43555
## 4     52           4    24.30619
## 5     52           5    24.73331
## 6     52           6    23.40831
## # - - - - - - - - - - - - - - - -                                 
## 24995 335        4995    13.02764
## 24996 335        4996    13.23298
## 24997 335        4997    11.66770
## 24998 335        4998    13.32230
## 24999 335        4999    16.47527
## 25000 335        5000    14.11117

smargins_v2(m, "hp")

##       hp .sim_number .smargin_qi
## 1     52           1    22.57997
## 2     52           2    23.56909
## 3     52           3    24.91277
## 4     52           4    22.35991
## 5     52           5    22.69978
## 6     52           6    23.41944
## # - - - - - - - - - - - - - - - -                                 
## 24995 335        4995    11.08558
## 24996 335        4996    10.73280
## 24997 335        4997    12.42628
## 24998 335        4998    14.58362
## 24999 335        4999    14.22267
## 25000 335        5000    17.25174

Now suppose that we want to choose the values of hp ourselves:

smargins_v1(m, hp = seq(50, 350, by = 50))

##       hp .sim_number .smargin_qi
## 1     50           1    23.67599
## 2     50           2    23.94712
## 3     50           3    23.86978
## 4     50           4    25.25884
## 5     50           5    22.10601
## 6     50           6    24.62478
## # - - - - - - - - - - - - - - - -                                 
## 34995 350        4995    13.01436
## 34996 350        4996    15.58402
## 34997 350        4997    13.68577
## 34998 350        4998    10.90554
## 34999 350        4999    10.84421
## 35000 350        5000    12.99449

smargins_v2(m, "hp", xvalues = list(hp = seq(50, 350, by = 50)))

##       hp .sim_number .smargin_qi
## 1     50           1    24.39110
## 2     50           2    23.12868
## 3     50           3    22.31790
## 4     50           4    22.94109
## 5     50           5    23.00245
## 6     50           6    22.22947
## # - - - - - - - - - - - - - - - -                                 
## 34995 350        4995   13.152172
## 34996 350        4996    7.793063
## 34997 350        4997   12.345081
## 34998 350        4998   13.447767
## 34999 350        4999   15.909270
## 35000 350        5000   11.656213

Now which do you prefer?

smargins_v1 is more verbose for the simple case, but less so when the user does wish to specify values. Which one should we pick? Or is there some combination or alternative that would be even better?

Interface for comparisons

Currently there is a scompare function that calculates pairwise differences for each combination of values on a particular predictor variable. For example, we can compare the differences between expected values for each level of gear as follows:

summary(scompare(smargins(m, gear = sort(unique(gear))), var="gear"))

##     gear      mean        sd    median lower_2.5 upper_97.5
## 1 3 vs 4 -1.011286 0.8398293 -1.002223  -2.67395  0.6376987
## 2 3 vs 5 -2.022573 1.6796587 -2.004445  -5.34790  1.2753973
## 3 4 vs 5 -1.011286 0.8398293 -1.002223  -2.67395  0.6376987

It would be nice to offer some flexibility here so that comparisons other than pairwise can be made. There are a few different interfaces we might choose from.

Some options include:

matrix based, like car::lht, and https://github.com/goshevs/effcomp
character representation, like car::lht
function based, like emmeans::contrast
function based, but function does computation directly

A few examples follow below.

Matrix interface for comparisons

summary(scompare_v1(smargins(m,
                             gear = sort(unique(gear))),
                    var = "gear",
                    contrast = c(-1, 0, 1)))

##     gear     mean       sd   median lower_2.5 upper_97.5
## 1 -1 0 1 2.006667 1.711305 1.995972 -1.320703   5.319751

summary(scompare_v1(smargins(m,
                             gear = sort(unique(gear))),
                    var = "gear",
                    contrast = c(-0.5, -0.5, 1.0)))

##          gear    mean      sd   median  lower_2.5 upper_97.5
## 1 -0.5 -0.5 1 1.49402 1.27138 1.480393 -0.9675313   4.073178

Simple function based interface to comparisons

summary(scompare_v2(smargins(m,
                             gear = sort(unique(gear))),
                    var = "gear",
                    contrast.fun = function(a, b, c) a - b + 0*c))

##         gear      mean        sd    median lower_2.5 upper_97.5
## 1 comparison -1.038076 0.8496704 -1.037681 -2.693875   0.621474

Naming is a problem…

summary(scompare_v2b(smargins(m,
                             gear = sort(unique(gear))),
                    var = "gear",
                    contrast.funs = list("3 vs 4" = function(a, b, c) a - b + 0*c)))

##     gear      mean        sd    median lower_2.5 upper_97.5
## 1 3 vs 4 -1.010048 0.8244786 -1.015673 -2.612933  0.6068117

summary(scompare_v2b(smargins(m,
                             gear = sort(unique(gear))),
                    var = "gear",
                    contrast.funs = list("3 and 4 vs 5" = function(a, b, c) .5*a + .5*b + - c)))

##           gear      mean       sd    median lower_2.5 upper_97.5
## 1 3 and 4 vs 5 -1.541021 1.298242 -1.534998 -4.101106  0.9724547

After looking at these examples I’m not sure we have a clear winner yet. Perhaps there is another way that I haven’t thought of yet? If you think of one please let me know by opening an issue at https://github.com/izahn/smargins/issues

smargins Questions and Discussion

2017-12-06

Specify x variable names or values?

Interface for comparisons

Matrix interface for comparisons

Simple function based interface to comparisons

Contents