Computes dominance statistics for predictive modeling functions that accept a formula
.
Usage
domin(
formula_overall,
reg,
fitstat,
sets = NULL,
all = NULL,
conditional = TRUE,
complete = TRUE,
consmodel = NULL,
reverse = FALSE,
...
)
Arguments
- formula_overall
An object of class
formula
or that can be coerced to classformula
for use in the modeling function inreg
. Theterms
on the right hand side of this formula are used as separate entries to the dominance analysis.A valid
formula_overall
entry is necessary, even if only submitting entries insets
, to define a valid left hand side of the prediction equation (see examples). The function called inreg
must accept one or more responses on the left hand side.- reg
A function implementing the predictive (or "reg"ression) model called.
String function names (e.g., "lm"), function names (e.g.,
lm
), or anonymous functions (e.g.,function(x) lm(x)
) are acceptable entries. This argument's contents are passed todo.call
and thus any function calldo.call
would accept is valid.The predictive model in
reg
must accept aformula
object as its first argument or must be adapted to do so with a wrapper function.- fitstat
List providing arguments to call a fit statistic extracting function (see details). The
fitstat
list must be of at least length two.The first element of
fitstat
must be a function implementing the fit statistic extraction. String function names (e.g., "summary"), function names (e.g.,summary
), or anonymous functions (e.g.,function(x) summary(x)
) are acceptable entries. This element's contents are passed todo.call
and thus any function calldo.call
would accept is valid.The second element of
fitstat
must be the named element of the list or vector produced by the fit extractor function called in the first element offitstat
. This element must be a string (e.g., "r.squared").All list elements beyond the second are submitted as additional arguments to the fit extractor function call.
The fit statistic extractor function in the first list element of
fitstat
must accept the model object produced by the predictive modeling function inreg
as its first argument or be adapted to do so with a wrapper function.The fit statistic produced must be scalar valued (i.e., vector of length 1).
- sets
A list with each element comprised of vectors containing variable/factor names or
formula
coercible strings.Each separate list element-vector in
sets
is concatenated (when the list element-vector is of length > 1) and used as an entry to the dominance analysis along with the terms informula_overall
.- all
A vector of variable/factor names or
formula
coercible strings. The entries in this vector are concatenated (when of length > 1) but are not used in the dominance analysis. Rather the value of the fit statistic associated with these terms is removed from the dominance analysis; this vector is used like a set of covariates.The entries in
all
are removed from and considered an additional component that explains the fit metric. As a result, the general dominance statistics will no longer sum to the overall fit metric and the standardized vector will no longer sum to 1.- conditional
Logical. If
FALSE
then conditional dominance matrix is not computed.If conditional dominance is not desired as an importance criterion, avoiding computing the conditional dominance matrix can save computation time.
- complete
Logical. If
FALSE
then complete dominance matrix is not computed.If complete dominance is not desired as an importance criterion, avoiding computing complete dominance designations can save computation time.
- consmodel
A vector of variable/factor names,
formula
coercible strings, or other formula terms (i.e., 1 to indicate an intercept). The entries in this vector are concatenated (when of length > 1) and, like the entries ofall
, are not used in the dominance analysis; this vector is used as an adjustment to the baseline value of the overall fit statistic.The use of
consmodel
changes the interpretation of the the general and conditional dominance statistics. Whenconsmodel
is used, the general and conditional dominance statistics are reflect the difference between the constant model and the overall fit statistic values.Typical usage of
consmodel
is to pass "1" to set the intercept as the baseline and control for its value when the baseline model's fit statistic value is not 0 (e.g., if using the AIC or BIC as a fit statistic; see examples).As such, this vector is used to set a baseline for the fit statistic when it is non-0.
- reverse
Logical. If
TRUE
then standardized vector, ranks, and complete dominance designations are reversed in their interpretation.This argument should be changed to
TRUE
if the fit statistic used decreases with better fit to the data (e.g., AIC, BIC).- ...
Additional arguments passed to the function call in the
reg
argument.
Value
Returns an object of class
"domin".
An object of class "domin" is a list composed of the following elements:
General_Dominance
Vector of general dominance statistics.
Standardized
Vector of general dominance statistics normalized to sum to 1.
Ranks
Vector of ranks applied to the general dominance statistics.
Conditional_Dominance
Matrix of conditional dominance statistics. Each row represents a term; each column represents an order of terms.
Complete_Dominance
Logical matrix of complete dominance designations. The term represented in each row indicates dominance status; the terms represented in each columns indicates dominated-by status.
Fit_Statistic_Overall
Value of fit statistic for the full model.
Fit_Statistic_All_Subsets
Value of fit statistic associated with terms in
all
.Fit_Statistic_Constant_Model
Value of fit statistic associated with terms in
consmodel
.Call
The matched call.
Subset_Details
List containing the full model and descriptions of terms in the full model by source.
Details
domin
automates the computation of all possible combination of entries to the dominance analysis (DA), the creation of formula
objects based on those entries, the modeling calls/fit statistic capture, and the computation of all the dominance statistics for the user.
domin
accepts only a "deconstructed" set of inputs and "reconstructs" them prior to formulating a coherent predictive modeling call.
One specific instance of this deconstruction is in generating the number of entries to the DA. The number of entries is taken as all the terms
from formula_overall
and the separate list element vectors from sets
. The entries themselves are concatenated into a single formula, combined with the entries in all
, and submitted to the predictive modeling function in reg
. Each different combination of entries to the DA forms a different formula
and thus a different model to estimate.
For example, consider this domin
call:
domin(y ~ x1 + x2, lm, list(summary, "r.squared"), sets = list(c("x3", "x4")), all = c("c1", "c2"), data = mydata))
This call records three entries and results in seven (i.e., \(2^3 - 1\)) different combinations:
x1
x2
x3, x4
x1, x2
x1, x3, x4
x2, x3, x4
x1, x2, x3, x4
domin
parses formula_overall
to obtain all the terms in it and combines them with sets
. When parsing formula_overall
, only the processing that is available in the stats
package is applied. Note that domin
is not programmed to process terms of order > 1 (i.e., interactions/products) appropriately (i.e., only include in the presence of lower order component terms). domin
also does not allow offset
terms.
From these combinations, the predictive models are constructed and called. The predictive model call includes the entries in all
, applies the appropriate formula, and reconstructs the function itself. The seven combinations above imply the following series of predictive model calls:
lm(y ~ x1 + c1 + c2, data = mydata
)lm(y ~ x2 + c1 + c2, data = mydata
)lm(y ~ x3 + x4 + c1 + c2, data = mydata
)lm(y ~ x1 + x2 + c1 + c2, data = mydata
)lm(y ~ x1 + x3 + x4 + c1 + c2, data = mydata
)lm(y ~ x2 + x3 + x4 + c1 + c2, data = mydata
)lm(y ~ x1 + x2 + x3 + x4 + c1 + c2, data = mydata
)
It is possible to use a domin
with only sets (i.e., no IVs in formula_overall
; see examples below). There must be at least two entries to the DA for domin
to run.
All the called predictive models are submitted to the fit extractor function implied by the entries in fitstat
. Again applying the example above, all seven predictive models' objects would be individually passed as follows:
summary(lm_obj)["r.squared"]
where lm_obj
is the model object returned by lm
.
The entries to fitstat
must be as a list and follow a specific structure:
list(fit_function, element_name, ...)
fit_function
First element and function to be applied to the object produced by the
reg
functionelement_name
Second element and name of the element from the object returned by
fit_function
to be used as a fit statistic. The fit statistic must be scalar-valued/length 1...
Subsequent elements and are additional arguments passed to
fit_function
In the case that the model object returned by reg
includes its own fit statistic without the need for an extractor function, the user can apply an anonymous function following the required format to extract it.
Notes
domin
is an R port of the Stata command with the same name (see Luchman, 2021).
domin
has been superseded by domir
.
References
Luchman, J. N. (2021). Relative importance analysis in Stata using dominance analysis: domin and domme. The Stata Journal, 21, 2. doi: 10.1177/1536867X211025837.
Examples
## Basic linear model with r-square
domin(mpg ~ am + vs + cyl,
lm,
list("summary", "r.squared"),
data = mtcars)
#> Overall Fit Statistic: 0.7619773
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am 0.1774892 0.2329324 3
#> vs 0.2027032 0.2660226 2
#> cyl 0.3817849 0.5010450 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> am 0.3597989 0.1389842 0.033684441
#> vs 0.4409477 0.1641982 0.002963748
#> cyl 0.7261800 0.3432799 0.075894823
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?vs Dmnated?cyl
#> Dmnates?am NA NA FALSE
#> Dmnates?vs NA NA FALSE
#> Dmnates?cyl TRUE TRUE NA
#>
## Linear model including sets
domin(mpg ~ am + vs + cyl,
lm,
list("summary", "r.squared"),
data = mtcars,
sets = list(c("carb", "gear"), c("disp", "wt")))
#> Overall Fit Statistic: 0.851596
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am 0.09446712 0.1109295 5
#> vs 0.10957434 0.1286694 4
#> cyl 0.19767129 0.2321186 3
#> set1 0.20978183 0.2463396 2
#> set2 0.24010141 0.2819429 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3 IVs: 4 IVs: 5
#> am 0.3597989 0.07688044 0.01944026 0.010342235 0.0058737118
#> vs 0.4409477 0.09276443 0.01167477 0.001799976 0.0006848322
#> cyl 0.7261800 0.19877978 0.04304518 0.015251991 0.0050994979
#> set1 0.7343966 0.20916653 0.05695739 0.030887988 0.0175006137
#> set2 0.7809306 0.24778381 0.08623238 0.051865275 0.0336949874
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?vs Dmnated?cyl Dmnated?set1 Dmnated?set2
#> Dmnates?am NA NA NA FALSE FALSE
#> Dmnates?vs NA NA FALSE FALSE FALSE
#> Dmnates?cyl NA TRUE NA FALSE FALSE
#> Dmnates?set1 TRUE TRUE TRUE NA FALSE
#> Dmnates?set2 TRUE TRUE TRUE TRUE NA
#>
#> Components of sets:
#> set1 : carb gear
#> set2 : disp wt
#>
## Multivariate linear model with custom multivariate r-square function
## and all subsets variable
Rxy <- function(obj, names, data) {
return(list("r2" = cancor(predict(obj),
as.data.frame(mget(names, as.environment(data))))[["cor"]][1]^2))
}
domin(cbind(wt, mpg) ~ vs + cyl + am,
lm,
list(Rxy, "r2", c("mpg", "wt"), mtcars),
data = mtcars,
all = c("carb"))
#> Overall Fit Statistic: 0.8384378
#> All Subsets Fit Statistic: 0.3137993
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> vs 0.07169528 0.08551056 3
#> cyl 0.21206213 0.25292531 2
#> am 0.24088099 0.28729741 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> vs 0.1761112 0.03768107 0.001293536
#> cyl 0.4307370 0.17804791 0.027401478
#> am 0.4311156 0.20686678 0.084660605
#>
#> Complete Dominance Designations:
#> Dmnated?vs Dmnated?cyl Dmnated?am
#> Dmnates?vs NA FALSE FALSE
#> Dmnates?cyl TRUE NA FALSE
#> Dmnates?am TRUE TRUE NA
#>
#> All subsets variables: carb
## Sets only
domin(mpg ~ 1,
lm,
list("summary", "r.squared"),
data = mtcars,
sets = list(c("am", "vs"), c("cyl", "disp"), c("qsec", "carb")))
#> Overall Fit Statistic: 0.8344278
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> set1 0.3197289 0.3831715 2
#> set2 0.3677706 0.4407459 1
#> set3 0.1469282 0.1760826 3
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> set1 0.6860825 0.24773691 0.02536743
#> set2 0.7595658 0.29577856 0.04796742
#> set3 0.3092531 0.07493621 0.05659539
#>
#> Complete Dominance Designations:
#> Dmnated?set1 Dmnated?set2 Dmnated?set3
#> Dmnates?set1 NA FALSE NA
#> Dmnates?set2 TRUE NA NA
#> Dmnates?set3 NA NA NA
#>
#> Components of sets:
#> set1 : am vs
#> set2 : cyl disp
#> set3 : qsec carb
#>
## Constant model using AIC
domin(mpg ~ am + carb + cyl,
lm,
list(function(x) list(aic = extractAIC(x)[[2]]), "aic"),
data = mtcars,
reverse = TRUE, consmodel = "1")
#> Overall Fit Statistic: 68.57997
#> Constant Model Fit Statistic: 115.9434
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am -11.392847 0.2405407 2
#> carb -8.862961 0.1871265 3
#> cyl -27.107674 0.5723328 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> am -12.271136 -13.71691 -8.190499
#> carb -9.574847 -11.18702 -5.827017
#> cyl -39.449099 -29.43173 -12.442191
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?carb Dmnated?cyl
#> Dmnates?am NA TRUE FALSE
#> Dmnates?carb FALSE NA FALSE
#> Dmnates?cyl TRUE TRUE NA
#>