Computes dominance statistics for predictive modeling functions that accept a formula.
Usage
domin(
formula_overall,
reg,
fitstat,
sets = NULL,
all = NULL,
conditional = TRUE,
complete = TRUE,
consmodel = NULL,
reverse = FALSE,
...
)Arguments
- formula_overall
An object of class
formulaor that can be coerced to classformulafor use in the modeling function inreg. Thetermson the right hand side of this formula are used as separate entries to the dominance analysis.A valid
formula_overallentry is necessary, even if only submitting entries insets, to define a valid left hand side of the prediction equation (see examples). The function called inregmust accept one or more responses on the left hand side.- reg
A function implementing the predictive (or "reg"ression) model called.
String function names (e.g., "lm"), function names (e.g.,
lm), or anonymous functions (e.g.,function(x) lm(x)) are acceptable entries. This argument's contents are passed todo.calland thus any function calldo.callwould accept is valid.The predictive model in
regmust accept aformulaobject as its first argument or must be adapted to do so with a wrapper function.- fitstat
List providing arguments to call a fit statistic extracting function (see details). The
fitstatlist must be of at least length two.The first element of
fitstatmust be a function implementing the fit statistic extraction. String function names (e.g., "summary"), function names (e.g.,summary), or anonymous functions (e.g.,function(x) summary(x)) are acceptable entries. This element's contents are passed todo.calland thus any function calldo.callwould accept is valid.The second element of
fitstatmust be the named element of the list or vector produced by the fit extractor function called in the first element offitstat. This element must be a string (e.g., "r.squared").All list elements beyond the second are submitted as additional arguments to the fit extractor function call.
The fit statistic extractor function in the first list element of
fitstatmust accept the model object produced by the predictive modeling function inregas its first argument or be adapted to do so with a wrapper function.The fit statistic produced must be scalar valued (i.e., vector of length 1).
- sets
A list with each element comprised of vectors containing variable/factor names or
formulacoercible strings.Each separate list element-vector in
setsis concatenated (when the list element-vector is of length > 1) and used as an entry to the dominance analysis along with the terms informula_overall.- all
A vector of variable/factor names or
formulacoercible strings. The entries in this vector are concatenated (when of length > 1) but are not used in the dominance analysis. Rather the value of the fit statistic associated with these terms is removed from the dominance analysis; this vector is used like a set of covariates.The entries in
allare removed from and considered an additional component that explains the fit metric. As a result, the general dominance statistics will no longer sum to the overall fit metric and the standardized vector will no longer sum to 1.- conditional
Logical. If
FALSEthen conditional dominance matrix is not computed.If conditional dominance is not desired as an importance criterion, avoiding computing the conditional dominance matrix can save computation time.
- complete
Logical. If
FALSEthen complete dominance matrix is not computed.If complete dominance is not desired as an importance criterion, avoiding computing complete dominance designations can save computation time.
- consmodel
A vector of variable/factor names,
formulacoercible strings, or other formula terms (i.e., 1 to indicate an intercept). The entries in this vector are concatenated (when of length > 1) and, like the entries ofall, are not used in the dominance analysis; this vector is used as an adjustment to the baseline value of the overall fit statistic.The use of
consmodelchanges the interpretation of the the general and conditional dominance statistics. Whenconsmodelis used, the general and conditional dominance statistics are reflect the difference between the constant model and the overall fit statistic values.Typical usage of
consmodelis to pass "1" to set the intercept as the baseline and control for its value when the baseline model's fit statistic value is not 0 (e.g., if using the AIC or BIC as a fit statistic; see examples).As such, this vector is used to set a baseline for the fit statistic when it is non-0.
- reverse
Logical. If
TRUEthen standardized vector, ranks, and complete dominance designations are reversed in their interpretation.This argument should be changed to
TRUEif the fit statistic used decreases with better fit to the data (e.g., AIC, BIC).- ...
Additional arguments passed to the function call in the
regargument.
Value
Returns an object of class "domin".
An object of class "domin" is a list composed of the following elements:
General_DominanceVector of general dominance statistics.
StandardizedVector of general dominance statistics normalized to sum to 1.
RanksVector of ranks applied to the general dominance statistics.
Conditional_DominanceMatrix of conditional dominance statistics. Each row represents a term; each column represents an order of terms.
Complete_DominanceLogical matrix of complete dominance designations. The term represented in each row indicates dominance status; the terms represented in each columns indicates dominated-by status.
Fit_Statistic_OverallValue of fit statistic for the full model.
Fit_Statistic_All_SubsetsValue of fit statistic associated with terms in
all.Fit_Statistic_Constant_ModelValue of fit statistic associated with terms in
consmodel.CallThe matched call.
Subset_DetailsList containing the full model and descriptions of terms in the full model by source.
Details
domin automates the computation of all possible combination of entries to the dominance analysis (DA), the creation of formula objects based on those entries, the modeling calls/fit statistic capture, and the computation of all the dominance statistics for the user.
domin accepts only a "deconstructed" set of inputs and "reconstructs" them prior to formulating a coherent predictive modeling call.
One specific instance of this deconstruction is in generating the number of entries to the DA. The number of entries is taken as all the terms from formula_overall and the separate list element vectors from sets. The entries themselves are concatenated into a single formula, combined with the entries in all, and submitted to the predictive modeling function in reg. Each different combination of entries to the DA forms a different formula and thus a different model to estimate.
For example, consider this domin call:
domin(y ~ x1 + x2, lm, list(summary, "r.squared"), sets = list(c("x3", "x4")), all = c("c1", "c2"), data = mydata))
This call records three entries and results in seven (i.e., \(2^3 - 1\)) different combinations:
x1
x2
x3, x4
x1, x2
x1, x3, x4
x2, x3, x4
x1, x2, x3, x4
domin parses formula_overall to obtain all the terms in it and combines them with sets. When parsing formula_overall, only the processing that is available in the stats package is applied. Note that domin is not programmed to process terms of order > 1 (i.e., interactions/products) appropriately (i.e., only include in the presence of lower order component terms). domin also does not allow offset terms.
From these combinations, the predictive models are constructed and called. The predictive model call includes the entries in all, applies the appropriate formula, and reconstructs the function itself. The seven combinations above imply the following series of predictive model calls:
lm(y ~ x1 + c1 + c2, data = mydata)lm(y ~ x2 + c1 + c2, data = mydata)lm(y ~ x3 + x4 + c1 + c2, data = mydata)lm(y ~ x1 + x2 + c1 + c2, data = mydata)lm(y ~ x1 + x3 + x4 + c1 + c2, data = mydata)lm(y ~ x2 + x3 + x4 + c1 + c2, data = mydata)lm(y ~ x1 + x2 + x3 + x4 + c1 + c2, data = mydata)
It is possible to use a domin with only sets (i.e., no IVs in formula_overall; see examples below). There must be at least two entries to the DA for domin to run.
All the called predictive models are submitted to the fit extractor function implied by the entries in fitstat. Again applying the example above, all seven predictive models' objects would be individually passed as follows:
summary(lm_obj)["r.squared"]
where lm_obj is the model object returned by lm.
The entries to fitstat must be as a list and follow a specific structure:
list(fit_function, element_name, ...)
fit_functionFirst element and function to be applied to the object produced by the
regfunctionelement_nameSecond element and name of the element from the object returned by
fit_functionto be used as a fit statistic. The fit statistic must be scalar-valued/length 1...Subsequent elements and are additional arguments passed to
fit_function
In the case that the model object returned by reg includes its own fit statistic without the need for an extractor function, the user can apply an anonymous function following the required format to extract it.
Notes
domin is an R port of the Stata command with the same name (see Luchman, 2021).
domin has been superseded by domir.
References
Luchman, J. N. (2021). Relative importance analysis in Stata using dominance analysis: domin and domme. The Stata Journal, 21, 2. doi: 10.1177/1536867X211025837.
Examples
## Basic linear model with r-square
domin(mpg ~ am + vs + cyl,
lm,
list("summary", "r.squared"),
data = mtcars)
#> Overall Fit Statistic: 0.7619773
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am 0.1774892 0.2329324 3
#> vs 0.2027032 0.2660226 2
#> cyl 0.3817849 0.5010450 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> am 0.3597989 0.1389842 0.033684441
#> vs 0.4409477 0.1641982 0.002963748
#> cyl 0.7261800 0.3432799 0.075894823
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?vs Dmnated?cyl
#> Dmnates?am NA NA FALSE
#> Dmnates?vs NA NA FALSE
#> Dmnates?cyl TRUE TRUE NA
#>
## Linear model including sets
domin(mpg ~ am + vs + cyl,
lm,
list("summary", "r.squared"),
data = mtcars,
sets = list(c("carb", "gear"), c("disp", "wt")))
#> Overall Fit Statistic: 0.851596
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am 0.09446712 0.1109295 5
#> vs 0.10957434 0.1286694 4
#> cyl 0.19767129 0.2321186 3
#> set1 0.20978183 0.2463396 2
#> set2 0.24010141 0.2819429 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3 IVs: 4 IVs: 5
#> am 0.3597989 0.07688044 0.01944026 0.010342235 0.0058737118
#> vs 0.4409477 0.09276443 0.01167477 0.001799976 0.0006848322
#> cyl 0.7261800 0.19877978 0.04304518 0.015251991 0.0050994979
#> set1 0.7343966 0.20916653 0.05695739 0.030887988 0.0175006137
#> set2 0.7809306 0.24778381 0.08623238 0.051865275 0.0336949874
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?vs Dmnated?cyl Dmnated?set1 Dmnated?set2
#> Dmnates?am NA NA NA FALSE FALSE
#> Dmnates?vs NA NA FALSE FALSE FALSE
#> Dmnates?cyl NA TRUE NA FALSE FALSE
#> Dmnates?set1 TRUE TRUE TRUE NA FALSE
#> Dmnates?set2 TRUE TRUE TRUE TRUE NA
#>
#> Components of sets:
#> set1 : carb gear
#> set2 : disp wt
#>
## Multivariate linear model with custom multivariate r-square function
## and all subsets variable
Rxy <- function(obj, names, data) {
return(list("r2" = cancor(predict(obj),
as.data.frame(mget(names, as.environment(data))))[["cor"]][1]^2))
}
domin(cbind(wt, mpg) ~ vs + cyl + am,
lm,
list(Rxy, "r2", c("mpg", "wt"), mtcars),
data = mtcars,
all = c("carb"))
#> Overall Fit Statistic: 0.8384378
#> All Subsets Fit Statistic: 0.3137993
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> vs 0.07169528 0.08551056 3
#> cyl 0.21206213 0.25292531 2
#> am 0.24088099 0.28729741 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> vs 0.1761112 0.03768107 0.001293536
#> cyl 0.4307370 0.17804791 0.027401478
#> am 0.4311156 0.20686678 0.084660605
#>
#> Complete Dominance Designations:
#> Dmnated?vs Dmnated?cyl Dmnated?am
#> Dmnates?vs NA FALSE FALSE
#> Dmnates?cyl TRUE NA FALSE
#> Dmnates?am TRUE TRUE NA
#>
#> All subsets variables: carb
## Sets only
domin(mpg ~ 1,
lm,
list("summary", "r.squared"),
data = mtcars,
sets = list(c("am", "vs"), c("cyl", "disp"), c("qsec", "carb")))
#> Overall Fit Statistic: 0.8344278
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> set1 0.3197289 0.3831715 2
#> set2 0.3677706 0.4407459 1
#> set3 0.1469282 0.1760826 3
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> set1 0.6860825 0.24773691 0.02536743
#> set2 0.7595658 0.29577856 0.04796742
#> set3 0.3092531 0.07493621 0.05659539
#>
#> Complete Dominance Designations:
#> Dmnated?set1 Dmnated?set2 Dmnated?set3
#> Dmnates?set1 NA FALSE NA
#> Dmnates?set2 TRUE NA NA
#> Dmnates?set3 NA NA NA
#>
#> Components of sets:
#> set1 : am vs
#> set2 : cyl disp
#> set3 : qsec carb
#>
## Constant model using AIC
domin(mpg ~ am + carb + cyl,
lm,
list(function(x) list(aic = extractAIC(x)[[2]]), "aic"),
data = mtcars,
reverse = TRUE, consmodel = "1")
#> Overall Fit Statistic: 68.57997
#> Constant Model Fit Statistic: 115.9434
#>
#> General Dominance Statistics:
#> General Dominance Standardized Ranks
#> am -11.392847 0.2405407 2
#> carb -8.862961 0.1871265 3
#> cyl -27.107674 0.5723328 1
#>
#> Conditional Dominance Statistics:
#> IVs: 1 IVs: 2 IVs: 3
#> am -12.271136 -13.71691 -8.190499
#> carb -9.574847 -11.18702 -5.827017
#> cyl -39.449099 -29.43173 -12.442191
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?carb Dmnated?cyl
#> Dmnates?am NA TRUE FALSE
#> Dmnates?carb FALSE NA FALSE
#> Dmnates?cyl TRUE TRUE NA
#>
