Title: | Bootstrapped Basis Regression with Oracle Model Selection |
---|---|
Description: | Routines for flexible functional form estimation via basis regression, with model selection via the adaptive LASSO or SCAD to prevent overfitting. |
Authors: | Brenton Kenkel and Curtis S. Signorino |
Maintainer: | Brenton Kenkel <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4-1 |
Built: | 2025-01-23 03:54:46 UTC |
Source: | https://github.com/brentonk/polywog |
A package for flexible functional form estimation via bootstrapped basis regression with oracle model selection. This version of the software should be considered in beta. For bug reports and feature requests, please email Brenton Kenkel ([email protected]) or file an issue at https://github.com/brentonk/polywog-package/issues.
We are grateful to Tyson Chatangier for many helpful suggestions about an earlier version of the package. We also thank the Wallis Institute of Political Economy and the Theory and Statistics Research Lab at the University of Rochester for financial support during the writing of the package.
Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.
Nonparametric bootstrap of the polywog
regression procedure.
Can be run on a fitted model of class "polywog"
, or within the
original procedure via the boot
argument. The function
control.bp
can be used to pass options to bootPolywog
when
bootstrapping within polywog
.
bootPolywog(model, nboot = 100, .parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE, .matrixOnly = FALSE) control.bp(.parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE)
bootPolywog(model, nboot = 100, .parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE, .matrixOnly = FALSE) control.bp(.parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE)
model |
a fitted model of class |
nboot |
number of bootstrap iterations. |
.parallel |
logical: whether to perform computations in parallel
using a backend registered with |
reuse.lambda |
logical: whether to use the penalization parameter from
the original fit ( |
reuse.penwt |
logical: whether to use the penalty weights from the
original fit ( |
nlambda |
number of values of the penalty factor to examine in
cross-validation, as in |
lambda.min.ratio |
ratio of the smallest value of the penalty factor
to the largest, as in |
nfolds |
number of cross-validation folds to use. |
thresh |
convergence threshold, as in |
maxit |
iteration limit for fitting, as in |
maxtries |
maximum number of attempts to generate a bootstrap sample with a non-collinear model matrix (often problematic with lopsided binary regressors) before stopping and issuing an error message. |
min.prop |
for models with a binary response variable, minimum proportion of
non-modal outcome to ensure is included in each bootstrap iteration (for
example, set |
report |
logical: whether to print a status bar. Not available if
|
.matrixOnly |
logical: whether to return just the matrix of bootstrap
coefficients ( |
Parallel computation via the .parallel
argument requires
registation of a backend for %dopar%
, as
in polywog
. In the case of bootPolywog
, bootstrap
fitting is carried out in parallel, while cross-validation to choose the
penalization factor (assuming reuse.lambda = FALSE
) is carried
out sequentially within each iteration.
If .matrixOnly = FALSE
, the returned object is model
with the bootstrap matrix included as its boot.matrix
element. If
.matrixOnly = TRUE
, just the matrix is returned. In either case, the
bootstrap matrix is a sparse matrix of class
"dgCMatrix"
.
Brenton Kenkel and Curtis S. Signorino
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model without bootstrapping ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, thresh = 1e-4) summary(fit1) ## Bootstrap the fitted model fit2 <- bootPolywog(fit1, nboot = 5) summary(fit2) ## Example of parallel processing on Mac/Unix via 'doMC' ## Not run: library(doMC) registerDoMC() fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE) ## End(Not run) ## Example of parallel processing on Windows via 'doSMP' ## Not run: library(doSMP) w <- startWorkers() registerDoSMP(w) fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE) stopWorkers(w) ## End(Not run)
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model without bootstrapping ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, thresh = 1e-4) summary(fit1) ## Bootstrap the fitted model fit2 <- bootPolywog(fit1, nboot = 5) summary(fit2) ## Example of parallel processing on Mac/Unix via 'doMC' ## Not run: library(doMC) registerDoMC() fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE) ## End(Not run) ## Example of parallel processing on Windows via 'doSMP' ## Not run: library(doSMP) w <- startWorkers() registerDoSMP(w) fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE) stopWorkers(w) ## End(Not run)
k-fold cross-validation to select the polynomial degree and penalization
factor for a polywog
model.
cv.polywog(formula, ..., degrees.cv = 1:3, nfolds = 10, model = TRUE, X = FALSE, y = FALSE)
cv.polywog(formula, ..., degrees.cv = 1:3, nfolds = 10, model = TRUE, X = FALSE, y = FALSE)
formula |
model formula specifying the response and input variables. |
... |
other arguments to be passed to |
degrees.cv |
vector of polynomial degrees to examine via cross-validation. |
nfolds |
number of folds to use in cross-validation. |
model |
logical: whether to include the model frame in the
|
X |
logical: whether to include the raw model matrix (i.e., the
matrix of input variables prior to taking their polynomial expansion) in
the |
y |
logical: whether to include the response variable in the
|
When fitting with method = "scad"
, different fold assignments are
used for each polynomial degree specified, because cv.ncvreg
does not allow for custom fold assignments. This may affect the accuracy
of the estimated cross-validation error for each degree. When
method = "scad"
, the calls to polywog
made by
cv.polywog
will issue warnings that the foldid
argument is
being ignored.
An object of class "cv.polywog"
, a list containing:
results
A table of each degree tested, the optimal
penalization factor for that degree, and its
cross-validation error.
degree.min
The polynomial degree giving the lowest cross-validation error.
polywog.fit
A polywog
model, fit at the
polynomial degree giving the lowest cross-validation error.
Because the returned object contains the fitted polywog model for the
optimal degree, no additional runs of polywog
are necessary
to estimate coefficients or the penalization factor .
However, bootstrap results must be obtained by running
bootPolywog
on the "polywog.fit"
element of the
returned object, as in the examples below.
Brenton Kenkel and Curtis S. Signorino
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Examine degrees 1 through 4 ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(39) cv1 <- cv.polywog(prestige ~ education + income + type, data = Prestige, degrees.cv = 1:4, nfolds = 10, thresh = 1e-4) print(cv1) ## Extract best model and bootstrap fit1 <- cv1$polywog.fit fit1 <- bootPolywog(fit1, nboot = 5) summary(fit1)
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Examine degrees 1 through 4 ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(39) cv1 <- cv.polywog(prestige ~ education + income + type, data = Prestige, degrees.cv = 1:4, nfolds = 10, thresh = 1e-4) print(cv1) ## Extract best model and bootstrap fit1 <- cv1$polywog.fit fit1 <- bootPolywog(fit1, nboot = 5) summary(fit1)
Computes average and observationwise marginal effects from a fitted
polywog
model.
## S3 method for class 'polywog' margEff(object, xvar = NULL, drop = FALSE, ...)
## S3 method for class 'polywog' margEff(object, xvar = NULL, drop = FALSE, ...)
object |
a fitted model of class |
xvar |
a character string containing the name of a raw input variable
(from |
drop |
logical: whether to convert one-column matrices in the output to vectors. |
... |
other arguments, currently ignored. |
For input variables that are binary, logical, or factors,
margEff.polywog
computes a first difference with comparison to a
reference category. All other variables are treated as continuous:
the function computes the partial derivative of the fitted value with
respect to the selected variable.
If xvar
is specified, a numeric object containing
the marginal effect of the chosen variable at each observation in
object$model
. For factor variables, if there are more than two
levels or drop = FALSE
, the returned object is a matrix; otherwise it
is a vector.
If xvar
is NULL
, a list of such results for each raw input
variable in the model is returned.
In either case, the returned object is of class "margEff.polywog"
.
Brenton Kenkel and Curtis S. Signorino
To plot the density of the observationwise marginal effects, see
plot.margEff.polywog
. For a table of average marginal effects
and order statistics, summary.margEff.polywog
.
To compute fitted values, see predict.polywog
and
predVals
.
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income | type, data = Prestige, degree = 2, thresh = 1e-4) ## Compute marginal effects for all variables me1 <- margEff(fit1) summary(me1) # type was included linearly, hence constant effects ## Plotting density of the results plot(me1) ## Can do the same when just examining a single variable me2 <- margEff(fit1, xvar = "income") summary(me2) plot(me2)
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income | type, data = Prestige, degree = 2, thresh = 1e-4) ## Compute marginal effects for all variables me1 <- margEff(fit1) summary(me1) # type was included linearly, hence constant effects ## Plotting density of the results plot(me1) ## Can do the same when just examining a single variable me2 <- margEff(fit1, xvar = "income") summary(me2) plot(me2)
Extracts the model frame from a fitted polywog
model, as
model.frame.lm
does for a fitted lm
model.
## S3 method for class 'polywog' model.frame(formula, ...)
## S3 method for class 'polywog' model.frame(formula, ...)
formula |
a fitted model of class |
... |
other arguments, currently ignored (but may later be adapted
for use as in |
A data frame containing the variables used to fit the model, with
additional attributes (e.g., "terms"
) used to construct a model
matrix.
Brenton Kenkel and Curtis S. Signorino
model.matrix.polywog
for constructing the design
matrix.
Constructs the design matrix used to fit a polywog
model,
similar to model.matrix.lm
.
## S3 method for class 'polywog' model.matrix(object, type = c("raw", "expanded"), ...)
## S3 method for class 'polywog' model.matrix(object, type = c("raw", "expanded"), ...)
object |
a fitted model of class |
type |
|
... |
other arguments to be passed to further methods (typically only used internally) |
There are two types of model matrix a user might want to construct.
First, there is the matrix of the raw input terms that go into the
eventual polynomial expansion. Such a matrix can be obtained by using
type = "raw"
(the default). The other form of the model matrix is
the full polynomial expansion, where each column contains some power of
the raw inputs. This can be obtained by using type = "expanded"
.
The design matrix of the specified model, consisting either of raw
terms or the full polynomial expansion depending on the type
argument.
Brenton Kenkel and Curtis S. Signorino
Generates density plots of the observationwise marginal effects computed by
margEff.polywog
.
## S3 method for class 'margEff.polywog' plot(x, ...)
## S3 method for class 'margEff.polywog' plot(x, ...)
x |
output of |
... |
plotting parameters to be passed to |
Data frame containing the variables whose densities were plotted, invisibly.
Brenton Kenkel and Curtis S. Signorino
Generates plots of the relationship between input variables and the expected
value of the outcome, using predVals
as a backend.
## S3 method for class 'polywog' plot(x, which = NULL, ask = FALSE, auto.set.par = TRUE, interval = TRUE, level = 0.95, FUN3D = c("contour", "filled.contour", "wireframe", "persp3d"), control.plot = list(), ...)
## S3 method for class 'polywog' plot(x, which = NULL, ask = FALSE, auto.set.par = TRUE, interval = TRUE, level = 0.95, FUN3D = c("contour", "filled.contour", "wireframe", "persp3d"), control.plot = list(), ...)
x |
a fitted model of class |
which |
selection of variables to plot: a character vector containing
one or two names of raw input variables (see |
ask |
logical: whether to display an interactive menu of terms to select. |
auto.set.par |
logical: whether to temporarily change the graphics
parameters so that multiple plots are displayed in one window (e.g., each
univariate plot when |
interval |
logical: whether to display bootstrap confidence intervals
around each fitted value. Not available for bivariate plots unless
|
level |
confidence level for the intervals. |
FUN3D |
which plotting function to use to generate bivariate plots.
Valid options include |
control.plot |
list of arguments to be passed to the underlying plotting functions (e.g., axis labels and limits). |
... |
additional arguments to be passed to |
By default, a univariate plot generated by plot.polywog
shows the
relationship between the selected input variable and the expected outcome
while holding all other covariates at "central" values (as in
predVals
). The values that the other variables are held out
can be changed by supplying additional arguments to ...
, as in the
examples below.
Similarly, a bivariate plot shows the relationship between two input variables and the expected outcome while holding all else fixed. If either variable is binary or categorical, the plot will show the relationship between one variable and the expected outcome across each value/level of the other.
An object of class preplot.polywog
, invisibly. This is a
data frame generated by predVals
that contains all information
used in plotting.
Brenton Kenkel and Curtis S. Signorino
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## All univariate relationships plot(fit1, n = 20) ## Predicted prestige across occupational categories plot(fit1, which = "type", control.plot = list(xlab = "occupational category")) ## Predicted prestige by education across occupational categories plot(fit1, which = c("education", "type"), n = 20) ## Joint effect of education and income plot(fit1, which = c("education", "income"), n = 10) ## Bring up interactive menu ## Not run: plot(fit1, ask = TRUE) # displays menu: # Select one or two variable numbers (separated by spaces), or 0 to exit: # 1: education # 2: income # 3: type ## End(Not run)
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## All univariate relationships plot(fit1, n = 20) ## Predicted prestige across occupational categories plot(fit1, which = "type", control.plot = list(xlab = "occupational category")) ## Predicted prestige by education across occupational categories plot(fit1, which = c("education", "type"), n = 20) ## Joint effect of education and income plot(fit1, which = c("education", "income"), n = 10) ## Bring up interactive menu ## Not run: plot(fit1, ask = TRUE) # displays menu: # Select one or two variable numbers (separated by spaces), or 0 to exit: # 1: education # 2: income # 3: type ## End(Not run)
Fits a regression model using a polynomial basis expansion of the input variables, with penalization via the adaptive LASSO or SCAD to provide oracle variable selection.
polywog(formula, data, subset, weights, na.action, degree = 3, family = c("gaussian", "binomial"), method = c("alasso", "scad"), penwt.method = c("lm", "glm"), unpenalized = character(0), .parallel = FALSE, boot = 0, control.boot = control.bp(.parallel = .parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e-07, 0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE, X = FALSE, y = FALSE)
polywog(formula, data, subset, weights, na.action, degree = 3, family = c("gaussian", "binomial"), method = c("alasso", "scad"), penwt.method = c("lm", "glm"), unpenalized = character(0), .parallel = FALSE, boot = 0, control.boot = control.bp(.parallel = .parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e-07, 0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE, X = FALSE, y = FALSE)
formula |
model formula specifying the response and input variables. See "Details" for more information. |
data |
a data frame, list or environment containing the variables specified in the model formula. |
subset |
an optional vector specifying a subset of observations to be used in fitting. |
weights |
an optional vector specifying weights for each observation to be used in fitting. |
na.action |
a function specifying what to do with observations
containing |
degree |
integer specifying the degree of the polynomial expansion of the input variables. |
family |
|
method |
variable selection method: |
penwt.method |
estimator for obtaining first-stage estimates in
logistic models when |
unpenalized |
names of model terms to be exempt from the adaptive
penalty (only available when |
.parallel |
logical: whether to perform k-fold cross-validation in
parallel (only available when |
boot |
number of bootstrap iterations (0 for no bootstrapping). |
control.boot |
list of arguments to be passed to
|
lambda |
a vector of values from which the penalty factor is to be
selected via k-fold cross-validation. |
nlambda |
number of values of the penalty factor to examine via
cross-validation if |
lambda.min.ratio |
ratio of the lowest value to the highest in the
generated sequence of values of the penalty factor if |
nfolds |
number of folds to use in cross-validation to select the penalization factor. |
foldid |
optional vector manually assigning fold numbers to each
observation used for fitting (only available when |
thresh |
convergence threshold, passed as the |
maxit |
maximum number of iterations to allow in adaptive LASSO or SCAD fitting. |
model |
logical: whether to include the model frame in the returned object. |
X |
logical: whether to include the raw design matrix (i.e., the matrix of input variables prior to taking their polynomial expansion) in the returned object. |
y |
logical: whether to include the response variable in the returned object. |
The design matrix for the regression is a polynomial basis expansion of the
matrix of raw input variables. This includes all powers and interactions of
the input variables up to the specified degree
. For example, the
following terms will be included in polywog(y ~ x1 + x2, degree = 3,
...)
:
terms of degree 0: intercept
terms of degree 1: x1
, x2
terms of degree 2: x1^2
, x2^2
, x1*x2
terms of degree 3: x1^3
, x2^3
, x1*x2^2
,
x1^2*x2
To exclude certain terms from the basis expansion, use a model formula like
y ~ x1 + x2 | z1 + z2
. Only the degree 1 terms of z1
and
z2
will be included.
It is possible that the "raw" basis expansion will be rank-deficient, such
as if there are binary input variables (in which case for
all
). The procedure detects collinearity via
qr
and
removes extraneous columns before fitting.
For both the adaptive LASSO and SCAD, the penalization factor
is chosen by k-fold cross-validation. The selected value minimizes the
average mean squared error of out-of-sample fits. (To select both
and the polynomial degree simultaneously via cross-validation,
see
cv.polywog
.)
The cross-validation process may be run in parallel via
foreach
by registering an appropriate backend and specifying
.parallel = TRUE
. The appropriate backend is system-specific; see
foreach
for information on selecting and registering a
backend. The bootstrap iterations may also be run in parallel by
specifying control.boot = control.bp(.parallel = TRUE)
.
An object of class "polywog"
, a list containing:
coefficients
the estimated coefficients.
lambda
value of the penalty factor used to
fit the final model.
lambda.cv
a list containing the results of the cross-validation procedure used to select the penalty factor:
lambda
values of the penalty factor tested in cross-validation.
cvError
out-of-fold prediction error corresponding to
each value of lambda
.
lambdaMin
value of lambda
with the minimal
cross-validation error.
errorMin
minimized value of the cross-validation error.
fitted.values
the fitted mean values for each observation used in fitting.
lmcoef
coefficients from an unpenalized least-squares regression of the response variable on the polynomial expansion of the input variables.
penwt
adaptive weight given to each term in the LASSO
penalty (NULL
for models fit via SCAD).
formula
model formula, as a Formula
object.
degree
degree of the polynomial basis expansion.
family
model family, "gaussian"
or
"binomial"
.
weights
observation weights if specified.
method
the specified regularization method.
penwt.method
the specified method for calculating
the adaptive LASSO weights (NULL
for models fit via SCAD).
unpenalized
logical vector indicating which terms were not included in the LASSO penalty.
thresh
convergence threshold used in fitting.
maxit
iteration limit used in fitting.
terms
the terms
object used to construct the
model frame.
polyTerms
a matrix indicating the power of each raw input term (columns) in each term of the polynomial expansion used in fitting (rows).
nobs
the number of observations used to fit the model.
na.action
information on how NA
values in the input
data were handled.
xlevels
levels of factor variables used in fitting.
varNames
names of the raw input variables included in the model formula.
call
the original function call.
model
if model = TRUE
, the model frame used in
fitting; otherwise NULL
.
X
if X = TRUE
, the raw model matrix (i.e., prior to
taking the polynomial expansion); otherwise NULL
. For calculating
the expanded model matrix, see model.matrix.polywog
.
y
if y = TRUE
, the response variable used in
fitting; otherwise NULL
.
boot.matrix
if boot > 0
, a sparse matrix of class
"dgCMatrix"
where each column is the estimate from a
bootstrap replicate. See bootPolywog
for more information
on bootstrapping.
Brenton Kenkel and Curtis S. Signorino
Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.
To estimate variation via the bootstrap, see
bootPolywog
. To generate fitted values, see
predVals
(and the underlying method
predict.polywog
). For plots, see plot.polywog
.
The polynomial degree may be selected via cross-validation using
cv.polywog
.
Adaptive LASSO estimates are provided via glmnet
and
cv.glmnet
from the glmnet package. SCAD estimates are
via ncvreg
and cv.ncvreg
in the ncvreg
package.
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## Basic information print(fit1) summary(fit1) ## See how fitted values change with education holding all else fixed predVals(fit1, "education", n = 10) ## Plot univariate relationships plot(fit1) ## Use SCAD instead of adaptive LASSO fit2 <- update(fit1, method = "scad", thresh = 1e-3) cbind(coef(fit1), coef(fit2))
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## Basic information print(fit1) summary(fit1) ## See how fitted values change with education holding all else fixed predVals(fit1, "education", n = 10) ## Plot univariate relationships plot(fit1) ## Use SCAD instead of adaptive LASSO fit2 <- update(fit1, method = "scad", thresh = 1e-3) cbind(coef(fit1), coef(fit2))
Generates fitted values, including bootstrap confidence intervals, for in- and out-of-sample data from a fitted polywog model.
## S3 method for class 'polywog' predict(object, newdata, type = c("link", "response"), interval = FALSE, level = 0.95, bag = FALSE, na.action = na.pass, ...)
## S3 method for class 'polywog' predict(object, newdata, type = c("link", "response"), interval = FALSE, level = 0.95, bag = FALSE, na.action = na.pass, ...)
object |
a fitted model of class |
newdata |
an optional data frame containing observations for which fitted values should be computed. If not specified, fitted values are generated for the data used to fit the model. |
type |
specifies whether the fitted values should be generated on the
link scale ( |
interval |
logical: whether to calculate bootstrap confidence intervals for each fitted value. |
level |
confidence level for the intervals. |
bag |
logical: whether to use "bootstrap aggregation" to generate the
main fitted values (if |
na.action |
a function specifying what to do with observations in
|
... |
other arguments, currently ignored. |
If interval = TRUE
, a matrix containing each fitted value and
its confidence interval. Otherwise, a vector containing the fitted values.
Brenton Kenkel and Curtis S. Signorino
For more user-friendly generation of fitted values, see
predVals
. To compute marginal effects, see
margEff.polywog
.
User-friendly generation of fitted values and their confidence intervals
from models of class "polywog"
, using the "observed-value approach"
advocated by Hanmer and Kalkan (2013).
predVals(model, xvars, data = model$model, xlims = list(), n = 100, interval = TRUE, level = 0.95, maxrows = 10000, report = FALSE, .parallel = FALSE, ...)
predVals(model, xvars, data = model$model, xlims = list(), n = 100, interval = TRUE, level = 0.95, maxrows = 10000, report = FALSE, .parallel = FALSE, ...)
model |
a fitted model of class |
xvars |
a character vector containing names of raw input variables
(from |
data |
data frame to treat as the observed sample (defaults to the data used to fit the supplied model) |
xlims |
named list of limits for the evaluation grid for each
continuous variable in |
n |
number of grid points at which to evaluate each continuous variable
in |
interval |
logical: whether to compute bootstrap confidence intervals for each fitted value. |
level |
confidence level for the intervals. |
maxrows |
maximum number of rows of output. Used to prevent accidental
memory overruns when |
report |
logical: whether to print a status bar. Not available if
|
.parallel |
logical: whether to perform bootstrap iterations in
parallel using |
... |
other arguments, currently ignored |
predVals
allows users to examine the estimated effects of input
variables on the expected outcome using the coefficients returned by
polywog
. The procedure is designed so that, for a preliminary
analysis, the user can simply specify the fitted model and the independent
variable of interest, and quickly obtain predicted values.
The predicted values are generated according to Hanmer and Kalkan's (2013)
observed-value approach, which takes the form of a nested loop. When
xvars
contains a single variable , the procedure is as
follows:
For each level of
in
data
(if
is discrete) or each element
of a grid over the range of
in
data
(if is continuous):
For each observation of
data
:
Set , while holding all other variables
at their observed levels
Compute the predicted value of for the modified
observation
, using the estimated model coefficients (as in
predict.polywog
)
The predicted value of given
is the average of
the predictions computed in the previous step
This observed-value approach provides a better estimate of population
average effects for nonlinear models than does the traditional approach,
which is to vary across its levels/range while holding each
other covariate to its mean or median in
data
(Hanmer and Kalkan
2013).
When xvars
consists of multiple variables , the
predVals
procedure is the same, except the
outer loop is over every combination of their levels in
data
.
All confidence intervals are generated via the bootstrap. Specifically,
predVals
repeats the above procedure for each set of bootstrap
coefficients and computes order statistics of the resulting set of
averages (for each combination of levels of xvars
). If
model
does not have a boot.matrix
element (see
bootPolywog
), confidence intervals will not be computed.
A data frame containing the fitted values and confidence intervals (if requested) for each combination of covariate values.
The returned data frame also inherits from class "preplot.polywog"
.
This is used by plot.polywog
, which calls predVals
to
compute the values to plot.
Brenton Kenkel and Curtis S. Signorino
Michael J. Hanmer and Kerem Ozan Kalkan. 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models." American Journal of Political Science 57(1):263–277.
predict.polywog
for more flexible (but less
user-friendly) computation of fitted values. plot.polywog
for
plotting fitted values and their confidence intervals.
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## Predicted prestige across occupational categories predVals(fit1, "type") ## Predicted prestige by education predVals(fit1, "education", n = 10) ## Plotting pred_income <- predVals(fit1, "income", n = 10) plot(pred_income)
## Using occupational prestige data data(Prestige, package = "carData") Prestige <- transform(Prestige, income = income / 1000) ## Fit a polywog model with bootstrap iterations ## (note: using low convergence threshold to shorten computation time of the ## example, *not* recommended in practice!) set.seed(22) fit1 <- polywog(prestige ~ education + income + type, data = Prestige, degree = 2, boot = 5, thresh = 1e-4) ## Predicted prestige across occupational categories predVals(fit1, "type") ## Predicted prestige by education predVals(fit1, "education", n = 10) ## Plotting pred_income <- predVals(fit1, "income", n = 10) plot(pred_income)
Generates a table of the average marginal effects and quartiles (or other
order statistics if requested) from a "margEff.polywog"
object.
## S3 method for class 'margEff.polywog' summary(object, probs = seq(0, 1, by = 0.25), ...)
## S3 method for class 'margEff.polywog' summary(object, probs = seq(0, 1, by = 0.25), ...)
object |
output of |
probs |
order statistics to display. |
... |
other arguments, currently ignored. |
Table of results.
Brenton Kenkel and Curtis S. Signorino
Generates a "regression table" to summarize the fitted model, including
coefficients along with their bootstrapped standard errors and confidence
intervals. If the fitted model does not have a boot.matrix
element,
the output will contain NA
s for the standard errors, and confidence
intervals will not be displayed.
## S3 method for class 'polywog' summary(object, level = 0.95, prop0 = FALSE, ...)
## S3 method for class 'polywog' summary(object, level = 0.95, prop0 = FALSE, ...)
object |
a fitted model of class |
level |
width of the bootstrap confidence interval to compute for the model coefficients. |
prop0 |
logical: whether to print the proportion of bootstrap iterations in which each coefficient was estimated as exactly 0. This may be informative but should not be interpreted as a p-value. |
... |
other arguments, currently ignored. |
An object of class "summary.polywog"
whose elements are the
"regression table" (coefficients
) and additional information from the
original fitted model.
Brenton Kenkel and Curtis S. Signorino