Package 'polywog' reference manual

Title:	Bootstrapped Basis Regression with Oracle Model Selection
Description:	Routines for flexible functional form estimation via basis regression, with model selection via the adaptive LASSO or SCAD to prevent overfitting.
Authors:	Brenton Kenkel and Curtis S. Signorino
Maintainer:	Brenton Kenkel <[email protected]>
License:	GPL (>= 2)
Version:	0.4-1
Built:	2025-03-24 04:08:00 UTC
Source:	https://github.com/brentonk/polywog

Bootstrapped basis regression with oracle model selection

Description

A package for flexible functional form estimation via bootstrapped basis regression with oracle model selection. This version of the software should be considered in beta. For bug reports and feature requests, please email Brenton Kenkel ([email protected]) or file an issue at https://github.com/brentonk/polywog-package/issues.

Acknowledgements

We are grateful to Tyson Chatangier for many helpful suggestions about an earlier version of the package. We also thank the Wallis Institute of Political Economy and the Theory and Statistics Research Lab at the University of Rochester for financial support during the writing of the package.

References

Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.

Bootstrap a fitted polywog model

Description

Nonparametric bootstrap of the polywog regression procedure. Can be run on a fitted model of class "polywog", or within the original procedure via the boot argument. The function control.bp can be used to pass options to bootPolywog when bootstrapping within polywog.

Usage

bootPolywog(model, nboot = 100, .parallel = FALSE, reuse.lambda = FALSE,
  reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04,
  nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000,
  min.prop = 0, report = FALSE, .matrixOnly = FALSE)

control.bp(.parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE,
  nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL,
  maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE)
bootPolywog(model, nboot = 100, .parallel = FALSE, reuse.lambda = FALSE,
  reuse.penwt = FALSE, nlambda = 100, lambda.min.ratio = 1e-04,
  nfolds = 10, thresh = NULL, maxit = NULL, maxtries = 1000,
  min.prop = 0, report = FALSE, .matrixOnly = FALSE)

control.bp(.parallel = FALSE, reuse.lambda = FALSE, reuse.penwt = FALSE,
  nlambda = 100, lambda.min.ratio = 1e-04, nfolds = 10, thresh = NULL,
  maxit = NULL, maxtries = 1000, min.prop = 0, report = FALSE)

Arguments

`model`	a fitted model of class `"polywog"`, typically the output of `polywog` or the `"polywog.fit"` element of the output of `cv.polywog`.
`nboot`	number of bootstrap iterations.
`.parallel`	logical: whether to perform computations in parallel using a backend registered with `foreach`.
`reuse.lambda`	logical: whether to use the penalization parameter from the original fit (`TRUE`), or to cross-validate within each iteration (`FALSE`, default).
`reuse.penwt`	logical: whether to use the penalty weights from the original fit (`TRUE`), or to re-calculate them within each iteration (`FALSE`, default).
`nlambda`	number of values of the penalty factor to examine in cross-validation, as in `polywog`.
`lambda.min.ratio`	ratio of the smallest value of the penalty factor to the largest, as in `polywog`.
`nfolds`	number of cross-validation folds to use.
`thresh`	convergence threshold, as in `polywog`. If `NULL`, use the same value as in the original model.
`maxit`	iteration limit for fitting, as in `polywog`. If `NULL`, use the same value as in the original model.
`maxtries`	maximum number of attempts to generate a bootstrap sample with a non-collinear model matrix (often problematic with lopsided binary regressors) before stopping and issuing an error message.
`min.prop`	for models with a binary response variable, minimum proportion of non-modal outcome to ensure is included in each bootstrap iteration (for example, set `min.prop = 0.1` to throw out any bootstrap iteration where less than 10 percent or more than 90 percent of the observations are 1's).
`report`	logical: whether to print a status bar. Not available if `.parallel = TRUE`.
`.matrixOnly`	logical: whether to return just the matrix of bootstrap coefficients (`TRUE`), or the originally supplied model with the bootstrap matrix as the `boot.matrix` element (`FALSE`, default).

Details

Parallel computation via the .parallel argument requires registation of a backend for %dopar%, as in polywog. In the case of bootPolywog, bootstrap fitting is carried out in parallel, while cross-validation to choose the penalization factor (assuming reuse.lambda = FALSE) is carried out sequentially within each iteration.

Value

If .matrixOnly = FALSE, the returned object is model with the bootstrap matrix included as its boot.matrix element. If .matrixOnly = TRUE, just the matrix is returned. In either case, the bootstrap matrix is a sparse matrix of class "dgCMatrix".

Author(s)

Brenton Kenkel and Curtis S. Signorino

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model without bootstrapping
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                thresh = 1e-4)
summary(fit1)

## Bootstrap the fitted model
fit2 <- bootPolywog(fit1, nboot = 5)
summary(fit2)

## Example of parallel processing on Mac/Unix via 'doMC'
## Not run: 
library(doMC)
registerDoMC()

fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE)

## End(Not run)

## Example of parallel processing on Windows via 'doSMP'
## Not run: 
library(doSMP)
w <- startWorkers()
registerDoSMP(w)

fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE)

stopWorkers(w)

## End(Not run)
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model without bootstrapping
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                thresh = 1e-4)
summary(fit1)

## Bootstrap the fitted model
fit2 <- bootPolywog(fit1, nboot = 5)
summary(fit2)

## Example of parallel processing on Mac/Unix via 'doMC'
## Not run: 
library(doMC)
registerDoMC()

fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE)

## End(Not run)

## Example of parallel processing on Windows via 'doSMP'
## Not run: 
library(doSMP)
w <- startWorkers()
registerDoSMP(w)

fit2 <- bootPolywog(fit1, nboot = 100, .parallel = TRUE)

stopWorkers(w)

## End(Not run)

Cross-validation of polynomial degree and penalization factor

Description

k-fold cross-validation to select the polynomial degree and penalization factor for a polywog model.

Usage

cv.polywog(formula, ..., degrees.cv = 1:3, nfolds = 10, model = TRUE,
  X = FALSE, y = FALSE)
cv.polywog(formula, ..., degrees.cv = 1:3, nfolds = 10, model = TRUE,
  X = FALSE, y = FALSE)

Arguments

`formula`	model formula specifying the response and input variables.
`...`	other arguments to be passed to `polywog`. Arguments related to the bootstrap will be ignored, as bootstrapping must be performed separately.
`degrees.cv`	vector of polynomial degrees to examine via cross-validation.
`nfolds`	number of folds to use in cross-validation.
`model`	logical: whether to include the model frame in the `"polywog"` object included in the output.
`X`	logical: whether to include the raw model matrix (i.e., the matrix of input variables prior to taking their polynomial expansion) in the `"polywog"` object included in the output.
`y`	logical: whether to include the response variable in the `"polywog"` object included in the output.

Details

When fitting with method = "scad", different fold assignments are used for each polynomial degree specified, because cv.ncvreg does not allow for custom fold assignments. This may affect the accuracy of the estimated cross-validation error for each degree. When method = "scad", the calls to polywog made by cv.polywog will issue warnings that the foldid argument is being ignored.

Value

An object of class "cv.polywog", a list containing:

results: A table of each degree tested, the optimal penalization factor $\lambda$ for that degree, and its cross-validation error.
degree.min: The polynomial degree giving the lowest cross-validation error.
polywog.fit: A polywog model, fit at the polynomial degree giving the lowest cross-validation error.

Because the returned object contains the fitted polywog model for the optimal degree, no additional runs of polywog are necessary to estimate coefficients or the penalization factor $\lambda$ . However, bootstrap results must be obtained by running bootPolywog on the "polywog.fit" element of the returned object, as in the examples below.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Examine degrees 1 through 4
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(39)
cv1 <- cv.polywog(prestige ~ education + income + type,
                  data = Prestige,
                  degrees.cv = 1:4,
                  nfolds = 10,
                  thresh = 1e-4)
print(cv1)

## Extract best model and bootstrap
fit1 <- cv1$polywog.fit
fit1 <- bootPolywog(fit1, nboot = 5)
summary(fit1)
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Examine degrees 1 through 4
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(39)
cv1 <- cv.polywog(prestige ~ education + income + type,
                  data = Prestige,
                  degrees.cv = 1:4,
                  nfolds = 10,
                  thresh = 1e-4)
print(cv1)

## Extract best model and bootstrap
fit1 <- cv1$polywog.fit
fit1 <- bootPolywog(fit1, nboot = 5)
summary(fit1)

Marginal effects for polywog models

Description

Computes average and observationwise marginal effects from a fitted polywog model.

Usage

## S3 method for class 'polywog'
margEff(object, xvar = NULL, drop = FALSE, ...)
## S3 method for class 'polywog'
margEff(object, xvar = NULL, drop = FALSE, ...)

Arguments

`object`	a fitted model of class `"polywog"`, typically the output of `polywog`. The object must have a `model` element, meaning it was fit with `model = TRUE`.
`xvar`	a character string containing the name of a raw input variable (from `object$varNames`). Partial matches are allowed.
`drop`	logical: whether to convert one-column matrices in the output to vectors.
`...`	other arguments, currently ignored.

Details

For input variables that are binary, logical, or factors, margEff.polywog computes a first difference with comparison to a reference category. All other variables are treated as continuous: the function computes the partial derivative of the fitted value with respect to the selected variable.

Value

If xvar is specified, a numeric object containing the marginal effect of the chosen variable at each observation in object$model. For factor variables, if there are more than two levels or drop = FALSE, the returned object is a matrix; otherwise it is a vector.

If xvar is NULL, a list of such results for each raw input variable in the model is returned.

In either case, the returned object is of class "margEff.polywog".

Author(s)

Brenton Kenkel and Curtis S. Signorino

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income | type,
                data = Prestige,
                degree = 2,
                thresh = 1e-4)

## Compute marginal effects for all variables
me1 <- margEff(fit1)
summary(me1)  # type was included linearly, hence constant effects

## Plotting density of the results
plot(me1)

## Can do the same when just examining a single variable
me2 <- margEff(fit1, xvar = "income")
summary(me2)
plot(me2)
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income | type,
                data = Prestige,
                degree = 2,
                thresh = 1e-4)

## Compute marginal effects for all variables
me1 <- margEff(fit1)
summary(me1)  # type was included linearly, hence constant effects

## Plotting density of the results
plot(me1)

## Can do the same when just examining a single variable
me2 <- margEff(fit1, xvar = "income")
summary(me2)
plot(me2)

Model frame of a polywog model

Description

Extracts the model frame from a fitted polywog model, as model.frame.lm does for a fitted lm model.

Usage

## S3 method for class 'polywog'
model.frame(formula, ...)
## S3 method for class 'polywog'
model.frame(formula, ...)

Arguments

`formula`	a fitted model of class `"polywog"` (the argument is named `formula` for consistency with the generic function `model.frame`)
`...`	other arguments, currently ignored (but may later be adapted for use as in `model.frame.lm`)

Value

A data frame containing the variables used to fit the model, with additional attributes (e.g., "terms") used to construct a model matrix.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Model matrix of a polywog model

Description

Constructs the design matrix used to fit a polywog model, similar to model.matrix.lm.

Usage

## S3 method for class 'polywog'
model.matrix(object, type = c("raw", "expanded"), ...)
## S3 method for class 'polywog'
model.matrix(object, type = c("raw", "expanded"), ...)

Arguments

`object`	a fitted model of class `"polywog"`
`type`	`"raw"`, the default, returns the non-expanded model matrix with no intercept (same number of columns as `object$polyTerms`). `"expanded"` returns the polynomial expansion used in fitting (number of columns equals `length(object$coefficients)`).
`...`	other arguments to be passed to further methods (typically only used internally)

Details

There are two types of model matrix a user might want to construct. First, there is the matrix of the raw input terms that go into the eventual polynomial expansion. Such a matrix can be obtained by using type = "raw" (the default). The other form of the model matrix is the full polynomial expansion, where each column contains some power of the raw inputs. This can be obtained by using type = "expanded".

Value

The design matrix of the specified model, consisting either of raw terms or the full polynomial expansion depending on the type argument.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Plot marginal effects

Description

Generates density plots of the observationwise marginal effects computed by margEff.polywog.

Usage

## S3 method for class 'margEff.polywog'
plot(x, ...)
## S3 method for class 'margEff.polywog'
plot(x, ...)

Arguments

`x`	output of `margEff.polywog`.
`...`	plotting parameters to be passed to `plot.density`.

Value

Data frame containing the variables whose densities were plotted, invisibly.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Univariate and bivariate fitted value plots

Description

Generates plots of the relationship between input variables and the expected value of the outcome, using predVals as a backend.

Usage

## S3 method for class 'polywog'
plot(x, which = NULL, ask = FALSE, auto.set.par = TRUE,
  interval = TRUE, level = 0.95, FUN3D = c("contour", "filled.contour",
  "wireframe", "persp3d"), control.plot = list(), ...)
## S3 method for class 'polywog'
plot(x, which = NULL, ask = FALSE, auto.set.par = TRUE,
  interval = TRUE, level = 0.95, FUN3D = c("contour", "filled.contour",
  "wireframe", "persp3d"), control.plot = list(), ...)

Arguments

`x`	a fitted model of class `"polywog"`, typically the output of `polywog`.
`which`	selection of variables to plot: a character vector containing one or two names of raw input variables (see `x$varNames`). May also be a numeric vector corresponding to indices of `x$varNames`. If `which = NULL`, a plot of each individual term will be generated.
`ask`	logical: whether to display an interactive menu of terms to select.
`auto.set.par`	logical: whether to temporarily change the graphics parameters so that multiple plots are displayed in one window (e.g., each univariate plot when `which = NULL`).
`interval`	logical: whether to display bootstrap confidence intervals around each fitted value. Not available for bivariate plots unless `FUN3d = "persp3d"`.
`level`	confidence level for the intervals.
`FUN3D`	which plotting function to use to generate bivariate plots. Valid options include `"contour"` (the default) and `"filled.contour"`; `"wireframe"`, which requires the lattice package; and `"persp3d"`, which requires the rgl package.
`control.plot`	list of arguments to be passed to the underlying plotting functions (e.g., axis labels and limits).
`...`	additional arguments to be passed to `predVals`.

Details

By default, a univariate plot generated by plot.polywog shows the relationship between the selected input variable and the expected outcome while holding all other covariates at "central" values (as in predVals). The values that the other variables are held out can be changed by supplying additional arguments to ..., as in the examples below.

Similarly, a bivariate plot shows the relationship between two input variables and the expected outcome while holding all else fixed. If either variable is binary or categorical, the plot will show the relationship between one variable and the expected outcome across each value/level of the other.

Value

An object of class preplot.polywog, invisibly. This is a data frame generated by predVals that contains all information used in plotting.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## All univariate relationships
plot(fit1, n = 20)

## Predicted prestige across occupational categories
plot(fit1, which = "type",
     control.plot = list(xlab = "occupational category"))

## Predicted prestige by education across occupational categories
plot(fit1, which = c("education", "type"), n = 20)

## Joint effect of education and income
plot(fit1, which = c("education", "income"), n = 10)

## Bring up interactive menu
## Not run: 
plot(fit1, ask = TRUE)

  # displays menu:
  # Select one or two variable numbers (separated by spaces), or 0 to exit:

  # 1: education
  # 2: income
  # 3: type

## End(Not run)
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## All univariate relationships
plot(fit1, n = 20)

## Predicted prestige across occupational categories
plot(fit1, which = "type",
     control.plot = list(xlab = "occupational category"))

## Predicted prestige by education across occupational categories
plot(fit1, which = c("education", "type"), n = 20)

## Joint effect of education and income
plot(fit1, which = c("education", "income"), n = 10)

## Bring up interactive menu
## Not run: 
plot(fit1, ask = TRUE)

  # displays menu:
  # Select one or two variable numbers (separated by spaces), or 0 to exit:

  # 1: education
  # 2: income
  # 3: type

## End(Not run)

Polynomial regression with oracle variable selection

Description

Fits a regression model using a polynomial basis expansion of the input variables, with penalization via the adaptive LASSO or SCAD to provide oracle variable selection.

Usage

polywog(formula, data, subset, weights, na.action, degree = 3,
  family = c("gaussian", "binomial"), method = c("alasso", "scad"),
  penwt.method = c("lm", "glm"), unpenalized = character(0),
  .parallel = FALSE, boot = 0, control.boot = control.bp(.parallel =
  .parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e-04,
  nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e-07,
  0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE,
  X = FALSE, y = FALSE)
polywog(formula, data, subset, weights, na.action, degree = 3,
  family = c("gaussian", "binomial"), method = c("alasso", "scad"),
  penwt.method = c("lm", "glm"), unpenalized = character(0),
  .parallel = FALSE, boot = 0, control.boot = control.bp(.parallel =
  .parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e-04,
  nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e-07,
  0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE,
  X = FALSE, y = FALSE)

Arguments

`formula`	model formula specifying the response and input variables. See "Details" for more information.
`data`	a data frame, list or environment containing the variables specified in the model formula.
`subset`	an optional vector specifying a subset of observations to be used in fitting.
`weights`	an optional vector specifying weights for each observation to be used in fitting.
`na.action`	a function specifying what to do with observations containing `NA`s (default `na.omit`).
`degree`	integer specifying the degree of the polynomial expansion of the input variables.
`family`	`"gaussian"` (default) or `"binomial"` for logistic regression (binary response only).
`method`	variable selection method: `"alasso"` (default) for adaptive LASSO or `"scad"` for SCAD. You can also select `method = "none"` to return the model matrix and other information without fitting.
`penwt.method`	estimator for obtaining first-stage estimates in logistic models when `method = "alasso"`: `"lm"` (default) for a linear probability model, `"glm"` for logistic regression.
`unpenalized`	names of model terms to be exempt from the adaptive penalty (only available when `method = "alasso"`).
`.parallel`	logical: whether to perform k-fold cross-validation in parallel (only available when `method = "alasso"`). See "Details" below for more information on parallel computation.
`boot`	number of bootstrap iterations (0 for no bootstrapping).
`control.boot`	list of arguments to be passed to `bootPolywog` when bootstrapping; see `control.bp`.
`lambda`	a vector of values from which the penalty factor is to be selected via k-fold cross-validation. `lambda` is left unspecified by default, in which case a sequence of values is generated automatically, controlled by the `nlambda` and `lambda.min.ratio` arguments. Naturally, k-fold cross-validation is skipped if `lambda` contains exactly one value.
`nlambda`	number of values of the penalty factor to examine via cross-validation if `lambda` is not specified in advance; see "Details".
`lambda.min.ratio`	ratio of the lowest value to the highest in the generated sequence of values of the penalty factor if `lambda` is not specified; see "Details".
`nfolds`	number of folds to use in cross-validation to select the penalization factor.
`foldid`	optional vector manually assigning fold numbers to each observation used for fitting (only available when `method = "alasso"`).
`thresh`	convergence threshold, passed as the `thresh` argument to `glmnet` when `method = "alasso"` and as the `eps` argument to `ncvreg` when `method = "scad"`.
`maxit`	maximum number of iterations to allow in adaptive LASSO or SCAD fitting.
`model`	logical: whether to include the model frame in the returned object.
`X`	logical: whether to include the raw design matrix (i.e., the matrix of input variables prior to taking their polynomial expansion) in the returned object.
`y`	logical: whether to include the response variable in the returned object.

Details

The design matrix for the regression is a polynomial basis expansion of the matrix of raw input variables. This includes all powers and interactions of the input variables up to the specified degree. For example, the following terms will be included in polywog(y ~ x1 + x2, degree = 3, ...):

terms of degree 0: intercept
terms of degree 1: x1, x2
terms of degree 2: x1^2, x2^2, x1*x2
terms of degree 3: x1^3, x2^3, x1*x2^2, x1^2*x2

To exclude certain terms from the basis expansion, use a model formula like y ~ x1 + x2 | z1 + z2. Only the degree 1 terms of z1 and z2 will be included.

It is possible that the "raw" basis expansion will be rank-deficient, such as if there are binary input variables (in which case $x_i = x_i^n$ for all $n > 0$ ). The procedure detects collinearity via qr and removes extraneous columns before fitting.

For both the adaptive LASSO and SCAD, the penalization factor $\lambda$ is chosen by k-fold cross-validation. The selected value minimizes the average mean squared error of out-of-sample fits. (To select both $\lambda$ and the polynomial degree simultaneously via cross-validation, see cv.polywog.)

The cross-validation process may be run in parallel via foreach by registering an appropriate backend and specifying .parallel = TRUE. The appropriate backend is system-specific; see foreach for information on selecting and registering a backend. The bootstrap iterations may also be run in parallel by specifying control.boot = control.bp(.parallel = TRUE).

Value

An object of class "polywog", a list containing:

coefficients

the estimated coefficients.

lambda

value of the penalty factor $\lambda$ used to fit the final model.

lambda.cv

a list containing the results of the cross-validation procedure used to select the penalty factor:

lambda: values of the penalty factor tested in cross-validation.
cvError: out-of-fold prediction error corresponding to each value of lambda.
lambdaMin: value of lambda with the minimal cross-validation error.
errorMin: minimized value of the cross-validation error.

fitted.values

the fitted mean values for each observation used in fitting.

lmcoef

coefficients from an unpenalized least-squares regression of the response variable on the polynomial expansion of the input variables.

penwt

adaptive weight given to each term in the LASSO penalty (NULL for models fit via SCAD).

formula

model formula, as a Formula object.

degree

degree of the polynomial basis expansion.

family

model family, "gaussian" or "binomial".

weights

observation weights if specified.

method

the specified regularization method.

penwt.method

the specified method for calculating the adaptive LASSO weights (NULL for models fit via SCAD).

unpenalized

logical vector indicating which terms were not included in the LASSO penalty.

thresh

convergence threshold used in fitting.

maxit

iteration limit used in fitting.

terms

the terms object used to construct the model frame.

polyTerms

a matrix indicating the power of each raw input term (columns) in each term of the polynomial expansion used in fitting (rows).

nobs

the number of observations used to fit the model.

na.action

information on how NA values in the input data were handled.

xlevels

levels of factor variables used in fitting.

varNames

names of the raw input variables included in the model formula.

call

the original function call.

model

if model = TRUE, the model frame used in fitting; otherwise NULL.

X

if X = TRUE, the raw model matrix (i.e., prior to taking the polynomial expansion); otherwise NULL. For calculating the expanded model matrix, see model.matrix.polywog.

y

if y = TRUE, the response variable used in fitting; otherwise NULL.

boot.matrix

if boot > 0, a sparse matrix of class "dgCMatrix" where each column is the estimate from a bootstrap replicate. See bootPolywog for more information on bootstrapping.

Author(s)

Brenton Kenkel and Curtis S. Signorino

References

Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## Basic information
print(fit1)
summary(fit1)

## See how fitted values change with education holding all else fixed
predVals(fit1, "education", n = 10)

## Plot univariate relationships
plot(fit1)

## Use SCAD instead of adaptive LASSO
fit2 <- update(fit1, method = "scad", thresh = 1e-3)
cbind(coef(fit1), coef(fit2))
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## Basic information
print(fit1)
summary(fit1)

## See how fitted values change with education holding all else fixed
predVals(fit1, "education", n = 10)

## Plot univariate relationships
plot(fit1)

## Use SCAD instead of adaptive LASSO
fit2 <- update(fit1, method = "scad", thresh = 1e-3)
cbind(coef(fit1), coef(fit2))

Predict method for polywog objects

Description

Generates fitted values, including bootstrap confidence intervals, for in- and out-of-sample data from a fitted polywog model.

Usage

## S3 method for class 'polywog'
predict(object, newdata, type = c("link", "response"),
  interval = FALSE, level = 0.95, bag = FALSE, na.action = na.pass, ...)
## S3 method for class 'polywog'
predict(object, newdata, type = c("link", "response"),
  interval = FALSE, level = 0.95, bag = FALSE, na.action = na.pass, ...)

Arguments

`object`	a fitted model of class `"polywog"`, typically the output of `polywog`.
`newdata`	an optional data frame containing observations for which fitted values should be computed. If not specified, fitted values are generated for the data used to fit the model.
`type`	specifies whether the fitted values should be generated on the link scale ( $X \beta$ ) or in terms of the expected value of the response variable. These only differ for binomial family models.
`interval`	logical: whether to calculate bootstrap confidence intervals for each fitted value.
`level`	confidence level for the intervals.
`bag`	logical: whether to use "bootstrap aggregation" to generate the main fitted values (if `FALSE`, they are calculated from the main model fit).
`na.action`	a function specifying what to do with observations in `newdata` containing `NA`s (default `na.pass`). See "Details".
`...`	other arguments, currently ignored.

Value

If interval = TRUE, a matrix containing each fitted value and its confidence interval. Otherwise, a vector containing the fitted values.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Easy computation of fitted values

Description

User-friendly generation of fitted values and their confidence intervals from models of class "polywog", using the "observed-value approach" advocated by Hanmer and Kalkan (2013).

Usage

predVals(model, xvars, data = model$model, xlims = list(), n = 100,
  interval = TRUE, level = 0.95, maxrows = 10000, report = FALSE,
  .parallel = FALSE, ...)
predVals(model, xvars, data = model$model, xlims = list(), n = 100,
  interval = TRUE, level = 0.95, maxrows = 10000, report = FALSE,
  .parallel = FALSE, ...)

Arguments

`model`	a fitted model of class `"polywog"`, typically the output of `polywog`.
`xvars`	a character vector containing names of raw input variables (from `model$varNames`). Partial matches are allowed.
`data`	data frame to treat as the observed sample (defaults to the data used to fit the supplied model)
`xlims`	named list of limits for the evaluation grid for each continuous variable in `xvars`. If not given, the variable's observed range is used.
`n`	number of grid points at which to evaluate each continuous variable in `xvars`.
`interval`	logical: whether to compute bootstrap confidence intervals for each fitted value.
`level`	confidence level for the intervals.
`maxrows`	maximum number of rows of output. Used to prevent accidental memory overruns when `xvars` contains more than two continuous variables.
`report`	logical: whether to print a status bar. Not available if `.parallel = TRUE`.
`.parallel`	logical: whether to perform bootstrap iterations in parallel using `foreach`. See the "Details" section of the `bootPolywog` documentation page for more on parallel computation.
`...`	other arguments, currently ignored

Details

predVals allows users to examine the estimated effects of input variables on the expected outcome using the coefficients returned by polywog. The procedure is designed so that, for a preliminary analysis, the user can simply specify the fitted model and the independent variable of interest, and quickly obtain predicted values.

The predicted values are generated according to Hanmer and Kalkan's (2013) observed-value approach, which takes the form of a nested loop. When xvars contains a single variable $X_m$ , the procedure is as follows:

For each level $x$ of $X_m$ in data (if $X_m$ is discrete) or each element $x$ of a grid over the range of $X_m$ in data (if $X_m$ is continuous):
1. For each observation $i$ of data:
  1. Set $X_{mi} = x$ , while holding all other variables $X_{-mi}$ at their observed levels
  2. Compute the predicted value of $Y_i$ for the modified observation $i$ , using the estimated model coefficients (as in predict.polywog)
2. The predicted value of $Y$ given $X_m = x$ is the average of the predictions computed in the previous step

This observed-value approach provides a better estimate of population average effects for nonlinear models than does the traditional approach, which is to vary $X_m$ across its levels/range while holding each other covariate to its mean or median in data (Hanmer and Kalkan 2013).

When xvars consists of multiple variables $X_1, \ldots, X_M$ , the predVals procedure is the same, except the outer loop is over every combination of their levels in data.

All confidence intervals are generated via the bootstrap. Specifically, predVals repeats the above procedure for each set of bootstrap coefficients and computes order statistics of the resulting set of averages (for each combination of levels of xvars). If model does not have a boot.matrix element (see bootPolywog), confidence intervals will not be computed.

Value

A data frame containing the fitted values and confidence intervals (if requested) for each combination of covariate values.

The returned data frame also inherits from class "preplot.polywog". This is used by plot.polywog, which calls predVals to compute the values to plot.

Author(s)

Brenton Kenkel and Curtis S. Signorino

References

Michael J. Hanmer and Kerem Ozan Kalkan. 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models." American Journal of Political Science 57(1):263–277.

Examples

## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## Predicted prestige across occupational categories
predVals(fit1, "type")

## Predicted prestige by education
predVals(fit1, "education", n = 10)

## Plotting
pred_income <- predVals(fit1, "income", n = 10)
plot(pred_income)
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)

## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
                data = Prestige,
                degree = 2,
                boot = 5,
                thresh = 1e-4)

## Predicted prestige across occupational categories
predVals(fit1, "type")

## Predicted prestige by education
predVals(fit1, "education", n = 10)

## Plotting
pred_income <- predVals(fit1, "income", n = 10)
plot(pred_income)

Summarize marginal effects

Description

Generates a table of the average marginal effects and quartiles (or other order statistics if requested) from a "margEff.polywog" object.

Usage

## S3 method for class 'margEff.polywog'
summary(object, probs = seq(0, 1, by = 0.25), ...)
## S3 method for class 'margEff.polywog'
summary(object, probs = seq(0, 1, by = 0.25), ...)

Arguments

`object`	output of `margEff.polywog`.
`probs`	order statistics to display.
`...`	other arguments, currently ignored.

Value

Table of results.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Summarize a fitted polywog model

Description

Generates a "regression table" to summarize the fitted model, including coefficients along with their bootstrapped standard errors and confidence intervals. If the fitted model does not have a boot.matrix element, the output will contain NAs for the standard errors, and confidence intervals will not be displayed.

Usage

## S3 method for class 'polywog'
summary(object, level = 0.95, prop0 = FALSE, ...)
## S3 method for class 'polywog'
summary(object, level = 0.95, prop0 = FALSE, ...)

Arguments

`object`	a fitted model of class `"polywog"`, typically the output of `polywog`.
`level`	width of the bootstrap confidence interval to compute for the model coefficients.
`prop0`	logical: whether to print the proportion of bootstrap iterations in which each coefficient was estimated as exactly 0. This may be informative but should not be interpreted as a p-value.
`...`	other arguments, currently ignored.

Value

An object of class "summary.polywog" whose elements are the "regression table" (coefficients) and additional information from the original fitted model.

Author(s)

Brenton Kenkel and Curtis S. Signorino

Package 'polywog'

Help Index

Bootstrapped basis regression with oracle model selection

Description

Acknowledgements

References

Bootstrap a fitted polywog model

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Cross-validation of polynomial degree and penalization factor

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Marginal effects for polywog models

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Model frame of a polywog model

Description

Usage

Arguments

Value

Author(s)

See Also

Model matrix of a polywog model

Description

Usage

Arguments

Details

Value

Author(s)

Plot marginal effects

Description

Usage

Arguments

Value

Author(s)

Univariate and bivariate fitted value plots

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Polynomial regression with oracle variable selection

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Predict method for polywog objects

Description

Usage

Arguments

Value

Author(s)

See Also

Easy computation of fitted values

Description

Usage

Arguments