Package 'mctest'

Title: Multicollinearity Diagnostic Measures
Description: Package computes popular and widely used multicollinearity diagnostic measures <doi:10.17576/jsm-2019-4809-26> and <doi:10.32614/RJ-2016-062> . Package also indicates which regressors may be the reason of collinearity among regressors.
Authors: Imdad Ullah Muhammad [aut, cre] , Aslam Muhammad [aut, ctb]
Maintainer: Imdad Ullah Muhammad <[email protected]>
License: GPL (>= 2)
Version: 1.3.1
Built: 2024-11-01 05:39:21 UTC
Source: https://github.com/cran/mctest

Help Index


Overall and Individual Multicollinearity Diagnostic Measures

Description

R package for computing popular and widely used multicollinearity diagnostic measures.

Details

This package contains functions for computing overall and individual multicollinearity diagnostic measures. The overall multicollinearity diagnostic measures are Determinant of correlation matrix, R-squared from regression of all xx's on yy, Farrar and Glauber chi-square test for detecting the strength of collinearity over the complete set of regressors, Condition Index, Sum of reciprocal of Eigenvalues, Theil's and Red indicator. The individual multicollinearity diagnostic measures are Klein's rule, variance inflation factor (VIF), Tolerance (TOL), Corrected VIF (CVIF), Leamer's method, F & R2R^2 relation, Farrar & Glauber F-test, and IND1 & IND2 indicators proposed by the author. The package also indicates which regressors may be the reason of collinearity among regressors. The VIF values and eigenvalues can also be plotted. Some other statistics such as correlation matrix, Eigenvalues and condition indexes are also available in the package.

For a complete list of functions, use library(help="mctest").

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam


Eigenvalues and Variance Decomposition Proportion

Description

Computes eigenvalues, condition indices and variance decomposition proportions of XXX'X or its related correlation matrix RR (see Belsley et al. (1980) <doi: 10.1007/BF00426854> ; Belsley, 1991; Kendall, 1957 and Silvey , 1969).

Usage

eigprop(mod, na.rm = TRUE, Inter = TRUE, prop = 0.5, ...)

Arguments

mod

A model object, not necessarily type lm

na.rm

Whether to remove missing observations.

Inter

Whether to include or exclude Intercept term, by default Inter =FALSE.

prop

variance proportion default threshold, prop=0.5.

...

Extra argument(s) if used will be ignored.

Details

The eigprop function can be used to detect the existence of multicollinearity among regressors. The function computes eigenvalues, condition indices and variance decomposition proportions of regression coefficients. To check the linear dependencies associated with the corresponding eigenvalue, the eigprop compares variance proportion with threshold value (default is 0.5) and displays the proportions greater than given threshold from each row and column, if any. If Inter = TRUE, eigenvalues, condition indices and variance proportions are computed without intercept term. A list object of class "eigp" is returned:

Value

The eigprop objects are:

ev

A vector of eigenvalues. By default Inter = TRUE, eigenvalues are returned with intercept term included in the X matrix.

ci

A vector of condition indices. By default Inter = TRUE, condition indices are returned with intercept term included in the X matrix.

call

The matched call.

Inter

logical, if TRUE (the default value) eigenvalues, condition indices and variance proportions are returned with intercept term included.

pi

A matrix of variance decomposition proportions. By default Inter = TRUE, variance decomposition proportions are returned with intercept term included in the X matrix.

prop

Default threshold proportion for comparison purpose.

Note

Missing values in data will be removed by default. There is no method for the detection of multicollinearity, if missing values exists in the data set.

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam

References

Belsely, D. A. A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1): 33–50, 1991.

Belsley, D. A., Kuh, E., and Welsch, R. E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley \& Sons, New York, 1980.

Imdad, M. U. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R (Doctoral Thesis, Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan), 2017.

Imdadullah, M., Aslam, M., and Altaf, S. mctest: An R Package for Detection of Collinearity Among Regressors. The R Journal, 8(2):499–509, 2016.

Silvey, S. D. Multicollinearity and imprecise estimation. Journal of the Royal Statistical Society, Series B (Methodological), 31(3):539–552, 1969.

See Also

Overall collinearity diagnostics omcdiag, Individual collinearity diagnostics imcdiag

Examples

## Hald Cement data
data(Hald)
model <- lm(y~X1+X2+X3+X4, data = as.data.frame(Hald))

# with Intercept term
eigprop(model)

# without Intercept term
eigprop(model, Inter = FALSE)

# different proportion threshold
eigprop(model, prop = 0.45)

# only variance proportions
eigprop(model)$pi

# only condition indices
eigprop(model)$ci

# only eigenvalues
eigprop(model)$ev

Portland Cement benchmark of Hald(1952)

Description

Heat evolved during setting of 13 cement mixtures of four basic ingredients. Each ingredient percentage appears to be rounded down to a full integer. The sum of the four mixture percentages varies from a maximum of 99% to a minimum of 95%. If all four regressor X-variables always summed to 100%, the centered X-matrix would then be of rank only 3. Thus, the regression of heat on four X-percentages is ill-conditioned, with an approximate rank deficiency of MCAL = 1. The first column is the response and the remaining four columns are the predictors.

The Hald data as used by Hoerl, Kennard and Baldwin (1975). These data are also in package wle.

Usage

data(Hald)

Format

A data frame with 13 observations on the following 5 variables.

Y

Heat (cals/gm) evolved in setting, recorded to nearest tenth.

X1

Integer percentage of 3CaO.Al2O3 in the mixture.

X2

Integer percentage of 3CaO.SiO2 in the mixture.

X3

Integer percentage of 4CaO.Al2O3.Fe2O3 in the mixture.

X4

Integer percentage of 2CaO.SiO2 in the mixture.

Source

Woods H, Steinour HH, Starke HR. "Effect of composition of Portland cement on heat evolved during hardening. Industrial Engineering and Chemistry 1932; 24: 1207-1214.

References

Ridge Regression: some simulations, Hoerl, A. E. et al, 1975, Comm Stat Theor Method 4:105

Examples

data(Hald)
y <- Hald[ , 1]
x <- Hald[ , -1]

Individual Multicollinearity Diagnostic Measures

Description

Computes different measures of multicollinearity diagnostics for each regressor in the design matrix XX. Individual measures includes variance Inflation factor (VIF) (Marquardt, 1970), Farrar F-test for determination of multicollinearity (Farrar and Glauber, 1967), Auxiliary F-test for relationship between F and R-square(Gujarati and Porter, 2008), Leamer's method (Greene, 2002), Corrected VIF (CVIF) Curto and Pinto (2011) <doi: 10.1080/02664763.2010.505956>, Klein's rule Klein (1962), and IND1 & IND2 (Imdad, et. al., 2019) <https://doi.org/10.17576/jsm-2019-4809-26> proposed by the researchers.

Usage

imcdiag(mod, method = NULL, na.rm = TRUE, corr = FALSE, 
              vif = 10, tol = 0.1, conf = 0.95, cvif = 10, ind1 = 0.02, 
              ind2 = 0.7, leamer = 0.1, all = FALSE, ...)

Arguments

mod

A model object, not necessarily type lm

na.rm

Whether to remove missing observations.

method

Specific individual measure of collinearity such as VIF, CVIF, and Leamer, etc. For example, method="VIF".

corr

Whether to display correlation matrix or not, by default corr=FALSE.

vif

Default threshold for VIF measure, vif=10.

tol

Default threshold for TOL measure, tol=0.10.

conf

Default confidence level for Farrar's Wi test, conf=0.99.

cvif

Default threshold for CVIF measure, CVIF=10.

ind1

Default threshold for IND1 indicator, ind1=0.02

ind2

Default threshold for IND2 indicator, ind2=0.7

leamer

Default threshold for Leamer's method, leamer=0.1.

all

Returns all individual measure of collinearity in a matrix of 0 (not detected) or 1 (detected).

...

Extra argument(s) if used will be ignored.

Details

The imcdiag function detects the existence of multicollinearity due to xx-variable. That's why named as individual measures of diagnostics. This includes VIF, TOL, Klein's rule, Farrar and Glauber F-test, F and R2R^2 relation, Leamer's method, CVIF, IND1, and IND2 diagnostic measures of multicollinearity. If method argument is used (method="VIF"), the VIF values for each regressor will be displayed with decision of either collinearity exists or not which is indicated by 0 (collinearity is not detected by method for regressor) and 1 (collinearity is detected by the method for regressor). If argument all=TRUE all individual measures of collinearity will be displayed in a matrix of 0 (collinearity is not detected) or 1 (collinearity is detected).

Value

This function detects the existence of multicollinearity by using different available diagnostic measures already available in literature. The function returns the value of diagnostic measures with decision of either collinearity is detected by the diagnostic measure or not. Value of 1 indicates that collinearity is detected and 0 indicates that measure could not detect the existence of collinearity. A list object of class "imc" is returned:

x

A numeric matrix of regressors.

y

A vector of response variable.

idiags

Listing of specific individual measure such as method="CVIF" provided. If method is not used all individual diagnostics will be displayed.

method

Specific individual collinearity measure, such as VIF, TOL, CVIF, IND1, and IND2 etc.

corr

Logical, if FALSE (the default value) a correlation matrix will not be displayed.

R2

R-square from regression of all regressors XX on response variable yy.

call

The matched call.

pval

Returns significant regressor as number after comparing the p-value of regressors from summary.lm function with 1conf1-conf.

all

If TRUE individual collinearity measures will be returned as a matrix of 0 or 1.

alldiag

Matrix of all individual collinearity measures indicated as either 0 (collinearity not detected) or 1 (collinearity detected) for each diagnostic measure and each regressor.

Note

Missing values in data will be removed by default. There is no method for the detection of multicollinearity, if missing values exists in the data set.

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam

References

Belsely, D. A. A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1): 33–50, 1991.

Belsley, D. A., Kuh, E., and Welsch, R. E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley \& Sons, New York, 1980.

Chatterjee, S. and Hadi, A. S. Regression Analysis by Example. John Wiley \& Sons, 4th edition, New York, 2006.

Curto, J. D. and Pinto, J. C. The Corrected VIF (CVIF). Journal of Applied Statistics, 38(7), 1499–1507.

Greene, W. H. Econometric Analysis. Prentice–Hall, Upper Saddle River, New Jersey, 4th edition, 2000.

Imdad, M. U. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R (Doctoral Thesis, Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan), 2017.

Imdadullah, M., Aslam, M., and Altaf, S. mctest: An R Package for Detection of Collinearity Among Regressors. The R Journal, 8(2):499–509, 2016.

Imdad, M. U., Aslam, M., Altaf, S., and Ahmed, M. Some New Diagnostics of Multicollinearity in Linear Regression Model. Sains Malaysiana, 48(2): 2051–2060, 2019.

See Also

Overall collinearity diagnostic omcdiag, collinearity plot mc.plot

Examples

## Hald Cement data
data(Hald)
model <- lm(y~X1+X2+X3+X4, data = as.data.frame(Hald))

## all Individual measures
id<-imcdiag(model); id$idiags[,1]

# VIF measure with custom VIF threshold
imcdiag(model, method = "VIF", vif = 5)

# IND1 measure with custom IND1 threshold and correlation matrix
imcdiag(model, method="IND1", ind1=0.01, corr=TRUE)

# CVIF measure with custom CVIF threshold and correlation matrix
imcdiag(model, method = "CVIF", cvif = 5, corr = TRUE)

# Collinearity Diagnostic measures in matrix of 0 or 1
imcdiag(model, all = TRUE)
imcdiag(model, method = "VIF", all = TRUE)

## only VIF values without collinearity detection indication
imcdiag(model, method = "VIF")[[1]][,1]
plot(imcdiag(model, method = "VIF")[[1]][,1]) # vif plot

Plot of VIF and Eigenvalues

Description

Plot of VIF and Eigenvalues for detection of multicollinearity among regressors. The VIF and Eigenvalues are also displayed on graph. Eigenvalues plot can be displayed with or without inclusion of intercept term.

Usage

mc.plot(mod, Inter = FALSE, vif = 10, ev = 0.01, ...)

Arguments

mod

A model object, not necessarily type lm

Inter

Whether to include or exclude Intercept term, by default Inter=FALSE.

vif

Threshold of VIF and will appear as horizontal line on VIF plot. The default value is vif=10.

ev

Threshold of Eigenvalues and will appear as horizontal line on Eigenvalues plot. The default value is ev=0.01.

...

Extra argument(s) if used will be ignored.

Details

mc.plot function draw graphs of VIF and Eigenvalues for graphical detection of collinearity among regression. Horizontal line for VIF and Eigenvalues is drawn as indication of threshold values of both VIF and Eigenvalues for testing the multicollinearity.

Value

Don't return any thing, it displays plot.

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam

References

Belsely, D. A. A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1): 33–50, 1991.

Belsley, D. A., Kuh, E., and Welsch, R. E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley \& Sons, New York, 1980.

Chatterjee, S. and Hadi, A. S. Regression Analysis by Example. John Wiley \& Sons, 4th edition, New York, 2006.

Greene, W. H. Econometric Analysis. Prentice–Hall, Upper Saddle River, New Jersey, 4th edition, 2000.

Imdad, M. U. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R (Doctoral Thesis, Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan), 2017.

Imdadullah, M., Aslam, M., and Altaf, S. mctest: An R Package for Detection of Collinearity Among Regressors. The R Journal, 8(2):499–509, 2016.

See Also

Overall collinearity diagnostic measures omcdiag, Individual collinearity diagnostic measures imcdiag

Examples

## Hald Cement data
data(Hald)
model <- lm(y~X1+X2+X3+X4, data = as.data.frame(Hald))

## plot with default threshold of VIF and Eigenvalues with no intercept
mc.plot(model)

## plot with default threshold of VIF and Eigenvalues with intercept
mc.plot(model, Inter = TRUE)

## plot with specific threshold of VIF and Eigenvalues with no intercept
mc.plot(model, vif = 5, ev = 20)

## plot with specific threshold of VIF and Eigenvalues with intercept
mc.plot(model, vif = 5, ev = 20, Inter = TRUE)

Multicollinearity diagnostic measures

Description

The function mctest display overall, individual or both types of multicollinearity diagnostic measures from omcdiag and imcdiag functions, respectively.

Usage

mctest(mod, type=c("o","i","b"), na.rm=TRUE, Inter=TRUE, method=NULL,
        corr=FALSE, detr=0.01, red=0.5, theil=0.5, cn=30, vif=10, tol=0.1,
        conf=0.95, cvif=10, ind1=0.02, ind2=0.7, leamer=0.1, all=FALSE, ...)

Arguments

mod

A model object, not necessarily type lm

na.rm

Whether to remove missing observations.

Inter

Whether to include or exclude Intercept term. By default Inter=TRUE.

type

Displays overall, individual or both type of collinearity diagnostics. Overall collinearity diagnostics are displayed by default with eigenvalues and condition indexes, when method and type argument are not used.

method

Specific individual measure of collinearity such as VIF, TOL, CVIF, Leamer, IND1, and IND2 etc, when method argument is used. For example, method="VIF".

corr

Whether to display correlation matrix or not Inter=TRUE.

detr

Determinant default threshold, detr=0.01.

red

Red indicator default threshold, red=0.5.

theil

Theil's indicator default threshold, theil=0.5.

cn

Condition number default threshold, cn=30.

vif

Default threshold for VIF measure, vif=10.

conf

Default confidence level for Farrar's test, conf=0.99.

cvif

Default threshold for CVIF measure, CVIF=10.

tol

Default threshold for TOL measure, TOL=0.10.

ind1

Default threshold for IND1 indicator, ind1=0.02.

ind2

Default threshold for IND2 indicator, ind2=0.7.

leamer

Default threshold for Leamer's method, leamer=0.1.

all

Returns all individual measure of collinearity in a matrix of 0 (not detected) or 1 (detected).

...

Extra argument(s) if used will be ignored.

Note

Missing values in data will be removed by default. There is no method for the detection of multicollinearity, if missing values exists in the data set

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam

References

Belsely, D. A. A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1): 33–50, 1991.

Belsley, D. A., Kuh, E., and Welsch, R. E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley \& Sons, New York, 1980.

Chatterjee, S. and Hadi, A. S. Regression Analysis by Example. John Wiley \& Sons, 4th edition, New York, 2006.

Greene, W. H. Econometric Analysis. Prentice–Hall, Upper Saddle River, New Jersey, 4th edition, 2000.

Imdad, M. U. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R (Doctoral Thesis, Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan), 2017.

Imdadullah, M., Aslam, M., and Altaf, S. mctest: An R Package for Detection of Collinearity Among Regressors. The R Journal, 8(2):499–509, 2016.

Imdad, M. U., Aslam, M., Altaf, S., and Ahmed, M. Some New Diagnostics of Multicollinearity in Linear Regression Model. Sains Malaysiana, 48(2): 2051–2060, 2019.

See Also

overall collinearity diagnostics omcdiag, individual collinearity diagnostics imcdiag, collinearity plots mc.plot

Examples

## Hald Cement data
data(Hald)
model <- lm(y~X1+X2+X3+X4, data = as.data.frame(Hald))

## Overall diagnostic measures and eigenvalues with intercept term
mctest(model)

## Overall diagnostic measures and eigenvalues without intercept term
mctest(model, Inter=FALSE)

## all individual diagnostic measures
mctest(model, type="i")

## certain individual diagnostic measures with collinearity detection indication
VIF<-mctest(model, type="i", method="VIF")
VIF[[1]][,1] # named VIF values only

IND1<-mctest(model, type="i", method="IND1")
IND1[[1]][,1] # named IND1 values only

## all individual diagnostic measures with correlation matrix
mctest(model, type="i", corr=TRUE)

## VIF and correlation matrix with collinearity detection indication
mctest(model, type="i", method="VIF", corr=TRUE)

## both overall and individual collinearity diagnostics
mctest(model, type="b")
mctest(model, type="b", method="VIF", cor=TRUE)

## all overall and vif with correlation matrix
## VIF and CN desired threshold
## eigenvalues without intercept term
mctest(model, type="b", method="VIF", Inter=FALSE, vif=15, cn=35)

## Individual collinearity diagnostic measures in matrix of 0 or 1
mctest(model, all = TRUE)
mctest(model, method = "VIF", all = TRUE)
mctest(model, type="b", all = TRUE)

Overall Multicollinearity Diagnostics Measures

Description

Computes different overall measures of multicollinearity diagnostics for matrix of regressors. Overall measures of collinearity detection includes Determinant of the correlation matrix (Cooley and Lohnes, 1971), Farrar test of chi-square for presence of multicollinearity (Farrar and Glauber, 1967), Red Indicator (Kovacs et al., 2015) <doi: 10.1111/j.1751-5823.2005.tb00156.x>, Sum of lambda inverse Chatterjee and Price (1977) values, Theil's indicator (Theil, 1971) and condition number (Belsley et al., 1980) <doi: 10.1007/BF00426854> with or without intercept term.

Usage

omcdiag(mod, na.rm = TRUE, Inter = TRUE, detr = 0.01, red = 0.5,
                     conf = 0.95, theil = 0.5, cn = 30,...)

Arguments

mod

A model object, not necessarily type lm

na.rm

Whether to remove missing observations.

Inter

Whether to include or exclude Intercept term, by default Inter=TRUE.

detr

Determinant default threshold, detr=0.01.

red

red indicator default threshold, red=0.5.

conf

confidence level of Farrar Chi-Square test, conf=0.95.

theil

Theil's indicator default threshold, theil=0.5.

cn

condition number default threshold, cn=30.

...

Extra argument(s) if used will be ignored.

Details

This function detects the existence of multicollinearity by using different available diagnostic measures already available in literature such as Determinant of correlation matrix, Farrar test of chi-square, Red Indicator, Sum of lambda inverse values, Theil's Indicator and Condition Number.

Function also displays diagnostic measures value with the decision of either multicollinearity is detected by the diagnostics or not. The Value of 1 indicate that multicollinearity is detected and 0 indicate measure could not detect by the certain diagnostic measure. A list object of class "omc" is returned:

Value

odiags

Listing of all overall diagnostic measures.

Inter

logical, if TRUE (the default value) condition number is returned with intercept term included.

x

matrix of regressors.

call

The matched call.

Note

Missing values in data will be removed by default. There is no method for the detection of multicollinearity, if missing values exists in the data set.

Author(s)

Muhammad Imdad Ullah, Muhammad Aslam

References

Belsely, D. A. A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1): 33–50, 1991.

Belsley, D. A., Kuh, E., and Welsch, R. E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley \& Sons, New York, 1980.

Chatterjee, S. and Hadi, A. S. Regression Analysis by Example. John Wiley \& Sons, 4th edition, New York, 2006.

Greene, W. H. Econometric Analysis. Prentice–Hall, Upper Saddle River, New Jersey, 4th edition, 2000.

Imdad, M. U. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R (Doctoral Thesis, Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan), 2017.

Imdadullah, M., Aslam, M., and Altaf, S. mctest: An R Package for Detection of Collinearity Among Regressors. The R Journal, 8(2):499–509, 2016.

Kovacs, P., Petres, T., and Toth, L. A New Measure of Multicollinearity in Linear Regression Models. International Statistical Review / Revue Internationale de Statistique, 73(3): 405–412, 2005.

See Also

Individual collinearity diagnostic measure imcdiag, Eigenvalues and variance decomposition proportion eigprop

Examples

## Hald Cement data
data(Hald)
model <- lm(y~X1+X2+X3+X4, data = as.data.frame(Hald))

## all oveall diagnostic measures and eigenvalues with intercept
od<-omcdiag(model)

## all oveall diagnostic measures and eigenvalues without intercept
omcdiag(model, Inter=FALSE)

## all oveall diagnostic measures and eigenvalues with intercept
## with different determinant and confidence level threshold

omcdiag(model, detr=0.001, conf=0.99)

## returns the determinant of correlation matrix |X'X|
omcdiag(model)[1]