The statistical methods for variable selection and prediction could be challenging

The statistical methods for variable selection and prediction could be challenging when missing covariates exist. elastic net (MI-WENet) method that is based on stacked MI data and a weighting scheme for each observation in the stacked data set. In the MI-WENet method MI accounts for sampling and imputation uncertainty for missing values and the weight accounts for the observed information. Extensive numerical simulations are carried out to compare the proposed MI-WENet method with the other competing alternatives such as the SPLS and ENet. In addition we applied the MIWENet method to examine the predictor variables for the endothelial function that can be characterized by median effective dose (ED50) and maximum effect (Emax) in an ex-vivo phenylephrine-induced extension and acetylcholine-induced relaxation experiment. 1 plausible values and then applying the standard analysis to each imputed data set. The final estimates of the parameters and their variances are obtained from the sets of estimates using Rubin’s rules with accounting for Prednisone (Adasone) the uncertainty among MIs.[3 4 The objective of MI method is not to predict missing values as close as possible to the true values but to handle missing data so that valid statistical inferences can be made.[3 4 Rubin’s rules have become the gold standard when data are missing at random (MAR).[6-8] By the definition of Little and Rubin [1] the three general types of missing mechanism are: (1) missing complete at random (MCAR); (2) MAR; and (3) not missing at random (NMAR).[1-3] Standard implementation Prednisone (Adasone) of MI relies on an assumption that missing data are either MCAR or MAR while the MI procedure may also be extended to the cases where missing data are NMAR.[7 9 10 Variable selection is increasingly important in modern data Mouse monoclonal to CBX1 analysis. Many techniques such as the least absolute shrinkage and selection operator (LASSO) [11] the elastic net (ENet) [12] and the sparse partial least squares (SPLS) [13] have been developed to select important variables that are associated with outcome variables. LASSO minimizes the restricted least squares with the constraint on the absolute values of the parameters (i.e. denote the outcome variable and be the = 1 for the = 1 and are standardized to have zero mean and unit standard deviation. For simplicity we consider the following linear regression model: = are independently identically distributed as is obtained by maximizing the correlation between the response variable and the linear combination of covariates ≥ 1) direction vector = = and = ? + 1replaced by its orthogonal projection onto the complementary of the column space of the known direction vectors i.e. replacing by . This process is repeated to obtain a small number of direction vectors. Regressing the original on those direction vectors result in a relationship between and due to each direction vector is a linear combination of the covariates by adding the is updated as is much greater than is the ENet penalty that is a compromise between the ridge regression penalty (= 0) [21] and the LASSO penalty (= 1).[11] Ridge regression is known to shrink the coefficients of correlated predictor variables allowing them to borrow strength from each other.[14 21 The ENet penalty with = 1 ? 0 performs much like the LASSO but removes any degeneracies and wild behaviour caused by extreme correlations.[17] For a given increases from 0 to 1 1 the sparsity of the solution to Equation (4) i.e. the number of coefficients being zero increases monotonically from 0 Prednisone (Adasone) to the sparsity of the LASSO solution. The na?ve ENet estimator obtained from Equations (4) and (5) does not perform satisfactorily [12] while the ENet estimator that undoes the shrinkage for the na?ve ENet performs much better even compared with Prednisone (Adasone) LASSO and ridge regression. The ENet estimator is obtained as is greater than and there are many correlated predictors [12] which has also been shown in our simulation studies. 2.2 MI-SPLS and MI-WENet Both the SPLS and ENet methods assume that all covariates and outcome variables are fully observed. In the cases that there are missing values Rubin’s rules provide a general framework to handle missing problems provided missing data are MAR or MCAR.[1-4] However Rubin’s rules can not be directly applied to SPLS or ENet because the variables selected for one imputed data set may be quite different from those based on another imputed data set. To the best of our knowledge there is no standard rule to combine the selected variables resulted from different imputed data sets.[8 9 16 22 To overcome the.