Computational efficiency is usually important for learning algorithms operating in AZD1981

Computational efficiency is usually important for learning algorithms operating in AZD1981 the “large p small n” setting. and bypasses the need for expensive tuning parameter optimization via cross-validation by employing Bayesian model averaging over the grid AZD1981 of tuning parameters. Additional computational efficiency is usually achieved by adopting the singular value decomposition re-parametrization of the ridge-regression model replacing computationally expensive inversions of large × matrices by efficient inversions of small and diagonal × matrices. We show in simulation studies and in the analysis of two large cancer cell collection data panels that our algorithm achieves slightly better predictive overall performance than cross-validated ridge-regression while requiring only a portion of the computation time. Furthermore in comparisons based on the cell collection data units our algorithm systematically out-performs the lasso in both predictive overall performance and computation time and shows comparative predictive overall performance but considerably smaller computation time than the elastic-net. × matrix inversions by efficient inversions of small and diagonal × matrices derived from the singular value decomposition7 (SVD) of the feature matrix. Note that the use of SVD re-parameterization is usually a practice to improve the computational efficiency Rabbit Polyclonal to LAT3. of ridge-regression model fit.8 We point out that both improvements are allowed by the analytical tractability of the Bayesian hierarchical formulation of ridge-regression where the marginal posterior distribution of the regression coefficients and the prior predictive distribution of the data are readily available leading to a fully analytical expression for the BMA estimate of the regression coefficients. Furthermore the quantities that need to be evaluated namely model specific posterior anticipations and marginal likelihoods can be efficiently computed under the SVD re-parametrization. The rest of the paper is usually organized as follows. In Section 2.1 we present the Stream algorithm and in Section 2.2 we present its re-parametrization in terms of the singular value decomposition of the feature data matrix. Section AZD1981 3.1 presents a simulation study comparing the predictive overall performance and computation time of Stream against the standard cross-validated ridge-regression model. Section 3.2 presents real data illustrations using two compound screening data units performed on large panels of malignancy cell lines. Finally in Section 4 we discuss our results and point out strengths and weaknesses of our proposed algorithm. 2 Statistical model In the next subsections we present the Stream-regression model and its re-parametrization in terms of the SVD of the feature data matrix. First we expose some notation. Throughout the text we consider the regression model = + represents the × 1 vector of responses corresponds to the × matrix of features corresponds to the × 1 vector of regression coefficients and ε represents a × 1 vector of impartial and identically distributed gaussian error terms with expectation 0 and precision and and covariance matrix Σ; and Stby det(= (+ λ= 1 … represent the grid of ridge-regression tuning parameters and let represent a ridge-regression model that uses λ = λis usually then | = 1 … = AZD1981 represents the feature data around the screening set and represents the regression coefficients estimate learned from the training set. In our Bayesian model we are interested around the AZD1981 the expectation of the response’s posterior predictive distribution | × feature data matrix of rank is usually given by = is usually a × orthogonal matrix of left singular vectors; is usually a × diagonal matrix of singular values is usually a × matrix of right singular vectors. An alternative representation is usually where × matrix obtained by augmenting with ? extra columns of zeros × diagonal matrix with the first diagonal entries given by the singular values and the remaining ? diagonal entries set to zero; and × orthogonal matrix obtained by augmenting with ? additional right singular vectors. Exploring these re-parametrizations we can after some algebra re-express in AZD1981 the computationally more efficient form (? + λ? + λ× matrix inversion by a × diagonal matrix inversion in the computation of the quadratic form. Next consider the determinant. From the application of the Woodbury matrix inversion formula10 we have that matrix we have that = = 100 values. For each data set we used the same.