Background The half-life of the protein is regulated by a variety

Background The half-life of the protein is regulated by a variety of system properties, like the abundance of the different parts of the degradative protein and machinery modifiers. these features right into a predictive model with guaranteeing precision. At a 20% fake positive price, the model displays an 80% accurate positive rate, outperforming the only suggested stability predictor previously. We also investigate the influence of N-terminal proteins tagging as utilized to generate the info established, specifically CP-724714 supplier the influence it could have got in the measurements for secreted and transmembrane protein; we train and test our model on a subset of the data with those proteins removed, and show that the model sustains high accuracy. Finally, we estimate system-wide metabolic stability by surveying the whole human proteome. Conclusions We describe a variety of protein features that are significantly over- or under-represented in stable and unstable proteins, including phosphorylation, acetylation and destabilizing N-terminal CP-724714 supplier residues. Bayesian networks are ideal for combining these features into a predictive model with superior accuracy and transparency compared to the only other proposed stability predictor. Furthermore, our stability predictions of the human proteome will find application in the analysis of functionally related proteins, shedding new light on regulation by protein synthesis and degradation. sp. red (DsRed), which are expressed on a single mRNA transcript. The DsRed protein acts as a control, while EGFP is expressed as an N-terminal fusion with a protein of interest. Coupling this approach with fluorescence activated cell sorting (FACS) and microarray analysis, the authors were able to measure the stability of approximately 8000 human proteins, and it is this data set we use in our study. An important consideration of N-terminal fusion is the interference that the EGFP tag could have on the function of N-terminal signal sequences. A recent review on the use of fluorescent protein tagging points out that approximately one third of human protein-coding genes contain position-dependent sequence information [6]. In the case of proteins with N-terminal signal peptides, or signal anchors, the fusion of a fluorescent protein to the N-terminus is likely to interfere with normal localization. Indeed, Yen and colleagues [1] found that unstable proteins contained an enrichment of membrane protein gene ontology (GO) terms but remark that it is unclear what effect fluorescent tagging will have upon the measurement of global degradation rates. Huang and colleagues recently explored a range of predictive features in the GPSP data set and indicated that a simple associative model can classify protein stability with a reasonable accuracy C as evaluated using the same data set [7]. However, without paying attention to the potential bias caused by N-terminal tagging, a computational model may contain the same biases. Therefore, our paper presents a protein stability model based on the largest of the present protein degradation data sets with emphasis on minimising experimental bias. Indeed, it may be possible to IL6R discount the influence of experimental artefacts by first exploring and understanding their impact on models. We created a method for classifying proteins as having a high metabolic stability (i.e. long half-life) or low stability. We developed this method using the GPSP stability data set, which is by far the most extensive available, and thus easiest to cross-reference to other complementary data resources. We considered that this data set may contain a bias portraying proteins with N-terminal signal peptides and anchors as metabolically unstable due to interference caused by the experimental technique. Consequently, we developed and tested models on two sets of proteins: a full set, and a trimmed set with secreted and transmembrane proteins removed. Using complementary resources, including the Human CP-724714 supplier Protein Reference Database (HPRD), a wide range of predictive features were explored. We identified groups of features that are statistically enriched in both stable and unstable proteins, ultimately to understand if they may be used to infer metabolic stability levels. We subsequently designed a model that explicitly recognizes and integrates known factors of the relevant processes and employed machine learning to optimise its ability to generalize to novel proteins. Finally, to illustrate metabolic stability on a system scale, we used the.