北京生物醫(yī)學工程

基于PU-learning的磷酸激酶預測算法

Prediction algorithm of phosphokinase based on PU-learning

作者：王藝琪王明舉張進彭智才魏森謝多雙

單位：太和醫(yī)院 (湖北十堰 442000）

關鍵詞：蛋白質磷酸化；生物信息；半監(jiān)督學習； PU-learning；磷酸激酶預測

分類號：R318

出版年·卷·期（頁碼）：2019·38·4（360-368）

摘要：

目的 Protein phosphorylation is the process where a protein kinase binds to a specific site/domain of a protein substrate for post-蛋白質磷酸化是通過激酶催化特定位點把磷酸基轉移到底物蛋白質氨基酸殘基的過程，是研究蛋白質活力及功能的重要機制。目前已鑒定的數(shù)千個磷酸化位點大多缺失激酶信息，為此本研究提出基于PU-learning的磷酸激酶預測算法，通過迭代標記磷酸位點，可以準確預測催化磷酸肽的磷酸激酶。方法首先該算法以PU-learning為框架，利用最大熵方差對不同種類的磷酸激酶自動篩選最佳閾值，從而提取每條磷酸肽上潛在的磷酸化位點，然后根據(jù)統(tǒng)計分析確定磷酸化位點對應的激酶，最后通過五折交叉驗證該算法在Phospho.ELM數(shù)據(jù)庫上的預測性能，并與現(xiàn)有算法對比。結果 Experimental results demonstrate that該算法SLKSL的交叉驗證特異性和靈敏度比現(xiàn)有最好算法在單個數(shù)據(jù)集上最高提高4%及10%，其預測Phospho.ELM中數(shù)據(jù)準確度達到79.52%。結論基于PU-learning的磷酸激酶預測算法顯著優(yōu)于現(xiàn)有算法，且可以準確預測Phospho.ELM數(shù)據(jù)庫中未知激酶信息的磷酸肽，在磷酸化實驗中具有較強的指導意義。

Objective Protein phosphorylation is a process by which a kinase catalyzes the transfer of a phosphate group to a protein residue at a specific site, as an important mechanism of protein activity and function. Most of identified phosphorylation sites are lack of kinase information. To this end, a prediction algorithm of phosphokinase based on PU-learning is proposed. By iterative phosphate site labeling, the phosphokinase that catalyzes the phosphopeptide can be accurately predicted. Methods The algorithm uses PU-learning as the framework to automatically screen the optimal thresholds for different kinds of phosphokinases by using the maximum entropy variance, so as to extract the potential phosphorylation sites on each phosphopeptide, and then determines the corresponding phosphorylation sites according to statistical analysis. Finally, the prediction performance is verified by a five-fold cross validation on the Phospho.ELM database and compared with existing algorithms. Results The cross-validation specificity and sensitivity of this algorithm are 4% and 10% higher than those of the best existing approach on single data set, and the prediction accuracy on Phospho.ELM is as high as 79.52%. Conclusions The prediction algorithm of phosphokinase based on PU-learning is significantly better than the existing algorithms, and can accurately predict the phosphopeptides of unknown kinase information in the Phospho.ELM database, which has a strong guiding significance in phosphorylation experiments.

參考文獻：

[1] Davis MI, Hunt JP, Herrgard S, et al. Comprehensive analysisof kinase inhibitor selectivity[J]. Nature Biotechnology, 2011, 29(11):1046-1051.

[2] 劉博雅, 賀福初, 王建. 蛋白質翻譯后修飾對STAT家族活性的調節(jié)[J]. 生命科學, 2013(3):275-279.

Liu BY, He FC, Wang J. The regulation of STAT activity by post-translational modifications[J]. Chinese Bulletin of Life Sciences, 2013(3):275-279.

[3] Kim JH, Lee J, Oh B, et al. Prediction of phosphorylation sites using SVMs[J]. Bioinformatics, 2004,20(17): 3179-3184.

[4] Wong YH, Lee TY, Liang HK, et al. KinasePhos 2.0: a webserver for identifying protein kinase-specific phosphorylation sites basedon sequences and coupling patterns[J]. Nucleic Acids Research, 2007, 35(Web Server issue):588-594.

[5] Blom N, Sicheritz-Pontén T, Gupta R, et al. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence[J]. Proteomics, 2004, 4(6):1633-1649.

[6] Xue Y, Li A, Wang L, et al. PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory[J]. BMC Bioinformatics, 2006, 7:163.

[7] Wang MH, Li CH, Chen WZ, et al.Prediction of PK-specificphosphorylation site based oninformation entropy[J]. Science in China Series C: Life Sciences, 2008, 51(1): 12-20.

[8] Xue Y, Ren J, Gao X, et al. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy[J]. Molecular & Cellular Proteomics, 2008, 7(9): 1598-1608.

[9] Diella F, Gould CM, Chica C, et al. Phospho.ELM: a database of phosphorylation sites-update[J]. Nucleic Acids Research, 2008, 36(suppl 1):D240-D244.

[10] Wang L, Chen C, Zhou J, et al. Time-sensitive customer churn prediction based on PU learning[J]. 2018.

[11] Yamazaki K. Accuracy analysis of semi-supervised classification when the class balance changes[J]. Neurocomputing, 2015, 160:132-140.

[12] Zou L, Wang M, Shen Y, et al. PKIS: computational identification of protein kinases for experimentally discovered protein phosphorylation sites[J]. BMC Bioinformatics, 2013, 14(1):247.

[13] Linding R, Jensen LJ, Pasculescu A, et al. NetworKIN: a resource for exploring cellular phosphorylation networks[J]. Nucleic Acids Research,2008, 36(suppl 1):D695-699.

[14] Chen X, Shi SP, Suo SB, et al. Proteomic analysis and prediction of human phosphorylation sites in subcellular level reveals subcellular specificity[J]. Bioinformatics, 2015 31(2):194-200.

[15] Ismail HD, Jones A, Kim JH, et al. Phosphorylation sites prediction using random forest[C]// 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS). Miami, FL, USA, 2015:1-6.

[16] Li H, Xu X, Feng H, et al. A novel kinase-substrate relation prediction method based on substrate sequence similarity and phosphorylation network[J]. IFAC PapersOnLine, 2015, 48(28):17-21.

[17] Patrick R, Horin C, Kobe B, et al. Prediction of kinase-specific phosphorylation sites through an integrative model of protein context and sequence[J]. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 2016, 1864(11):1599-1608.

[18] Kaushik AC, Pal A, Kumar A, et al. Internal transcribed spacer sequence database of plant fungal pathogens: PFP-ITSS Database[J]. Informatics in Medicine Unlocked, 2017, 7: 34-38.

服務與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請登錄！點此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频