51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频

設(shè)為首頁 |  加入收藏
首頁首頁 期刊簡介 消息通知 編委會(huì) 電子期刊 投稿須知 廣告合作 聯(lián)系我們
基于機(jī)器學(xué)習(xí)方法的非編碼RNA-蛋白質(zhì)相互作用的預(yù)測

Prediction of ncRNA-protein interactions based on machine learning methods

作者: 程淑萍  譚建軍  門婧睿 
單位:北京工業(yè)大學(xué)生命科學(xué)與生物工程學(xué)院,智能化生理測量與臨床轉(zhuǎn)化北京市國際科研合作基地(北京 100124)
關(guān)鍵詞: 非編碼RNA-蛋白質(zhì)相互作用;  LightGBM;  隨機(jī)森林;  極端梯度增強(qiáng)算法;  卷積自編碼器 
分類號(hào):R318.01; Q51
出版年·卷·期(頁碼):2019·38·4(353-359)
摘要:

目的 非編碼RNA-蛋白質(zhì)的相互作用(noncoding RNA-protein interactions, ncRPI)具有重要的生物學(xué)意義,目前預(yù)測其相互作用已成為當(dāng)下研究非編碼RNA (noncoding RNA, ncRNA)和蛋白質(zhì)功能的重要途徑之一。方法 本研究基于ncRNA和蛋白質(zhì)的序列信息提取特征,運(yùn)用卷積自編碼器預(yù)處理原始數(shù)據(jù),訓(xùn)練三個(gè)機(jī)器學(xué)習(xí)模型: LightGBM(LBM)、隨機(jī)森林(random forest, RF)和極端梯度增強(qiáng)算法(extreme gradient boosting, XGB), 預(yù)測ncRNA與蛋白質(zhì)的相互作用。結(jié)果 在RPI369和RPI488兩個(gè)數(shù)據(jù)集做5倍交叉驗(yàn)證,LBM、RF與XGB三個(gè)模型在兩個(gè)數(shù)據(jù)集均達(dá)到較高的預(yù)測準(zhǔn)確率,在RPI369數(shù)據(jù)集三個(gè)模型的預(yù)測準(zhǔn)確率分別為0.757(LBM)、0.791(RF)、0.791(XGB),在RPI488數(shù)據(jù)集三個(gè)模型的預(yù)測準(zhǔn)確率分別為0.918(LBM)、0.908(RF)、0.918(XGB);三個(gè)模型在RPI1807、RPI2241、RPI13254大數(shù)據(jù)集也取得較高的AUC(area under curve)值,在RPI1807三個(gè)模型的AUC值均為0.99,在RPI2241三個(gè)模型最低AUC值為0.87,在RPI13254三個(gè)模型最低AUC值為0.81,都表現(xiàn)出較好的預(yù)測準(zhǔn)確性。結(jié)論 機(jī)器學(xué)習(xí)方法能夠預(yù)測ncRNA與蛋白質(zhì)是否存在相互作用。

Objective The biological significance of noncoding RNA-protein interactions (ncRPI) is important, and ncRPI prediction is an important way to study the function of noncoding RNA (ncRNA) and protein. Methods We extracted feature based on the sequence of ncRNA and protein in the work, preprocessed raw data by training a convolutional autoencoder (CAE). Three machine learning models, LightGBM (LBM), random forest (RF) and extreme gradient boosting (XGB) were trained to predict the ncRPI. Results We tested the three models by 5-fold cross validation (CV) on RPI369 and RPI488. All the three methods of LBM, RF and XGB achieved high performance with the accuracy of 0.757 (LBM), 0.791 (RF), 0.791 (XGB) on RPI369, respectively. On RPI488, the three models obtained the accuracy of 0.918 (LBM), 0.908 (RF), 0.918 (XGB), respectively. The three models obtained higher area under curve (AUC) on large-scale data. On RPI1807, all the three models obtained the AUC of 0.99, and the smallest AUC of 0.87 and 0.81 on RPI2241 and RPI13254, respectively. All the three methods of LBM, RF and XGB performed well for predicting ncRPI. Conclusions The machine learning methods can be used to predict ncRNA-protein interaction.

參考文獻(xiàn):

[1]    Pan X, Rijnbeek P, Yan J, et al. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks[J]. BMC Genomics, 2018,19:511.

 [2]    Adjeroh D, Allaga M, Tan J, et al. Feature-based and string-based models for predicting RNA-protein interaction[J]. Molecules, 2018,23(3): E697.

 [3]    Suresh V, Liu L, Adjeroh D, et al. RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information[J]. Nucleic Acids Research, 2015,43(3):1370-1379.

 [4]    Zhang SW, Fan XN. Computational methods for predicting ncRNA-protein interactions[J]. Medicinal Chemistry, 2017, 13(6):515-525.

 [5]    Cook KB, Vembu S, Ha KCH, et al. RNAcompete-S: combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection[J]. Methods, 2017,126:18-28.

 [6]    Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information[J]. BMC Bioinformatics, 2011,12:489.

 [7]    Wang Y, Chen X, Liu ZP, et al. De novo prediction of RNA-protein interactions from sequence information[J]. Molecular Biosystems, 2013, 9(1):133-142.

 [8]    Pan X, Fan YX, Yan J, et al. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction[J]. BMC Genomics, 2016,17:582.

 [9]    張凱宇. 基于深度學(xué)習(xí)的蛋白質(zhì)-RNA相互作用預(yù)測模型構(gòu)建[D]. 中國人民解放軍軍事醫(yī)學(xué)科學(xué)院, 2017.

Zhang KY. Construction of prediction model for protein-RNA interaction using the deep learning methods[D]. Academy of Military Medical Sciences, 2017.

[10]    Hu H, Zhang L, Ai H, et al. HLPI-ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy[J]. RNA Biology, 2018,15(6):797-806.

[11]    Alipanahi B, Delong A, Weirauch MT, et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning[J]. Nature Biotechnology, 2015,33(8):831-838.

[12]    Zeng X, Leung MR, Zeev-Ben-Mordehai T, et al. A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation[J]. Journal of Structural Biology, 2017,202(2):150-160.

[13]    Kroll C,von der Werth MVD, Leuck H, et al. Combining high-speed SVM learning with CNN feature encoding for real-time target recognition in high-definition video for ISR missions[C]//Society of Photo-optical Instrumentation Engineers 10202, Automatic Target Recognition XXVII. Anaheim, California, US, 2017:1020208.

[14]    Xia Y, Yang X, Zhang Y. A rejection inference technique based on contrastive pessimistic likelihood estimation for P2P lending[J]. Electronic Commerce Research and Applications, 2018, 30:111-124.

[15]    Qi Y, Klein-Seetharaman J, Bar-Joseph Z. Random forest similarity for protein-protein interaction prediction from multiple sources[J]. Pacific Symposium on Biocomputing, 2005,10:531-542.

[16]    Chen T, Guestrin C. XGBoost: a scalable tree boosting system[C]//the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016.

[17]    Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins[J]. BMC Genomics, 2013,14: 651.

服務(wù)與反饋:
文章下載】【加入收藏
提示:您還未登錄,請登錄!點(diǎn)此登錄
 
友情鏈接  
地址:北京安定門外安貞醫(yī)院內(nèi)北京生物醫(yī)學(xué)工程編輯部
電話:010-64456508  傳真:010-64456661
電子郵箱:[email protected]