北京生物醫(yī)學(xué)工程

一種基于深度學(xué)習(xí)的蛋白質(zhì)組分析方法

A proteome analysis method based on deep learning

作者：劉扣龍鄭浩然

單位：中國科學(xué)技術(shù)大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院(合肥 230027),<br />通信作者：鄭浩然,副教授。E-mail: [email protected]

關(guān)鍵詞：蛋白質(zhì)組學(xué)；深度學(xué)習(xí)；數(shù)據(jù)非依賴性采集；相對(duì)定量；質(zhì)譜

分類號(hào)：R318

出版年·卷·期（頁碼）：2022·41·6（569-575）

摘要：

目的基于液相色譜-串聯(lián)質(zhì)譜的數(shù)據(jù)非依賴性采集(data-independent acquisition, DIA)方法是蛋白質(zhì)組數(shù)據(jù)獲取的一種主要方式，采集的混合二級(jí)質(zhì)譜由多個(gè)肽段同時(shí)碎裂組成，增加了肽段定性和定量的復(fù)雜度。目前主流的基于離子色譜圖的方法需要經(jīng)過預(yù)處理，構(gòu)建色譜峰，提取色譜峰特征等操作。這類方法流程復(fù)雜，存在很多誤差，并且不同的色譜圖復(fù)雜度和色譜時(shí)間會(huì)影響定性和定量的準(zhǔn)確度。針對(duì)該方法的不足之處，課題組提出一種基于深度學(xué)習(xí)的方法，直接對(duì)肽段進(jìn)行定性和定量。方法與基于離子色譜圖的方法不同，本課題組沒有使用色譜維度的信息，不會(huì)受到色譜圖復(fù)雜度和色譜時(shí)間等因素的影響。將預(yù)處理后的質(zhì)譜數(shù)據(jù)輸入到兩個(gè)基于卷積神經(jīng)網(wǎng)絡(luò)(convolutional neural network, CNN)的模型中，通過二分類和回歸預(yù)測(cè)的方式，解決定性和定量問題。結(jié)果課題組在公開數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn)，與準(zhǔn)確度較高的FIGS相比，提高了定性結(jié)果的重復(fù)性，在保證定量準(zhǔn)確度的同時(shí)提高了不同豐度下的肽段定量數(shù)量。結(jié)論本文提出的基于深度學(xué)習(xí)的模型，沒有使用色譜維度的信息，可以有效地對(duì)肽段進(jìn)行定性和定量。

Objective The data-independent acquisition (DIA) method based on liquid chromatography-tandem mass spectrometry is one of the main methods of proteomic data acquisition. The collected mixed MS/MS is composed of multiple peptide fragments at the same time, which increases the complexity of peptide identification and quantification. The current mainstream methods based on ion chromatograms require preprocessing, construction of chromatographic peaks, and extraction of chromatographic peak features. This kind of method is complicated in process, there are many errors, and different chromatogram complexity and chromatographic time will affect the accuracy of identification and quantification. In view of the shortcomings of this method, we propose a method based on deep learning to directly identify and quantify peptides. Methods Unlike methods based on ion chromatograms, we do not use the information of chromatographic dimensions and will not be affected by factors such as the complexity of chromatograms and chromatographic time. Input the preprocessed mass spectrum data into two models based on convolutional neural networks, and solve qualitative and quantitative problems through binary classification and regression prediction. Results We conducted experiments on the public dataset. Compared with FIGS with high accuracy, it improved the qualitative repeatability and increased the quantitative number of peptides under different abundances while ensuring the quantitative accuracy. Conclusions The model based on deep learning proposed in this paper does not use the information of chromatographic dimensions, and can effectively identify and quantify peptides.

參考文獻(xiàn)：

[1] Aebersold R, Mann M. Mass spectrometry-based proteomics[J]. Nature, 2003, 422(6928): 198-207.
[2] Gillet LC, Navarro P, Tate S, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis[J]. Molecular & Cellular Proteomics, 2012, 11(6): O111.016717.
[3] Venable JD, Dong MQ, Wohlschlegel J, et al. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra[J]. Nature Methods, 2004, 1(1): 39-45.
[4] Ludwig C, Gillet L, Rosenberger G, et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial[J]. Molecular Systems Biology, 2018, 14(8): e8126.
[5] Noor Z, Adhikari S, Ranganathan S, et al. Quantification of proteins from proteomic analysis[M]//Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics. Amsterdam: Elsevier, 2019,3: 871-890.
[6] Rst HL, Rosenberger G, Navarro P, et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data[J]. Nature Biotechnology, 2014, 32(3): 219-223.
[7] Wang J, Pérez-Santiago J, Katz JE, et al. Peptide identification from mixture tandem mass spectra[J]. Molecular & Cellular Proteomics, 2010, 9(7): 1476-1485.
[8] Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide[J]. Molecular & Cellular Proteomics, 2014, 13(12): 3688-3697.
[9] Wang J, Tucholska M, Knight JDR, et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition[J]. Nature Methods, 2015, 12(12): 1106-1108.
[10] Peckner R, Myers SA, Jacome ASV, et al. Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics[J]. Nature Methods, 2018, 15(5): 371-378.
[11] Demichev V, Messner CB, Vernardis SI, et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput[J]. Nature Methods, 2020, 17(1): 41-44.
[12] Fang Y, Li QR, Zhang ZH, et al. FIGS: featured ion-guided stoichiometry for data-independent proteomics through dynamic deconvolution[J]. Journal of Proteome Research, 2021, 20(8): 4131-4138.
[13] Navarro P, Kuharev J, Gillet LC, et al. A multicenter study benchmarks software tools for label-free proteome quantification[J]. Nature Biotechnology, 2016, 34(11): 1130-1136.
[14] Collins BC, Hunter CL, Liu Y, et al. Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry[J]. Nature Communications, 2017, 8: 29.
[15] Cheng CY, Tsai CF, Chen YJ, et al. Spectrum-based method to generate good decoy libraries for spectral library searching in peptide identifications[J]. Journal of Proteome Research, 2013, 12(5): 2305-2310.
[16] MacLean B, Tomazela DM, Shulman N, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments[J]. Bioinformatics, 2010, 26(7): 966-968.
[17] Bruderer R, Bernhardt OM, Gandhi T, et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues[J]. Molecular & Cellular Proteomics, 2015, 14(5): 1400-1410.
[18] Tsou CC, Avtonomov D, Larsen B, et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics[J]. Nature Methods, 2015, 12(3): 258-264.

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請(qǐng)登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频