北京生物醫(yī)學(xué)工程

NetRD: 一種利用Bing搜索結(jié)果補(bǔ)充文獻(xiàn)挖掘證據(jù)集的工具

NetRD: a tool to supplement evidence sentence set by literature mining with Bing data

作者：鄒熊峰鄭浩然

單位：中國(guó)科學(xué)技術(shù)大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院(合肥 230027)

關(guān)鍵詞：生物文獻(xiàn)挖掘； Bing web search API；證據(jù)集；生物實(shí)體關(guān)聯(lián)對(duì)；證據(jù)補(bǔ)充

分類號(hào)：R318

出版年·卷·期（頁(yè)碼）：2019·38·4（377-383）

摘要：

目的當(dāng)前生物文獻(xiàn)挖掘工作的重心是改進(jìn)各挖掘模塊性能，以提升挖掘結(jié)果的可信度，但有很大比例的挖掘結(jié)果其文獻(xiàn)證據(jù)很少，為此本文提出一個(gè)利用Bing搜索引擎從海量web數(shù)據(jù)中為文獻(xiàn)挖掘得到的生物實(shí)體關(guān)聯(lián)對(duì)提供補(bǔ)充證據(jù)的工具系統(tǒng)。方法利用現(xiàn)有文本挖掘技術(shù)從PubMed文獻(xiàn)中挖掘一批生物實(shí)體關(guān)聯(lián)對(duì)，引入Bing web搜索模塊，以生物實(shí)體名作為關(guān)鍵詞從web中利用Bing開放搜索API得到一批搜索結(jié)果，將這些結(jié)果整理成新的數(shù)據(jù)源，最終從該新的數(shù)據(jù)源中挖掘得到一批來自web的補(bǔ)充證據(jù)。結(jié)果本系統(tǒng)（http://bioinfo.ustc.edu.cn/NetRD）對(duì)文獻(xiàn)證據(jù)較少的生物實(shí)體關(guān)聯(lián)對(duì)提供了有效的補(bǔ)充證據(jù)支持，豐富了文獻(xiàn)挖掘結(jié)果最終的證據(jù)集。結(jié)論以web數(shù)據(jù)作為補(bǔ)充數(shù)據(jù)源，能夠有效地為文獻(xiàn)證據(jù)很少的生物實(shí)體對(duì)提供證據(jù)補(bǔ)充，為相關(guān)研究者確認(rèn)兩個(gè)生物實(shí)體之間的關(guān)聯(lián)提供重要參考。

Objective The current focus of biological literature mining is to improve the performance of each mining module to enhance the confidence of mining results. However, there are a large proportion of results having few evidence sentences from literature. To alleviate this problem, we propose a tool system which uses Bing search engine to provide additional evidence for the association of biological entities obtained from massive amounts of web data. Methods Firstly, existing bio-literature mining tools are applied to mine a batch of associations between bio-entities. Then, by applying Bing web search API to text mining system, we use biomedical entities as keywords to search from web and fetch the returned results. These results are then collected as another data source. Finally, we mine associations between biomedical entities from the new data source and collect a considerable amount of supplemental evidence sentences from web. Results NetRD (http://bioinfo.ustc.edu.cn/NetRD) provides an effective supplemental evidence support for disease-related genes that have few evidences from literature, and enriches the final set of evidence sentences for literature mining. Conclusions Using Bing search results, NetRD can effectively provide supplemental evidence support for associations between bio-entities with few evidence sentences mined from literature, which is of great reference value for the relevant researchers to confirm whether a bio-entity is associated with another bio-entity.

參考文獻(xiàn)：

[1] Liu Y, Liang Y, Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more[J]. Nucleic Acids Research, 2015, 43: W535-W542.

[2] Pletscher-Frankild S, Pallejà A, Tsafou K, et al. Diseases: text mining and data integration of disease–gene associations[J]. Methods, 2015, 74:83-89.

[3] Kim J, Kim J, Lee H. Corrigendum: An analysis of disease-gene relationship from Medline abstracts by DigSee[J]. Scientific Reports, 2017, 7: 40154.

[4] Xu D, Zhang M, Xie Y, et al. DTMiner: identification of potential disease targets through biomedical literature mining[J]. Bioinformatics, 2016, 32(23):3619-3626.

[5] Kim J, Kim H, Yoon Y, et al. LGscore: a method to identify disease-related genes using biological literature and Google data[J]. Journal of Biomedical Informatics, 2015, 54:270-282.

[6] Yildirim MA, Goh KI, Cusick ME, et al. Drug-target network[J]. Nature Biotechnology, 2007, 25(10): 1119-1126.

[7] Chen X, Yan CC, Zhang X, et al. Drug–target interaction prediction: databases, web servers and computational models[J]. Briefings in Bioinformatics, 2016, 17(4): 696-712.

[8] Yang H, Swaminathan R, Sharma A, et al. Mining biomedical text towards building a quantitative food-disease-gene network[M]// Learning Structure and Schemas from Documents. Berlin: Springer-Verlag Berlin Heidelberg, 2011:205-225.

[9] Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts[J]. BMC Bioinformatics, 2004, 5(1):1-13.

[10] https://en.wikipedia.org/wiki/Bing_(search_engine)

[11] Campos D, Matos S, Oliveira JL. Gimli: open source and high-performance biomedical name recognition[J]. BMC Bioinformatics, 2013, 14:54.

[12] Liu H, Hu ZZ, Zhang J, et al. BioThesaurus: a web-based thesaurus of protein and gene names[J]. Bioinformatics, 2006, 22(1):103-105.

[13] Wei CH, Kao HY, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains[J]. Biomed Research International, 2015, 2015:918710.

[14] Song M, Kim WC, Lee D, et al. PKDE4J: Entity and relation extraction for public knowledge discovery[J]. Journal of Biomedical Informatics, 2015, 57:320-332.

[15] Becker KG, Barnes KC, Bright TJ, et al. The genetic association database[J]. Nature Genetics, 2004, 36(5):431-432.

[16] Bravo à, Pi?ero J, Queralt-Rosinach N, et al. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research[J]. BMC Bioinformatics, 2015, 16:55.

[17] Bertram L, McQueen MB, Mullin K, et al. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database[J]. Nature Genetics, 2007, 39(1):17-23.

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請(qǐng)登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频