北京生物醫(yī)學(xué)工程

基于文本挖掘的流行病學(xué)致病因素的提取_________

Extraction of epidemiologic risk factors based on text mining

作者：盧延鑫姚旭峰

單位：           中國疾病預(yù)防控制中心寄生蟲病預(yù)防控制所，衛(wèi)生部寄生蟲病原與媒介生物學(xué)重點(diǎn)實(shí)驗(yàn)室，世界衛(wèi)生組織瘧疾、血吸蟲病和絲蟲病合作中心(上海200025)    

關(guān)鍵詞：文本挖掘；致病因素；信息提取；流行病學(xué)

分類號(hào)：

出版年·卷·期（頁碼）：2013·32·2（160-163）

摘要：

目的基于文本挖掘技術(shù)，設(shè)計(jì)出能夠自動(dòng)提取流行病學(xué)致病因素的系統(tǒng)。方法該自動(dòng)信息提取系統(tǒng)由一個(gè)文本挖掘引擎子系統(tǒng)和一個(gè)基于規(guī)則的信息提取子系統(tǒng)構(gòu)成。首先使用文本挖掘引擎標(biāo)記出所有的名詞短語，并收集該名詞短語的語義等信息。然后利用基于規(guī)則的文本分類器，標(biāo)記出流行病學(xué)致病因素。結(jié)果為評(píng)估本系統(tǒng)，將由流行病學(xué)專家人工注解的文本輸入該系統(tǒng)，評(píng)估發(fā)現(xiàn)最好的結(jié)果F-measure為64.6%，其精確率和召回率分別為61.0%和68.8%，該結(jié)果優(yōu)于其它相關(guān)研究，且其中有些錯(cuò)誤仍可避免。結(jié)論基于文本挖掘的方法對(duì)從流行病學(xué)研究文獻(xiàn)中自動(dòng)提取致病因素信息有很大幫助。

Objective Based on text mining techniques，we design a system which automatically extracts epidemiologic risk factors. Methods The system consists of a text mining engine subsystem and a rule-based information extraction subsystem. First，all the noun phrases are identified by the text mining engine subsystem and the information are collected. Then，the epidemiologic risk factors are identified by the text classifier system based on rules. Results The evaluation of the system using text annotated by an epidemiologist shows the highest F-measure of 64.6%（Precision 61.0% and Recall 68.8%)，with certain avoidable mistakes. Conclusions This method is helpful for the automatic extraction of risk factors in the epidemiologic literatures.

參考文獻(xiàn)：

［1］Larsson SC，Orsini N，Wolk A. Vitamin B6 and risk of colorectal cancer：a meta-analysis of prospective studies［J］. JAMA，2010，303(11)：1077-1083.
［2］Mosca L，Appel LJ，Benjamin EJ， et al. Evidence-based guidelines for cardiovascular disease prevention in women［J］. Circulation，2004，109(5)：672-693.
［3］Dietary Guideline for Americans ［EB/OL］. ［2012-03-20］. http://www.health.gov/dietaryguidelines.
［4］Centers for Disease Control and Prevention ［EB/OL］. ［2012-03-20］. http://www.cdc.gov/DiseasesConditions.
［5］Rindflesch TC，Tanabe L，Weinstein JN， et al. EDGAR：extraction of drugs，genes and relations from the biomedical literature［J］. Pac Symp Biocomput，2000：517-528.
［6］Cohen AM，Hersh WR. A survey of current work in biomedical text mining. Brief Bioinform，2005，6(1)：57-71.
［7］Yu H，Hatzivassiloglou V，F(xiàn)riedman C，et al. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles［J］. Proc AMIA Symp，2002：919-923.
［8］Chen ES，Hripcsak G，Xu H，et al. Automated acquisition of disease drug knowledge from biomedical and clinical documents：an initial study［J］. J Am Med Inform Assoc，2008， 15(1)：87-98.
［9］Church K，Gale W，Hanks P，et al. Using statistics in lexical analysis［M］. //Hillsdale ZU，Lexical Acquisition：Exploiting on-line ressources to build a lexicon. NJ：Lawrence Erlbaum Associates，1991.

［10］Basili R，Pazienza M，Zanzotto F. Modeling the syntactic contextual information for term extraction［C］. Bulgaria：Conference on Recent Advances in Natural Lanugage Processing，2001.
［11］Frantzi K，Ananiadou S，Mima H. Automatic recognition of multi-word terms［J］. The C-value/NC-value Method International Journal on Digital Libraries，2000，3：115-130.
［12］Krauthammer M，Nenadic G. Term identification in the biomedical literature［J］. J Biomed Inform，2004，37(6)：512-526.
［13］Zeng QT，Tse T，Divita G，et al. Term identification methods for consumer health vocabulary development［J］. J Med Internet Res，2007，9(1)：e4.
［14］Harris MR，Savova GK，Johnson TM，et al. A term extraction tool for expanding content in the domain of functioning，disability，and health：proof of concept［J］. J Biomed Inform，2003，36(4-5)：250-259.
［15］Rindflesch TC，Hunter L，Aronson AR. Mining molecular binding terminology from biomedical text［J］. Proc AMIA Symp Proc，1999：127-131.
［16］Fiszman M，Rosemblat G，Ahlers CB，et al. Identifying risk factors for metabolic syndrome in biomedical text［J］. AMIA Annu Symp Proc，2007：249-253.

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請(qǐng)登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频