北京生物醫(yī)學(xué)工程

中文影像學(xué)報(bào)告中的命名實(shí)體識(shí)別研究

Study on named entity recognition in Chinese radiology reports

作者：張志強(qiáng) 徐巖黃艷群王妮楊正漢陳卉劉紅蕾

單位：首都醫(yī)科大學(xué)生物醫(yī)學(xué)工程學(xué)院(北京 100069) 首都醫(yī)科大學(xué)臨床生物力學(xué)應(yīng)用基礎(chǔ)研究北京重點(diǎn)實(shí)驗(yàn)室(北京 100069) 首都醫(yī)科大學(xué)附屬北京友誼醫(yī)院放射科 (北京 100050)

關(guān)鍵詞：影像學(xué)報(bào)告; 自然語言處理; 條件隨機(jī)場(chǎng); 命名實(shí)體識(shí)別; 信息提取

分類號(hào)：R318；TP31

出版年·卷·期（頁碼）：2020·39·6（609-614）

摘要：

目的探索對(duì)中文影像學(xué)報(bào)告進(jìn)行命名實(shí)體識(shí)別的方法，特別是條件隨機(jī)場(chǎng)算法的識(shí)別效果。方法隨機(jī)收集98份腹部CT影像學(xué)報(bào)告。與影像學(xué)專家共同確定報(bào)告中影像所見部分的5類實(shí)體部位、形態(tài)、大小、密度和增強(qiáng)，并進(jìn)行人工標(biāo)注。將98份報(bào)告按7:3的比例隨機(jī)分為訓(xùn)練集樣本和測(cè)試集樣本，使用條件隨機(jī)場(chǎng)中的三種特征模板進(jìn)行命名實(shí)體識(shí)別，并比較識(shí)別結(jié)果。結(jié)果 98份CT影像學(xué)報(bào)告的影像所見共32332個(gè)漢字及字符，訓(xùn)練集19151字，測(cè)試集7418字。分別利用三種條件隨機(jī)場(chǎng)特征模板時(shí)，實(shí)體的總體識(shí)別結(jié)果F1值平均0.9487，實(shí)體[大小]的識(shí)別的F1值最高達(dá)0.9818。結(jié)論條件隨機(jī)場(chǎng)算法在中文影像學(xué)報(bào)告的命名實(shí)體識(shí)別任務(wù)中具有很高的準(zhǔn)確性，所識(shí)別的實(shí)體可用于進(jìn)行后續(xù)信息提取等自然語言處理任務(wù)。

Objective To explore the method for the named entity recognition in Chinese radiology reports, especially the recognition performance using a conditional random field (CRF) algorithm. Methods We collected 98 abdominal CT radiology reports randomly. Five named entities, including [location], [shape], [size], [density], and [enhancement] were determined together with experienced radiologists. All reports were labeled manually. 98 radiology reports were divided randomly into the training set and test set by a ratio of 7:3. The recognition performances were compared among different feature templates used in the CRF algorithm. Results A total of 32332 Chinese characters and other characters, 19151 characters in the training set and 7418 characters in the test set, were seen in the part of the radiological finding of the study radiology reports. Three CRF feature templates were used respectively. The average F1-score for the entity recognition of all entities was 0.9487, and the F1-score (0.9818) for the entity [size] was the highest. Conclusions The accuracy of named entity recognition in Chinese radiology reports was high using the CRF algorithm. The recognized entities could be applied in information extraction or other tasks in natural language processing.

參考文獻(xiàn)：

[1] 孟勛. 醫(yī)療信息化中的醫(yī)院信息系統(tǒng)建設(shè)研究[J]. 中國(guó)衛(wèi)生產(chǎn)業(yè), 2016,13 (35)：66-67. Meng X. Research on construction of hospital information system in the hospital information[J]. China Health Industry, 2016, 13 (35)：66-67.
[2] 馬錫坤, 楊國(guó)斌, 于京杰. 國(guó)內(nèi)電子病歷發(fā)展與應(yīng)用現(xiàn)狀分析[J]. 計(jì)算機(jī)應(yīng)用與軟件, 2015, 32 (1) : 10-12, 38. Ma XK, Yang GB, Yu JJ. Analysing the development and application status of electronic medical records in China[J]. Computer Applications and Software, 2015, 32 (1) : 10-12, 38.
[3] 聶莉莉, 李傳富, 許曉倩, 等. 人工智能在醫(yī)學(xué)診斷知識(shí)圖譜構(gòu)建中的應(yīng)用研究[J]. 醫(yī)學(xué)信息學(xué)雜志, 2018, 39(6): 7-12.
Nie LL, Li CF, Xu XQ, et al. Study on application intelligence in the building of medical diagnosis knowledge graph[J]. Journal of Medical Intelligence, 2018, 39(6): 7-12.
[4] Liu Y, Zhu LN, Liu Q, et al. Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing[J]. Chinese Medical Journal, 2019, 132(14): 1673-1680.
[5] 于楠. 中文電子病歷信息抽取關(guān)鍵技術(shù)研究[D]. 北京: 北京工業(yè)大學(xué), 2017.
Yu N. Study on key technology of Chinese electronic medical records information extraction[D]. Beijing: Beijing University of Technology, 2017.
[6] 周昆. 基于規(guī)則的命名實(shí)體識(shí)別研究[D]. 合肥: 合肥工業(yè)大學(xué), 2010.
Zhou K. Research on named entity recognition based on rules[D]. Hefei: Hefei University of Technology, 2010.
[7] Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American Medical Informatics Association, 2014, 21(5) : 808-814.
[8] Chen Y, Lasko TA, Mei Q, et al. A study of active learning methods for named entity recognition in clinical text[J]. Journal of Biomedical Informatics, 2015, 58: 11-18.
[9] 曲春燕, 關(guān)毅, 楊錦鋒, 等. 中文電子病歷命名實(shí)體標(biāo)注語料庫構(gòu)建[J]. 高技術(shù)通訊, 2015, 25(2): 143-150.
Qu CY, Guan Y, Yang JF, et al. The construction of annotated corpora of named entities for Chinese electronic medical records[J]. High Technology Letters, 2015, 25(2): 143-150.
[10] 李航.統(tǒng)計(jì)學(xué)習(xí)方法[M]. 北京:清華大學(xué)出版社，2012: 194-198.
[11] Chen P, Liu Q, Wei L, et al. Automatically structuring on Chinese ultrasound report of cerebrovascular diseases via natural language processing[J]. IEEE Access, 2019,7: 89043-89050.
[12] Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports[J]. Artificial Intelligence in Medicine, 2016, 66 : 29-39.
[13] Liu X, Zhou Y, Wang Z. Recognition and extraction of named entities in online medical diagnosis data based on a deep neural network[J]. Journal of Visual Communication and Image Representation, 2019, 60: 1-15.
[14] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging [EB/OL].[2019-11-30]. https://arxiv.org/pdf/1508.01991
[15] Souza F, Nogueira R, Lotufo R. Portuguese named entity recognition using BERT-CRF [EB/OL].[2019-11-30]. https://arxiv.org/pdf/1909.10649

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請(qǐng)登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频