北京生物醫(yī)學(xué)工程

基于上下文特征的領(lǐng)域文獻(xiàn)實(shí)體消歧算法

Entity disambiguation algorithm for domain document based on context feature

作者：王靜譚紹峰賀東東陳建輝閆健卓

單位：北京工業(yè)大學(xué)信息學(xué)部（北京 100124）<p>首都醫(yī)科大學(xué)附屬北京友誼醫(yī)院平谷醫(yī)院（北京 101200）</p>

關(guān)鍵詞：實(shí)體消歧; 上下文特征; 概率模型

分類號(hào)：R318

出版年·卷·期（頁碼）：2018·37·4（398-402）

摘要：

目的面向生物醫(yī)學(xué)領(lǐng)域基于文獻(xiàn)的知識(shí)學(xué)習(xí)及應(yīng)用需求, 為解決實(shí)體識(shí)別中存在的詞語歧義問題, 提出一種基于上下文特征的實(shí)體消歧算法。方法實(shí)體消歧通常分為候選生成和實(shí)體消歧兩部分。在候選生成階段, 本文采用基于知識(shí)庫的方法對(duì)實(shí)體指稱生成候選, 并根據(jù)實(shí)體在知識(shí)庫中的先驗(yàn)概率對(duì)候選實(shí)體進(jìn)行篩選, 這樣保證了目標(biāo)實(shí)體的召回率并有效減少消歧階段的計(jì)算復(fù)雜度和噪聲。在實(shí)體消歧階段, 本文提出一種基于上下文特征的實(shí)體消歧方法, 構(gòu)建概率模型計(jì)算實(shí)體上下文和實(shí)體指稱上下文之間的相似度, 選取相似度最大的實(shí)體作為目標(biāo)實(shí)體。對(duì)從文獻(xiàn)中識(shí)別出的命名指稱做實(shí)體消歧實(shí)驗(yàn), 通過領(lǐng)域?qū)＜遗袛鄬?shí)體消歧結(jié)果的正確性, 比較在不同算法下實(shí)體消歧的準(zhǔn)確率。結(jié)果本文提出的方法在所選擇的數(shù)據(jù)集中獲得了83%的實(shí)體消歧準(zhǔn)確率, 高于其他算法。結(jié)論基于上下文特征的實(shí)體消歧算法在本領(lǐng)域的實(shí)體消歧工作中效果最佳。

Objective Based on the requirements of knowledge learning and application in the domain of biomedical, a kind of entity disambiguation algorithm is proposed to solve the problem of word ambiguity in entity recognition. Methods Entity disambiguation is usually divided into two parts: candidate generation and entity disambiguation. In this paper, candidates of name mention are generated based on the knowledge base method, and candidate entities are filtered based on the prior probability in the knowledge base of the candidate entity, which ensures the recall rate of the candidate entity set and the noise reduction in the disambiguation stage effectively. In the stage of entity disambiguation, we propose a disambiguation method based on the contextual characteristics of the entity, construct probabilistic model to compute the similarity between entity context and entity reference context, and select the largest similarity entity as the target entity. Then, we conduct entity disambiguation experiments for name mentions which are recognized from the literature, and determine the correctness of entity disambiguation by domain experts. Finally, we compare the accuracy of entity disambiguation under different algorithms. Results The accuracy of the proposed method is 83% with our dataset, which is higher than that of other algorithms. Conclusions The entity disambiguation algorithm based on context features is the best in the field of entity disambiguation.

參考文獻(xiàn)：

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請(qǐng)登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频