北京生物醫(yī)學(xué)工程

基于半監(jiān)督學(xué)習(xí)的患者相似性度量研究

Analysis of patient similarity measurement based on semi-supervised learning

作者：王妮黃艷群劉紅蕾費(fèi)曉璐魏嵐趙相坤陳卉

單位：首都醫(yī)科大學(xué)生物醫(yī)學(xué)工程學(xué)院（北京100069）臨床生物力學(xué)應(yīng)用基礎(chǔ)研究北京市重點(diǎn)實(shí)驗(yàn)室（北京100069）首都醫(yī)科大學(xué)宣武醫(yī)院信息中心（北京100053）

關(guān)鍵詞：半監(jiān)督學(xué)習(xí)；聚類分析；患者相似性；電子病歷；馬氏距離

分類號(hào)：R318;TP31

出版年·卷·期（頁碼）：2020·39·2（152-157）

摘要：

目的對數(shù)據(jù)類型多樣的電子病歷數(shù)據(jù)開展基于半監(jiān)督學(xué)習(xí)的患者相似性度量研究，評(píng)估其可行性和有效性，并為后續(xù)個(gè)性化研究提供相似患者隊(duì)列。方法對來自真實(shí)世界的電子病歷數(shù)據(jù)，首先特異性計(jì)算特征相似性（年齡、性別、疾病、實(shí)驗(yàn)室檢查），結(jié)合專家標(biāo)注的部分監(jiān)督信息構(gòu)成標(biāo)簽集，在標(biāo)簽集中有監(jiān)督地學(xué)習(xí)出最優(yōu)距離度量。然后計(jì)算標(biāo)簽集與無標(biāo)簽集數(shù)據(jù)間的馬氏距離，對無標(biāo)簽集中的每個(gè)樣本，找出與其距離最近的標(biāo)簽集樣本，并將其相似性分值作為該無標(biāo)簽樣本的患者相似性預(yù)測值。最后將學(xué)習(xí)出的患者相似性作為聚類時(shí)評(píng)估患者親疏程度的指標(biāo)，并與基于傳統(tǒng)歐氏距離和余弦距離的聚類結(jié)果進(jìn)行比較。結(jié)果較歐氏距離和余弦距離，基于學(xué)習(xí)出的患者相似性的聚類結(jié)果中，患者相似程度更高，聚類效果更好。結(jié)論對電子病歷數(shù)據(jù)開展基于半監(jiān)督學(xué)習(xí)的患者相似性度量研究是有效的。

Objective To analyze the validity and effect of patient similarity measurement based on semi-supervised learning on electronic medical records and to provide a similar cohort (“patients like me”) for personalized prediction. Methods Based on electronic medical record data, feature similarities (age, sex, disease, laboratory tests) were firstly calculated by using customized measurements. Certain paired feature similarities and their corresponding single similarity score from experts were combined as the label set, based on which the optimal distance measurement was learned by supervised learning. For each sample (i.e., paired similarities for age, sex, disease, and laboratory tests of two patients) of the unlabeled set, its potential similarity score was determined by its nearest neighbor based on the Mahalanobis distance. And then the learned patient similarity was applied to cluster as the closeness degree between patients. The clustering results based on traditional Euclidean distance and cosine distance were given as reference. Results Patients in each cluster based on semi-supervised learning were more similar than those clusters based on classical Euclidian distance and cosine distance. Conclusions It is effective to carry out a study on patient similarity measurement based on semi-supervised learning for electronic medical record data.

參考文獻(xiàn)：

[1] Longhurst CA, Harrington RA, Shah NH. A 'green button' for using aggregate patient data at the point of care[J]. Health Affairs, 2014, 33(7): 1229-1235.

[2] Lee J, Maslove DM, Dubin JA. Personalized mortality prediction driven by electronic medical data and a patient similarity metric[J]. PLoS One, 2015, 10(5): e0127428.

[3] Ng K, Sun J, Hu J, et al. Personalized predictive modeling and risk factor identification using patient similarity[J]. AMIA Joint Summits on Translational Science Proceedings, 2015, 2015: 132-136.

[4] Li L, Cheng WY, Glicksberg BS, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity[J]. Science Translational Medicine, 2015, 7(311): 311ra174.

[5] Sharafoddini A, Dubin JA, Lee J. Patient similarity in prediction models based on health data: a scoping review[J]. JMIR Medical Informatics, 2017, 5(1): e7.

[6] 劉建偉, 劉媛, 羅雄麟.半監(jiān)督學(xué)習(xí)方法[J]. 計(jì)算機(jī)學(xué)報(bào), 2015, 38(8): 1592-1617.

Liu JW, Liu Y, Luo XL. Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38(8): 1592-1617.

[7] 薛巍. 基于半監(jiān)督學(xué)習(xí)的人臉特征抽取方法研究[D]. 揚(yáng)州: 揚(yáng)州大學(xué), 2015.

Xue W. The research of facial feature extraction method based on semi-supervised learning[D]. Yangzhou: Yangzhou University, 2015.

[8] Wang N, Huang Y, Liu H, et al. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records[J]. BioMedical Engineering OnLine, 2019,18: 98.

[9] 黃艷群, 王妮, 張慧, 等. 利用患者相似性建立個(gè)性化糖尿病預(yù)測模型[J]. 醫(yī)學(xué)信息學(xué)雜志, 2019, 40(1):54-58.

Huang YQ, Wang N, Zhang H, et al. Establishing the personalized diabetes prediction models by making use of patient similarity[J]. Journal of Medical Informatics, 2019, 40(1): 54-58.

[10] Wang F, Sun J, Li T, et al. Two Heads Better Than One: Metric+Active Learning and its Applications for IT Service Classification[C]// 2009 Ninth IEEE International Conference on Data Mining. Miami Beach, FL, USA: IEEE Press, 2009: 1022-1027.

[11] Jia Y, Nie F, Zhang C. Trace ratio problem revisited[J]. IEEE Transactions on Neural Networks, 2009, 20(4): 729-735.

[12] 李凱, 王蘭. 層次聚類的簇集成方法研究[J]. 計(jì)算機(jī)工程與應(yīng)用, 2010, 46(27): 120-123.

Li K, Wang L. Research on cluster ensembles methods based on hierarchical clustering[J]. Computer Engineering and Applications, 2010, 46(27): 120-123.

[13] Parimbelli E, Marini S, Sacchi L, et al. Patient similarity for precision medicine: a systematic review[J]. Joumal of Biomedical informatics, 2018, 83: 87-96.

[14] Gottlieb A, Stein GY, Ruppin E, et al. A method for inferring medical diagnoses from patient similarities[J]. BMC Medicine, 2013, 11: 194.

[15] Perlman L , Gottlieb A , Atias N , et al. Combining drug and gene similarity measures for drug-target elucidation[J]. Journal of Computational Biology, 2011, 18(2):133-145.

服務(wù)與反饋：

【文章下載】【加入收藏】

提示：您還未登錄，請登錄！點(diǎn)此登錄

51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频