51黑料吃瓜在线观看,51黑料官网|51黑料捷克街头搭讪_51黑料入口最新视频

設(shè)為首頁 |  加入收藏
首頁首頁 期刊簡介 消息通知 編委會 電子期刊 投稿須知 廣告合作 聯(lián)系我們
一種面向小樣本數(shù)據(jù)的錯標記樣本識別方法

A mislabeled sample recognition method for small sample data

作者: 秦瑞斌  鄭浩然  周宏 
單位:中國科學技術(shù)大學計算機科學與技術(shù)學院(合肥230027)
關(guān)鍵詞: 錯標記;小樣本數(shù)據(jù);微陣列 
分類號:
出版年·卷·期(頁碼):2012·31·6(574-578)
摘要:

目的 針對小樣本數(shù)據(jù)的錯標記問題,本文在CL-stability算法的基礎(chǔ)上提出一種加權(quán)的錯標記樣本識別算法(UCL-stability)。方法 在UCL-stability算法中,根據(jù)樣本標記翻轉(zhuǎn)后數(shù)據(jù)所能選出的差異特征數(shù)目,定義了一個投票權(quán)值用于衡量翻轉(zhuǎn)不同樣本標記對分類的影響。結(jié)果 兩組癌癥基因表達數(shù)據(jù)的實驗結(jié)果表明,UCL-stability與CL-stability算法均能有效識別數(shù)據(jù)中的可疑樣本。通過人為錯標記樣本的進一步實驗,顯示UCL-stability算法相比于無投票權(quán)的CL-stability算法可取得較高的precision和recall值。結(jié)論 本文提出的UCL-stability算法不僅考慮了小樣本數(shù)據(jù)中單個樣本的標記錯誤對分類器設(shè)計造成的影響,更進一步考慮了不同樣本的標記錯誤對分類結(jié)果影響的差異。通過引入特征信息衡量該差異,UCL-stability取得了較好的結(jié)果。

Objective To propose a new method UCL-stability based on the CL-stability method to solve the mislabeled sample problem. Methods According to the number of significant differential features (after sample label flipping),UCL-stability proposes a voting weight in order to measure the effects of flipping different samples’ label. Results The experimental results of two cancer microarray data sets indicate that both UCL-stability and CL-stability can recognize the suspect samples effectively. The further experiments of artificial mislabeling show that UCL-stability can obtain a higher value of precision and recall. Conclusions The UCL-stability algorithm not only considers the effects of a single sample’s mislabeling,but also distinguishes the effects of different samples’ mislabeling. In order to measure the effects quantitatively,we employ the feature information and achieve preferable results.

參考文獻:

[1]Alon U,Barkai N,Notterman DA,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotides array [J]. Proceedings of the National Academy of Sciences of the United States of America,1999,96:6745-6750.
[2]West M,Blanchette C,Huang E,et al. Predicting the clinical status of human breast cancer by using gene expression profiles [J]. Proceedings of the National Academy of Sciences of the United States of America,2001,98:11462-11467.
[3]West M. Bayesian factor regression models in the ‘Large p,Small n’ paradigm [J]. Bayesian Statistics,2003,7:723-732.
[4]Brodley CE,Friedly MA. Identifying mislabeled training data [J]. Journal of Artificial Intelligence Research,1999,11:131-166.
[5]Muhlenbach F,Lallich S,Zighed DA. Identifying and handling mislabeled instances [J].Journal of Intelligent Information Systems,2004,22:89-109.
[6]Venkataraman S,Metaxas D,Fradkin D,et al. Distinguishing mislabeled data from correctly labeled data in classifier design [C]. In 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAL’04),2004:668-672.
[7]Malossini A,Blanzieri E,Ng RT. Detecting potential labeling errors in microarrays by data perturbation [J]. Bioinformatics,2006,22:2114-2121.
[8]Zhang C,Wu C,Blanzieri E,et al. Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model [J]. Bioinformatics,2009,25:2708-2714.
[9]Zhang W,Rekaya R,Bertrand K. A method for predicting disease subtypes in presence of misclassification among training samples using gene expression:application to human breast cancer [J]. Bioinformatics,2006,22:317-325.
[10]Barnett V,Lewis T. Outliers in Statistical Data [M]. New York:John Wiley and Sons,1994.
 

服務與反饋:
文章下載】【加入收藏
提示:您還未登錄,請登錄!點此登錄
 
友情鏈接  
地址:北京安定門外安貞醫(yī)院內(nèi)北京生物醫(yī)學工程編輯部
電話:010-64456508  傳真:010-64456661
電子郵箱:[email protected]