초록 열기/닫기 버튼

Recently, Numerous data mining methods in the bioinformatics field have been developed for processing biodata. We extracted significant genes (60,483 of gene expression data from TCGA) for the prognosis prediction of 1,157 patients using gene expression data from patients with kidney cancer and applied classification methods based on data mining. Significant genes were extracted using least absolute shrinkage and selection operator (LASSO) and principal component analysis (PCA), and classification accuracy and performance were compared using a classification algorithm. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy as risk factors by sample type, primary diagnosis, tumor stage, and vital status representing the state of patients. Classification accuracy based on sample type showed the best performance, particularly for the logistic regression and support vector machine algorithms. These results can be applied to extract biomarkers for prognosis prediction of kidney cancer from various causes and for preventing kidney cancer and early diagnosis.