Important Features Identification for Prostate Cancer Patients Stratification Using Isolation Forest and Interactive Clustering Method
E. A. Mohammed, E. Shakeri, H. A. Z. Shakeri, T. Crump and B. Far, “Important Features Identification for Prostate Cancer Patients Stratification Using Isolation Forest and Interactive Clustering Method,” 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 334-341, doi: 10.1109/IRI51335.2021.00052.
Date of Conference: 10-12 Aug. 2021
Date Added to IEEE Xplore: 17 November 2021
Prostate-specific Antigen (PSA) levels are commonly used to screen prostate cancer patients. However, because of the wide range of PSA levels in men, the classification results pertain to extensive false positives and false negatives that may impact the patient treatment. This paper presents a method to cluster prostate cancer patient clinical and demographics data into homogenous groups to support prostate cancer patients’ classification with high accuracy. The proposed method is based on the isolation forest and interactive (two-step) clustering algorithm. We further analyze each group for commonalities and differences. The dataset used in this paper is collected from participants enrolled in the Alberta Prostate Cancer Research Initiative (APCaRI) study, which includes (after pre-processing) 2,878 patients with 20 clinical and demographics variables. The APCaRI study enrolled the population of men undergoing prostate cancer diagnosis in Calgary and Edmonton, Canada. These patients are referred for a diagnostic biopsy based on conventional clinical guidelines (e.g., elevated PSA or abnormal digital rectal examination). The data contains three different PSA levels measured at three follow-up times and the initial screening PSA level. The analysis results show that the PSA levels are a significant factor within each group, and there is a significant overlap between PSA levels between groups, and it may not be the best factor to classify prostate cancer patients. The data’s majority group has PSA levels (10.83%, 10.44%, and 10.14%) smaller than the remaining groups. This paper concludes that it is maybe better to design an independent classifier per group to identify prostate cancer patients from clinical and demographics data.
Published in: 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)
INSPEC Accession Number: 21299876
Conference Location: Las Vegas, NV, USA