A Cluster-Based Machine Learning Model for Large Healthcare Data Analysis [abstract]

A Cluster-Based Machine Learning Model for Large Healthcare Data Analysis [abstract]

A Cluster-Based Machine Learning Model for Large Healthcare Data Analysis

Fatemeh Sharifi, Emad Mohammed, Trafford Crump, Behrouz H. Far


There is huge growth in the amount of patient survey data being generated in healthcare industries and hospitals. Curse of dimensionality is a barrier to extracting useful information from patient survey data which can help in the treatment and care of patients. It is paramount to have methods to find importance of features based on such huge volumes of stored information for the desired outputs. The health-related quality of life (HRQOL) is a powerful paradigm to help reaching such a desired output, measuring as patient satisfaction. In such scenarios, it is difficult to investigate the features, out of such high-dimensional data, that could best represent desired output and explain them so that such features can be used in the future at the point f care. In this paper we propose a Cluster-based Random Forest (CB-RF) method to particularly exploit the most important features for the desired output, which is Expanded Prostate Index Composite-26 (EPIC-26) domain scores. EPIC-26 is being used for assessing a range of HRQOL issues related to the diagnosis and treatment of prostate cancer. Different feature extraction methods are applied to extract features and the best method is the proposed CB-RF model which could find the most important features (10 or less) out of over 1500 features that can be used to accurately estimate patient with their EPIC-26 values with on average 85% coefficient of correlation between predicted and observed values of real dataset including 5093 patients.


Machine learning Big data Patient quality of life Dimension reduction 

Part of the Communications in Computer and Information Science book series (CCIS, volume 1054)

The Calgary Prostate Cancer Centre has the highest accrual for a novel ultrasound study in prostate cancer

“We have enrolled over 400 patients at our site, reaching our enrollment goal much faster than all other sites across North America. We are now planning on adding in 250 more patients to this trial because of the encouraging results found with the first arm of the trial. Our site tied with the highest accrual goal and surpassed all other sites to meet our enrollment goal.”

The study is a “Multi-Center trial of high-resolution transrectal ultrasound versus standard low-resolution transrectal ultrasound for the identification of clinically significant prostate cancer”

The only definitive method for diagnosing prostate cancer is through a prostate biopsy. This procedure includes the use of an ultrasound machine to guide both freezing needles and biopsy needles into the prostate. The ultrasound machine that is currently in use is a low-resolution ultrasound machine which means that although it is good at seeing the entire prostate gland to guide the needles, it is often unable to visualize the prostate in enough detail to be able to see different lesions and areas of concern within it. Thus, many biopsy samples are taken systematically with two samples from each section of the prostate. Recently a new ultrasound machine has been created that gives images of the prostate with much higher resolution, allowing the radiologist performing the biopsy to see details within the prostate that were previously inaccessible. A study using this new high-resolution ultrasound machine is being completed at the Prostate Cancer Centre to compare the adequacy of this new machine to detect prostate cancer over the standard low-resolution machine. Over 650 patients will be enrolled in this study!


- Eric Hyndman