Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data

Contributors: Reda Alhajj, PhD, Mohamad Elzohbi, Peter Peng

Knowledge-Based Systems 56 (2004) 108-122

Peter Penga, Omer Addama, Mohamad Elzohbia, Sibel T. Özyerb, Ahmad Elhajjc, Shang Gaoa, Yimin Liua, Tansel Özyerd, Mehmet Kayae, Mick Ridleyc, Jon Roknea, Reda Alhajja, f


Clustering is an essential research problem which has received considerable attention in the research community for decades. It is a challenge because there is no unique solution that fits all problems and satisfies all applications. We target to get the most appropriate clustering solution for a given application domain. In other words, clustering algorithms in general need prior specification of the number of clusters, and this is hard even for domain experts to estimate especially in a dynamic environment where the data changes and/or become available incrementally. In this paper, we described and analyze the effectiveness of a robust clustering algorithm which integrates multi-objective genetic algorithm into a framework capable of producing alternative clustering solutions; it is called Multi-objective K-Means Genetic Algorithm (MOKGA). We investigate its application for clustering a variety of datasets, including microarray gene expression data. The reported results are promising. Though we concentrate on gene expression and mostly cancer data, the proposed approach is general enough and works equally to cluster other datasets as demonstrated by the two datasets Iris and Ruspini. After running MOKGA, a pareto-optimal front is obtained, and gives the optimal number of clusters as a solution set. The achieved clustering results are then analyzed and validated under several cluster validity techniques proposed in the literature. As a result, the optimal clusters are ranked for each validity index. We apply majority voting to decide on the most appropriate set of validity indexes applicable to every tested dataset. The proposed clustering approach is tested by conducting experiments using seven well cited benchmark data sets. The obtained results are compared with those reported in the literature to demonstrate the applicability and effectiveness of the proposed approach.

Download PDF


Annual Terwillegar Trail Run and Walk Fundraiser

It was a beautiful crisp fall morning for a 10 Km trail run or 7.5 Km walk through the Terwillegar ravine on Saturday, September 29th. The run/walk, hosted by the Terwillegar Trail Run/Walk and the Alberta Cancer Foundation,  is in its 7th year. Its goal is to bring families and friends together to enjoy the outdoors and ultimately raise funds for prostate cancer research.

John Lewis’ research group was out in force; represented by John Lewis, Catalina Vasquez, Arun Raturi, Perrin Beatty and Abbie Coros. Despite the fact that, as one of the run/walk organizers Doug Mitchell pointed out to the participants, John ran in 15-year-old tennis shoes, the Lewis group runners ran well and had a great time!

Funds raised by the Terwillegar Trail Run and Walk go to support cancer research in Alberta. Check out the Alberta Cancer Foundations’ “Dollars at Work” to read about how these funds have been used to support the research from APCaRI members Dr. Frank Wuest and Dr. John Lewis’ labs!

With just over 100 participants this year the 2018 Terwillegar Trail Run/Walk raised over $21 000 for prostate cancer research! You can still donate to this awesome fundraiser, just go to Alberta Cancer Foundation TTRW and click on the Donate Now button!

- Perrin Beatty