Cancer classification through high-throughput gene expression profiles has been widely used in biomedical research. Most recently, we portrayed a multivariate method for large scale gene selection based on information theory with the central issue of feature interdependence, and we validated its effectiveness using a colon cancer benchmark. The present paper further develops our previous work on feature interdependence. Firstly, we have refined the method and proposed a complete framework to select a gene signature for a certain disease phenotype prediction under high-throughput technologies. The framework has then been applied to a brain cancer gene expression profile derived from Affymetrix Human Genome U95Av2 Array, where the number of interrogated genes is six times larger than that in the previously studied colon cancer data set. Three information theory based filters were used for comparison. Our experimental results show that the framework outperforms them in terms of classification performance based upon three performance measures. Additionally, to demonstrate how effectively feature interdependence can be tackled within the framework, two sets of enrichment analysis have also been performed. The results also show that more statistically significant gene sets and regulatory interactions could be found in our gene signature. Therefore, this framework could be promising for high-throughput gene selection around gene synergy.
Proceedings of the 6th International Conference on Agents and Artificial Intelligence.