Towards Integrating the Gene Ontology and the Hierarchical Bayesian Network Classification Model: An Empirical Case Study

Hasanein Alharbi


Data Mining (DM) is knowledge-intensive process that can be significantly enhanced by integrating the domain knowledge. Recent research claimed that ontology can play various roles in the DM process. Additionally, ontology can facilitate different steps in the Bayesian Network (BN) construction task. To this end, this paper investigates the advantages of consolidating the Gene Ontology (GO) and the Hierarchical Bayesian Network (HBN) classifier in a flexible framework which preserves the advantages of both ontology and Bayesian theory. The proposed Semantically Aware Hierarchical Bayesian Network (SAHBN) classification model introduces a flexible framework that systematically consolidates domain knowledge in the form of ontology and the DM process. Furthermore, it establishes a solid foundation to explore the possibility of integrating more comprehensive ontological knowledge in the DM process. SAHBN is tested using three datasets in the biomedical domain to predict the effect of the DNA repair gene on the human ageing process. DNA repair genes are classified as either ageing-related or non-ageing related based on their GO biological process terms. Overall, SAHBN classifier shows a very competitive performance compared with the existing Bayesian-based classification algorithms. SAHBN has outperformed existing algorithms in more than 50% of the implemented experiments. Six performance criteria were used to evaluate the performance of the proposed SAHBN model.


DNA Repair Gene, Hierarchical Bayesian Network, Human Ageing Process, Ontology, Semantic Data Mining.


U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI magazine, vol. 17, no. 3, pp. 37–54, 1996.

C. Zhang and S. Zhang, Association Rule Mining: Models and Algorithms. Springer-Verlag Berlin Heidelberg. XII, 244., 2002.

M. Sexton and S. Lu, “The challenges of creating actionable knowledge: an action research perspective,” Construction Management and Economics, vol. 2, pp. 683–694, 2009.

L. Cao, P. S. Yu, C. Zhang, and Y. Zhao, Domain driven data mining. New York: Springer, 2010.

L. Cao, “Domain-driven data mining: Challenges and prospects,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 6, pp. 755–769, 2010.

H. Dahan, S. Cohen, L. Rokach, and O. Maimon, Proactive Data Mining with Decision Trees. Springer Science & Business Media., 2014.

C. Antunes and A. Silva, “New Trends in Knowledge Driven Data Mining a position paper,” Proceedings of the 16th International Conference on Enterprise Information Systems, pp. 346–351, 2014.

S. Staab and R. Studer, Hand Book on Ontologies. Springer Science & Business Media, 2013.

G. Mansingh and L. Rao, “The Role of Ontologies in Developing Knowledge Technologies,” In Knowledge Management for Development. Springer US, pp. 145–156, 2014.

H. Liu, “Towards semantic data mining,” In Proc. of the 9th International Semantic Web Conference (ISWC2010). 2010.

D. Dou, H. Wang, and H. Liu, “Semantic data mining: A survey of ontology-based approaches,” in Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), 2015, pp. 244–251.

J. Han and M. Kamber, Data Mining: Concepts and Techniques, vol. 12. 2011.

S.-H. Liao, P.-H. Chu, and P.-Y. Hsiao, “Data mining techniques and applications – A decade review from 2000 to 2011,” Expert Systems with Applications, vol. 39, no. 12, pp. 11303–11311, 2012.

P. Ristoski and H. Paulheim, “Semantic Web in data mining and knowledge discovery: A comprehensive survey,” Web Semantics: Science, Services and Agents on, 2016.

P. K. Novak, A. Vavpetic, I. Trajkovski, and N. Lavrac, “Towards semantic data mining with g-segs,” in Proceedings of the 11th International Multiconference Information Society, IS, 2009.

J. A. Blake and M. A. Harris, “The Gene Ontology (GO) Project: Structured vocabularies for molecular biology and their application to genome and expression analysis,” Current Protocols in Bioinformatics, no. SUPPL. 23. 2008.

M. Harris, J. Deegan, and J. Lomax, “The Gene Ontology project in 2008,” Nucleic Acids Research, vol. 36, no. Database issue, pp. D440–D444, 2008.

P. Gaudet, N. Škunca, J. C. Hu, and C. Dessimoz, “Primer on the Gene Ontology,” arXiv preprint arXiv:1602.01876, 2016.

R. Balakrishnan, M. A. Harris, R. Huntley, K. Van Auken, and J. Michael Cherry, “A guide to best practices for gene ontology (GO) manual annotation,” Database, vol. 2013, 2013.

The Gene Ontology Consortium, “Gene Ontology Consortium: going forward,” Nucleic Acids Research, vol. 43, no. D1, pp. D1049–D1056, 2015.

S. Götz and A. Conesa, Visual Gene Ontology Based Knowledge Discovery in Functional Genomics. INTECH Open Access Publisher, 2011.

R. P. Huntley, T. Sawford, M. J. Martin, and C. O’Donovan, “Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt.,” GigaScience, vol. 3, no. 1, p. 4, 2014.

J. A. Blake, “Ten quick tips for using the gene ontology,” PLoS Comput Biol, vol. 9, no. 11, p. e1003343, 2013.

E. Gyftodimos and P. a Flach, “Hierarchical Bayesian Networks : An Approach to Classification and Learning for Structured Data,” Proceedings of the ECML/PKDD - 2003 Workshop on Probablistic Graphical Models for Classification, vol. 3025. pp. 291–300, 2004.

E. Gyftodimos and P. A. Flach, “Hierarchical bayesian networks: A probabilistic reasoning model for structured domains,” in Proceedings of the ICML-2002 Workshop on Development of Representations, 2002, pp. 23–30.

M. M., L. D., F. N., and S. K., “A hierarchical, ontology-driven Bayesian concept for ubiquitous medical environments--a case study for pulmonary diseases.,” Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, pp. 3807–3810, 2008.

E. Gyftodimos and P. Flach, “Learning hierarchical bayesian networks for human skill modelling,” in Proceedings of the 2003 UK workshop on Computational Intelligence (UKCI-2003). University of Bristol, 2003.

“Gene Ontology Consortium | Gene Ontology Consortium.” [Online]. Available: [Accessed: 21-Dec-2016].

T. D. Nielsen and F. V. Jensen, Bayesian Network and Decision Graph. Springer Science & Business Media, 2009.

D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT press, 2009.

R. G. Almond, R. J. Mislevy, L. S. Steinberg, D. Yan, and D. M. Williamson, “Learning in Models with Fixed Structure,” Bayesian Networks in Educational Assessment. Springer New York, pp. 279–330, 2015.

Z. Ji, Q. Xia, and G. Meng, “A Review of Parameter Learning Methods in Bayesian Network,” in Advanced Intelligent Computing Theories and Applications: 11th International Conference, ICIC 2015, Fuzhou, China, August 20-23, 2015. Proceedings, Part III, D.-S. Huang and K. Han, Eds. Cham: Springer International Publishing, 2015, pp. 3–12.

H. E. Wheeler and S. K. Kim, “Genetics and genomics of human ageing.,” Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol. 366, no. 1561, pp. 43–50, 2011.

H. Lees, H. Walters, and L. S. Cox, “Animal and human models to understand ageing,” Maturitas, 2016.

T. B. Kirkwood, “The origins of human ageing.,” Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol. 352, no. 1363, pp. 1765–72, 1997.

P. Rashidi and A. Mihailidis, “A survey on ambient-assisted living tools for older adults,” IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 3, pp. 579–590, 2013.

C. Wan, A. A. Freitas, and J. P. De Magalhaes, “Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 2, pp. 262–275, 2015.

J. P. de Magalhães et al., “The Human Ageing Genomic Resources: Online databases and tools for biogerontologists,” Aging Cell, vol. 8, no. 1. pp. 65–72, 2009.

A. a Freitas, O. Vasieva, and J. P. de Magalhães, “A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related.,” BMC genomics, vol. 12, no. 1, p. 27, 2011.

C. Wan and A. Freitas, “Prediction of the pro-longevity or anti-longevity effect of Caenorhabditis Elegans genes based on Bayesian classification methods,” in Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on, 2013, pp. 373–380.

R. D. Wood, M. Mitchell, J. Sgouros, and T. Lindahl, “Human DNA repair genes.,” Science (New York, N.Y.), vol. 291, no. 5507, pp. 1284–9, 2001.

“Human DNA repair genes.” [Online]. Available: [Accessed: 08-Dec-2016].

“GenAge: The Ageing Gene Database.” [Online]. Available: [Accessed: 08-Dec-2016].

“Human Protein Reference Database.” [Online]. Available: [Accessed: 08-Dec-2016].

“UniProt.” [Online]. Available: [Accessed: 08-Dec-2016].

“National Center for Biotechnology Information.” [Online]. Available: [Accessed: 08-Dec-2016].

I. H. W. Eibe Frank, Mark A. Hall, “The WEKA Workbench. Online Appendix for ‘Data Mining: Practical Machine Learning Tools and Techniques.’” Morgan Kaufmann, 2016.

“Norsys Software Corp. - Bayes Net Software.” [Online]. Available: [Accessed: 22-Dec-2016].

Full Text: PDF


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

IT in Innovation IT in Business IT in Engineering IT in Health IT in Science IT in Design IT in Fashion

IT in Industry @ . ISSN (Online): 2203-1731; ISSN (Print): 2204-0595