GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based on Genetic Algorithms

Mir Md. Jahangir Kabir, Shuxiang Xu, Byeong Ho Kang, Zongyuan Zhao

Abstract


This paper presents a new approach based on genetic algorithms (GAs) to generate maximal frequent itemsets (MFIs) from large datasets. This new algorithm, GeneticMax, is heuristic which mimics natural selection approaches for finding MFIs in an efficient way. The search strategy of this algorithm uses a lexicographic tree that avoids level by level searching which reduces the time required to mine the MFIs in a linear way. Our implementation of the search strategy includes bitmap representation of the nodes in a lexicographic tree and identifying frequent itemsets (FIs) from superset-subset relationships of nodes. This new algorithm uses the principles of GAs to perform global searches. The time complexity is less than many of the other algorithms since it uses a non-deterministic approach. We separate the effect of each step of this algorithm by experimental analysis on real datasets such as Tic-Tac-Toe, Zoo, and a 10000x8 dataset. Our experimental results showed that this approach is efficient and scalable for different sizes of itemsets. It accesses a major dataset to calculate a support value for fewer number of nodes to find the FIs even when the search space is very large, dramatically reducing the search time. The proposed algorithm shows how evolutionary method can be used on real datasets to find all the MFIs in an efficient way.

Keywords


Data Mining; Genetic Algorithm; Maximal Frequent Itemset; Lexicographic Tree;

References


R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in 20th International Conference on Very Large Data Bases, 1994, pp. 487-499.

R. Agrawal, T. Imieliński, and A. Swami, "Mining association rules between sets of items in large databases," Proc. 1993 ACM SIGMOD Int. Conf. Manag. Data - SIGMOD '93, May 1993, pp. 207-216.

R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad, "A tree projection algorithm for generation of frequent itemsets," Parallel Distrib. Comput. Spec. Issue High Perform. Data Min., vol. 61, no. 3, pp. 350-371, 2001.

J. J. Cameron and C. K. Leung, "Mining frequent patterns from precise and uncertain data," Comput. Syst., vol. 1, no. 1, pp. 3-22, 2011.

M. M. J. Kabir, S. Xu, B. H. O. Kang, and Z. Zhao, "Association rule mining for both frequent and infrequent items using particle swarm optimization algorithm," Int. J. Comput. Sci. Eng., vol. 6, no. 7, pp. 221-231, 2014.

J. Han, H. Cheng, D. Xin, and X. Yan, "Frequent pattern mining: current status and future directions," Data Min. Knowl. Discov., vol. 15, no. 1, pp. 55-86, Jan. 2007.

Q. Mei, D. Xin, H. Cheng, J. Han, and C. Zhai, "Generating semantic annotations for frequent patterns with context analysis," Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD '06, 2006, pp. 337-346.

D.-I. Lin and Z. M. Kedem, "Pincer-Search: a new algorithm for discovering the maximal frequent set," in Proc. 6th International Conference on Extending Database Technology, 1998, pp. 103-119.

R. J. Bayardo, "Efficiently mining long patterns from databases," ACM SIGMOD, pp. 85-93, 1998.

R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad, "Depth first generation of long patterns," Proc. Sixth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD '00, 2000, pp. 108-118.

D. Burdick, M. Calimlim, and J. Gehrke, "MAFIA: a maximal frequent itemset algorithm for transactional databases," Proc. 17th Int. Conf. Data Eng., 2001, pp. 443-452.

K. Gouda and M. J. Zaki, "GenMax : an efficient algorithm for mining," Data Min. Knowl. Discov., vol. 11, no. 3, pp. 223-242, 2005.

B. Alataş and E. Akin, "An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules," Soft Comput., vol. 10, no. 3, pp. 230-237, Apr. 2005.

H. Lu, R. Setiono, and H. Liu, "Effective data mining using neural networks," IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 957-961.

S. Ghosh, A. Nag, D. Biswas, J. P. Singh, S. Biswas, D. Sarkar, and P. P. Sarkar, "Weather data mining using artificial neural network," 2011 IEEE Recent Adv. Intell. Comput. Syst., pp. 192-195, Sep. 2011.

W. Dou, J. Hu, K. Hirasawa, and G. Wu, "Quick response data mining model using genetic algorithm," 2008 SICE Annu. Conf., Aug. 2008, pp. 1214-1219.

A. Salleb-Aouissi, C. Vrain, C. Nortet, X. Kong, and D. Cassard, "QuantMiner for mining quantitative association rules," Mach. Learn. Res., vol. 14, no. 1, pp. 3153-3157, 2013.

A. Salleb-Aouissi, C. Vrain, and C. Nortet, "QuantMiner : a Genetic Algorithm for mining quantitative association rules," Proc. 20th International Joint Conference on Artificial Intelligence, 2007, pp. 1035-1040.

R. J. Kuo and C. W. Shih, "Association rule mining through the ant colony system for national health insurance research database in Taiwan," Comput. Math. with Appl., vol. 54, no. 11-12, pp. 1303-1318, Dec. 2007.

R. J. Kuo, C. M. Chao, and Y. T. Chiu, "Application of particle swarm optimization to association rule mining," Appl. Soft Comput., vol. 11, no. 1, pp. 326-336, 2011.

J. P. Huang, C. T. Yang, and C. H. Fu, "A Genetic Algorithm based searching of maximal frequent itemsets," in International Conference on Artificial Intelligence, 2004, pp. 548-554.

J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press, 1975.

D. Beasley, D. R. Bull, and R. R. Martin, "An overview of Genetic Algorithms : Part 1 , Fundamentals," Univ. Comput., vol. 15, no. 2, pp. 58-69, 1993.

M. Srinivas and L. M. Patnaik, "Genetic Algorithms: a survey," Computer (Long. Beach. Calif)., vol. 27, no. 6, pp. 17-26, 1994.

D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1989.

Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. Berlin: Springer, 1992.

A. A. Freitas, "A survey of evolutionary algorithms for data mining and knowledge discovery," in Advances in Evolutionary Computing, Springer Berlin-Heidelberg, 2003, pp. 819-845.


Full Text: PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

IT in Innovation IT in Business IT in Engineering IT in Health IT in Science IT in Design IT in Fashion

IT in Industry http://www.it-in-industry.com ISSN (Online): 2203-1731; ISSN (Print): 2204-0595