The D-basis Algorithm for Association Rules of High Confidence

Oren Segal, Justin Cabot-Miller, Kira Adaricheva and J.B.Nation

Abstract


We develop a new approach for distributed computing of the association rules of high confidence on the attributes/columns of a binary table. It is derived from the D-basis algorithm developed by K.Adaricheva and J.B.Nation (Theoretical Computer Science, 2017), which runs multiple times on sub-tables of a given binary table, obtained by removing one or more rows. The sets of rules retrieved at these runs are then aggregated. This allows us to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute. This paper focuses on some algorithmic details and the technical implementation of the new algorithm. Results are given for tests performed on random, synthetic and real data.

Keywords


Binary table, association rules, implications, the D- basis algorithm, parallel computing, the relevance

References


K. Adaricheva, and J.B. Nation, Discovery of the D-basis in binary tables based on hypergraph dualization, v.658 (2017), Theoretical Computer Science, Part B, 307–315.

K. Adaricheva, J.B. Nation, G. Okimoto, V. Adarichev, A. Amanbekkyzy, S. Sarkar, A. Sailanbayev, N. Seidalin, and K. Alibek, Measuring the Implications of the D-basis in Analysis of Data in Biomedical Studies, Proceedings of ICFCA-15, Nerja, Spain; Springer, 2015, 39–57.

K. Adaricheva, J.B. Nation and R. Rand, Ordered direct implicational basis of a finite closure system, Disc. Appl. Math. 161 (2013), 707–723.

K.Adaricheva and T. Ninesling, Direct and binary-direct bases for oneset updates of a closure system, manuscript; presented in poster session of ICFCA-2018 http://icfca2017.irisa.fr/program/accepted-papers/

Adaricheva K., Turan G., Sloan R. and Szorenyi B., Horn belief contraction: remainders, envelopes and computational aspect, in Proceedings of KR-2012, Italy, 107–115.

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, Advances in Knowledge discovery and data mining, AAAI Press, Menlo Park, California (1996), 307–328.

Guillaume Bosc, Marc Plantevit, Jean-Franois Boulicaut, Moustafa Bensafi, and Mehdi Kaytoue, h(odor): Interactive discovery of hypotheses on the structure-odor relationship in neuroscience, in ECML/PKDD 2016 (Demo), 2016.

J.L. Balc´azar, Redundancy, deduction schemes, and minimum-size bases for association rules, Log. Meth. Comput. Sci. 6 (2010), 2:3, 1–33.

Gordon Bell, Jim Gray, and Alex Szalay. Petascale computational systems.Computer, 39(1):110–112, 2006.

Raffaele Bolla, Roberto Bruschi, Franco Davoli, and Flavio Cucchietti. Energy efficiency in the future internet: A survey of existing approaches and trends in energy-aware fixed network infrastructures. IEEE Communications Surveys & Tutorials, 13(2):223–244, 2011.

George Bosilca, Thomas Herault, Ala Rezmerita, and Jack Dongarra. On scalability for mpi runtime systems. In 2011 IEEE International Conference on Cluster Computing, pages 187–195. IEEE, 2011.

M. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms, J. Algorithms 21 (1996), 618–628.

Edgar Gabriel, Graham E Fagg, George Bosilca, Thara Angskun, Jack J Dongarra, Jeffrey M Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, et al. Open mpi: Goals, concept, and design of a next generation mpi implementation. In European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, pages 97–104. Springer, 2004.

B. Ganter, and R.Wille, Formal concept Analysis: Mathematical Foundations, Springer, 1999.

E.L. Kaplan and P. Meier, Nonparametric estimation from incomplete observations, J. Amer. Statist. Assn. 53 N282 (1958), 457–481.

Jonathan Koomey, Stephen Berard, Marla Sanchez, and Henry Wong. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing, 33(3):46–54, 2011.

M. Kryszkiewicz, Concise representation of association rules, Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, Springer-Verlag, London, UK, 92–109.

K. Murakami and T. Uno, Efficient algorithms for dualizing large scale hypergraphs, Disc. Appl. Math. 170 (2014), 83–94.

J. B. Nation, G. Okimoto, T. Wenska, A. Achari, J. Maligro, T. Yoshioka, and E. Zitello, A Comparative analysis of MRNA expression for sixteen different cancers, preprint, http : www:math:hawaii:edu jb=lust 2017 615:pdf

D. Prajapati, S. Garg and N.C.Chanhan Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment, Future Computing and Informatics Journal 2 (2017), 19–30.

O. Segal, J. Cabot-Miller, K. Adaricheva, J.B. Nation, A. Sharafudinov, The Bases of Association Rules of High Confidence, Proceedings of SIGPRO - 2018, Sydney, David C.Wyld et al. (Eds) pp. 39-51, 2018.

Arnaud Soulet and Bruno Crmilleux, Mining constraint-based patterns using automatic relaxation, Intell. Data Anal., 13(1):109–133, 2009.

Arnaud Soulet, Chedy Rassi, Marc Plantevit, and Bruno Cremilleux,Mining dominant patterns in the sky In IEEE 11th Int. Conf on Data Mining (ICDM 2011), pages 655–664. IEEE, 2011.

T. Rauber and G. R¨unger. Parallel Programming: for Multicore and Cluster Systems. Springer Berlin Heidelberg, 2013.

The Cancer Genome Atlas Research Network, The Cancer Genome Atlas Pan-Cancer analysis project, Nature Genetics 45 (2013), 1113–1120.

Frequent Itemset Mining Dataset Repository, publicly available at http://fimi. ua. ac.be /data/retail.dat


Full Text: PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

IT in Innovation IT in Business IT in Engineering IT in Health IT in Science IT in Design IT in Fashion

IT in Industry @ (2012 - ) . http://www.it-in-industry.com . ISSN (Online): 2203-1731; ISSN (Print): 2204-0595