A Statistical Theory for Quantitative Association Rules

Yonatan Aumann and Yehuda Lindell

Abstract:

The goal of data mining is to discover knowledge and reveal new, interesting and previously unknown information to the user. A central data-mining tool is association rules. For events X and Y, an association rules is a rule of the type "X implies Y" with a certain probability. Classical use of association rules is with market-basket data resulting in rules of the type "70% of people who buy beer also buy diapers".

Association rules discover patterns and correlations that may be buried deep inside a database. They have therefore become a key data-mining tool and as such have been well researched. So far, this research has been focused predominantly on databases containing categorical data only. However, many, if not most, real-world databases contain quantitative attributes and current solutions for this case are so far inadequate.

We introduce a new definition of quantitative association rules based on statistical inference theory. Our definition reflects the intuition that the goal of association rules is to find extraordinary and therefore interesting phenomena in databases.

We present rigorous experimental evaluation on real-world datasets, demonstrating the usefulness and characteristics of rules mined according to our definition.

My Master's thesis: Postscript, gzipped Postscript (recommended only for Sections 4 and 5; for other sections, see the full version).

The paper which appeared in KDD'99: Postscript, gzipped Postscript .

A full version of the paper: Postscript, gzipped Postscript .

Back Home