论文部分内容阅读
挖掘频繁项集是数据挖掘中最基本的问题之一,而大型数据库庞大的数据使得传统的频繁模式挖掘算法难以适用。针对大型数据库的特点,在分析FP-growth算法的基础上,提出一种基于等价类的大型数据库频繁模式挖掘算法EFP-growth(Equivalent Classes Frequent Patterns-Growth)算法。EFP-growth算法利用项集等价类将关联规则挖掘的项集分成互不相交的子空间的性质,将一个大型数据库分解成多个投影数据库,依次在每一个投影数据库上进行约束频繁项集挖掘。算法尤其适合支持度较小时的大型数据库的挖掘。分析和实验表明EFP-growth算法在挖掘大型数据库时时间和空间的性能上均优于FP-growth算法。而且,随着数据库规模的增大,EFP-growth算法具有更明显的优势。“,”Finding frequent itemsets is one of the most basic problems in data mining. The large amounts of data make the traditional algorithms for frequent patterns mining difficult to extend to large databases. According to characteristic of large databases, inspired by the fact that the FP-growth provides an effective algorithm, a new EFP-growth for mining frequent patterns in large databases is proposed. Based on the characteristic of equivalent classes , which separate item sets of association rules into many subsets , proposed algorithm divides a large database into many projection subsets and carries out constrained frequent. Experiments show that the algorithm has accelerated the mining speed and the performance of space scalability is superior to the FP-growth algorithm. Moreover, the algorithm has a very good time and space scalability with the increasing size of database.