Zhang Yanqing. A Frequent Closed Graph Mining Algorithm in Large Scale Internet based on Hadoop[J]. Advanced Engineering Sciences, 2016,48(4):136-143.DOI:
中文摘要: 针对当前单机模式下频繁闭图挖掘算法无法处理大规模Internet数据集的问题,通过改进Apriori算法,提出了基于Hadoop的迭代式频繁闭图挖掘算法AMR(Apriori based on MapReduce)。首先将动态网络的边集存储在键值表中,并设计了序列化子图编码方案以确保频繁子图的唯一性;然后提出了一种传递子图编码的通信机制,通过整合每个分片的支持度得到全局支持度,从而确保了频繁闭图的准确性;最后通过剪枝得到动态网络的频繁闭图。将AMR算法分别运用于国家级和AS级Internet的动态网络中,结果表明,频繁闭图能够准确表征Internet骨干网络的拓扑结构,说明AMR算法能够快速且有效地挖掘大规模动态网络的频繁闭图。
Abstract
Abstract:The existing frequent closed graph(FCG) mining methods under standalone mode can not process massive Internet datasets
by improving the Apriori algorithm
an iterative algorithm AMR(Apriori based on MapReduce) was proposed to mine FCGs of large scale dynamic networks based on Hadoop. First of all
the edge sets of dynamic networks were stored in the key-value table
and a serialized subgraph coding mechanism was designed to ensure the uniqueness of frequent subgraphs(FSG). Secondly
a communication mechanism was proposed to pass subgraph codes
and to ensure the accuracy of FCG
local supports in each partition were aggregated to a global one. Finally
FCGs were achieved by pruning the FSGs. AMR was used in the dynamic networks of both country and AS level Internet
and experimental results showed that FCG can accurately characterize the topology of backbone Internet
thus verifying the efficiency and effectiveness of AMR in mining FCG of large scale dynamic networks.