Aclosely related work to mining Frequent Items over streams is the mining offrequent items over data streams.
However, mining Frequent Items is far morechallenging than counting singleton items due to the combinatorial explosion ofthe number of itemsets. Extensive research has been conducted on queryingprocessing and data management in data streams. The issues of data models,query languages in the context of data streams are the main research topics. Werefer readers to the excellent surveys by Babcock et al.
4 and Golab and¨Ozsu 19. Some prior work on streaming algorithms for mining Frequent Itemsinclude Carma 1 and SWF 3. Carma scans a transaction database twice to minethe set of all Frequent Items, where the first scan of Carma can be applied tocompute a false-positive set of Frequent Items over a landmark window. SWFmines the exact set of all Frequent Items over a time-based sliding window. SWFfirst computes the set of all Frequent Items over the first window and then itrequires only one scan of the database for each incremental update. However,scanning the entire window for each slide may not be efficient enough to handlehigh-speed data streams. For the recent development of Frequent Item/FCI miningover data streams, we are aware of the following two studies.Jin andAgrawal 20 develop the Stream Mining Algorithm using potential frequent 2-itemsetsand the Apriori property to reduce the number of candidate itemsets.
They alsodesign a memory-resident summary data structure that implements a compact prefixtree using a hash table. Their algorithm is approximate and false-positive, whichhas deterministic bounds on the accuracy. The window model they adopt is the landmarkwindow and they do not distinguish recent data from old data. Cheng et al. 21adopt a sliding window model to approximately mine Frequent Items over datastreams. The main contribution of this work is that it proposes a progressivelyincreasing minimum support function to address the dilemma caused by ? in mostof the existing algorithms. When an itemset is retained in the window longer,the technique requires its minimum support threshold to approach the minimumsupport of a Frequent Item.
As a result, it is able to increase ? tosignificantly improve the mining efficiency and save memory consumption, at theexpense of only slightly degraded accuracy. The proposed algorithm of thiswork, called MineSW, is shown by Cheng et al. to outperform an improved versionof the algorithm proposed in 22 that apply Lossy Counting 23 algorithm tomine Frequent Items over a sliding window.