CN113792099B - Data flow high-utility item set mining system based on historical utility table pruning - Google Patents

Data flow high-utility item set mining system based on historical utility table pruning Download PDF

Info

Publication number
CN113792099B
CN113792099B CN202110922923.2A CN202110922923A CN113792099B CN 113792099 B CN113792099 B CN 113792099B CN 202110922923 A CN202110922923 A CN 202110922923A CN 113792099 B CN113792099 B CN 113792099B
Authority
CN
China
Prior art keywords
item
data
utility
global
mining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110922923.2A
Other languages
Chinese (zh)
Other versions
CN113792099A (en
Inventor
闫凤麒
陈欣如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiye Information Technology Co ltd
Original Assignee
Shanghai Xiye Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiye Information Technology Co ltd filed Critical Shanghai Xiye Information Technology Co ltd
Priority to CN202110922923.2A priority Critical patent/CN113792099B/en
Publication of CN113792099A publication Critical patent/CN113792099A/en
Application granted granted Critical
Publication of CN113792099B publication Critical patent/CN113792099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data flow high-utility item set mining system based on historical utility table pruning. The efficient item set mining of the data stream based on the sliding window is one of the most challenging problems in the field of data mining, and the current algorithm can generate a large number of candidate item sets and redundant items, so that the performance is reduced when large-scale data streams are mined, and reference historical mining results are less referenced in the mining process of the data stream. The invention has the innovation points that the history utility value table is established, the search space of the data stream is effectively built by utilizing the history data, candidates and redundant items are reduced, and the distributed architecture is used for constructing a data mining system, so that the establishment and update work of the history utility value table is completed on the premise of not influencing the data stream mining, and the efficiency of mining the high-efficiency data stream item sets is effectively improved.

Description

Data flow high-utility item set mining system based on historical utility table pruning
Technical Field
The invention relates to a frequent pattern mining algorithm and a data stream mining system.
Efficient use of item set mining is an important branch of frequent pattern mining.
Background
Frequent item set mining is an important branch of the data mining field that is capable of mining out item sets that occur more frequently than a user-set threshold from all transactions in the data set. With the wide application of frequent item sets, it is found that some non-frequent item sets can create higher value than frequent item sets, and aiming at the problem, a learner proposes a concept of efficient item set mining, and the efficient item set overcomes the defects of occurrence frequency, price, profit, regional distribution and the like which are not considered in the frequent mining item sets, and evaluates the importance of the item set through comprehensive utility indexes.
The current mode growth method is effective in a high-utility item set mining algorithm of a data stream, the HUM-UT algorithm provides a global header table for data in a sliding window, the utility value of the data stream is estimated, the high-efficiency item set is mined by using the global header table and a global utility tree, and the global header table and the utility tree still contain a large number of redundant data items and low-utility item sets. In order to solve the problem, the IHUM-UT algorithm improves the time efficiency by compressing the size of the global header table, the SHUGROWth algorithm optimizes the mining process by constructing an SHU-Tree structure, and the HUISW algorithm optimizes the global header table by constructing a HUIL-Tree.
However, too many candidates and redundancy often result in high spatial complexity of the constructed data structure (especially tree structure), which makes the mining process frequently recursive, resulting in memory overflow and reduced algorithm efficiency. Thus, pruning and filtering redundancy sets is one of the main optimization objectives of current algorithms.
More in algorithms based on sliding window technology is to build better global structures. The current algorithm ignores long-term historical data and has a certain guiding significance on the mining of future data streams in actual data analysis, which can help the algorithm to effectively filter redundancy and candidates. Meanwhile, the current efficient item set mining algorithm based on the distributed framework is also quite scarce, and on the premise that the current data flow is more and more huge, the improvement of the instantaneity and the efficiency of the data flow mining algorithm is quite challenging.
Disclosure of Invention
The current pattern growth algorithm inevitably has the problems of candidate item sets, excessive redundancy items, useless processing of low utility data and the like, and often causes higher space complexity of the constructed high utility tree structure, so that subtrees are frequently recursively created in the mining process, and finally the problems of memory overflow, low algorithm efficiency and the like occur. How to effectively screen candidate sets is one of the main optimization goals of efficient use of the item mining algorithm.
With the development of current distributed systems and data stream engines, there have been many solutions to the problem of handling large-scale data streams, where there are no few excellent data stream engines (spark streaming, store, fly). In the actual data mining and analysis process, long-time historical data analysis has a certain reference value for mining of future data streams and by means of a distributed data processing frame, so that the invention considers that the current data stream mining algorithm is assisted and optimized to realize the transformation from a single machine to a distributed type while effectively mining historical data, reduces the time cost and the storage cost of mining, and shows better expandability and stability for a large data set.
The invention designs a distributed high-utility item set mining system, which ensures the real-time performance of the high-utility item set mining of the current data stream while stably analyzing the historical mining data. Meanwhile, the invention effectively utilizes the result of historical mining data, constructs a historical utility value table, effectively reduces redundancy items of the data stream mining algorithm through the table, and improves the efficiency of the data stream mining algorithm.
In order to achieve the above object, the present invention provides the following solutions:
step 1, creating and updating a history utility value table;
step 2, constructing, updating and optimizing a global header table and a global tree;
step 3, performing efficient item set mining on the optimized global data structure;
step 4, a distributed efficient item set mining system;
advantageous effects
The invention reduces the data items with lower utility value in the optimized global header table, supposes that N data items exist currently, tn transactions have average length L, window size is WinSize, batch size is BatchSize, and when all data items are used, the space complexity can reach O (WinSize x BatchSize x L), and under the condition that the current window size and batch size are kept unchanged, the average length of the transactions can be reduced by reducing the data items with low utility, and meanwhile, the generation quantity and recursion times of subtrees in the global tree are effectively reduced, and on the basis, the time and space complexity of the algorithm are effectively improved.
According to the invention, a comparison experiment is carried out on four classical mining data sets, and remarkable improvement of performance is observed. This also demonstrates the improvement in algorithm efficiency of the construction of the historical utility value table and the construction of the distributed efficient use item set mining system. The method has great significance for improving the efficiency of the efficient item set mining algorithm on the current data stream, ensuring the instantaneity of the algorithm and widening the application of the efficient item set mining algorithm.
Drawings
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a block diagram of a distributed high utility item set mining system;
FIG. 2 is a technical roadmap of a data stream high utility item set mining algorithm based on historical utility table pruning;
FIG. 3 is a flowchart of step one, historical utility value table creation and update;
FIG. 4 is a flowchart of the construction, updating and optimizing of the global head table global tree in the second step;
FIG. 5 is a diagram illustrating efficient use item set mining after step three optimization;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the present invention will be made with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
The implementation process of the system of the invention is shown in fig. 2:
step 1, creating and updating a history utility value table;
step 2, constructing, updating and optimizing a global header table and a global tree;
step 3, performing efficient item set mining on the optimized global data structure;
step 4, a distributed efficient item set mining system;
the individual steps are described in detail below.
Step 1: creation and updating of a historical utility value table, as shown in FIG. 3
1.1 initialization of a historical utility value Table
Initializing when no utility value table exists in the current cache, and constructing different historical utility value tables according to different thresholds, wherein the design of each table item is as follows:
item_index: index representing current data item
item_profile: external utility value representing current data item
item_reliability: utility value mean representing current data item
item_level: representing the level of the current data item (for reference during mining)
And initializing the data items to obtain a historical utility value table, wherein item_index of each item is the name of the data item after initialization, item_profile is updated to an external utility value, item_utility is initialized to 0, item_level is also initialized to 0, and the completed historical utility value table is continuously used in step 1.2.
1.2 creation of historic utility value tables
After the first sliding window is mined, according to the mining result and the calculated utility value mean value, updating item_utility in the data item, which is marked as a low utility item set and is obviously lower than the window transaction weight utility value, marking the item_level of the item_level as-1, mining to obtain item_level marks of high utility multiple item sets as 2, marking the obtained high utility one item set as 1, and both the above data marks and updating are based on the historical utility value table initialized in the step 1.1.
After the first initialization and creation are completed, the data item index is used as a key, the rest items are used as values to be stored in a cache, the names of the historical utility value table are the names of the transaction item set and the minimum utility threshold value, and the historical utility value table generated in the step 1.1 and the step 1.2 only corresponds to a certain minimum utility threshold value under a certain data set.
1.3 updating of a historical utility value Table
After a window is slid forwards and the mining process is completed, two items of item_utility and item_level are updated according to the rule at the time of creation, and a history utility value table to be updated is obtained according to the data set names and the minimum utility value threshold value in the cache.
Step 2: construction, updating and optimization of global header table and global tree as shown in fig. 4
2.1 initialization of Global header tables and Global trees
The global header table needs to contain all data items in the initialization process, and each item in the header table is a transaction utility value average value (TWU) of the data items in the current batch. The global Tree consists of a plurality of TN-Tree subtrees, and three types of nodes are respectively a root node, a general node and a tail node in the TN-Tree. The root node is a null node which is used for merging all child nodes, and the general node and the tail node comprise the current data item name, the pointer of the father node and the pointer of the child node. The tail node is special, and besides the content, all utility values of the current transaction need to be saved, and the tail node is composed of a two-dimensional array. N arrays are created according to the window size, and utility values of data items are stored in each array according to the sequence. In step 2.1, an empty global header table and global root with all data items need to be generated.
2.2 building updates of Global header tables and Global trees
According to the window size, the batch size sequentially reads in the data stream and fills the global head list, and constructs subtrees according to the TN-Tree rule and merges the subtrees onto the global Tree, wherein the global head list is an empty list initialized in the step 2.1, and the merged global Tree is a root node root generated in the step 2.1. It should be noted that transaction items having the same prefix in the tree structure share the same tree node. After the window slides forward, the global header table and the global tree are updated, the oldest batch of data is removed and the latest batch of data is added, and the header table and the tree are updated according to the rules.
2.3 optimization of Global header tables and Global trees
When the cache is provided with a history utility value table corresponding to the current mining window, the global header table needs to be reconstructed according to the data of the history utility value table, and the global header table is optimized mainly according to item_level. The historical utility value table is mainly generated in step 1.1 and step 1.2, and step 1.3 needs to be triggered to update the historical utility value table when the sliding window slides forwards.
The algorithm sorts the absolute high-utility data items (item_level=1) and the potential high-utility data items (item_level=2) to the head of the table, and the data items are mined preferentially in the process of building the tree, the data items are sorted according to the dictionary sequence for the common data items (item_level=0), and the data items (item_level= -1) with low utility are pruned. To guarantee the algorithmic recall as much as possible, the twu value is calculated for the low utility data item (item_level= -1) and remains if it is significantly higher than the minimum utility value of the current window. Meanwhile, the structure of the global tree is adjusted according to the optimization result of the global head table, and the algorithm can adjust to the tail node for the absolute high-utility data item (item_level=1) and the potential high-utility data item (item_level=2) to carry out mining preferentially.
Step 3: efficient use of item mining on optimized global data structures, as shown in FIG. 5
3.1 pretreatment before excavation
The mining firstly needs to read in the data of a complete sliding window, optimizes the global tree and the global head table in the current sliding window according to the step 2.3, adds a utility buffer value to each leaf node after the completion, establishes link pointers of the corresponding tree nodes in the head table, and mines item by item according to the sequence.
3.2 actual excavation Process
According to the global data structure and the mining order obtained in the step 3.1, twu values, utility cache values, positions of nodes in the global tree and transaction path information of a certain data item can be obtained, and because mining is performed through a tail node table, the data item corresponds to leaf nodes, and after mining is completed, the data item nodes can move up the utility cache values to parent nodes.
After mining is started, when the utility value of the data item is greater than or equal to the minimum utility value, the data item is a high utility item set; meanwhile, as long as the twu value of the data item is larger than or equal to the minimum utility value, a sub-header table and a sub-tree are created for the data item; while the twu value of the data item is less than the minimum utility value, depending on the nature of twu convergence downward, the superset of the data item must not be an efficient set of items, thus ending the mining of the data item.
If the current data item needs to create a sub-header table and a subtree, reading utility cache values of nodes corresponding to link pointers of the current data item, obtaining all paths in a global tree where the current data item is located, reading all data items in the paths, calculating twu values, and adding the data items into the sub-header table when twu values are greater than or equal to the minimum utility values. And at the moment, the utility value of the current data item is used as a basic utility value to be added to the tail node of the subtree to complete the construction of the subtree, then the subtree is recursively constructed, the sub-head table is created, and the mining is carried out to finally obtain all the high-efficiency item sets of the current sliding window.
After the excavation work is completed, the sliding window slides forward, and at this time, step 1.3 is required to be executed to update the historical utility value table, and step 2.2, step 2.3 is required to update the global data structure and reconstruct and optimize.
Step 4: distributed efficient use of item set mining system construction, as shown in FIG. 1
The whole high-utility item set mining system is divided into three modules: the historical batch data processing module is mainly responsible for the related processing in the step 1 and is responsible for the worker 1; the data flow processing module is mainly responsible for the related processing in the step 2 and the step 3 and is responsible for the related processing by the worker 2; the historical utility value table caching module is used for storing a historical utility value table which is a processing result of historical batch data by a lightweight caching system (redis) and is used for optimizing the search space of the efficient item set in the pruning data stream, so that the efficiency of the efficient item set mining algorithm is improved.
Innovation point
The invention optimizes the current data flow high-utility item set mining algorithm based on the historical utility value table, and builds a distributed high-utility item set mining system. Because the data flow processing technology of the damping window and the landmark window causes great pressure on a data warehouse and a message queue, the invention adopts a sliding window technology for processing the data flow, and focuses on real-time processing. On the basis, a distributed framework is utilized, tasks are split into two parts of historical data processing and data stream processing, the analysis of historical mining data is performed stably, the real-time performance of the mining of a high-utility item set of the current data stream is guaranteed, the data storage is performed through a lightweight cache, the storage pressure of a data warehouse is reduced, meanwhile, the search space of the data stream mining is effectively pruned according to the referential of the historical data, and the overall efficiency of a data stream mining algorithm is improved.
The system has good performance in retail, connect, TLC _trip and other data sets, and improves the time and space efficiency of the mining algorithm on the premise of ensuring high recall ratio.

Claims (2)

1. A data flow high utility item set mining system based on historical utility table pruning, characterized by comprising the following implementation processes:
step 1, creating and updating a history utility value table;
step 2, constructing, updating and optimizing a global header table and a global tree;
step 3, performing efficient item set mining on the optimized global data structure;
step 4, a distributed efficient item set mining system;
wherein, step 1: creation and updating of historical utility value tables
1.1 initialization of a historical utility value Table
Initializing when no utility value table exists in the current cache, and constructing different historical utility value tables according to different thresholds, wherein the design of each table item is as follows:
item_index: index representing current data item
item_profile: external utility value representing current data item
item_reliability: utility value mean representing current data item
item_level: representing the level of the current data item for reference during mining;
initializing according to the data item to obtain a history utility value table;
1.2 creation of a historical utility value Table
After the first sliding window is mined, according to the mining result and the calculated utility value mean value, updating item_utility in the data item, which is marked as a low utility item set and is obviously lower than the window transaction weight utility value, marking the item_level of the item_utility as-1, and mining to obtain item_level marks of high utility multiple item sets as 2, wherein the obtained high utility one item set is marked as 1;
after the first initialization and creation are completed, the index of the data item is used as a key, the rest items are used as values and stored in a cache, and the names of the history utility value table are the names of the transaction item set and the minimum utility threshold;
1.3 updating the historical utility value Table
After a window is slid forwards and the mining process is completed, two items of item_quality and item_level are updated according to the rule at the time of creation, and after the updating process is completed, a historical utility value table is updated into a cache, and the table name is still the name of the transaction item set and the minimum utility threshold value;
step 2: construction, updating and optimization of global head table and global tree
2.1 initialization of Global header tables and Global trees
The global header table needs to contain all data items in the process of initialization, and each table item in the header table is an estimated value of the utility value of the data item in the current batch, wherein the estimated value is twu value of the data item in the batch; the global Tree consists of a plurality of TN-Tree subtrees, and three types of nodes are respectively root nodes, general nodes and tail nodes in the TN-Tree; the root node is an empty node which is used for merging all child nodes, and the general node and the tail node comprise the current data item name, the pointer of the father node and the pointer of the child node; the tail node is special, and besides the content, all utility values of the current transaction need to be stored, and the tail node is composed of a two-dimensional array; creating n arrays according to the window size, and storing utility values of the data items according to the sequence in each array;
2.2 building updates of Global header tables and Global trees
According to the window size, the batch size sequentially reads in the data stream and fills the global header table, and constructs subtrees according to the TN-Tree rule and merges the subtrees on the global Tree, and it is noted that transaction items with the same prefix in the Tree structure share the same Tree node; after the window slides forwards, updating the global head table and the global tree, removing the data of the oldest batch and adding the data of the latest batch, and updating the head table and the tree according to the rule;
2.3 optimization of Global header tables and Global trees
When the cache is provided with a corresponding historical utility value table in the current mining window, the global header table needs to be reconstructed according to the data of the historical utility value table, and the global header table is optimized according to item_level;
step 3, performing efficient item set mining on the optimized global data structure
The first step of mining needs to add a validity_cache to each leaf node in the global tree, and according to the above, any one leaf node can store the utility value on the current path, and each batch of data is stored by using "{ }", and all batches of data need to be stored into the validity_cache; after the history utility value table is optimized, the head table is constructed according to the screened head table, the data item sequence of the head table is according to the sequence of the optimized tail node table, and each link pointer in the head table is stored for pointing to the position of the corresponding data item of the global tree, and then the head table starts to excavate item by item according to the sequence in the head table;
the method comprises the steps that the position of a data item in a global tree is obtained from a link pointer of the currently mined data item, and because the data item is the last item and is subjected to adjustment of a global table and the global tree, a system begins to mine the data item of a tail node, the current data item corresponds to leaf nodes, namely the leaf nodes necessarily contain the attribute_cache, and when the mining of the item is completed, a child node can transmit the attribute_cache to a father node, so that the node in the corresponding tree of the item also has the attribute_cache when the next data item is mined; since the last value in each utility _ cache is the utility value of the current data item, so all the units of the availability cache that the data item owns are the last item and is the utility value of the data item in the current window; when the utility value is greater than or equal to the minimum utility value, the data item is a high utility item set; meanwhile, as long as the twu value of the data item is larger than or equal to the minimum utility value, a sub-header table and a sub-tree are created for the data item; while the twu value of the data item is smaller than the minimum utility value, according to the property of twu that the superset of the data item is not necessarily an efficient item set, thus ending the mining of the data item;
if the current data item needs to create a sub-header table and a subtree, reading the attribute_cache value of the node corresponding to the link pointer of the current data item, at the moment, obtaining all paths in the global tree where the current data item is located, reading all the data items in the paths, calculating twu values, and adding the data item into the sub-header table when the twu value is more than or equal to the minimum utility value; at this time, the utility value of the current data item is used as a basic utility value to be added to the tail node of the subtree, so as to complete the construction of the subtree; then recursively creating subtrees, creating sub-head tables and excavating to finally obtain all high-utility item sets of the current sliding window;
step 4, distributed efficient item set mining system
The whole high-utility item set mining system is divided into three modules: the historical batch data processing module is responsible for the related processing in the step 1 and is responsible for the worker 1; the data flow processing module is responsible for the related processing in the step 2 and the step 3 and is responsible for the related processing by the worker 2; the historical utility value table caching module is used for storing a historical utility value table, which is a processing result of the historical batch data, by using a lightweight caching system redis and is used for optimizing the search space of the efficient use item set in the pruning data stream, so that the efficiency of the efficient use item set mining algorithm is improved.
2. The system of claim 1, wherein in step 2.3:
the system sorts the absolute high-utility data items with item_level=1 and the potential high-utility data items with item_level=2 to the head of the table, the system digs preferentially in the process of constructing the tree in the later step, the common data items with item_level=0 are still sorted according to the dictionary sequence, and the low-utility data items with item_level=1 are pruned;
to ensure system recall, the twu value is calculated for the low utility data item of item_level= -1, which remains if it is significantly higher than the minimum utility value of the current window; meanwhile, the structure of the global tree is adjusted according to the optimization result of the global head table, and the system can adjust the potential high utility data items with item_level=2 to the tail node for mining preferentially aiming at the absolute high utility data items with item_level=1.
CN202110922923.2A 2021-08-12 2021-08-12 Data flow high-utility item set mining system based on historical utility table pruning Active CN113792099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110922923.2A CN113792099B (en) 2021-08-12 2021-08-12 Data flow high-utility item set mining system based on historical utility table pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110922923.2A CN113792099B (en) 2021-08-12 2021-08-12 Data flow high-utility item set mining system based on historical utility table pruning

Publications (2)

Publication Number Publication Date
CN113792099A CN113792099A (en) 2021-12-14
CN113792099B true CN113792099B (en) 2023-08-25

Family

ID=78875879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110922923.2A Active CN113792099B (en) 2021-08-12 2021-08-12 Data flow high-utility item set mining system based on historical utility table pruning

Country Status (1)

Country Link
CN (1) CN113792099B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868296A (en) * 2016-03-24 2016-08-17 银江股份有限公司 Fast pruning policy based method for drug DDD value data analysis in efficient sequence modes
CN106777182A (en) * 2016-12-23 2017-05-31 陕西理工学院 A kind of data flow effective item set mining algorithm for reducing candidate
CN108401459A (en) * 2015-12-18 2018-08-14 思睿物联网公司 The predictive subdivision of energy consumers
CN110471957A (en) * 2019-08-16 2019-11-19 安徽大学 Localization difference secret protection Mining Frequent Itemsets based on frequent pattern tree (fp tree)
CN111475551A (en) * 2020-06-15 2020-07-31 河北工业大学 High average utility sequence pattern mining method under non-overlapping condition
CN112422484A (en) * 2019-08-23 2021-02-26 华为技术有限公司 Method, apparatus, and storage medium for determining a scenario for processing a security event
CN112434031A (en) * 2020-11-16 2021-03-02 宁波财经学院 Uncertain high-utility mode mining method based on information entropy
CN112667703A (en) * 2020-12-21 2021-04-16 云南大学 Spatial high-utility kernel mode mining method under k-nearest neighbor relation
CN113112635A (en) * 2021-04-12 2021-07-13 滁州博格韦尔电气有限公司 Conventional inspection system for intelligent equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2719774B8 (en) * 2008-11-07 2020-04-22 Adaptive Biotechnologies Corporation Methods of monitoring conditions by sequence analysis
US10452992B2 (en) * 2014-06-30 2019-10-22 Amazon Technologies, Inc. Interactive interfaces for machine learning model evaluations

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108401459A (en) * 2015-12-18 2018-08-14 思睿物联网公司 The predictive subdivision of energy consumers
CN105868296A (en) * 2016-03-24 2016-08-17 银江股份有限公司 Fast pruning policy based method for drug DDD value data analysis in efficient sequence modes
CN106777182A (en) * 2016-12-23 2017-05-31 陕西理工学院 A kind of data flow effective item set mining algorithm for reducing candidate
CN110471957A (en) * 2019-08-16 2019-11-19 安徽大学 Localization difference secret protection Mining Frequent Itemsets based on frequent pattern tree (fp tree)
CN112422484A (en) * 2019-08-23 2021-02-26 华为技术有限公司 Method, apparatus, and storage medium for determining a scenario for processing a security event
CN111475551A (en) * 2020-06-15 2020-07-31 河北工业大学 High average utility sequence pattern mining method under non-overlapping condition
CN112434031A (en) * 2020-11-16 2021-03-02 宁波财经学院 Uncertain high-utility mode mining method based on information entropy
CN112667703A (en) * 2020-12-21 2021-04-16 云南大学 Spatial high-utility kernel mode mining method under k-nearest neighbor relation
CN113112635A (en) * 2021-04-12 2021-07-13 滁州博格韦尔电气有限公司 Conventional inspection system for intelligent equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢旺等.基于空间序偶模式挖掘污染源与癌症病例的关系.《数据分析与知识发现》.2020,14-31. *

Also Published As

Publication number Publication date
CN113792099A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
US9152665B2 (en) Labeling versioned hierarchical data
CN101345707B (en) Method and apparatus for implementing IPv6 packet classification
CN108369582B (en) Address error correction method and terminal
CN105550171A (en) Error correction method and system for query information of vertical search engine
CN102411580B (en) The search method of XML document and device
CN107066551B (en) Row-type and column-type storage method and system for tree-shaped data
KR20110084698A (en) Method for finding frequent itemsets over long transaction data streams
CN107565973B (en) Method for realizing node-extensible Huffman coding and circuit structure
CN103198149A (en) Method and system for query error correction
CN103561133A (en) IP address ownership information indexing and fast querying method
CN103051543A (en) Route prefix processing, lookup, adding and deleting method
CN108197313B (en) Dictionary indexing method for realizing space optimization through 16-bit Trie tree
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN110188131A (en) A kind of Frequent Pattern Mining method and device
CN112035586A (en) Spatial range query method based on extensible learning index
CN113792099B (en) Data flow high-utility item set mining system based on historical utility table pruning
CN114372177A (en) Excel table data matching method
Cheng et al. ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
US20130332433A1 (en) Computer product, generating apparatus, and generating method
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN111556375B (en) Video barrage generation method and device, computer equipment and storage medium
CN117472959A (en) Gskip list-based block chain efficient query system and dynamic construction method
CN113553493A (en) Service selection method based on demand service probability matrix
CN105740458A (en) Frequent subgraph mining method based on CPU MPI (Central Processing Unit Message Passing Interface) parallel depth-first search
CN104850591A (en) Data conversion storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant