CN116662934A - Early warning target association relation analysis method, system, storage medium and terminal - Google Patents

Early warning target association relation analysis method, system, storage medium and terminal Download PDF

Info

Publication number
CN116662934A
CN116662934A CN202211568128.9A CN202211568128A CN116662934A CN 116662934 A CN116662934 A CN 116662934A CN 202211568128 A CN202211568128 A CN 202211568128A CN 116662934 A CN116662934 A CN 116662934A
Authority
CN
China
Prior art keywords
algorithm
discretized
data set
identifying
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211568128.9A
Other languages
Chinese (zh)
Inventor
李宏权
梁复台
孙合敏
方昆
汤景棉
李灵芝
陈旸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Air Force Early Warning Academy
Original Assignee
Air Force Early Warning Academy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Air Force Early Warning Academy filed Critical Air Force Early Warning Academy
Priority to CN202211568128.9A priority Critical patent/CN116662934A/en
Publication of CN116662934A publication Critical patent/CN116662934A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method, the system, the storage medium and the terminal for analyzing the association relation of the early warning targets are applied to the technical field of information, and can acquire continuous attribute information in a transaction data set which is created in advance and contains preset targets; discretizing the attribute information to obtain a discretized transaction data set; and analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a PrefixSpan algorithm to obtain an association relation aiming at the preset target. According to the scheme provided by the embodiment of the invention, the association relation of the early warning targets can be automatically acquired without manual analysis, so that the analysis speed of the association relation of the early warning targets is improved.

Description

Early warning target association relation analysis method, system, storage medium and terminal
Technical Field
The invention relates to the technical field of information, in particular to a method, a system, a storage medium and a terminal for analyzing early warning target association relation.
Background
The early warning target association relationship is the relationship of causality, restriction and the like among all target activity events, and is an important embodiment of the early warning target activity rule. At present, the extraction of the association relation of the early warning targets is generally carried out manually.
However, the manual method has the problems of low processing speed and low accuracy in mining and analyzing the historical data of the early warning target activity, and is more difficult to perform especially when facing a large amount of data.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a system, a storage medium and a terminal for analyzing early warning target association relations, which are used for solving the problem that the analysis speed of the early warning target association relations is low in a manual mode.
The specific technical scheme is as follows:
in a first aspect of the embodiment of the present invention, there is provided a method for analyzing a correlation between early warning targets, the method including:
acquiring continuous attribute information in a pre-established transaction dataset containing a preset target;
discretizing the attribute information to obtain a discretized transaction data set;
and analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a PrefixSpan algorithm to obtain an association relation aiming at the preset target.
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes:
identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset;
and identifying a strong association relation aiming at the preset target according to the first frequent item set.
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes:
identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm;
identifying a second set of frequent items in the target dataset;
and identifying the strong association relation of the preset target according to the second frequent item set.
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes:
identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm;
and identifying the strong association relation of the preset target according to the identified support.
In a second aspect of the embodiment of the present invention, there is provided an early warning target association relationship analysis apparatus, including:
the information acquisition module is used for acquiring continuous attribute information in a transaction data set which is created in advance and contains a preset target;
the discretization module is used for discretizing the attribute information to obtain a discretized transaction data set;
the association relation acquisition module is used for analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a Prefix span algorithm to obtain the association relation aiming at the preset target.
In a possible implementation manner, the association relationship obtaining module is specifically configured to:
identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset;
and identifying a strong association relation aiming at the preset target according to the first frequent item set.
In a possible implementation manner, the association relationship obtaining module is specifically configured to:
identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm;
identifying a second set of frequent items in the target dataset;
and identifying the strong association relation of the preset target according to the second frequent item set.
In a possible implementation manner, the association relationship obtaining module is specifically configured to: identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm;
and identifying the strong association relation of the preset target according to the identified support.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface, and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the early warning target association relation analysis methods when executing the program stored in the memory.
The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the early warning target association relation analysis methods when being executed by a processor.
The embodiment of the invention also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the early warning target association relation analysis methods.
The embodiment of the invention has the beneficial effects that:
the method, the system, the storage medium and the terminal for analyzing the association relation of the early warning targets can acquire continuous attribute information in a transaction data set which is created in advance and contains preset targets; discretizing the attribute information to obtain a discretized transaction data set; and analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a PrefixSpan algorithm to obtain an association relation aiming at the preset target. According to the scheme provided by the embodiment of the invention, the association relation of the early warning targets can be automatically acquired without manual analysis, so that the analysis speed of the association relation of the early warning targets is improved.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.
FIG. 1 is a schematic flow chart of an early warning target association analysis method provided by an embodiment of the invention;
fig. 2 is an exemplary diagram of an early warning target association analysis method according to an embodiment of the present invention;
FIG. 3 is another exemplary diagram of an early warning target association analysis method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an early warning target association relationship analysis device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by the person skilled in the art based on the present invention are included in the scope of protection of the present invention.
In a first aspect of the embodiment of the present invention, a method for analyzing a correlation between early warning targets is provided, referring to fig. 1, where the method includes:
step S11, acquiring continuous attribute information in a transaction dataset which is created in advance and contains a preset target;
step S12, discretizing the attribute information to obtain a discretized transaction data set;
and S13, analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a Prefix span algorithm to obtain an association relation aiming at the preset target.
The early warning target association relation in the embodiment of the invention mainly describes the relation between targets and event factors such as social activities, military operations, weather environments and the like, searches for causal or association relation hidden between the targets and the event factors, searches for association modes between the targets, is often hidden in a large amount of data, is difficult to directly judge manually, and needs to be searched through data mining. However, the target activity modes represented by the relationships and the rules often reflect certain rule modes of the target activities, and have great significance for air defense early warning and air defense combat. The association degree of the early warning target activities with other types of targets, social activities, meteorological factors and other things and factors is high, and the early warning target activities are special meteorological phenomena, seasonal large bird migration activities and the like. The early warning targets, particularly the associated factors of non-cooperative targets, can be mined, so that the activity rules, the movement trend and the like of the early warning targets can be predicted better, and the effects of assisting commander in making decisions and guiding air defense are achieved. The early warning target association relationship can be divided into an association co-occurrence mode and an association sequence mode. The former mainly represents the association relation between the early warning targets at the same time, and the latter mainly represents the association relation represented by the time sequence data of the early warning targets.
The acquiring the continuous attribute information in the transaction data set including the preset target, which is created in advance, may include cleaning of books and attribute mapping.
1. Data cleansing
Most association rule mining algorithms are directed to boolean data only, so transaction datasets are typically represented by 0-1 binary matrices. At the same time, the dataset should also contain some extended additional transaction information, such as transaction ID, timestamp, etc., which aids in analyzing the dataset. When constructing the binary matrix of the transaction data set 0-1, the problems of abnormal data, abnormal attributes and blank values are faced, and data cleaning is needed. For abnormal data in the data set, abnormal values are often found and removed by means of auxiliary visualization and the like through experience of field experts. Such as by visually exploring the matrix structure for anomalies.
Fig. 2 is a visual illustration of a binary matrix, with black dots representing 1 values in the matrix. It can be seen from the figure that the distribution of items in the data set is irregular, and for those lines that are significantly too dense, it is shown that these transactions contain too many items, which may be problematic, and these problems may occur in the data collection link, which needs to be further examined. These lengthy transactions are rejected if necessary. For discriminating and rejecting abnormal values of a certain continuous attribute in a data set, a common statistical identification method is adopted, and a Grabbs criterion method is adopted. Firstly, the standard deviation s and the residual error delta are obtained from all measured values of the continuous attribute, and the measured value satisfying the condition |delta|/s > g (n) is an abnormal value, wherein g (n) refers to a critical coefficient. And (5) after deleting the abnormal values, recursively calculating according to the method again until all the abnormal values are deleted. And filling the blank value of a certain attribute in the data set, and mainly aiming at two types of item values of the category attribute and the quantization attribute. For the category attribute, a numerical value is uniformly set according to a specific scene to be filled, and the blank value of the binary category attribute is always set to 0. For continuous quantization properties, it cannot be simply set to 0 or 1, but rather the appropriate values should be filled in according to the law of change of the data, where classical polynomial interpolation is used.
2. Attribute mapping
In association rule mining, continuous attributes may not be characterized by simple and non-simple. Such as airport daily rainfall, in different transactions, the attribute of the rainfall item is continuously valued. The association rule algorithms such as Apriori and FP-growth require the data to be in a binary data form, so that continuous attributes in the original data need to be mapped into category attributes. After the attribute mapping is completed, each category attribute with k intervals is represented by k binary pseudo attributes, and the pseudo attributes are equivalent to items with k binary attributes added in the data set, which are shown in table 1. So far, the formal construction of the binary matrix of the transaction data is completed.
TABLE 1 transaction data binary matrix
When discretizing the attribute information to obtain a discretized transaction data set, the attribute mapping requires mapping the continuous attribute into binary data, which is a process of discretizing the continuous attribute. The basic idea of discretizing the continuous attribute is to divide the continuous attribute value range into a plurality of intervals according to a certain strategy, thereby completing the mapping from the continuous variable to the discrete variable. The method improves an entropy-based algorithm based on the thought, after the number of the designated intervals is calculated, the accumulated loss rate of the entropy of each step is calculated, namely the difference between the original entropy and the entropy value at the moment is divided by the number of steps, and the interval corresponding to the minimum accumulated loss rate is divided into the optimal mapping interval, so that the algorithm can automatically select the optimal discrete interval. The method comprises the following specific steps:
step 1: dividing an initial interval for continuous attributes, and calculating entropy:
wherein m is 0 For the initial interval number, i is the interval number, p i The probability that the attribute value falls within the interval i is referred to herein as the frequency.
Step 2: and selecting entropy values after every two adjacent intervals are combined, selecting a combination with the minimum entropy value loss as a new interval, and calculating the entropy value at the moment.
Step 3: calculating the entropy accumulated loss rate:
wherein m is n In order to specify the number of intervals,for this purpose, the corresponding entropy.
Step 4: and (3) repeating the step 2 and the step 3, and selecting the interval with the minimum entropy accumulated loss rate to be divided into the optimal mapping interval of the continuous attribute.
In one example, referring to fig. 3, the adaptive continuous attribute discretization algorithm is used to obtain three intervals of less than 28 mm, 29 to 46 mm and greater than 47 mm, which are marked as light, medium and strong. After the continuous attribute discretization is completed, each category attribute with k intervals is represented by k binary pseudo attributes, and the pseudo attributes are equivalent to items added with k binary attributes in the data set. So far, the formal construction of the binary matrix of the transaction data is completed.
In the embodiment of the invention, the association rule is one of data mining, and aims to find out the relation between data items in a data set. And mining the transaction data set based on the Apriori algorithm, the FP-growth algorithm, the GSP algorithm and the SPADE algorithm from two aspects of mining the association co-occurrence mode and the association sequence mode, and analyzing to obtain a frequent item set and a strong association rule.
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes: identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset; and identifying a strong association relation aiming at the preset target according to the first frequent item set.
In one example, the Apriori algorithm is a commonly used algorithm that mines strong association rules. The core of the method is a recursive algorithm of a two-stage frequent item set concept. The algorithm first finds all frequent item sets and then generates strong association rules from the frequent item sets.
The first step is to find all frequent item sets. All transactions are scanned, generating candidate 1 set C1. And selecting the satisfied items from C1 according to the minimum support threshold, namely obtaining a frequent 1 item set L1. Pruning the collection generated by the self-connection of the L1 to generate a candidate 2 item set C2, scanning all transactions, and screening satisfied items from the C2 according to a minimum support threshold value to obtain a frequent 2 item set L2. Pruning the collection generated by the self connection of the L2 to generate a candidate 3-item set C3, scanning all transactions, and screening satisfied items from the C3 according to a minimum support threshold value to obtain a frequent 3-item set L3. Similarly, a frequent set of k terms is obtained for the L (k-1) iterative operation. The transactions listed in table 1 find frequent item sets, set the minimum support to 0.3, and finally obtain four frequent item sets with support of 0.5.
The second step is to generate strong association rules from frequent item sets. For each frequent item set Ln, all non-empty subsets thereof (frequent item sets) are generated. For each non-empty subset P of Ln, if:
then output strong association rulesIn equation 8.3, min_conf is the minimum confidence threshold set.
Setting the minimum confidence coefficient threshold value to 0.8 for the frequent item set obtained by the first step iteration, and obtaining four strong association rules
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes: identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm; identifying a second set of frequent items in the target dataset; and identifying the strong association relation of the preset target according to the second frequent item set.
Aiming at the defect that the Apriori algorithm generates a large number of candidate item sets in the running process and influences the running speed, han Gu et al propose an improved algorithm FP-growth of the FP-tree, so that the association rule mining efficiency is greatly improved. FP-growth is a method that does not generate candidate set mining frequent item sets. The method adopts a divide-and-conquer idea, after 2 times of scanning, frequent item sets in a database are compressed into a frequent pattern tree (FP-tree), and meanwhile related information in the frequent pattern tree is still reserved, and then the FP-tree is differentiated into a plurality of conditional subtrees. Each subtree is associated with a set of frequent items of length 1, and then the conditional subtrees are mined separately. Specifically, the algorithm steps: the first step is to scan the transaction data set D for a set of items with frequent items 1, sift through those items smaller than the threshold according to a predefined minimum support threshold, and then sort the items in the data set D in descending order. The second step is to perform the second scanning, create the head list and build the FP-tree. Beginning with the creation of the root node, labeled "null", the frequent items of each transaction in D are selected and ordered in the order in L. And thirdly, excavating a frequent item set. Starting from a frequent pattern of length 1, a conditional pattern base is constructed, then a conditional FP-tree is constructed, and the tree is mined recursively, with pattern growth being achieved by suffix pattern concatenation with frequent patterns generated by the conditional FP-tree. The FP-growth method converts the problem of finding long patterns into recursively finding some short patterns, and then concatenates suffixes. It suffixes the least frequent term. The method greatly reduces the searching cost. In one example, the transactions listed in Table 1 find frequent term sets, set a minimum support of 0.3, and similarly obtain four frequent term sets, which also have a support of 0.5.
In a possible implementation manner, the analyzing the discretized transaction data set by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, and PrefixSpan algorithm to obtain the association relationship for the preset target includes: identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm; and identifying the strong association relation of the preset target according to the identified support.
Among them, the Prefix projection pattern mining (Prefix-Projected Pattern Growth, prefix span) algorithm is a sequential pattern mining algorithm based on "pattern growth". It does not require frequent candidate generation, but rather recursively creates "projection data sets" that reduce their search space into smaller partitions. In the PrefixSpan algorithm, a "prefix" is simply a subsequence of the preceding portion of sequence data. "prefix projection" is the concept of a suffix. The prefix and the suffix form a sequence. The combination of all suffixes corresponding to the same prefix is referred to as a "projection dataset" corresponding to the prefix. The goal of the PrefixSpan algorithm is to mine a frequent sequence meeting the minimum support, and like the Apriori algorithm, it begins to mine a sequence pattern from a prefix with length 1, searches the corresponding prefix projection dataset to obtain a frequent sequence corresponding to a prefix with length 1, then recursively mines a frequent sequence corresponding to a prefix with length 2, and so on. And ending until the prefix projection data set of a certain prefix is empty. Optionally, the algorithm steps: the first step: scanning the whole data set, solving the occurrence times of each item in the record (item set or transaction), obtaining frequent items, deleting the items with less than the support degree from the data set, and obtaining a new data set. And a second step of: and for the new data set, sequentially taking the frequent items obtained in the first step as prefixes, and solving prefix projection of the frequent items to obtain all the frequent item prefix projection data sets. And a third step of: and scanning each frequent item prefix projection data set, and recursively obtaining each branch frequent item and each frequent item prefix projection data set. Fourth step: if the projection data set is empty, the branch recursion ends, returning to the branch and counting the frequent sequence of this branch. Fifth step: and outputting all frequent sequence sets meeting the requirement of the support degree if all branch projection data sets are empty. In one example, the transactions listed in Table 1 find frequent item sets, set the minimum support count to 2, and similarly the support count for frequent 4 item sets is also 2.
The inventors found that:
1) The Apriori algorithm has simple principle of mining frequent modes and numerous modifications, but the Apriori algorithm scans a database for a plurality of times in the running process, and generates a large number of candidate item sets to influence the running speed. The FP-growth algorithm has a significantly improved speed over the Apriori algorithm because no candidate set is generated, no candidate tests are used, and a compact data structure is used to avoid duplicate database scans. Studies have shown that the FP-growth algorithm is an order of magnitude faster than the Apriori algorithm.
2) The FP-growth algorithm and the Apriori algorithm generate a set of frequent items based on depth-first and breadth-first searches, respectively. The FP-growth algorithm can only generate frequent item sets, while the Apriori algorithm can not only obtain frequent item sets, but also obtain strong association rules, which has more practical significance in association analysis.
3) GSP algorithms are very useful in associative sequence pattern mining, but the drawbacks are also very significant. The algorithm performs multiple times of scanning on the data set in the executing process, and the algorithm time complexity is high; the method generates a very large candidate list like the Apriori algorithm, has large space complexity, occupies a large amount of resources and is not suitable for mining long sequence modes.
4) The PrefixSpan algorithm does not need to generate candidate sequences, the projection data set is reduced very fast, the memory consumption is relatively stable, and the frequent sequence pattern mining effect is very high. Compared with the GSP algorithm, the PrefixSpan algorithm has great advantages in generating the sequence mining algorithm of the candidate sequence, so that the method is more suitable for information analysis application. The maximum consumption of the PreFixSpan runtime is in constructing the projection database recursively. If the sequence data set is large, the algorithm running speed can be obviously reduced when the number of items is large.
In order to illustrate the method according to the embodiment of the present invention, the following description is given with reference to specific embodiments:
and obtaining association rules meeting the conditions of minimum support and minimum confidence through experimental data mining analysis to form an association rule set.
1. Experimental environmental data sources: the experimental platform is a 32-bit Windows7 system, CPU3.6GHz, 8GB memory and programming by using Python language. The experiment is properly reconstructed according to the real data to obtain 1000 transaction data. Taking mining association rules as an example, mining analysis is performed. Meanwhile, to verify the performance of the associated co-occurrence pattern algorithm, a public dataset kosarak is employed. The data set contains a total of 99 ten thousand sample transactions, each of which records a news story page that has been viewed by an internet user, wherein the news stories are encoded into an index value.
2. Co-occurrence pattern association analysis effect: generally, association rule mining is a demanding traction. For association rulesThe LHS and RHS may be a set of items containing multiple items, which once the need is determined, the RHS is often limited to a single item, with the number of items contained by the LHS being indefinite. Considering that the association rule with high confidence is often concerned by the user, selecting the rule output with high confidence facilitates the user to read the association rule and analyze the activity rule. And calling an Apriori algorithm to mine association rules according to the minimum support degree of 0.3 and the confidence degree of 0.8, and screening to obtain an association rule set. And after ordering the rule sets according to the confidence degrees, three rules with highest confidence degrees in the association rules are inspected.
And calling an FP-growth algorithm to mine association rules, and searching a frequent item set for the transactions listed in the table 2 to obtain frequent items.
It can be seen that the resulting frequent item set is consistent with the Apriori algorithm. However, apriori algorithm can get strong association rules for frequent item sets. In order to verify the mining analysis efficiency of the algorithm, the mining analysis is carried out on the public data set kosarak by adopting an Apriori algorithm and an FP-growth algorithm, the support degree threshold is set to be 0.1, and the same frequent item set can be obtained. The FP-growth takes 7 seconds, and the Apriori algorithm takes more than 10 minutes, so that the FP-growth algorithm has great speed advantage in terms of frequent item mining.
3. Sequence pattern association analysis effect: setting the minimum support count as 2, calling a GSP algorithm to mine association rules, and searching frequent item sets according to a pre-listed sequence data set, wherein the obtained frequent item sets are different from the co-occurrence mode by taking frequent 3 item sets as an example because the sequence mode involves a timestamp concept.
It can be seen that the resulting frequent item set has time-corresponding front and back pieces, and also has simultaneous items, as compared to the Apriori algorithm. Setting the minimum support count as 2, calling a PrefixSpan algorithm to mine association rules, and searching frequent item sets for the transactions listed in table 5 to obtain the same frequent items.
4. And (3) application range analysis of the algorithm: typically, data sets can be divided into dense data sets and sparse data sets. Dense data sets have a large number of long-scale and highly supportive frequent patterns, in which many events are similar, such as DNA analysis or stock sequence analysis. The sparse data set mainly consists of short patterns, and although long patterns exist, the corresponding support degree is small, such as a browsing page sequence of a user in a website. The early warning target association relation data set belongs to a typical sparse data set. The Apriori type algorithm is suitable for application of sparse data sets and is not suitable for application of dense data sets. The GSP algorithm is more suitable for sequence pattern mining with constraints (e.g., time interval constraints of neighboring transactions). The PrefixSpan algorithm works well in both data sets and their advantages are more pronounced in dense data sets. The performance of PrefixSpan is better than GSP. The Apriori class algorithm is simpler to use, but the algorithm performance is poor. Prefixspan, while efficient, is difficult to implement. Therefore, the improved algorithm of the Apriori algorithm is adopted more in the early warning target association rule mining analysis, so that the defect of low execution efficiency of the Apriori algorithm is overcome.
In a second aspect of the embodiment of the present invention, there is provided an early warning target association relationship analysis apparatus, referring to fig. 4, the apparatus includes:
an information obtaining module 401, configured to obtain attribute information, which is created in advance and contains a continuity in a transaction dataset of a preset target;
a discretization module 402, configured to discretize the attribute information to obtain a discretized transaction data set;
the association relationship obtaining module 403 is configured to analyze the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm, and a PrefixSpan algorithm, so as to obtain an association relationship for the preset target.
In a possible implementation manner, the association relationship obtaining module is specifically configured to:
identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset;
and identifying a strong association relation aiming at the preset target according to the first frequent item set.
In a possible implementation manner, the association relationship obtaining module is specifically configured to:
identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm;
identifying a second set of frequent items in the target dataset;
and identifying the strong association relation of the preset target according to the second frequent item set.
In a possible implementation manner, the association relationship obtaining module is specifically configured to: identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm;
and identifying the strong association relation of the preset target according to the identified support.
The embodiment of the invention also provides an electronic device, as shown in fig. 5, which comprises a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete communication with each other through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501 is configured to execute the program stored in the memory 503, and implement the following steps:
acquiring continuous attribute information in a pre-established transaction dataset containing a preset target;
discretizing the attribute information to obtain a discretized transaction data set;
and analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a PrefixSpan algorithm to obtain an association relation aiming at the preset target.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In still another embodiment of the present invention, a computer readable storage medium is provided, where a computer program is stored, where the computer program, when executed by a processor, implements the steps of any of the foregoing early warning target association relationship analysis methods.
In yet another embodiment of the present invention, a computer program product containing instructions that, when executed on a computer, cause the computer to perform any of the pre-warning target association analysis methods of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, storage media, and computer program product embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. The method for analyzing the association relation of the early warning targets is characterized by comprising the following steps:
acquiring continuous attribute information in a pre-established transaction dataset containing a preset target;
discretizing the attribute information to obtain a discretized transaction data set;
and analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a PrefixSpan algorithm to obtain an association relation aiming at the preset target.
2. The method of claim 1, wherein the analyzing the discretized transaction dataset by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, prefixSpan algorithm to obtain the association relationship for the preset target comprises:
identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset;
and identifying a strong association relation aiming at the preset target according to the first frequent item set.
3. The method of claim 1, wherein the analyzing the discretized transaction dataset by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, prefixSpan algorithm to obtain the association relationship for the preset target comprises:
identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm;
identifying a second set of frequent items in the target dataset;
and identifying the strong association relation of the preset target according to the second frequent item set.
4. The method of claim 1, wherein the analyzing the discretized transaction dataset by at least one of Apriori algorithm, FP-growth algorithm, GSP algorithm, prefixSpan algorithm to obtain the association relationship for the preset target comprises:
identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm;
and identifying the strong association relation of the preset target according to the identified support.
5. An early warning target association analysis device, characterized in that the device comprises:
the information acquisition module is used for acquiring continuous attribute information in a transaction data set which is created in advance and contains a preset target;
the discretization module is used for discretizing the attribute information to obtain a discretized transaction data set;
the association relation acquisition module is used for analyzing the discretized transaction data set through at least one of an Apriori algorithm, an FP-growth algorithm, a GSP algorithm and a Prefix span algorithm to obtain the association relation aiming at the preset target.
6. The apparatus of claim 5, wherein the association acquisition module is specifically configured to:
identifying, by the Apriori algorithm, a first set of frequent items in the discretized transaction dataset;
and identifying a strong association relation aiming at the preset target according to the first frequent item set.
7. The apparatus of claim 5, wherein the association acquisition module is specifically configured to:
identifying a target data set with the support degree of the discretized transaction data set being smaller than a preset threshold value through the FP-growth algorithm;
identifying a second set of frequent items in the target dataset;
and identifying the strong association relation of the preset target according to the second frequent item set.
8. The apparatus of claim 5, wherein the association acquisition module is specifically configured to: identifying the support degree of each data item in the discretized transaction data set through the PreFixSpan algorithm;
and identifying the strong association relation of the preset target according to the identified support.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.
CN202211568128.9A 2022-12-08 2022-12-08 Early warning target association relation analysis method, system, storage medium and terminal Pending CN116662934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211568128.9A CN116662934A (en) 2022-12-08 2022-12-08 Early warning target association relation analysis method, system, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211568128.9A CN116662934A (en) 2022-12-08 2022-12-08 Early warning target association relation analysis method, system, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN116662934A true CN116662934A (en) 2023-08-29

Family

ID=87717713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211568128.9A Pending CN116662934A (en) 2022-12-08 2022-12-08 Early warning target association relation analysis method, system, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN116662934A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117933762A (en) * 2024-03-22 2024-04-26 西安道法数器信息科技有限公司 Data acquisition method based on energy Internet marketing service system
CN118171891A (en) * 2024-05-11 2024-06-11 南方电网调峰调频发电有限公司 Work task scheduling method, device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117933762A (en) * 2024-03-22 2024-04-26 西安道法数器信息科技有限公司 Data acquisition method based on energy Internet marketing service system
CN118171891A (en) * 2024-05-11 2024-06-11 南方电网调峰调频发电有限公司 Work task scheduling method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111241241B (en) Case retrieval method, device, equipment and storage medium based on knowledge graph
CN116662934A (en) Early warning target association relation analysis method, system, storage medium and terminal
CN111860981B (en) Enterprise national industry category prediction method and system based on LSTM deep learning
CN108897842A (en) Computer readable storage medium and computer system
US20160364468A1 (en) Database index for constructing large scale data level of details
CA2882280A1 (en) System and method for matching data using probabilistic modeling techniques
CN110689368B (en) Method for designing advertisement click rate prediction system in mobile application
CN108959395B (en) Multi-source heterogeneous big data oriented hierarchical reduction combined cleaning method
CN105550171A (en) Error correction method and system for query information of vertical search engine
WO2014210387A2 (en) Concept extraction
CN111190968A (en) Data preprocessing and content recommendation method based on knowledge graph
US11327985B2 (en) System and method for subset searching and associated search operators
CN111444304A (en) Search ranking method and device
CN110008306A (en) A kind of data relationship analysis method, device and data service system
CN114168608A (en) Data processing system for updating knowledge graph
CN111475551A (en) High average utility sequence pattern mining method under non-overlapping condition
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN112395881A (en) Material label construction method and device, readable storage medium and electronic equipment
CN113268485B (en) Data table association analysis method, device, equipment and storage medium
Liu et al. Discovering representative attribute-stars via minimum description length
CN112685452B (en) Enterprise case retrieval method, device, equipment and storage medium
Kim et al. Efficient approach for mining high-utility patterns on incremental databases with dynamic profits
CN110134943B (en) Domain ontology generation method, device, equipment and medium
Chen et al. Fuzzy Frequent Pattern Mining Algorithm Based on Weighted Sliding Window and Type‐2 Fuzzy Sets over Medical Data Stream
CN114879945B (en) Diversified API sequence recommendation method and device for long tail distribution characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination