CN103020109A - Analytic method for relativity of civil aviation messages based on interview information digging - Google Patents
Analytic method for relativity of civil aviation messages based on interview information digging Download PDFInfo
- Publication number
- CN103020109A CN103020109A CN201210406334XA CN201210406334A CN103020109A CN 103020109 A CN103020109 A CN 103020109A CN 201210406334X A CN201210406334X A CN 201210406334XA CN 201210406334 A CN201210406334 A CN 201210406334A CN 103020109 A CN103020109 A CN 103020109A
- Authority
- CN
- China
- Prior art keywords
- message
- frequent
- collection
- messages
- civil aviaton
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of civil aviation information, aiming to analyze the relativity of messages and obtain direct or indirect relationship between messages. According to the technical scheme, the analytic method for relativity of civil aviation messages based on interview information digging comprises the following steps of: 1, obtaining the interview record of civil aviation messages; 2, analyzing the content of messages on the basis of the interview record of the step 1, and putting the messages with uniform information into a same set; 3, setting a minimum supporting degree, wherein the occurrence frequency of messages is required to be larger than the supporting degree; 4, generating a one-dimensional frequent message set on the basis of the message set of the step 2, and the minimum supporting degree of the step 3; 5, circularly processing the generated one-dimensional frequent message set until incapability of generating a frequent message set with a higher dimension; and 6, obtaining the frequent message sets with all dimensions on the step 4, and the step 5, analyzing the message set, and obtaining the relativity of messages. The analytic method is mainly applied in the processing of civil aviation information.
Description
Technical field
The present invention relates to the Civil Aviation IT field, relate to the analytical approach of civil aviaton's message correlativity.Specifically, relate to the civil aviaton's message correlation analysis that excavates based on visit information.
Background technology
Along with the increase of AIRLINE ﹠ AIRPORT flow and the development of Civil Aviation IT, the use of civil aviaton's message is more and more frequent, and importance is more and more significant.
Civil aviaton's message is of a great variety, totally is divided into aviation management fly dynamic AFTN message, the SITA of airline message and Civil Aviation Airport meteorological telegraphic messages etc.Correlativity is in various degree arranged between the various messages.After sending such as the Civil Aviation Airport meteorological telegraphic messages, Air Traffic Administers is according to whether suitable landing of meteorological telegraphic messages analysis, and the result will be reflected to landing in the newspaper of flight subsequently, and the correlativity of these two kinds of messages is just very high.Analyze the correlativity of message, obtain directly or indirectly to contact between the message, extremely important to the Civil Aviation IT field.
The technology that the message correlation analysis adopts visit information to excavate mainly is the priori method.Visit information digging utilization data mining technology extracts interested, useful pattern and implicit information from relevant resource and behavior, relate to a plurality of fields such as data mining, Computational Linguistics, information science, is a complex art.
Summary of the invention
The present invention is intended to overcome the deficiencies in the prior art, analyzes the correlativity of message, obtains directly or indirectly to contact between the message.For achieving the above object, the technical scheme that the present invention takes is that the civil aviaton's message correlation analysis based on visit information excavates comprises the steps:
1) obtains the Visitor Logs of civil aviaton's message;
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set;
3) minimum support is set, the frequency that message occurs needs greater than this support;
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension;
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again;
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
Described step 1) Visitor Logs is by database journal, and web log file etc. obtain.
Described step 2) consensus information in is the identical information that comprises flight number, airport, and the message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.
Described step 3) minimum support scope is 0.01-0.99, and actual value is determined by the user.
Described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.
Described step 5) enforcement should be satisfied following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support.
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
The first step of circulating treatment procedure be simple statistics all contain the frequency that the message of an element occurs, decide maximum one dimension message Item Sets, and step 4) institute works; Go on foot at k, divide two stages, at first by (k-1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
Described step 6) message correlativity is produced by frequent message collection, and relative coefficient is the frequency of frequent message collection.
Technical characterstic of the present invention and effect:
The present invention takes full advantage of existing research and the Realizing Achievement in the data mining technology, can analyze the Visitor Logs of message easily, calculates the correlativity of message.Application of the present invention does not rely on storage and the circulation way of message, and the user can select only packet storage and circulation way according to application demand, to obtain best result of use.
The data of processing are message datas of flight operation, process by classification of the present invention, have improved flight message analysis efficient and accuracy, realize the level of flight operational management.
Description of drawings
Fig. 1 is that system of the present invention forms structural drawing;
Embodiment
The step that technical scheme is taked is as follows:
1) obtains the Visitor Logs of civil aviaton's message.
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set.
3) minimum support is set, the frequency that message occurs needs greater than this support.
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension.
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again.
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
Described step 1) Visitor Logs can be by database journal, and web log file etc. obtain.
Described step 2) consensus information in comprises the identical information such as flight number, airport.The message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.Set of records ends is as shown in table 1.
Described step 3) minimum support scope is 0.01-0.99.Actual value is determined by the user.Support is larger, and correlation analysis is more accurate, but some messages with implicit associations may be missed.
Described step 4) the frequent message collection of one dimension is decided by frequency and the support that message occurs in.One dimension Frequent Set example is as shown in table 2.
Described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.The frequent message collection of 3 dimensions is as shown in table 3.
Described step 5) enforcement should be satisfied following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support.
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
Described step 6) message correlativity is produced by frequent message collection, and relative coefficient is the frequency of frequent message collection.
Now the present invention will be further described in conjunction with the accompanying drawings and embodiments.
Shown in table 1, table 2, table 3 and accompanying drawing, specific implementation process of the present invention and principle of work are as follows:
1) according to the concrete applied environment of civil aviaton's message, from the storage environments such as database journal or web log file, obtains Visitor Logs.
2) according to the message daily record, in same Visitor Logs, will there be the message of correlation information to be placed in the identity set.Same Visitor Logs can generate one or more above-mentioned message set.The message set is as shown in table 1.
3) according to actual conditions, minimum support is set.Minimum support is less, and the message correlation information that obtains is more, but efficient is lower.Vice versa.
4) add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension.As shown in table 2.
5) the frequent message collection of one dimension to producing, circular treatment is until fail to produce the higher frequent message collection of dimension again.Process flow diagram as shown in drawings.Circulating treatment procedure has adopted the thought of apriori algorithm.
The first step of Apriori algorithm be simple statistics all contain the frequency that the message of an element occurs, decide maximum one dimension message Item Sets, and step 4) institute works.Go on foot at k, divide two stages, at first by (k-1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
The circulating treatment procedure arthmetic statement is as follows:
(1) L
1={ the frequent message collection of one dimension };
(2)for(k=2;L
k-1≠Φ;k++){
(3)C
k=apriori_gen(L
k-1,min_sup);
(4)for?each?record?r∈R{
(5)C
r=subset(C
k,r);
(6)for?each?candidate?c∈C
r
(7)c.count++;
(8)}
(9)L
k={c∈C
k|c.count≥min_sup}
(10)}
(11)return?L=∪L
k;
Wherein, L
kFor k ties up frequent message collection, C
kBe the frequent message collection of candidate, min_sup is minimum support, and R is step 2) message accounting that generates, C
rThe candidate who comprises for recording r.(2) expression is tieed up frequent message collection from k-1 and is generated the frequent message collection of candidate C
k(4) expression scanning message accounting.(5) expression is found out at C from record r
kIn the candidate.(6), (7) if the candidate is found in expression in record, then corresponding candidate's frequency increase by 1. wherein c belong to C
r, be a kind of message.C.count represents the occurrence number of message.(9) if the frequency of the appearance of expression message Candidate Set greater than minimum support, then adds it in frequent set of K dimension message.(11) the message Frequent Set of all dimensions of generation is returned in expression, is correlativity greater than the set of the message of minimum support.
The apriori_gen function declaration is as follows:
The Apriori candidate produces the parameter L of function apriori_gen
K-1, i.e. the set of all large-scale (k-1) Item Sets.It returns a superset (Superset) of the set of all large-scale k Item Sets.At first, in Jion (connection) step, L
K-1And L
K-1Be connected to obtain a superset C of candidate's final set
k:
Then, in Prune (pruning) step, we will delete all Item Sets c ∈ C
kIf some k-1 subsets of c are not at L
K-1In, for being described, this production process why can keep completeness, note for L
kIn any Item Sets that minimum support is arranged, any size is that the subset of k-1 also must have minimum support.Therefore, if then we delete all k-1 subsets not at L with each Item Sets among all possible project expansion Lk-1
K-1In Item Sets, we just can obtain L so
kA superset of middle Item Sets.
Through union operation, C
k>L
kSimilar reason is deleted C in the deletion computing
kIn its k-1 Sub itemset not at L
K-1In Item Sets, same not deletion is included in L
kIn Item Sets.
(1) for all items collection c ∈ C
kDo
(2) (k-1) subset s do of all c of for
(3)if(s¢L
k-1)then
(4) from C
kMiddle deletion c
The subset function declaration is as follows:
Candidate's Item Sets C
kBe stored in the Hash tree.The node of Hash tree has comprised a chained list (leaf node) of a collection or has comprised a Hash table (interior nodes).In interior nodes, each Bucket of Hash table points to another node.The degree of depth of the root of Hash tree is defined as 1.Point to the node of depth d+1 in an interior nodes of depth d.Item Sets is stored in the leaf.When loading an Item Sets c, begin downwards until a leaf from root.Be on the interior nodes of d in the degree of depth, determine to choose which branch, can use a Hash function to d project of this Item Sets, then follow the pointer among the corresponding Bucket.All nodes all create leaf node at first.When a leaf node middle term collection quantity surpassed the threshold value of certain appointment, this leaf node just transferred an interior nodes to.
From root node, the Subset function is sought all and is included in certain record candidate among r, and method is as follows: if be in a leaf, which Item Sets of just seeking in this leaf is included among the r, and to they additional sensing answer set of quoting.If be in an interior nodes, thereby and arrive this node by Hash project i, so just each project after the i among the r is carried out Hash, and the node among the corresponding Bucket is recursively used this process.For root node, just each project among the r is carried out Hash.
Visitor Logs | The message collection |
1 | PLN,COR,FPL,CHG,DEP |
2 | FPL,CHG,DEP |
3 | PLN,COR,ABS,FPL |
4 | PLN,COR,ABS,CHG,DEP |
5 | PLN,COR,ABS,FPL,CHG,DEP |
6 | PLN,DEP |
7 | PLN,COR |
8 | PLN,COR,ABS,CHG,DEP |
Table 1
Message | The frequency of occurrences |
PLN | 0.40 |
COR | 0.10 |
ABS | 0.30 |
FPL | 0.20 |
CHG | 0.01 |
DEP | 0.20 |
Table 2
The frequent message collection of 3 dimensions | The frequency of occurrences |
PLN,CHG,DEP | 0.10 |
FPL,CHG,DEP | 0.09 |
ABS,FPL,CHG | 0.08 |
PLN,COR,CHG | 0.20 |
PLN,FPL,DEP | 0.30 |
COR,ABS,FPL | 0.20 |
ABS,FPL,DEP | 0.05 |
Table 3
PLN: flight forecast message
COR: revision flight forecast message
ABS: cancellation repeats and the non-forecast message that flies that repeats
CHG: the revision plan of navigating is reported
DEP: message takes off.
Claims (8)
1. the civil aviaton's message correlation analysis that excavates based on visit information is characterized in that, comprises the steps:
1) obtains the Visitor Logs of civil aviaton's message;
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set;
3) minimum support is set, the frequency that message occurs needs greater than this support;
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension;
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again;
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
2. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 1) Visitor Logs by database journal, web log files etc. obtain.
3. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, described step 2) consensus information in is the identical information that comprises flight number, airport, the message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.
4. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 3) the minimum support scope be 0.01-0.99, actual value is determined by the user.
5. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.
6. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 5) enforcement should satisfy following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support;
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
7. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, the first step of circulating treatment procedure be simple statistics all contain the frequency that the message of an element occurs, deciding maximum one dimension message Item Sets, and step 4) institute works; Go on foot at k, divide two stages, at first by (k 1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
8. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 6) the message correlativity produced by frequent message collection, relative coefficient is the frequency of frequent message collection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210406334XA CN103020109A (en) | 2012-10-22 | 2012-10-22 | Analytic method for relativity of civil aviation messages based on interview information digging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210406334XA CN103020109A (en) | 2012-10-22 | 2012-10-22 | Analytic method for relativity of civil aviation messages based on interview information digging |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103020109A true CN103020109A (en) | 2013-04-03 |
Family
ID=47968713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210406334XA Pending CN103020109A (en) | 2012-10-22 | 2012-10-22 | Analytic method for relativity of civil aviation messages based on interview information digging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103020109A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514267A (en) * | 2013-09-04 | 2014-01-15 | 快传(上海)广告有限公司 | Gateway correlation information obtaining method and system |
CN113806204A (en) * | 2020-06-11 | 2021-12-17 | 北京威努特技术有限公司 | Method, device, system and storage medium for evaluating message field correlation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102136949A (en) * | 2011-03-24 | 2011-07-27 | 国网电力科学研究院 | Method and system for analyzing alarm correlation based on network and time |
CN102185742A (en) * | 2011-06-16 | 2011-09-14 | 北京亿赞普网络技术有限公司 | Communication-network-message-based Internet advertising effect monitoring method and system |
-
2012
- 2012-10-22 CN CN201210406334XA patent/CN103020109A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102136949A (en) * | 2011-03-24 | 2011-07-27 | 国网电力科学研究院 | Method and system for analyzing alarm correlation based on network and time |
CN102185742A (en) * | 2011-06-16 | 2011-09-14 | 北京亿赞普网络技术有限公司 | Communication-network-message-based Internet advertising effect monitoring method and system |
Non-Patent Citations (1)
Title |
---|
郭涛等: "基于关联规则数据挖掘Apriori算法的研究与应用", 《计算机技术与发展》, vol. 21, no. 6, 30 June 2011 (2011-06-30) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514267A (en) * | 2013-09-04 | 2014-01-15 | 快传(上海)广告有限公司 | Gateway correlation information obtaining method and system |
CN113806204A (en) * | 2020-06-11 | 2021-12-17 | 北京威努特技术有限公司 | Method, device, system and storage medium for evaluating message field correlation |
CN113806204B (en) * | 2020-06-11 | 2023-07-25 | 北京威努特技术有限公司 | Method, device, system and storage medium for evaluating message segment correlation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Leung et al. | A data science solution for mining interesting patterns from uncertain big data | |
Wang et al. | Review on community detection algorithms in social networks | |
CN103793489B (en) | Method for discovering topics of communities in on-line social network | |
CN103914493A (en) | Method and system for discovering and analyzing microblog user group structure | |
CN103678671A (en) | Dynamic community detection method in social network | |
CN104699851A (en) | Service tag extension method in big data environment | |
Chao et al. | Efficient trajectory contact query processing | |
CN104298669A (en) | Person geographic information mining model based on social network | |
CN102799616A (en) | Outlier point detection method in large-scale social network | |
CN104317794A (en) | Chinese feature word association pattern mining method based on dynamic project weight and system thereof | |
Khodaei et al. | Temporal-textual retrieval: Time and keyword search in web documents | |
Orakzai et al. | Distributed convoy pattern mining | |
Wang et al. | Group pattern mining on moving objects’ uncertain trajectories | |
Hao et al. | Research on parallel association rule mining of big data based on an improved K-means clustering algorithm | |
CN108173876B (en) | Dynamic rule base construction method based on maximum frequent pattern | |
CN103020109A (en) | Analytic method for relativity of civil aviation messages based on interview information digging | |
Yu et al. | BIDE-based parallel mining of frequent closed sequences with MapReduce | |
CN103927373A (en) | Method for building dynamic big data model efficiently based on incremental association rule technology | |
Wang et al. | A new method for discovering behavior patterns among animal movements | |
Fu et al. | ICA: an incremental clustering algorithm based on OPTICS | |
Hu et al. | An incremental rare association rule mining approach with a life cycle tree structure considering time-sensitive data | |
Colosi et al. | Time series data management optimized for smart city policy decision | |
CN104572648B (en) | A kind of storage statistical system and method based on high-performance calculation | |
Pola et al. | Similarity sets: A new concept of sets to seamlessly handle similarity in database management systems | |
Dong et al. | An innovative model to mine asynchronous periodic pattern of moving objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130403 |