CN103020109A - Analytic method for relativity of civil aviation messages based on interview information digging - Google Patents

Analytic method for relativity of civil aviation messages based on interview information digging Download PDF

Info

Publication number
CN103020109A
CN103020109A CN201210406334XA CN201210406334A CN103020109A CN 103020109 A CN103020109 A CN 103020109A CN 201210406334X A CN201210406334X A CN 201210406334XA CN 201210406334 A CN201210406334 A CN 201210406334A CN 103020109 A CN103020109 A CN 103020109A
Authority
CN
China
Prior art keywords
message
frequent
collection
messages
civil aviaton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210406334XA
Other languages
Chinese (zh)
Inventor
宋雪雁
黄兆桐
孙济洲
李志增
于翠玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201210406334XA priority Critical patent/CN103020109A/en
Publication of CN103020109A publication Critical patent/CN103020109A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of civil aviation information, aiming to analyze the relativity of messages and obtain direct or indirect relationship between messages. According to the technical scheme, the analytic method for relativity of civil aviation messages based on interview information digging comprises the following steps of: 1, obtaining the interview record of civil aviation messages; 2, analyzing the content of messages on the basis of the interview record of the step 1, and putting the messages with uniform information into a same set; 3, setting a minimum supporting degree, wherein the occurrence frequency of messages is required to be larger than the supporting degree; 4, generating a one-dimensional frequent message set on the basis of the message set of the step 2, and the minimum supporting degree of the step 3; 5, circularly processing the generated one-dimensional frequent message set until incapability of generating a frequent message set with a higher dimension; and 6, obtaining the frequent message sets with all dimensions on the step 4, and the step 5, analyzing the message set, and obtaining the relativity of messages. The analytic method is mainly applied in the processing of civil aviation information.

Description

Civil aviaton's message correlation analysis based on the visit information excavation
Technical field
The present invention relates to the Civil Aviation IT field, relate to the analytical approach of civil aviaton's message correlativity.Specifically, relate to the civil aviaton's message correlation analysis that excavates based on visit information.
Background technology
Along with the increase of AIRLINE ﹠ AIRPORT flow and the development of Civil Aviation IT, the use of civil aviaton's message is more and more frequent, and importance is more and more significant.
Civil aviaton's message is of a great variety, totally is divided into aviation management fly dynamic AFTN message, the SITA of airline message and Civil Aviation Airport meteorological telegraphic messages etc.Correlativity is in various degree arranged between the various messages.After sending such as the Civil Aviation Airport meteorological telegraphic messages, Air Traffic Administers is according to whether suitable landing of meteorological telegraphic messages analysis, and the result will be reflected to landing in the newspaper of flight subsequently, and the correlativity of these two kinds of messages is just very high.Analyze the correlativity of message, obtain directly or indirectly to contact between the message, extremely important to the Civil Aviation IT field.
The technology that the message correlation analysis adopts visit information to excavate mainly is the priori method.Visit information digging utilization data mining technology extracts interested, useful pattern and implicit information from relevant resource and behavior, relate to a plurality of fields such as data mining, Computational Linguistics, information science, is a complex art.
Summary of the invention
The present invention is intended to overcome the deficiencies in the prior art, analyzes the correlativity of message, obtains directly or indirectly to contact between the message.For achieving the above object, the technical scheme that the present invention takes is that the civil aviaton's message correlation analysis based on visit information excavates comprises the steps:
1) obtains the Visitor Logs of civil aviaton's message;
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set;
3) minimum support is set, the frequency that message occurs needs greater than this support;
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension;
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again;
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
Described step 1) Visitor Logs is by database journal, and web log file etc. obtain.
Described step 2) consensus information in is the identical information that comprises flight number, airport, and the message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.
Described step 3) minimum support scope is 0.01-0.99, and actual value is determined by the user.
Described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.
Described step 5) enforcement should be satisfied following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support.
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
The first step of circulating treatment procedure be simple statistics all contain the frequency that the message of an element occurs, decide maximum one dimension message Item Sets, and step 4) institute works; Go on foot at k, divide two stages, at first by (k-1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
Described step 6) message correlativity is produced by frequent message collection, and relative coefficient is the frequency of frequent message collection.
Technical characterstic of the present invention and effect:
The present invention takes full advantage of existing research and the Realizing Achievement in the data mining technology, can analyze the Visitor Logs of message easily, calculates the correlativity of message.Application of the present invention does not rely on storage and the circulation way of message, and the user can select only packet storage and circulation way according to application demand, to obtain best result of use.
The data of processing are message datas of flight operation, process by classification of the present invention, have improved flight message analysis efficient and accuracy, realize the level of flight operational management.
Description of drawings
Fig. 1 is that system of the present invention forms structural drawing;
Embodiment
The step that technical scheme is taked is as follows:
1) obtains the Visitor Logs of civil aviaton's message.
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set.
3) minimum support is set, the frequency that message occurs needs greater than this support.
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension.
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again.
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
Described step 1) Visitor Logs can be by database journal, and web log file etc. obtain.
Described step 2) consensus information in comprises the identical information such as flight number, airport.The message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.Set of records ends is as shown in table 1.
Described step 3) minimum support scope is 0.01-0.99.Actual value is determined by the user.Support is larger, and correlation analysis is more accurate, but some messages with implicit associations may be missed.
Described step 4) the frequent message collection of one dimension is decided by frequency and the support that message occurs in.One dimension Frequent Set example is as shown in table 2.
Described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.The frequent message collection of 3 dimensions is as shown in table 3.
Described step 5) enforcement should be satisfied following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support.
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
Described step 6) message correlativity is produced by frequent message collection, and relative coefficient is the frequency of frequent message collection.
Now the present invention will be further described in conjunction with the accompanying drawings and embodiments.
Shown in table 1, table 2, table 3 and accompanying drawing, specific implementation process of the present invention and principle of work are as follows:
1) according to the concrete applied environment of civil aviaton's message, from the storage environments such as database journal or web log file, obtains Visitor Logs.
2) according to the message daily record, in same Visitor Logs, will there be the message of correlation information to be placed in the identity set.Same Visitor Logs can generate one or more above-mentioned message set.The message set is as shown in table 1.
3) according to actual conditions, minimum support is set.Minimum support is less, and the message correlation information that obtains is more, but efficient is lower.Vice versa.
4) add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension.As shown in table 2.
5) the frequent message collection of one dimension to producing, circular treatment is until fail to produce the higher frequent message collection of dimension again.Process flow diagram as shown in drawings.Circulating treatment procedure has adopted the thought of apriori algorithm.
The first step of Apriori algorithm be simple statistics all contain the frequency that the message of an element occurs, decide maximum one dimension message Item Sets, and step 4) institute works.Go on foot at k, divide two stages, at first by (k-1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
The circulating treatment procedure arthmetic statement is as follows:
(1) L 1={ the frequent message collection of one dimension };
(2)for(k=2;L k-1≠Φ;k++){
(3)C k=apriori_gen(L k-1,min_sup);
(4)for?each?record?r∈R{
(5)C r=subset(C k,r);
(6)for?each?candidate?c∈C r
(7)c.count++;
(8)}
(9)L k={c∈C k|c.count≥min_sup}
(10)}
(11)return?L=∪L k
Wherein, L kFor k ties up frequent message collection, C kBe the frequent message collection of candidate, min_sup is minimum support, and R is step 2) message accounting that generates, C rThe candidate who comprises for recording r.(2) expression is tieed up frequent message collection from k-1 and is generated the frequent message collection of candidate C k(4) expression scanning message accounting.(5) expression is found out at C from record r kIn the candidate.(6), (7) if the candidate is found in expression in record, then corresponding candidate's frequency increase by 1. wherein c belong to C r, be a kind of message.C.count represents the occurrence number of message.(9) if the frequency of the appearance of expression message Candidate Set greater than minimum support, then adds it in frequent set of K dimension message.(11) the message Frequent Set of all dimensions of generation is returned in expression, is correlativity greater than the set of the message of minimum support.
The apriori_gen function declaration is as follows:
The Apriori candidate produces the parameter L of function apriori_gen K-1, i.e. the set of all large-scale (k-1) Item Sets.It returns a superset (Superset) of the set of all large-scale k Item Sets.At first, in Jion (connection) step, L K-1And L K-1Be connected to obtain a superset C of candidate's final set k:
Then, in Prune (pruning) step, we will delete all Item Sets c ∈ C kIf some k-1 subsets of c are not at L K-1In, for being described, this production process why can keep completeness, note for L kIn any Item Sets that minimum support is arranged, any size is that the subset of k-1 also must have minimum support.Therefore, if then we delete all k-1 subsets not at L with each Item Sets among all possible project expansion Lk-1 K-1In Item Sets, we just can obtain L so kA superset of middle Item Sets.
Through union operation, C k>L kSimilar reason is deleted C in the deletion computing kIn its k-1 Sub itemset not at L K-1In Item Sets, same not deletion is included in L kIn Item Sets.
(1) for all items collection c ∈ C kDo
(2) (k-1) subset s do of all c of for
(3)if(s¢L k-1)then
(4) from C kMiddle deletion c
The subset function declaration is as follows:
Candidate's Item Sets C kBe stored in the Hash tree.The node of Hash tree has comprised a chained list (leaf node) of a collection or has comprised a Hash table (interior nodes).In interior nodes, each Bucket of Hash table points to another node.The degree of depth of the root of Hash tree is defined as 1.Point to the node of depth d+1 in an interior nodes of depth d.Item Sets is stored in the leaf.When loading an Item Sets c, begin downwards until a leaf from root.Be on the interior nodes of d in the degree of depth, determine to choose which branch, can use a Hash function to d project of this Item Sets, then follow the pointer among the corresponding Bucket.All nodes all create leaf node at first.When a leaf node middle term collection quantity surpassed the threshold value of certain appointment, this leaf node just transferred an interior nodes to.
From root node, the Subset function is sought all and is included in certain record candidate among r, and method is as follows: if be in a leaf, which Item Sets of just seeking in this leaf is included among the r, and to they additional sensing answer set of quoting.If be in an interior nodes, thereby and arrive this node by Hash project i, so just each project after the i among the r is carried out Hash, and the node among the corresponding Bucket is recursively used this process.For root node, just each project among the r is carried out Hash.
Visitor Logs The message collection
1 PLN,COR,FPL,CHG,DEP
2 FPL,CHG,DEP
3 PLN,COR,ABS,FPL
4 PLN,COR,ABS,CHG,DEP
5 PLN,COR,ABS,FPL,CHG,DEP
6 PLN,DEP
7 PLN,COR
8 PLN,COR,ABS,CHG,DEP
Table 1
Message The frequency of occurrences
PLN 0.40
COR 0.10
ABS 0.30
FPL 0.20
CHG 0.01
DEP 0.20
Table 2
The frequent message collection of 3 dimensions The frequency of occurrences
PLN,CHG,DEP 0.10
FPL,CHG,DEP 0.09
ABS,FPL,CHG 0.08
PLN,COR,CHG 0.20
PLN,FPL,DEP 0.30
COR,ABS,FPL 0.20
ABS,FPL,DEP 0.05
Table 3
PLN: flight forecast message
COR: revision flight forecast message
ABS: cancellation repeats and the non-forecast message that flies that repeats
CHG: the revision plan of navigating is reported
DEP: message takes off.

Claims (8)

1. the civil aviaton's message correlation analysis that excavates based on visit information is characterized in that, comprises the steps:
1) obtains the Visitor Logs of civil aviaton's message;
2) based on step 1) Visitor Logs, analyze message content, the message that will have consensus information is put to identity set;
3) minimum support is set, the frequency that message occurs needs greater than this support;
4) based on step 2) message collection and step 3) minimum support, add up the frequency that all messages occur, find out the message more than or equal to minimum support, produce the frequent message collection of one dimension;
5) based on step 3) minimum support, based on step 4) the frequent message collection of one dimension, to the frequent message collection of one dimension that produces, circular treatment is until fail to produce the higher frequent message collection of dimension again;
6) based on step 4), step 5), obtain the frequent message collection of all dimensions of message, analyze these message collection, get final product to get the correlativity of outgoing packet.
2. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 1) Visitor Logs by database journal, web log files etc. obtain.
3. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, described step 2) consensus information in is the identical information that comprises flight number, airport, the message that will have the consistance content is placed in the identity set, to guarantee the validity of analysis result.
4. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 3) the minimum support scope be 0.01-0.99, actual value is determined by the user.
5. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, described step 5) the frequent message collected explanations or commentaries of multidimensional is interpreted as: if k the message that is associated arranged in the message collection, then for K ties up frequent message collection, its frequency is the number of times that occurs simultaneously in record.
6. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 5) enforcement should satisfy following condition:
1) obtaining frequency that the message Frequent Set occurs must be greater than minimum support;
When 2) tieing up frequent message collection and produce K+1 and tie up frequent message collection by K, must consider that k ties up all combinations that frequent message is concentrated.
7. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1, it is characterized in that, the first step of circulating treatment procedure be simple statistics all contain the frequency that the message of an element occurs, deciding maximum one dimension message Item Sets, and step 4) institute works; Go on foot at k, divide two stages, at first by (k 1) if the maximum message segment collection that generates in the step generates candidate message Item Sets. then search database is calculated the support of candidate Item Sets. the support of candidate's Item Sets is greater than step 3) minimum support that arranges, then this message Item Sets is put into k and ties up frequent message collection.
8. a kind of civil aviaton's message correlation analysis that excavates based on visit information as claimed in claim 1 is characterized in that described step 6) the message correlativity produced by frequent message collection, relative coefficient is the frequency of frequent message collection.
CN201210406334XA 2012-10-22 2012-10-22 Analytic method for relativity of civil aviation messages based on interview information digging Pending CN103020109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210406334XA CN103020109A (en) 2012-10-22 2012-10-22 Analytic method for relativity of civil aviation messages based on interview information digging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210406334XA CN103020109A (en) 2012-10-22 2012-10-22 Analytic method for relativity of civil aviation messages based on interview information digging

Publications (1)

Publication Number Publication Date
CN103020109A true CN103020109A (en) 2013-04-03

Family

ID=47968713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210406334XA Pending CN103020109A (en) 2012-10-22 2012-10-22 Analytic method for relativity of civil aviation messages based on interview information digging

Country Status (1)

Country Link
CN (1) CN103020109A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514267A (en) * 2013-09-04 2014-01-15 快传(上海)广告有限公司 Gateway correlation information obtaining method and system
CN113806204A (en) * 2020-06-11 2021-12-17 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message field correlation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136949A (en) * 2011-03-24 2011-07-27 国网电力科学研究院 Method and system for analyzing alarm correlation based on network and time
CN102185742A (en) * 2011-06-16 2011-09-14 北京亿赞普网络技术有限公司 Communication-network-message-based Internet advertising effect monitoring method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136949A (en) * 2011-03-24 2011-07-27 国网电力科学研究院 Method and system for analyzing alarm correlation based on network and time
CN102185742A (en) * 2011-06-16 2011-09-14 北京亿赞普网络技术有限公司 Communication-network-message-based Internet advertising effect monitoring method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭涛等: "基于关联规则数据挖掘Apriori算法的研究与应用", 《计算机技术与发展》, vol. 21, no. 6, 30 June 2011 (2011-06-30) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514267A (en) * 2013-09-04 2014-01-15 快传(上海)广告有限公司 Gateway correlation information obtaining method and system
CN113806204A (en) * 2020-06-11 2021-12-17 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message field correlation
CN113806204B (en) * 2020-06-11 2023-07-25 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message segment correlation

Similar Documents

Publication Publication Date Title
Leung et al. A data science solution for mining interesting patterns from uncertain big data
Wang et al. Review on community detection algorithms in social networks
CN103793489B (en) Method for discovering topics of communities in on-line social network
CN103914493A (en) Method and system for discovering and analyzing microblog user group structure
CN103678671A (en) Dynamic community detection method in social network
CN104699851A (en) Service tag extension method in big data environment
Chao et al. Efficient trajectory contact query processing
CN104298669A (en) Person geographic information mining model based on social network
CN102799616A (en) Outlier point detection method in large-scale social network
CN104317794A (en) Chinese feature word association pattern mining method based on dynamic project weight and system thereof
Khodaei et al. Temporal-textual retrieval: Time and keyword search in web documents
Orakzai et al. Distributed convoy pattern mining
Wang et al. Group pattern mining on moving objects’ uncertain trajectories
Hao et al. Research on parallel association rule mining of big data based on an improved K-means clustering algorithm
CN108173876B (en) Dynamic rule base construction method based on maximum frequent pattern
CN103020109A (en) Analytic method for relativity of civil aviation messages based on interview information digging
Yu et al. BIDE-based parallel mining of frequent closed sequences with MapReduce
CN103927373A (en) Method for building dynamic big data model efficiently based on incremental association rule technology
Wang et al. A new method for discovering behavior patterns among animal movements
Fu et al. ICA: an incremental clustering algorithm based on OPTICS
Hu et al. An incremental rare association rule mining approach with a life cycle tree structure considering time-sensitive data
Colosi et al. Time series data management optimized for smart city policy decision
CN104572648B (en) A kind of storage statistical system and method based on high-performance calculation
Pola et al. Similarity sets: A new concept of sets to seamlessly handle similarity in database management systems
Dong et al. An innovative model to mine asynchronous periodic pattern of moving objects

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130403