CN105447137A

CN105447137A - Algorithm for retrieving same master-slave relation data from big data based on relational database

Info

Publication number: CN105447137A
Application number: CN201510810811.2A
Authority: CN
Inventors: 马亚飞; 刘天智
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2015-11-23
Filing date: 2015-11-23
Publication date: 2016-03-30

Abstract

The invention provides a relation database-based algorithm for retrieving the same master-slave relation data from big data, which is an algorithm for comparing data in mass data, and adopts algorithms of 'big and small, first surface and then point', and gradually reduces the data comparison range by utilizing algorithms of packet traversal, middle table storage and the like, thereby efficiently retrieving the same record. The method for rapidly retrieving the same record aiming at massive master-slave structure data in the enterprise data is suitable for various situations in enterprise management and control requiring retrieval of the same master-slave structure data, enhances the management and control capability of the enterprise, creates a better market environment for the enterprise and improves the competitiveness of the enterprise.

Description

A kind of algorithm retrieving identical master slave relation data based on relational database from large data

Technical field

The present invention relates to based on relational database, be specifically related to a kind of algorithm retrieving identical master slave relation data based on relational database from large data.

Background technology

Enter large data age, with data-driven development, thus raising business decision ability and public service quality become enterprise's trend.For in the analysis of mass data, data type comprises structural data, unstructured data, semi-structured data, and wherein structural data includes again simple structure data and complex types of data.For simple structured data, such as character type, digital data directly can carry out statistical study by database SQL, such as, GROUPBY statement can be utilized to carry out Querying by group, thus find out identical data; Also can compare the circulation of data in employing program, thus find out data completely.When mass data, namely the Data Comparison of this simple types can significantly improve calculated performance by optimization data storehouse, optimized algorithm.But for the analyses and comparison of master slave relation data, then lack efficient search method easily.

Summary of the invention

Technical assignment of the present invention is for the deficiencies in the prior art, provides a kind of algorithm retrieving identical master slave relation data based on relational database from large data.For magnanimity host-guest architecture data in business data, provide a kind of quick-searching to go out the method for identical recordings, thus provide data supporting for the management and control analysis of enterprise.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of algorithm retrieving identical master slave relation data based on relational database from large data, it is a kind of algorithm carrying out comparing in mass data, adopt " changing little greatly; point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.

By extract master-salve table grouping according to-determine order of packets-performs grouping, in execution grouping process, progressively reduce the algorithm of data area in conjunction with ergodic algorithm and middle table storage.

Of the present inventionly a kind ofly from large data, retrieve the algorithm of identical master slave relation data compared with prior art based on relational database, the beneficial effect produced is: the present invention is directed to magnanimity host-guest architecture data in business data, and the method that quick-searching goes out identical recordings is applicable to the various situations that needing in enterprise managing retrieves identical host-guest architecture data.The retrieval of same order data, can be applicable to enterprise's falsifying management.Enterprise's falsifying can upset the market order of enterprise product, causes market to engage in internal strife, price is chaotic, have a strong impact on manufacturer's reputation.For the management and control analysis of enterprise's falsifying, will by embodying the analysis of order, wherein analysis mode is exactly find out an identical order from magnanimity order, then whether has that artificial malice brushes list, false order, internal staff gang up down the situation that goods etc. causes falsifying by finding out the judgement of same order.Finally, strengthen the management and control ability of enterprise, for better market environment is built by enterprise, improve enterprise competitiveness.

Accompanying drawing explanation

Fig. 1 is this algorithm steps figure.

Fig. 2 is the entity relationship diagram of master slave relation data instance, order data.

Fig. 3 is the algorithm steps figure retrieving same order in example.

Embodiment

Below a kind of algorithm retrieving identical master slave relation data based on relational database from large data of the present invention is described in detail below.

From large data, retrieve an algorithm for identical master slave relation data based on relational database, adopt " changing little greatly, point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.

1) concrete steps are as Fig. 1:

Conveniently set forth, with the common master slave relation data-order data of enterprise exemplarily, suppose that master meter tables of data is called: CO_MAIN, be called from table tables of data: CO_SUB.E-R graph of a relation is as Fig. 2:

Object: find out same order from magnanimity order data, that is: the identical order of the quantity of order commodity and commodity.

Algorithm steps is as Fig. 3

1: acknowledgment packet index is:

Master meter index: total amount of the orders, order total amount.

From table index: order type of merchandize quantity, order commodity amount, order goods amount.

Finally to divide into groups foundation: 1) total amount of the orders+order total amount

2) total amount of the orders+order total amount+order type of merchandize quantity

3) order commodity amount+order goods amount

2: acknowledgment packet execution sequence:

1) total amount of the orders+order total amount

3) order commodity amount+order goods amount

3: perform grouping comparison step by step according to order of packets

A: total amount of the orders+order total amount grouping; Total amount of the orders+order total amount+order type of merchandize number of packets

Utilize two-layer nested GROUPBY grouping to find, the order that on order total charge, order total amount, order, the quantity of type of merchandize is identical, is stored in maysamelist.

Wherein CO_COUNT represents the quantity of order in grouping, and CO_COUNT_NUM1 represents the order in grouping.

B: order commodity amount+order goods amount grouping

Circulation maycolist, judges each subgroup submaycolist, judges that the public method whether two orders are identical judges whether there is same order in this grouping, by same order stored in SAME_CO_MAIN, SAME_CO_SUB by calling.Specific algorithm:

For (maycolist, intercept subgroup (i.e. an order grouping that may be identical) according to CO_COUNT_NUM1, run into 1 stopping)

{

1: obtain depositing co_id=CO_ID, goodcount=GOOD_COUNT in submaycolist:list)

2: import submaycolist into, goodcount call the method judging that whether an order grouping is identical, and the inside recursive call judges the method whether two orders are identical

3:for(submaycolist){

3.1 call the method judging that whether two orders are identical

twocossame(coid1,coid2,goodcount)

If 3.2 return results as T, judge two orders whether in SAME_CO_SUB

1) all it's not true then stored in SAME_CO_MAIN, SAME_CO_SUB for coid1, coid2;

2) one is had, by another stored in SAME_CO_SUB

3) have, inoperation

}

C: judge a method whether two orders are identical, twocossame (coid1, coid2, goodcount)

The retrieval of same order data in this example, can be applicable to enterprise's falsifying management.Enterprise's falsifying can upset the market order of enterprise product, causes market to engage in internal strife, price is chaotic, have a strong impact on manufacturer's reputation.For the management and control analysis of enterprise's falsifying, will by embodying the analysis of order, wherein analysis mode is exactly find out an identical order from magnanimity order, then whether has that artificial malice brushes list, false order, internal staff gang up down the situation that goods etc. causes falsifying by finding out the judgement of same order.Finally, strengthen the management and control ability of enterprise, for better market environment is built by enterprise, improve enterprise competitiveness.

Claims

1. from large data, retrieve the algorithm of identical master slave relation data based on relational database for one kind, it is a kind of algorithm carrying out comparing in mass data, it is characterized in that adopting " changing little greatly; point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.

2. a kind of algorithm retrieving identical master slave relation data based on relational database from large data according to claim 1, it is characterized in that, by extract master-salve table grouping according to-determine order of packets-performs grouping, in execution grouping process, progressively reduce the algorithm of data area in conjunction with ergodic algorithm and middle table storage.