CN105447137A - Algorithm for retrieving same master-slave relation data from big data based on relational database - Google Patents

Algorithm for retrieving same master-slave relation data from big data based on relational database Download PDF

Info

Publication number
CN105447137A
CN105447137A CN201510810811.2A CN201510810811A CN105447137A CN 105447137 A CN105447137 A CN 105447137A CN 201510810811 A CN201510810811 A CN 201510810811A CN 105447137 A CN105447137 A CN 105447137A
Authority
CN
China
Prior art keywords
data
algorithm
enterprise
order
retrieving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510810811.2A
Other languages
Chinese (zh)
Inventor
马亚飞
刘天智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201510810811.2A priority Critical patent/CN105447137A/en
Publication of CN105447137A publication Critical patent/CN105447137A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a relation database-based algorithm for retrieving the same master-slave relation data from big data, which is an algorithm for comparing data in mass data, and adopts algorithms of 'big and small, first surface and then point', and gradually reduces the data comparison range by utilizing algorithms of packet traversal, middle table storage and the like, thereby efficiently retrieving the same record. The method for rapidly retrieving the same record aiming at massive master-slave structure data in the enterprise data is suitable for various situations in enterprise management and control requiring retrieval of the same master-slave structure data, enhances the management and control capability of the enterprise, creates a better market environment for the enterprise and improves the competitiveness of the enterprise.

Description

A kind of algorithm retrieving identical master slave relation data based on relational database from large data
Technical field
The present invention relates to based on relational database, be specifically related to a kind of algorithm retrieving identical master slave relation data based on relational database from large data.
Background technology
Enter large data age, with data-driven development, thus raising business decision ability and public service quality become enterprise's trend.For in the analysis of mass data, data type comprises structural data, unstructured data, semi-structured data, and wherein structural data includes again simple structure data and complex types of data.For simple structured data, such as character type, digital data directly can carry out statistical study by database SQL, such as, GROUPBY statement can be utilized to carry out Querying by group, thus find out identical data; Also can compare the circulation of data in employing program, thus find out data completely.When mass data, namely the Data Comparison of this simple types can significantly improve calculated performance by optimization data storehouse, optimized algorithm.But for the analyses and comparison of master slave relation data, then lack efficient search method easily.
Summary of the invention
Technical assignment of the present invention is for the deficiencies in the prior art, provides a kind of algorithm retrieving identical master slave relation data based on relational database from large data.For magnanimity host-guest architecture data in business data, provide a kind of quick-searching to go out the method for identical recordings, thus provide data supporting for the management and control analysis of enterprise.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of algorithm retrieving identical master slave relation data based on relational database from large data, it is a kind of algorithm carrying out comparing in mass data, adopt " changing little greatly; point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.
By extract master-salve table grouping according to-determine order of packets-performs grouping, in execution grouping process, progressively reduce the algorithm of data area in conjunction with ergodic algorithm and middle table storage.
Of the present inventionly a kind ofly from large data, retrieve the algorithm of identical master slave relation data compared with prior art based on relational database, the beneficial effect produced is: the present invention is directed to magnanimity host-guest architecture data in business data, and the method that quick-searching goes out identical recordings is applicable to the various situations that needing in enterprise managing retrieves identical host-guest architecture data.The retrieval of same order data, can be applicable to enterprise's falsifying management.Enterprise's falsifying can upset the market order of enterprise product, causes market to engage in internal strife, price is chaotic, have a strong impact on manufacturer's reputation.For the management and control analysis of enterprise's falsifying, will by embodying the analysis of order, wherein analysis mode is exactly find out an identical order from magnanimity order, then whether has that artificial malice brushes list, false order, internal staff gang up down the situation that goods etc. causes falsifying by finding out the judgement of same order.Finally, strengthen the management and control ability of enterprise, for better market environment is built by enterprise, improve enterprise competitiveness.
Accompanying drawing explanation
Fig. 1 is this algorithm steps figure.
Fig. 2 is the entity relationship diagram of master slave relation data instance, order data.
Fig. 3 is the algorithm steps figure retrieving same order in example.
Embodiment
Below a kind of algorithm retrieving identical master slave relation data based on relational database from large data of the present invention is described in detail below.
From large data, retrieve an algorithm for identical master slave relation data based on relational database, adopt " changing little greatly, point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.
By extract master-salve table grouping according to-determine order of packets-performs grouping, in execution grouping process, progressively reduce the algorithm of data area in conjunction with ergodic algorithm and middle table storage.
1) concrete steps are as Fig. 1:
Conveniently set forth, with the common master slave relation data-order data of enterprise exemplarily, suppose that master meter tables of data is called: CO_MAIN, be called from table tables of data: CO_SUB.E-R graph of a relation is as Fig. 2:
Object: find out same order from magnanimity order data, that is: the identical order of the quantity of order commodity and commodity.
Algorithm steps is as Fig. 3
1: acknowledgment packet index is:
Master meter index: total amount of the orders, order total amount.
From table index: order type of merchandize quantity, order commodity amount, order goods amount.
Finally to divide into groups foundation: 1) total amount of the orders+order total amount
2) total amount of the orders+order total amount+order type of merchandize quantity
3) order commodity amount+order goods amount
2: acknowledgment packet execution sequence:
1) total amount of the orders+order total amount
2) total amount of the orders+order total amount+order type of merchandize quantity
3) order commodity amount+order goods amount
3: perform grouping comparison step by step according to order of packets
A: total amount of the orders+order total amount grouping; Total amount of the orders+order total amount+order type of merchandize number of packets
Utilize two-layer nested GROUPBY grouping to find, the order that on order total charge, order total amount, order, the quantity of type of merchandize is identical, is stored in maysamelist.
Wherein CO_COUNT represents the quantity of order in grouping, and CO_COUNT_NUM1 represents the order in grouping.
B: order commodity amount+order goods amount grouping
Circulation maycolist, judges each subgroup submaycolist, judges that the public method whether two orders are identical judges whether there is same order in this grouping, by same order stored in SAME_CO_MAIN, SAME_CO_SUB by calling.Specific algorithm:
For (maycolist, intercept subgroup (i.e. an order grouping that may be identical) according to CO_COUNT_NUM1, run into 1 stopping)
{
1: obtain depositing co_id=CO_ID, goodcount=GOOD_COUNT in submaycolist:list)
2: import submaycolist into, goodcount call the method judging that whether an order grouping is identical, and the inside recursive call judges the method whether two orders are identical
3:for(submaycolist){
3.1 call the method judging that whether two orders are identical
twocossame(coid1,coid2,goodcount)
If 3.2 return results as T, judge two orders whether in SAME_CO_SUB
1) all it's not true then stored in SAME_CO_MAIN, SAME_CO_SUB for coid1, coid2;
2) one is had, by another stored in SAME_CO_SUB
3) have, inoperation
}
}
C: judge a method whether two orders are identical, twocossame (coid1, coid2, goodcount)
The retrieval of same order data in this example, can be applicable to enterprise's falsifying management.Enterprise's falsifying can upset the market order of enterprise product, causes market to engage in internal strife, price is chaotic, have a strong impact on manufacturer's reputation.For the management and control analysis of enterprise's falsifying, will by embodying the analysis of order, wherein analysis mode is exactly find out an identical order from magnanimity order, then whether has that artificial malice brushes list, false order, internal staff gang up down the situation that goods etc. causes falsifying by finding out the judgement of same order.Finally, strengthen the management and control ability of enterprise, for better market environment is built by enterprise, improve enterprise competitiveness.

Claims (2)

1. from large data, retrieve the algorithm of identical master slave relation data based on relational database for one kind, it is a kind of algorithm carrying out comparing in mass data, it is characterized in that adopting " changing little greatly; point behind first face ", utilize packet traverses, middle table to store scheduling algorithm and progressively reduce comparing scope, efficient retrieval goes out identical record.
2. a kind of algorithm retrieving identical master slave relation data based on relational database from large data according to claim 1, it is characterized in that, by extract master-salve table grouping according to-determine order of packets-performs grouping, in execution grouping process, progressively reduce the algorithm of data area in conjunction with ergodic algorithm and middle table storage.
CN201510810811.2A 2015-11-23 2015-11-23 Algorithm for retrieving same master-slave relation data from big data based on relational database Pending CN105447137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510810811.2A CN105447137A (en) 2015-11-23 2015-11-23 Algorithm for retrieving same master-slave relation data from big data based on relational database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510810811.2A CN105447137A (en) 2015-11-23 2015-11-23 Algorithm for retrieving same master-slave relation data from big data based on relational database

Publications (1)

Publication Number Publication Date
CN105447137A true CN105447137A (en) 2016-03-30

Family

ID=55557314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510810811.2A Pending CN105447137A (en) 2015-11-23 2015-11-23 Algorithm for retrieving same master-slave relation data from big data based on relational database

Country Status (1)

Country Link
CN (1) CN105447137A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779126A (en) * 2016-12-30 2017-05-31 中国民航信息网络股份有限公司 Malice accounts for the processing method and system of an order
CN107291908A (en) * 2017-06-26 2017-10-24 浪潮软件股份有限公司 Cross-database mass data comparison method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779126A (en) * 2016-12-30 2017-05-31 中国民航信息网络股份有限公司 Malice accounts for the processing method and system of an order
CN107291908A (en) * 2017-06-26 2017-10-24 浪潮软件股份有限公司 Cross-database mass data comparison method

Similar Documents

Publication Publication Date Title
US9158812B2 (en) Enhancing parallelism in evaluation ranking/cumulative window functions
US10162857B2 (en) Optimized inequality join method
CN106104525B (en) Event processing system
US20120109888A1 (en) Data partitioning method of distributed parallel database system
Liu et al. Efficient distributed query processing in large RFID-enabled supply chains
Chai et al. Crowdsourcing database systems: Overview and challenges
US9390129B2 (en) Scalable and adaptive evaluation of reporting window functions
US20180129708A1 (en) Query processing management in a database management system
CN102968420A (en) Database query method and system
CN110222029A (en) A kind of big data multidimensional analysis computational efficiency method for improving and system
US10872086B2 (en) Selectivity estimation for database query planning
CN103176974A (en) Method and device used for optimizing access path in data base
US9135630B2 (en) Systems and methods for large-scale link analysis
Giannakouris et al. MuSQLE: Distributed SQL query execution over multiple engine environments
CN112015741A (en) Method and device for storing massive data in different databases and tables
WO2021036452A1 (en) Real-time data deduplication counting method and device
CN104281891A (en) Time-series data mining method and system
Tank et al. Speeding ETL processing in data warehouses using high-performance joins for changed data capture (cdc)
US11726975B2 (en) Auto unload
CN105447137A (en) Algorithm for retrieving same master-slave relation data from big data based on relational database
Wang et al. A hybrid index for temporal big data
KR20180077830A (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN106339432A (en) System and method for balancing load according to content to be inquired
US8832157B1 (en) System, method, and computer-readable medium that facilitates efficient processing of distinct counts on several columns in a parallel processing system
CN115391424A (en) Database query processing method, storage medium and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160330