CN106503243B

CN106503243B - Electric power big data querying method based on HBase secondary index

Info

Publication number: CN106503243B
Application number: CN201610980816.4A
Authority: CN
Inventors: 马艳; 苏建军; 张方正; 李红梅; 郭志红; 陈玉峰; 祝永新; 盛戈皞; 杨祎; 许乃媛; 沈宇蓝; 王畅; 刘斌; 孙占睿; 李程启; 林颖; 耿玉杰; 白德盟; 李华东; 王勇
Original assignee: Shanghai Jiaotong University; State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Current assignee: Shanghai Jiaotong University; State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date: 2016-11-08
Filing date: 2016-11-08
Publication date: 2019-08-06
Anticipated expiration: 2036-11-08
Also published as: CN106503243A

Abstract

The invention discloses the electric power big data querying methods based on HBase secondary index；It includes: step (1): establishing secondary index table；Step (2): judging whether tables of data has update, does not update secondary index table if not having if so, just updating secondary index table；Step (3): data are inquired using secondary index table.Basic update operation may be implemented in the present invention, and more can efficiently realize connection Query and selection inquiry operation between tables of data for each specific business, to realize the support to complicated business demand.

Description

Electric power big data querying method based on HBase secondary index

Technical field

The present invention relates to the electric power big data querying methods based on HBase secondary index.

Background technique

The safety of power transmission and transforming equipment is the basis of electric power netting safe running.Data information relevant to power transmission and transformation equipment state produces The operational process such as inspection, test, live detection, on-line monitoring, operation of power networks, environment weather and equipment account are born from, are dispersed Among different systems, data volume is big, and type is complicated.Design effective distributed storage mould towards power transmission and transforming equipment big data Type is to realize the basis comprehensively and accurately evaluated equipment state, is the important of realization power grid big data Complete Coupling Analysis Support, is of great significance.

The HBase database run in Hadoop platform be a high reliability, high-performance, towards column, it is expansible Distributed memory system.Large-scale storage cluster, energy can be erected on low-cost server cluster using HBase database technology Enough meet the storage demand of power grid big data.But the big data storage scheme based on HBase is not fully solved data Efficient retrieval problem, especially in face of electric power big data, complicated, flexible inquiry business demand, single line unit are necessarily unable to satisfy Service inquiry needs, therefore a kind of urgently big data search method that can satisfy needs.

[1] power grid timing big data storage method, 104239447 A of CN propose a kind of power grid timing big data storage Method, by selecting open source distribution columnar database HBase as accumulation layer, in conjunction with SG-CIM model in electrical network business to industry Business re-starts description with a collection of measuring point information of position correlation in logic, is deposited by designing a kind of reasonable measuring point data The index organization's mode for storing up table, using the subregion and load-balancing function of HBase, so that having position correlation in service logic Position of the historical data in physical store of a collection of measuring point be adjacent so as to the historical data of this batch of measuring point into The disk tracking time can be reduced when row inquiry, improved search efficiency, provided immediate inquiring service for service application.

[2] HBase secondary index method and device, 104112013 A of CN propose to establish the two of user's table based on HBase The index entry of grade index, secondary index sorts to the value of the rowkey of user's table, to facilitate according to value to user's table It is searched.The corresponding secondary index table of every user's table, and user's table is stored in corresponding secondary index table when storage In identical region server, transregional index is avoided.

Patent [1], [2] are different from the present invention.[1] what is proposed is a kind of number for corresponding service logic correlation According to secondary index organizational form, core concept be so that logically related data realized in storage it is physically adjacent, To improve search efficiency.[2] a kind of index for occuping HBase proposed generates scheme, and core concept is a tables of data pair A concordance list is answered, and tables of data and the storage of manipulative indexing table are on the same server, to improve search efficiency.This hair The secondary index scheme of bright proposition is the electric power big data storage model based on HBase, first according to inquiry business to dependency number Secondary index table is established according to column, the corresponding secondary index table of a basic query business, a complex query business can be right Answer multiple secondary index tables.When inquiry, the line unit for obtaining corresponding data is inquired according to concordance list first, is existed further according to line unit Inquiry is in tables of data to obtain data.When the related column of more new data table, need to update corresponding secondary index simultaneously Table.

Summary of the invention

The purpose of the present invention is to solve the above-mentioned problems, provides a kind of big number of the electric power based on HBase secondary index It is investigated that asking method and system, basic update operation may be implemented, and can be more efficient real for each specific business Connection Query and selection inquiry operation between existing tables of data, to realize the support to complicated business demand.

To achieve the goals above, the present invention adopts the following technical scheme:

Electric power big data querying method based on HBase secondary index, includes the following steps:

Step (1): secondary index table is established；

Step (2): judging whether tables of data has update, does not update two if not having if so, just updating secondary index table Grade concordance list；

Step (3): data are inquired using secondary index table.

The method that the step (1) establishes secondary index table includes the following steps:

Step (11): secondary index table is generated according to action type；

Step (12): according to data column-generation secondary index entry and it is inserted into secondary index table；

The step of step (11) are as follows:

Step (111): for selecting inquiry operation, the M data column for being related to selection inquiry are respectively stored into M second level In concordance list, wherein M is more than or equal to 1, and the line unit R of each secondary index table is made of three parts, is successively: QUALIFIER, VALUE and ROEKEY；Wherein QUALIFIER is the identifier that data arrange in tables of data, and VALUE is in tables of data The value of data column, ROWKEY is the line unit of tables of data；

Step (112): operating connection Query, and the N number of data column for being related to connection Query are stored to a second level rope Draw in table, wherein N is more than or equal to 2, and the line unit R of secondary index table is made of three parts, is successively: PREFIX, VALUE, QUALIFIER；Wherein PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is that data arrange in tables of data Value, QUALIFIER be in tables of data data arrange identifier；

Step (113): for step (111) and step (112), the value that data arrange in the secondary index table is corresponding number According to the ROWKEY of table；The line unit R of data arrange in the secondary index table value and secondary index table collectively forms secondary index table An entry；

Using HBase creation secondary index table (table name of specified secondary index table), and data are arranged into corresponding second level The incidence relation of concordance list is stored into metadata table, and the line unit of metadata table is constituted successively are as follows:

The table name of tables of data, column family name, column name, the action type of secondary index table, timestamp,

The corresponding value of the line unit of metadata table are as follows: the action type and secondary index table name of secondary index table.

The action type of secondary index table includes: selection inquiry operation and connection Query operation.

The step of step (12) are as follows:

Step (121): for selecting inquiry operation, M data column are scanned respectively, according to item described in step (113) Mesh format generates secondary index table clause, and secondary index entry is inserted into corresponding secondary index table.

Step (122): operating connection Query, N number of data column is scanned respectively, according to item described in step (113) Mesh format generates secondary index entry, and secondary index entry is inserted into the same secondary index table.

The method that the step (2) updates secondary index table includes the following steps:

Step (21): more new data table: the Put method interface provided by the HBase in Hadoop platform submits data The values of column, line unit, column family and column identifier, the update of complete paired data table；

Step (22): generate secondary index entry: for the column of the data currently updated, query metadata table is needed The secondary index table and the corresponding action type of secondary index table to be updated select corresponding secondary index according to action type Tableau format meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data；

Step (23): secondary index table is updated: the interface provided by the HBase Coprocessor in Hadoop platform Method, the format of the secondary index entry generated according to step (22) submit the mark of the value of secondary index table, line unit, column family and column Know symbol, completes the update to secondary index table.

The step (22) includes the following steps:

Step (221): if the action type of secondary index table is selection inquiry operation, according to step (111) second level rope Draw tableau format, meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data；

Step (222): if the action type of secondary index table is connection Query operation, according to step (112) second level rope Draw tableau format, generates the compound corresponding tabular entry of secondary index using the data information updated in tables of data.

The step (3) inquires data using secondary index table, includes the following steps:

Step (31): scanning secondary index table obtains the line unit of data to be checked；

Step (32): the collection query tables of data of the ROWKEY of data to be checked is used.

The step of step (31) are as follows:

Step (311): for the querying method of the secondary index table of selection inquiry:

Each of the M data column being related to for selection inquiry business data column, inquire first number according to action type According to table, the title of corresponding secondary index table is obtained.Look into the secondary index table, specific query process are as follows:

It is according to the secondary index table row key format in step (111) it is found that directly fixed according to the condition value in selection inquiry Position continues to scan on, to first qualified data until one ineligible data of discovery；Scanned meets item The data composition of part meets the set of the ROWKEY of the querying condition of current data column.

If M is equal to the set that 1, ROWKEY set is the ROWKEY of data to be checked；

If M is greater than 1, according to the logical relation in inquiry business in M data column, the ROWKEY of different lines is gathered Do corresponding set operation: logical AND corresponds to the operation of intersection of sets collection, logic or corresponding union operation, operation the result is that be checked Ask the set of the ROWKEY of data.

Step (312): for the querying method of the secondary index table of connection Query:

For N number of data column that connection Query business is related to, corresponding two are obtained according to action type query metadata table The title (the corresponding same secondary index table of N number of column) of grade concordance list.Inquire the secondary index table, specific query process are as follows:

According to the secondary index table row key format in step (112) it is found that N number of data of value having the same are listed in second level Corresponding entry continuous arrangement in concordance list；

If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY structure of N number of entry Meet N tuple<R1, R2 of querying condition at one ..., RN>；

Entire secondary index table is scanned, then obtains the set {<R1, R2 ..., RN>} of all N tuples for meeting condition, then Set {<R1, R2 ..., RN>} is exactly the set of the ROWKEY of data to be checked.

The step of step (32) are as follows:

It is provided using the set of the ROWKEY of the data to be checked obtained in step (311) and step (312) by HBase Get interface method corresponding data value is obtained in tables of data.

Electric power big data inquiry system based on HBase secondary index, comprising:

Secondary index table establishes module: for establishing secondary index table；

Judge update module: judging whether tables of data has update, if so, just updating secondary index table, if not having, not more New secondary index table；

Data inquiry module: data are inquired using secondary index table.

Beneficial effects of the present invention:

This patent proposes a kind of secondary index design scheme based on HBase.The secondary index design scheme can have Most basic connection Query, selection inquiry operation in the support relational database of effect, to be power grid big data complex query Business provides good support.Meanwhile service-oriented establishes corresponding secondary index table, it can be in the performance and business of inquiry It is balanced between flexibility.

The invention proposes a kind of secondary index design schemes based on HBase database, realize in relational database Basic selection inquiry and connection Query function, support can be provided to complicated inquiry business demand in network system.

Selection query performance of the invention: for any table T1, inquiry meets condition<T1.a, a '>record, the present invention The item number for the data record for needing to scan is equal to the item number of the record for the condition that meets, less than the item number of the record of whole table | T1 |, It is suitable with the record strip number that the column for establishing index for inquiring traditional relational database need to scan.

Connection Query performance of the invention: connection Query operation, traditional relationship number are carried out for any two table T1, T2 The item number of record scanned is needed to be according to library | T1 | * | T2 |, the present invention needs the record strip number scanned to be | T1 |+| T2 |, it is comprehensive The join operation between set after consideration, the present invention can largely improve the performance of connection Query.

Detailed description of the invention

Fig. 1 is data query flow chart of the invention；

Fig. 2 is that electric power big data of the invention selects querying method flow chart；

Fig. 3 is electric power big data connection Query method flow diagram of the invention.

Specific embodiment

The invention will be further described with embodiment with reference to the accompanying drawing.

The present invention program mainly includes the content of two aspects, to the update scheme and logarithm of tables of data and secondary index According to the query scheme of table, wherein the query scheme of tables of data includes secondary index organization's scheme of basic selection inquiry and right Secondary index organization's scheme of basic connection Query.As shown in Figs. 1-3.

5.1 establish secondary index table

In the present invention, the corresponding secondary index table of a basic query business, a complex query business can be right Answer multiple secondary index tables.The values of the data of secondary index table, line unit, column family and column the information of identifier be to be believed by former data Breath integrates layout acquisition.

A) for selecting inquiry operation, the present invention deposits the secondary index for the corresponding column for being related to multiple tables of selection inquiry It stores up into a table, the line unit R of concordance list is made of three parts, is successively: QUALIFIER, VALUE, ROEKEY.Wherein QUALIFIER is the identifier arranged in tables of data, and VALUE is the value that data arrange in tables of data, and ROWKEY is the line unit of tables of data.

B) connection Query is operated, the present invention deposits the secondary index for the corresponding column for being related to multiple tables of connection Query It stores up into a table, the line unit R of concordance list is made of three parts, is successively: PREFIX, VALUE, QUALIFIER.PREFIX by Hash function generates, and for distinguishing the group of connection Query, VALUE is the value that data arrange in tables of data, and QUALIFIER is tables of data The identifier of middle column.

The train value of concordance list collectively forms an entry of concordance list for the line unit and concordance list line unit of corresponding data.

5.2 select corresponding secondary index table according to operation requests

In the present invention, the corresponding relationship of business and corresponding concordance list stores in the metadata, update or inquiry one When the corresponding tables of data of business, corresponding secondary index table is obtained according to metadata.

5.3 data update

5.3.1 more new data table

HBase Coprocessor in the Hadoop platform that the present invention uses provides the addition delete operation of tables of data Basic support.The interface provided by HBase Coprocessor submits the mark of the values of data, line unit, column family and column Symbol, can be updated tables of data.

5.3.2 generating secondary index entry

According to secondary index tableau format, meet corresponding second level rope using the known data information generation for needing to update Draw tabular entry.

5.3.3 concordance list is updated

The update method of concordance list is similar with data table updating method, the interface provided by HBase Coprocessor, The identifier for submitting the value of concordance list, line unit, column family and column, can be updated concordance list.

5.4 data query

5.4.1 inquiring secondary index table

For the line unit value for being determined for compliance with condition data, need to carry out prescan to secondary index table before inquiring data.

A) for the querying method of the concordance list of selection inquiry:

The compound selection querying condition of business is split as single query item first by selection inquiry business compound for one Then part is obtained the entry set for meeting single condition by the line unit of concordance list, will finally meet the entry of each single condition Set carries out set operation, can be obtained all secondary index entries for meeting compound query condition, then mention from these entries Take all qualified tables of data line units.Wherein, obtain meet the secondary index destination aggregation (mda) of single condition when, can be according to It directly positions according to the line unit of concordance list to first qualified data, scans down, until discovery one is ineligible Data, then scanned entry is merged into the secondary index destination aggregation (mda) for meeting single condition.

As shown in Fig. 2, there are tables of data T1, T2, for compound selection inquiry business (Y1):<T1.a, a '>| |<T1.c, C '>| |<T2.b, b '>(value for meeting the data column a in table T1 " is less than " a ', or meets the value of the data column c in table T1 " being less than " c ', or meet the value of the data column b in table T2 and " be less than " b '), secondary index table is by the middle data of tables of data T1, T2 Corresponding secondary index entry storage is into a table.For Y1, in corresponding secondary index table, with identical QUALIFIER The line unit of beginning forms continuous storage record segment (secondary index table).For querying condition<T1.a, a '>, it can be according to T1.a First for being directly targeted to the condition of satisfaction records, and after continuous scanning, encounters first record for being unsatisfactory for condition, i.e. data Value be greater than a ' record, scanning i.e. complete, scanned entry is merged to the set S1:{ R1 for obtaining a line unit }, be Meet condition<T1.a, a '>all data be recorded in the set of the line unit in tables of data.Similarly, sequential scan concordance list is other Part, can successively obtain meeting condition<T1.c, c '>set S2:{ R2 and meet condition<T2.b, b '>set S3: { R3 } then asks S1 ∪ S2 ∪ S3 can be obtained and meets all data of Y1 and be recorded in the value of line unit in tables of data.

B) it is directed to the querying method of connection Query concordance list:

For compound connection Query business, inquiry can be divided into two connection Query groups, the number of same connection Query group When being inserted into concordance list according to column, identical PREFIX value is generated by hash function.The corresponding value of line unit R is then that this is listed in data Line unit in table.Whole scan is carried out to secondary index table when inquiry, records qualified multi-component system set, then these are more Tuple-set carries out set operation, obtains the line unit value of eligible data.Wherein recording qualified multi-component system set In the process, when the multi-component system of only continuous entry composition can meet the condition of connection Query group, just this multi-component system is added It adds in multi-component system set.

As shown in figure 3, there are tables of data T1, T2, T3, T4, for compound connection Query business (Y2): T1.a=T2.b= T4.d&&T1.e=T3.c (wherein, a, b, c, d, e are respectively the column in table T1, T2, T3, T4, T1), can be divided into two for inquiry A connection Query group, two (Z2): T1.e=T3.c of one (Z1): T1.a=T2.b=T4.d of group and group.It is all in Z1 for Y1 Column can all be started with same PREFIX, therefore will form continuous storage record (secondary index table), and scanning should from the beginning Section storage record, the record of VALUE having the same can be scanned consecutively, count to scanning, three continuous (because of Z1 Be related to 3 tables) VALUE it is identical be recorded as connection Query a result record, result be a triple set S1:{ < R1, R2, R4 > }, R1, R2, R4 respectively correspond the line unit that three data with same VALUE are listed in tables of data T1, T2, T4 Value.S1 is the connection Query result of Z1.Equally, it scans that Z2 is formed so record, available similar connection Query knot Fruit S2:{<R1, R3>} because between Z1 and Z2 being the relationship (&& of intersection), connection Query behaviour is done on R1 to S1 and S2 It can be obtained by the final query result S:{<R1, R2, R3, R4>of business Y2 }.

5.4.2 content is obtained in tables of data using line unit

After the line unit value for obtaining the data for meeting querying condition, the line unit value obtained can be used to pass through HBase The Get interface that Coprocessor is provided obtains data value corresponding to the line unit value in tables of data.

Specific embodiment:

Hadoop distributed file system is installed；

Install HBase database, version be 0.92 and after；

PrePut the and postPut method for rewriteeing region observer in HBase Coprocessor, according to The data being newly inserted into are updated corresponding secondary index table；

It realizes the preGet method of region observer in HBase Coprocessor, is first accessed according to query argument Corresponding secondary index table obtains the line unit of inquiry data, the data then needed according to line unit inquiry.

Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims

1. the electric power big data querying method based on HBase secondary index, characterized in that include the following steps:

Step (1): secondary index table is established；

Step (2): judging whether tables of data has update, does not update second level rope if not having if so, just updating secondary index table Draw table；

Step (3): data are inquired using secondary index table；

Step (11): secondary index table is generated according to action type；

The step of step (11) are as follows:

Step (111): for selecting inquiry operation, the M data column for being related to selection inquiry are respectively stored into M secondary index In table, wherein M is more than or equal to 1, and the line unit R of each secondary index table is made of three parts, is successively: QUALIFIER, VALUE and ROEKEY；Wherein QUALIFIER is the identifier that data arrange in tables of data, and VALUE is that data arrange in tables of data Value, ROWKEY is the line unit of tables of data；

Step (112): operating connection Query, and the N number of data column for being related to connection Query are stored to a secondary index table In, wherein N is more than or equal to 2, and the line unit R of secondary index table is made of three parts, is successively: PREFIX, VALUE, QUALIFIER；Wherein PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is that data arrange in tables of data Value, QUALIFIER be in tables of data data arrange identifier；

Step (113): for step (111) and step (112), the value that data arrange in the secondary index table is corresponding data table ROWKEY；The line unit R of data arrange in the secondary index table value and secondary index table collectively forms the one of secondary index table A entry；

Secondary index table is created using HBase, and the incidence relation that data arrange corresponding secondary index table is stored to first number According in table, the line unit of metadata table is constituted successively are as follows:

2. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that

The step of step (12) are as follows:

Step (121): for selecting inquiry operation, M data column are scanned respectively, according to entry lattice described in step (113) Formula generates secondary index table clause, and secondary index entry is inserted into corresponding secondary index table；

Step (122): operating connection Query, N number of data column is scanned respectively, according to entry lattice described in step (113) Formula generates secondary index entry, and secondary index entry is inserted into the same secondary index table.

3. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that the step Suddenly the method that (2) update secondary index table includes the following steps:

Step (21): more new data table: the Put method interface provided by the HBase in Hadoop platform submits data column Value, line unit, column family and column identifier, the update of complete paired data table；

Step (22): generate secondary index entry: for the column of the data currently updated, query metadata table is obtained and is needed more New secondary index table and the corresponding action type of secondary index table, selects corresponding secondary index table according to action type Format meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data；

Step (23): secondary index table is updated: the interface side provided by the HBase Coprocessor in Hadoop platform Method, the format of the secondary index entry generated according to step (22) submit the mark of the value of secondary index table, line unit, column family and column Symbol completes the update to secondary index table.

4. the electric power big data querying method based on HBase secondary index as claimed in claim 3, characterized in that the step Suddenly (22) include the following steps:

Step (221): if the action type of secondary index table is selection inquiry operation, according to step (111) secondary index table Format, use the data information that updates in tables of data to generate the compound corresponding tabular entry of secondary index；

Step (222): if the action type of secondary index table is connection Query operation, according to step (112) secondary index table Format, use the data information that updates in tables of data to generate the compound corresponding tabular entry of secondary index.

5. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that the step Suddenly (3) inquire data using secondary index table, include the following steps:

6. the electric power big data querying method based on HBase secondary index as claimed in claim 5, characterized in that the step Suddenly the step of (31) are as follows:

Each of the M data column being related to for selection inquiry business data arrange, according to action type query metadata table, Obtain the title of corresponding secondary index table；Look into the secondary index table, specific query process are as follows:

According to the secondary index table row key format in step (111) it is found that according to selection inquiry in condition value directly position to First qualified data, continues to scan on, until one ineligible data of discovery；Scanned is qualified Data composition meets the set of the ROWKEY of the querying condition of current data column；

If M is greater than 1, according to the logical relation in inquiry business in M data column, phase is done to the ROWKEY set of different lines The set operation answered: logical AND corresponds to the operation of intersection of sets collection, logic or corresponding union operation, operation the result is that number to be checked According to ROWKEY set；

For N number of data column that connection Query business is related to, corresponding second level rope is obtained according to action type query metadata table Draw the title of table；Inquire the secondary index table, specific query process are as follows:

According to the secondary index table row key format in step (112) it is found that N number of data of value having the same are listed in secondary index Corresponding entry continuous arrangement in table；

If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY of N number of entry constitutes one A N tuple<R1, R2 for meeting querying condition ..., RN>；

Entire secondary index table is scanned, then obtains the set {<R1, R2 ..., RN>} of all N tuples for meeting condition, then gathers {<R1, R2 ..., RN>be exactly data to be checked ROWKEY set.

7. the electric power big data querying method based on HBase secondary index as claimed in claim 6, characterized in that the step Suddenly the step of (32) are as follows:

It is provided using the set of the ROWKEY of the data to be checked obtained in step (311) and step (312) by HBase Get interface method obtains corresponding data value in tables of data.