CN106503243B - Electric power big data querying method based on HBase secondary index - Google Patents
Electric power big data querying method based on HBase secondary index Download PDFInfo
- Publication number
- CN106503243B CN106503243B CN201610980816.4A CN201610980816A CN106503243B CN 106503243 B CN106503243 B CN 106503243B CN 201610980816 A CN201610980816 A CN 201610980816A CN 106503243 B CN106503243 B CN 106503243B
- Authority
- CN
- China
- Prior art keywords
- secondary index
- data
- index table
- column
- tables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000000875 corresponding effect Effects 0.000 claims description 53
- 230000009471 action Effects 0.000 claims description 17
- 150000001875 compounds Chemical class 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the electric power big data querying methods based on HBase secondary index;It includes: step (1): establishing secondary index table;Step (2): judging whether tables of data has update, does not update secondary index table if not having if so, just updating secondary index table;Step (3): data are inquired using secondary index table.Basic update operation may be implemented in the present invention, and more can efficiently realize connection Query and selection inquiry operation between tables of data for each specific business, to realize the support to complicated business demand.
Description
Technical field
The present invention relates to the electric power big data querying methods based on HBase secondary index.
Background technique
The safety of power transmission and transforming equipment is the basis of electric power netting safe running.Data information relevant to power transmission and transformation equipment state produces
The operational process such as inspection, test, live detection, on-line monitoring, operation of power networks, environment weather and equipment account are born from, are dispersed
Among different systems, data volume is big, and type is complicated.Design effective distributed storage mould towards power transmission and transforming equipment big data
Type is to realize the basis comprehensively and accurately evaluated equipment state, is the important of realization power grid big data Complete Coupling Analysis
Support, is of great significance.
The HBase database run in Hadoop platform be a high reliability, high-performance, towards column, it is expansible
Distributed memory system.Large-scale storage cluster, energy can be erected on low-cost server cluster using HBase database technology
Enough meet the storage demand of power grid big data.But the big data storage scheme based on HBase is not fully solved data
Efficient retrieval problem, especially in face of electric power big data, complicated, flexible inquiry business demand, single line unit are necessarily unable to satisfy
Service inquiry needs, therefore a kind of urgently big data search method that can satisfy needs.
[1] power grid timing big data storage method, 104239447 A of CN propose a kind of power grid timing big data storage
Method, by selecting open source distribution columnar database HBase as accumulation layer, in conjunction with SG-CIM model in electrical network business to industry
Business re-starts description with a collection of measuring point information of position correlation in logic, is deposited by designing a kind of reasonable measuring point data
The index organization's mode for storing up table, using the subregion and load-balancing function of HBase, so that having position correlation in service logic
Position of the historical data in physical store of a collection of measuring point be adjacent so as to the historical data of this batch of measuring point into
The disk tracking time can be reduced when row inquiry, improved search efficiency, provided immediate inquiring service for service application.
[2] HBase secondary index method and device, 104112013 A of CN propose to establish the two of user's table based on HBase
The index entry of grade index, secondary index sorts to the value of the rowkey of user's table, to facilitate according to value to user's table
It is searched.The corresponding secondary index table of every user's table, and user's table is stored in corresponding secondary index table when storage
In identical region server, transregional index is avoided.
Patent [1], [2] are different from the present invention.[1] what is proposed is a kind of number for corresponding service logic correlation
According to secondary index organizational form, core concept be so that logically related data realized in storage it is physically adjacent,
To improve search efficiency.[2] a kind of index for occuping HBase proposed generates scheme, and core concept is a tables of data pair
A concordance list is answered, and tables of data and the storage of manipulative indexing table are on the same server, to improve search efficiency.This hair
The secondary index scheme of bright proposition is the electric power big data storage model based on HBase, first according to inquiry business to dependency number
Secondary index table is established according to column, the corresponding secondary index table of a basic query business, a complex query business can be right
Answer multiple secondary index tables.When inquiry, the line unit for obtaining corresponding data is inquired according to concordance list first, is existed further according to line unit
Inquiry is in tables of data to obtain data.When the related column of more new data table, need to update corresponding secondary index simultaneously
Table.
Summary of the invention
The purpose of the present invention is to solve the above-mentioned problems, provides a kind of big number of the electric power based on HBase secondary index
It is investigated that asking method and system, basic update operation may be implemented, and can be more efficient real for each specific business
Connection Query and selection inquiry operation between existing tables of data, to realize the support to complicated business demand.
To achieve the goals above, the present invention adopts the following technical scheme:
Electric power big data querying method based on HBase secondary index, includes the following steps:
Step (1): secondary index table is established;
Step (2): judging whether tables of data has update, does not update two if not having if so, just updating secondary index table
Grade concordance list;
Step (3): data are inquired using secondary index table.
The method that the step (1) establishes secondary index table includes the following steps:
Step (11): secondary index table is generated according to action type;
Step (12): according to data column-generation secondary index entry and it is inserted into secondary index table;
The step of step (11) are as follows:
Step (111): for selecting inquiry operation, the M data column for being related to selection inquiry are respectively stored into M second level
In concordance list, wherein M is more than or equal to 1, and the line unit R of each secondary index table is made of three parts, is successively:
QUALIFIER, VALUE and ROEKEY;Wherein QUALIFIER is the identifier that data arrange in tables of data, and VALUE is in tables of data
The value of data column, ROWKEY is the line unit of tables of data;
Step (112): operating connection Query, and the N number of data column for being related to connection Query are stored to a second level rope
Draw in table, wherein N is more than or equal to 2, and the line unit R of secondary index table is made of three parts, is successively: PREFIX, VALUE,
QUALIFIER;Wherein PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is that data arrange in tables of data
Value, QUALIFIER be in tables of data data arrange identifier;
Step (113): for step (111) and step (112), the value that data arrange in the secondary index table is corresponding number
According to the ROWKEY of table;The line unit R of data arrange in the secondary index table value and secondary index table collectively forms secondary index table
An entry;
Using HBase creation secondary index table (table name of specified secondary index table), and data are arranged into corresponding second level
The incidence relation of concordance list is stored into metadata table, and the line unit of metadata table is constituted successively are as follows:
The table name of tables of data, column family name, column name, the action type of secondary index table, timestamp,
The corresponding value of the line unit of metadata table are as follows: the action type and secondary index table name of secondary index table.
The action type of secondary index table includes: selection inquiry operation and connection Query operation.
The step of step (12) are as follows:
Step (121): for selecting inquiry operation, M data column are scanned respectively, according to item described in step (113)
Mesh format generates secondary index table clause, and secondary index entry is inserted into corresponding secondary index table.
Step (122): operating connection Query, N number of data column is scanned respectively, according to item described in step (113)
Mesh format generates secondary index entry, and secondary index entry is inserted into the same secondary index table.
The method that the step (2) updates secondary index table includes the following steps:
Step (21): more new data table: the Put method interface provided by the HBase in Hadoop platform submits data
The values of column, line unit, column family and column identifier, the update of complete paired data table;
Step (22): generate secondary index entry: for the column of the data currently updated, query metadata table is needed
The secondary index table and the corresponding action type of secondary index table to be updated select corresponding secondary index according to action type
Tableau format meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data;
Step (23): secondary index table is updated: the interface provided by the HBase Coprocessor in Hadoop platform
Method, the format of the secondary index entry generated according to step (22) submit the mark of the value of secondary index table, line unit, column family and column
Know symbol, completes the update to secondary index table.
The step (22) includes the following steps:
Step (221): if the action type of secondary index table is selection inquiry operation, according to step (111) second level rope
Draw tableau format, meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data;
Step (222): if the action type of secondary index table is connection Query operation, according to step (112) second level rope
Draw tableau format, generates the compound corresponding tabular entry of secondary index using the data information updated in tables of data.
The step (3) inquires data using secondary index table, includes the following steps:
Step (31): scanning secondary index table obtains the line unit of data to be checked;
Step (32): the collection query tables of data of the ROWKEY of data to be checked is used.
The step of step (31) are as follows:
Step (311): for the querying method of the secondary index table of selection inquiry:
Each of the M data column being related to for selection inquiry business data column, inquire first number according to action type
According to table, the title of corresponding secondary index table is obtained.Look into the secondary index table, specific query process are as follows:
It is according to the secondary index table row key format in step (111) it is found that directly fixed according to the condition value in selection inquiry
Position continues to scan on, to first qualified data until one ineligible data of discovery;Scanned meets item
The data composition of part meets the set of the ROWKEY of the querying condition of current data column.
If M is equal to the set that 1, ROWKEY set is the ROWKEY of data to be checked;
If M is greater than 1, according to the logical relation in inquiry business in M data column, the ROWKEY of different lines is gathered
Do corresponding set operation: logical AND corresponds to the operation of intersection of sets collection, logic or corresponding union operation, operation the result is that be checked
Ask the set of the ROWKEY of data.
Step (312): for the querying method of the secondary index table of connection Query:
For N number of data column that connection Query business is related to, corresponding two are obtained according to action type query metadata table
The title (the corresponding same secondary index table of N number of column) of grade concordance list.Inquire the secondary index table, specific query process are as follows:
According to the secondary index table row key format in step (112) it is found that N number of data of value having the same are listed in second level
Corresponding entry continuous arrangement in concordance list;
If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY structure of N number of entry
Meet N tuple<R1, R2 of querying condition at one ..., RN>;
Entire secondary index table is scanned, then obtains the set {<R1, R2 ..., RN>} of all N tuples for meeting condition, then
Set {<R1, R2 ..., RN>} is exactly the set of the ROWKEY of data to be checked.
The step of step (32) are as follows:
It is provided using the set of the ROWKEY of the data to be checked obtained in step (311) and step (312) by HBase
Get interface method corresponding data value is obtained in tables of data.
Electric power big data inquiry system based on HBase secondary index, comprising:
Secondary index table establishes module: for establishing secondary index table;
Judge update module: judging whether tables of data has update, if so, just updating secondary index table, if not having, not more
New secondary index table;
Data inquiry module: data are inquired using secondary index table.
Beneficial effects of the present invention:
This patent proposes a kind of secondary index design scheme based on HBase.The secondary index design scheme can have
Most basic connection Query, selection inquiry operation in the support relational database of effect, to be power grid big data complex query
Business provides good support.Meanwhile service-oriented establishes corresponding secondary index table, it can be in the performance and business of inquiry
It is balanced between flexibility.
The invention proposes a kind of secondary index design schemes based on HBase database, realize in relational database
Basic selection inquiry and connection Query function, support can be provided to complicated inquiry business demand in network system.
Selection query performance of the invention: for any table T1, inquiry meets condition<T1.a, a '>record, the present invention
The item number for the data record for needing to scan is equal to the item number of the record for the condition that meets, less than the item number of the record of whole table | T1 |,
It is suitable with the record strip number that the column for establishing index for inquiring traditional relational database need to scan.
Connection Query performance of the invention: connection Query operation, traditional relationship number are carried out for any two table T1, T2
The item number of record scanned is needed to be according to library | T1 | * | T2 |, the present invention needs the record strip number scanned to be | T1 |+| T2 |, it is comprehensive
The join operation between set after consideration, the present invention can largely improve the performance of connection Query.
Detailed description of the invention
Fig. 1 is data query flow chart of the invention;
Fig. 2 is that electric power big data of the invention selects querying method flow chart;
Fig. 3 is electric power big data connection Query method flow diagram of the invention.
Specific embodiment
The invention will be further described with embodiment with reference to the accompanying drawing.
The present invention program mainly includes the content of two aspects, to the update scheme and logarithm of tables of data and secondary index
According to the query scheme of table, wherein the query scheme of tables of data includes secondary index organization's scheme of basic selection inquiry and right
Secondary index organization's scheme of basic connection Query.As shown in Figs. 1-3.
5.1 establish secondary index table
In the present invention, the corresponding secondary index table of a basic query business, a complex query business can be right
Answer multiple secondary index tables.The values of the data of secondary index table, line unit, column family and column the information of identifier be to be believed by former data
Breath integrates layout acquisition.
A) for selecting inquiry operation, the present invention deposits the secondary index for the corresponding column for being related to multiple tables of selection inquiry
It stores up into a table, the line unit R of concordance list is made of three parts, is successively: QUALIFIER, VALUE, ROEKEY.Wherein
QUALIFIER is the identifier arranged in tables of data, and VALUE is the value that data arrange in tables of data, and ROWKEY is the line unit of tables of data.
B) connection Query is operated, the present invention deposits the secondary index for the corresponding column for being related to multiple tables of connection Query
It stores up into a table, the line unit R of concordance list is made of three parts, is successively: PREFIX, VALUE, QUALIFIER.PREFIX by
Hash function generates, and for distinguishing the group of connection Query, VALUE is the value that data arrange in tables of data, and QUALIFIER is tables of data
The identifier of middle column.
The train value of concordance list collectively forms an entry of concordance list for the line unit and concordance list line unit of corresponding data.
5.2 select corresponding secondary index table according to operation requests
In the present invention, the corresponding relationship of business and corresponding concordance list stores in the metadata, update or inquiry one
When the corresponding tables of data of business, corresponding secondary index table is obtained according to metadata.
5.3 data update
5.3.1 more new data table
HBase Coprocessor in the Hadoop platform that the present invention uses provides the addition delete operation of tables of data
Basic support.The interface provided by HBase Coprocessor submits the mark of the values of data, line unit, column family and column
Symbol, can be updated tables of data.
5.3.2 generating secondary index entry
According to secondary index tableau format, meet corresponding second level rope using the known data information generation for needing to update
Draw tabular entry.
5.3.3 concordance list is updated
The update method of concordance list is similar with data table updating method, the interface provided by HBase Coprocessor,
The identifier for submitting the value of concordance list, line unit, column family and column, can be updated concordance list.
5.4 data query
5.4.1 inquiring secondary index table
For the line unit value for being determined for compliance with condition data, need to carry out prescan to secondary index table before inquiring data.
A) for the querying method of the concordance list of selection inquiry:
The compound selection querying condition of business is split as single query item first by selection inquiry business compound for one
Then part is obtained the entry set for meeting single condition by the line unit of concordance list, will finally meet the entry of each single condition
Set carries out set operation, can be obtained all secondary index entries for meeting compound query condition, then mention from these entries
Take all qualified tables of data line units.Wherein, obtain meet the secondary index destination aggregation (mda) of single condition when, can be according to
It directly positions according to the line unit of concordance list to first qualified data, scans down, until discovery one is ineligible
Data, then scanned entry is merged into the secondary index destination aggregation (mda) for meeting single condition.
As shown in Fig. 2, there are tables of data T1, T2, for compound selection inquiry business (Y1):<T1.a, a '>| |<T1.c,
C '>| |<T2.b, b '>(value for meeting the data column a in table T1 " is less than " a ', or meets the value of the data column c in table T1
" being less than " c ', or meet the value of the data column b in table T2 and " be less than " b '), secondary index table is by the middle data of tables of data T1, T2
Corresponding secondary index entry storage is into a table.For Y1, in corresponding secondary index table, with identical QUALIFIER
The line unit of beginning forms continuous storage record segment (secondary index table).For querying condition<T1.a, a '>, it can be according to T1.a
First for being directly targeted to the condition of satisfaction records, and after continuous scanning, encounters first record for being unsatisfactory for condition, i.e. data
Value be greater than a ' record, scanning i.e. complete, scanned entry is merged to the set S1:{ R1 for obtaining a line unit }, be
Meet condition<T1.a, a '>all data be recorded in the set of the line unit in tables of data.Similarly, sequential scan concordance list is other
Part, can successively obtain meeting condition<T1.c, c '>set S2:{ R2 and meet condition<T2.b, b '>set S3:
{ R3 } then asks S1 ∪ S2 ∪ S3 can be obtained and meets all data of Y1 and be recorded in the value of line unit in tables of data.
B) it is directed to the querying method of connection Query concordance list:
For compound connection Query business, inquiry can be divided into two connection Query groups, the number of same connection Query group
When being inserted into concordance list according to column, identical PREFIX value is generated by hash function.The corresponding value of line unit R is then that this is listed in data
Line unit in table.Whole scan is carried out to secondary index table when inquiry, records qualified multi-component system set, then these are more
Tuple-set carries out set operation, obtains the line unit value of eligible data.Wherein recording qualified multi-component system set
In the process, when the multi-component system of only continuous entry composition can meet the condition of connection Query group, just this multi-component system is added
It adds in multi-component system set.
As shown in figure 3, there are tables of data T1, T2, T3, T4, for compound connection Query business (Y2): T1.a=T2.b=
T4.d&&T1.e=T3.c (wherein, a, b, c, d, e are respectively the column in table T1, T2, T3, T4, T1), can be divided into two for inquiry
A connection Query group, two (Z2): T1.e=T3.c of one (Z1): T1.a=T2.b=T4.d of group and group.It is all in Z1 for Y1
Column can all be started with same PREFIX, therefore will form continuous storage record (secondary index table), and scanning should from the beginning
Section storage record, the record of VALUE having the same can be scanned consecutively, count to scanning, three continuous (because of Z1
Be related to 3 tables) VALUE it is identical be recorded as connection Query a result record, result be a triple set S1:{ <
R1, R2, R4 > }, R1, R2, R4 respectively correspond the line unit that three data with same VALUE are listed in tables of data T1, T2, T4
Value.S1 is the connection Query result of Z1.Equally, it scans that Z2 is formed so record, available similar connection Query knot
Fruit S2:{<R1, R3>} because between Z1 and Z2 being the relationship (&& of intersection), connection Query behaviour is done on R1 to S1 and S2
It can be obtained by the final query result S:{<R1, R2, R3, R4>of business Y2 }.
5.4.2 content is obtained in tables of data using line unit
After the line unit value for obtaining the data for meeting querying condition, the line unit value obtained can be used to pass through HBase
The Get interface that Coprocessor is provided obtains data value corresponding to the line unit value in tables of data.
Specific embodiment:
Hadoop distributed file system is installed;
Install HBase database, version be 0.92 and after;
PrePut the and postPut method for rewriteeing region observer in HBase Coprocessor, according to
The data being newly inserted into are updated corresponding secondary index table;
It realizes the preGet method of region observer in HBase Coprocessor, is first accessed according to query argument
Corresponding secondary index table obtains the line unit of inquiry data, the data then needed according to line unit inquiry.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention
The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.
Claims (7)
1. the electric power big data querying method based on HBase secondary index, characterized in that include the following steps:
Step (1): secondary index table is established;
Step (2): judging whether tables of data has update, does not update second level rope if not having if so, just updating secondary index table
Draw table;
Step (3): data are inquired using secondary index table;
The method that the step (1) establishes secondary index table includes the following steps:
Step (11): secondary index table is generated according to action type;
Step (12): according to data column-generation secondary index entry and it is inserted into secondary index table;
The step of step (11) are as follows:
Step (111): for selecting inquiry operation, the M data column for being related to selection inquiry are respectively stored into M secondary index
In table, wherein M is more than or equal to 1, and the line unit R of each secondary index table is made of three parts, is successively: QUALIFIER,
VALUE and ROEKEY;Wherein QUALIFIER is the identifier that data arrange in tables of data, and VALUE is that data arrange in tables of data
Value, ROWKEY is the line unit of tables of data;
Step (112): operating connection Query, and the N number of data column for being related to connection Query are stored to a secondary index table
In, wherein N is more than or equal to 2, and the line unit R of secondary index table is made of three parts, is successively: PREFIX, VALUE,
QUALIFIER;Wherein PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is that data arrange in tables of data
Value, QUALIFIER be in tables of data data arrange identifier;
Step (113): for step (111) and step (112), the value that data arrange in the secondary index table is corresponding data table
ROWKEY;The line unit R of data arrange in the secondary index table value and secondary index table collectively forms the one of secondary index table
A entry;
Secondary index table is created using HBase, and the incidence relation that data arrange corresponding secondary index table is stored to first number
According in table, the line unit of metadata table is constituted successively are as follows:
The table name of tables of data, column family name, column name, the action type of secondary index table, timestamp,
The corresponding value of the line unit of metadata table are as follows: the action type and secondary index table name of secondary index table.
2. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that
The step of step (12) are as follows:
Step (121): for selecting inquiry operation, M data column are scanned respectively, according to entry lattice described in step (113)
Formula generates secondary index table clause, and secondary index entry is inserted into corresponding secondary index table;
Step (122): operating connection Query, N number of data column is scanned respectively, according to entry lattice described in step (113)
Formula generates secondary index entry, and secondary index entry is inserted into the same secondary index table.
3. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that the step
Suddenly the method that (2) update secondary index table includes the following steps:
Step (21): more new data table: the Put method interface provided by the HBase in Hadoop platform submits data column
Value, line unit, column family and column identifier, the update of complete paired data table;
Step (22): generate secondary index entry: for the column of the data currently updated, query metadata table is obtained and is needed more
New secondary index table and the corresponding action type of secondary index table, selects corresponding secondary index table according to action type
Format meets the corresponding tabular entry of secondary index using the data information generation updated in tables of data;
Step (23): secondary index table is updated: the interface side provided by the HBase Coprocessor in Hadoop platform
Method, the format of the secondary index entry generated according to step (22) submit the mark of the value of secondary index table, line unit, column family and column
Symbol completes the update to secondary index table.
4. the electric power big data querying method based on HBase secondary index as claimed in claim 3, characterized in that the step
Suddenly (22) include the following steps:
Step (221): if the action type of secondary index table is selection inquiry operation, according to step (111) secondary index table
Format, use the data information that updates in tables of data to generate the compound corresponding tabular entry of secondary index;
Step (222): if the action type of secondary index table is connection Query operation, according to step (112) secondary index table
Format, use the data information that updates in tables of data to generate the compound corresponding tabular entry of secondary index.
5. the electric power big data querying method based on HBase secondary index as described in claim 1, characterized in that the step
Suddenly (3) inquire data using secondary index table, include the following steps:
Step (31): scanning secondary index table obtains the line unit of data to be checked;
Step (32): the collection query tables of data of the ROWKEY of data to be checked is used.
6. the electric power big data querying method based on HBase secondary index as claimed in claim 5, characterized in that the step
Suddenly the step of (31) are as follows:
Step (311): for the querying method of the secondary index table of selection inquiry:
Each of the M data column being related to for selection inquiry business data arrange, according to action type query metadata table,
Obtain the title of corresponding secondary index table;Look into the secondary index table, specific query process are as follows:
According to the secondary index table row key format in step (111) it is found that according to selection inquiry in condition value directly position to
First qualified data, continues to scan on, until one ineligible data of discovery;Scanned is qualified
Data composition meets the set of the ROWKEY of the querying condition of current data column;
If M is equal to the set that 1, ROWKEY set is the ROWKEY of data to be checked;
If M is greater than 1, according to the logical relation in inquiry business in M data column, phase is done to the ROWKEY set of different lines
The set operation answered: logical AND corresponds to the operation of intersection of sets collection, logic or corresponding union operation, operation the result is that number to be checked
According to ROWKEY set;
Step (312): for the querying method of the secondary index table of connection Query:
For N number of data column that connection Query business is related to, corresponding second level rope is obtained according to action type query metadata table
Draw the title of table;Inquire the secondary index table, specific query process are as follows:
According to the secondary index table row key format in step (112) it is found that N number of data of value having the same are listed in secondary index
Corresponding entry continuous arrangement in table;
If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY of N number of entry constitutes one
A N tuple<R1, R2 for meeting querying condition ..., RN>;
Entire secondary index table is scanned, then obtains the set {<R1, R2 ..., RN>} of all N tuples for meeting condition, then gathers
{<R1, R2 ..., RN>be exactly data to be checked ROWKEY set.
7. the electric power big data querying method based on HBase secondary index as claimed in claim 6, characterized in that the step
Suddenly the step of (32) are as follows:
It is provided using the set of the ROWKEY of the data to be checked obtained in step (311) and step (312) by HBase
Get interface method obtains corresponding data value in tables of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610980816.4A CN106503243B (en) | 2016-11-08 | 2016-11-08 | Electric power big data querying method based on HBase secondary index |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610980816.4A CN106503243B (en) | 2016-11-08 | 2016-11-08 | Electric power big data querying method based on HBase secondary index |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106503243A CN106503243A (en) | 2017-03-15 |
CN106503243B true CN106503243B (en) | 2019-08-06 |
Family
ID=58323974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610980816.4A Active CN106503243B (en) | 2016-11-08 | 2016-11-08 | Electric power big data querying method based on HBase secondary index |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503243B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241724A (en) * | 2017-05-11 | 2018-07-03 | 新华三大数据技术有限公司 | A kind of metadata management method and device |
CN107341198B (en) * | 2017-06-16 | 2020-05-12 | 云南电网有限责任公司信息中心 | Electric power mass data storage and query method based on theme instance |
CN107506464A (en) * | 2017-08-30 | 2017-12-22 | 武汉烽火众智数字技术有限责任公司 | A kind of method that HBase secondary indexs are realized based on ES |
CN108398641B (en) * | 2017-11-30 | 2021-03-09 | 深圳市科列技术股份有限公司 | Battery data processing method and battery data server |
CN108319665B (en) * | 2018-01-18 | 2022-04-19 | 努比亚技术有限公司 | Hbase column value searching method, terminal and storage medium |
CN109063186A (en) * | 2018-08-27 | 2018-12-21 | 郑州云海信息技术有限公司 | A kind of General query method and relevant apparatus |
CN109299102B (en) * | 2018-10-23 | 2020-11-13 | 中国电子科技集团公司第二十八研究所 | HBase secondary index system and method based on Elastcissearch |
CN109800222B (en) * | 2018-12-11 | 2021-06-01 | 中国科学院信息工程研究所 | HBase secondary index self-adaptive optimization method and system |
CN110502524B (en) * | 2019-08-15 | 2022-06-10 | 济南浪潮数据技术有限公司 | Phoenix index data asynchronous updating method and device |
CN114372064B (en) * | 2022-03-22 | 2022-07-12 | 飞狐信息技术(天津)有限公司 | Data processing apparatus, method, computer readable medium and processor |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112013A (en) * | 2014-07-17 | 2014-10-22 | 浪潮(北京)电子信息产业有限公司 | HBase secondary indexing method and device |
CN104217011A (en) * | 2014-09-19 | 2014-12-17 | 浪潮(北京)电子信息产业有限公司 | Method and device for inquiring HBase secondary index table |
-
2016
- 2016-11-08 CN CN201610980816.4A patent/CN106503243B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112013A (en) * | 2014-07-17 | 2014-10-22 | 浪潮(北京)电子信息产业有限公司 | HBase secondary indexing method and device |
CN104217011A (en) * | 2014-09-19 | 2014-12-17 | 浪潮(北京)电子信息产业有限公司 | Method and device for inquiring HBase secondary index table |
Also Published As
Publication number | Publication date |
---|---|
CN106503243A (en) | 2017-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503243B (en) | Electric power big data querying method based on HBase secondary index | |
CN106528773B (en) | Map computing system and method based on Spark platform supporting spatial data management | |
CN104881424B (en) | A kind of acquisition of electric power big data, storage and analysis method based on regular expression | |
CN109582667A (en) | A kind of multiple database mixing storage method and system based on power regulation big data | |
CN103714134B (en) | Network flow data index method and system | |
CN110633186A (en) | Log monitoring system for electric power metering micro-service architecture and implementation method | |
Tran et al. | Managing structured and semistructured RDF data using structure indexes | |
CN107506464A (en) | A kind of method that HBase secondary indexs are realized based on ES | |
CN102332030A (en) | Data storing, managing and inquiring method and system for distributed key-value storage system | |
Zhu et al. | Distributed skyline retrieval with low bandwidth consumption | |
CN104484472A (en) | Database cluster for mixing various heterogeneous data sources and implementation method | |
De Virgilio et al. | A similarity measure for approximate querying over RDF data | |
CN104700190A (en) | Method and device for matching item and professionals | |
Yun et al. | Research on intelligent fault diagnosis of power acquisition based on knowledge graph | |
CN106599190A (en) | Dynamic Skyline query method based on cloud computing | |
CN113706333A (en) | Method and system for automatically generating topology island of power distribution network | |
CN107491463A (en) | The optimization method and system of data query | |
Shangguan et al. | Big spatial data processing with Apache Spark | |
CN103377236B (en) | A kind of Connection inquiring method and system for distributed data base | |
Chen et al. | Multi-source and heterogeneous data integration model for big data analytics in power DCS | |
Chen et al. | An optimized distributed OLAP system for big data | |
CN109189873A (en) | A kind of Meteorological Services big data monitoring analysis system platform | |
CN110399337B (en) | File automation service method and system based on data driving | |
Wang et al. | Smart grid time series big data processing system | |
Ma et al. | Multi-sourced data storage and index construction for equipment condition assessment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |