CN113031878B - HBase-based data storage optimization method and system - Google Patents

HBase-based data storage optimization method and system Download PDF

Info

Publication number
CN113031878B
CN113031878B CN202110549557.0A CN202110549557A CN113031878B CN 113031878 B CN113031878 B CN 113031878B CN 202110549557 A CN202110549557 A CN 202110549557A CN 113031878 B CN113031878 B CN 113031878B
Authority
CN
China
Prior art keywords
data
operation characteristic
abnormal
value
power system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110549557.0A
Other languages
Chinese (zh)
Other versions
CN113031878A (en
Inventor
宋成平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruizhi Technology Group Co ltd
Original Assignee
Ruizhi Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruizhi Technology Group Co ltd filed Critical Ruizhi Technology Group Co ltd
Priority to CN202110549557.0A priority Critical patent/CN113031878B/en
Publication of CN113031878A publication Critical patent/CN113031878A/en
Application granted granted Critical
Publication of CN113031878B publication Critical patent/CN113031878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a data storage optimization method and a data storage optimization system based on HBase, wherein the method comprises the following steps: collecting system operation characteristic data and electric power data acquisition device operation characteristic data; calculating an abnormal risk evaluation value of the collected data according to the system operation characteristic data and the operation characteristic data of the electric power data collecting device; when the acquired data abnormal risk assessment value is lower than a preset threshold value, acquiring electric power data, otherwise, forbidding acquisition of the electric power data; uploading the collected power data to an HBase open source database; analyzing data to be stored according to the characteristics of the demand data to obtain demand field data and missing field data; storing the required field data to a middle station in a columnar storage mode; and supplementing the data stored in the middle according to the missing field data. The method and the device finish flexible storage of the service data on the premise of ensuring safety and reliability of the data, and reduce occupation of storage space resources.

Description

HBase-based data storage optimization method and system
Technical Field
The application relates to the technical field of data processing, in particular to a data storage optimization method and system based on HBase.
Background
Business data in the electric power data center station are continuously accumulated along with the development of power grid business, a normalized data monitoring and data analysis business system is formed, data collection, storage and exchange are carried out in an off-line, quasi-real-time and real-time mode from each business system according to business data requirements by means of calculation indexes such as detail load data of equipment and operation efficiency in the practical operation process of the business, and normalized operation of monitoring business such as operation performance, management efficiency and operation efficiency is realized through a three-level transverse cooperation and longitudinal through operation control system of headquarters, provinces and cities, and related decision support is provided for development planning, power grid operation and the like.
In the process, a large amount of business data and non-business data are written into a data center relevant database, and data are extracted and stored in a mode of full access and periodic extraction so as to meet basic requirements of business system data access and subsequent data analysis.
At present, business data in a power data center station are extracted and stored into a related database in a mode of full access and periodic extraction. In the process, due to the fact that the provincial side data cleaning rule and the data quality check are not standard, the problem that data are missing in the access data is caused, for example, a large amount of field data in a business data table is missing. The data tables waiting for modification and supplement are accumulated in the database before being fed back to the provincial side, so that a large amount of storage space of the database is occupied, the query efficiency of data is reduced, data filling cannot be performed in real time, and table-level data updating can be performed only after the provincial side is modified, which is obviously not beneficial to the rapid conversion of the middlebox service data.
As shown in fig. 1, there is a missing part of the data, and due to the conventional database storage manner, the missing field still occupies the storage space of the relevant data, which results in a large amount of storage space in the data being occupied.
In addition, when the method is used for business data, the source table is generally a wide table, and has a large number of data fields, and the number of data fields required in the corresponding business analysis process is small, so that a large number of field data queries and modifications are involved in the table data updating or analyzing process, and a large small table is generated, which also reduces the efficiency of business data storage analysis.
Disclosure of Invention
The method realizes the optimization of the storage process of the service data in the power data, completes the flexible storage of the service data on the premise of ensuring the safety, reliability and consistency of the data, and reduces the occupation of the missing data on the storage space resources.
In order to achieve the above object, the present application provides an HBase-based data storage optimization method, which includes the following steps:
pre-constructing a power data acquisition risk assessment model;
judging whether to allow the collection of the electric power data or not according to the risk assessment model, if so, collecting the electric power data, and if not, forbidding the collection of the electric power data;
uploading the acquired power data serving as data to be stored to an HBase starting database;
analyzing data to be stored according to the characteristics of the demand data, acquiring demand field data and missing field data, and feeding back the missing field data;
storing the required field data to a middle station in a columnar storage mode;
supplementing data stored in the central station in a column dynamic expansion mode according to the missing field data;
the method for judging whether to allow the collection of the power data according to the risk assessment model comprises the following steps:
collecting system operation characteristic data and electric power data acquisition device operation characteristic data, and inputting the data into a risk assessment model;
the risk assessment model calculates an abnormal risk assessment value of the collected data according to the system operation characteristic data and the operation characteristic data of the electric power data collection device;
and when the acquired data abnormal risk assessment value is lower than a preset threshold value, acquiring the electric power data, otherwise, forbidding to acquire the electric power data.
As described above, in the process of uploading the power data, according to the preset verification rule, the data quality analysis verification is performed to obtain the null data amount of the corresponding field in the uploaded data.
As above, wherein the data to be stored has a plurality of attribute components, the attribute components include: row keys, time stamps, column clusters, and column qualifiers.
As above, the method for storing the requirement field data to the central station in a columnar storage manner includes: and setting columns corresponding to the required service fields, and storing the required field data to the middle station according to the columns corresponding to the required service fields in a column type storage mode.
The above, wherein the requirement field data is stored in a columnar storage manner in a service data table of the central station, the service data table has rows and columns, each row represents a data object, each row includes a row key, and one or more columns.
As above, the method for acquiring the system operation characteristic data includes: gather the electric power system operation characteristic data in a period of time, electric power system operation characteristic data includes: voltage, current, frequency offset value, oscillation, and scheduling load.
As above, wherein, the power data collection device operation characteristic data includes: operating current, operating voltage and operating frequency.
As above, wherein, according to the system operation characteristic data and the power data acquisition device operation characteristic data, calculating the acquired data abnormal risk assessment value includes the following sub-steps:
calculating abnormal values of the power system according to the operation characteristic data of the power system;
calculating abnormal values of the electric power data acquisition device according to the operation characteristic data of the electric power data acquisition device;
and calculating an abnormal risk evaluation value of the acquired data according to the abnormal value of the power system and the abnormal value of the power data acquisition device.
As above, the calculation formula of the acquired data abnormal risk assessment value is as follows:
Figure 596061DEST_PATH_IMAGE001
;
wherein, FcRepresenting the collected data abnormal risk assessment value, and K1 representing the influence weight of the power system abnormal value on the collected data abnormal risk assessment value; k2 represents the influence weight of the abnormal value of the electric power data acquisition device on the acquired data abnormal risk assessment value; hdIndicating abnormal value of power system, HcAn abnormal value indicating the power data collection apparatus.
The present application further provides an HBase-based data storage optimization system, which includes:
the abnormal characteristic data acquisition device is used for acquiring system operation characteristic data and electric power data acquisition device operation characteristic data;
the data processor is used for calculating an abnormal risk evaluation value of the acquired data according to the system operation characteristic data and the power data acquisition device operation characteristic data;
the electric power data acquisition device is used for acquiring the electric power data when the acquired data abnormal risk assessment value is lower than a preset threshold value, and otherwise, forbidding acquisition of the electric power data;
the data transmission module is used for uploading the acquired power data serving as data to be stored to the HBase starting database;
the acquisition module is used for analyzing the data to be stored according to the characteristics of the demand data, acquiring the demand field data and the missing field data, and feeding back the missing field data;
the data storage module is used for storing the required field data to the middle station in a column type storage mode;
and the data supplement module is used for supplementing the data stored in the middle platform in a column dynamic expansion mode according to the missing field data.
The beneficial effect that this application realized is as follows:
(1) the method and the device realize optimization of the storage process of the service data of the electric power data, complete flexible storage of the service data on the premise of ensuring safety, reliability and consistency of the data, and reduce occupation of missing data on storage space resources.
(2) According to the method, the line storage of the business data is changed into the column storage by introducing related components of HBase (open source database), the data is stored by setting column families and dynamically expanding columns, for a large number of missing data fields in a business data table, compression on the data storage space of the missing field data is realized by the characteristic of column cluster type storage, and the occupation of the storage space is reduced. In addition, the efficiency of data query can be improved.
(3) The application evaluates the abnormal condition of the electric power system and the abnormal risk of the electric power data acquisition device, and then judges whether the electric power data can be acquired, guarantees that the electric power system and the electric power data acquisition device acquire data under the normal condition, and improves the reliability and accuracy of the acquired data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic diagram of a conventional database storage method.
Fig. 2 is a schematic diagram of an actual physical storage structure of a middle station according to an embodiment of the present application.
Fig. 3 is a flowchart of a data storage optimization method based on HBase according to an embodiment of the present application.
Fig. 4 is a flowchart of a method for calculating an abnormal risk assessment value of collected data according to system operation characteristic data and power data collection device operation characteristic data according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a data storage optimization system based on HBase according to an embodiment of the present application.
Fig. 6 is a flowchart of a data storage optimization method based on HBase according to an embodiment of the present application.
Reference numerals: 10-an abnormal characteristic data acquisition device; 20-a data processor; 30-a power data acquisition device; 40-a data transmission module; 50-an acquisition module; 60-a data storage module; 70-a data supplementation module; 100-data storage optimization system.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
As shown in fig. 3 and 6, the present application provides an HBase-based data storage optimization method, which includes the following steps:
and step S1, constructing a power data acquisition risk assessment model in advance.
The method for constructing the electric power data acquisition risk assessment model in advance comprises the following steps:
and constructing a collected data abnormal risk assessment value calculation module for calculating an abnormal value of the power system, a collected data abnormal risk assessment value and an abnormal value of the power data collection device.
And the construction judgment module is used for judging whether the acquired data abnormal risk assessment value is lower than a preset threshold value.
Step S2, judging whether to allow the collection of the electric power data according to the risk assessment model, if so, collecting the electric power data, otherwise, forbidding to collect the electric power data;
the method for judging whether to allow the collection of the power data according to the risk assessment model comprises the following steps:
and step S210, collecting system operation characteristic data and electric power data collection device operation characteristic data, and inputting a risk assessment model.
Specifically, the method comprises the steps of collecting operation characteristic data of the power system and operation characteristic data of the power data collecting device.
Specifically, gather the electric power system operation characteristic data in a period of time, electric power system operation characteristic data includes: voltage, current, frequency offset, oscillation, scheduling load, etc.
The operation characteristic data of the electric power data acquisition device comprises: operating current, operating voltage, operating frequency, etc.
Step S220, the risk assessment model calculates an abnormal risk assessment value of the collected data according to the system operation characteristic data and the power data collection device operation characteristic data.
As shown in fig. 4, step S220 includes the following sub-steps:
step S221, calculating abnormal values of the power system according to the power system operation characteristic data.
Specifically, the calculation formula of the abnormal value of the power system is as follows:
Figure 763082DEST_PATH_IMAGE002
wherein HdIndicating an electric power system abnormal value; t2 represents the total time period for collecting the operation characteristic data of the power system; n represents the total number of categories of the power system operation characteristic data; qjData pair for representing operation characteristics of power system in jth categoryInfluence weight of abnormal values of the power system; dbjStandard values representing operation characteristic data of the power system in the jth category; ds is a group ofjtRepresenting the actual measured value of the operation characteristic data of the power system in the jth category at the tth moment; t1 represents the duration of the frequency fluctuation; aftRepresenting a frequency deviation value of the power system at the time t; yftIndicating the allowed frequency deviation value of the power system.
Step S222, calculating abnormal values of the electric power data acquisition device according to the operation characteristic data of the electric power data acquisition device.
Specifically, the calculation formula of the abnormal value of the power data acquisition device is as follows:
Figure 423870DEST_PATH_IMAGE003
wherein HcRepresenting an abnormal value of the power data acquisition device; j. the design is a squareiRepresenting a deviation factor for collecting the operation characteristic data of the ith category; m represents the total category number of the collected operation characteristic data; q. q.siRepresenting the weight of the ith category of operation characteristic data in the abnormal value of the power data acquisition device; sdiRepresenting actual measured values which are larger than standard values in the ith category of operation characteristic data; sxiRepresenting actual measured values which are smaller than standard values in the ith category of operation characteristic data; wbiStandard values representing the i-th category of operating characteristic data; e = 2.718.
Step S223, calculating an abnormal risk assessment value of the collected data according to the abnormal value of the power system and the abnormal value of the power data collection device.
Specifically, the calculation formula of the acquired data abnormal risk assessment value is as follows:
Figure 238243DEST_PATH_IMAGE004
;
wherein, FcRepresenting the collected data abnormal risk assessment value, and K1 representing the influence weight of the power system abnormal value on the collected data abnormal risk assessment value; k2 represents an abnormal value of the power data collection deviceInfluence weight on the collected data abnormal risk assessment value; hdIndicating abnormal value of power system, HcAn abnormal value indicating the power data collection apparatus.
There are many reasons for the abnormality in the collected power data, such as: the power system is unusual and the collection system is unusual, wherein the power system is unusual includes: no voltage/current, unstable current, unbalanced current and other conditions exist; the abnormality of the data acquisition device can cause more false alarm data. Therefore, the abnormal condition of the power system and the abnormal risk of the power data acquisition device need to be evaluated, and then whether the power data can be acquired or not is judged, so that the data acquisition of the power system and the power data acquisition device under the normal condition is ensured, and the reliability and the accuracy of the acquired data are improved.
And step S230, when the acquired data abnormal risk assessment value is lower than a preset threshold value, acquiring the electric power data, otherwise, forbidding to acquire the electric power data.
And step S3, uploading the collected power data serving as data to be stored to an HBase starting database.
Specifically, when the collected data abnormal risk assessment value is lower than a preset threshold value, the power data is collected, and the collected power data (for example, provincial side power data) is uploaded to an open source database. In the process of uploading the electric power data, according to a preset checking rule, data quality analysis checking is carried out so as to obtain the null data volume of the corresponding field in the uploaded data.
The method and the device for collecting provincial side power data are used as a specific embodiment of the invention to upload the collected provincial side power data to an open source database.
An HBase (HBase is a distributed and column-oriented open source database) is a database system which is established on an HDFS (distributed file system), provides high reliability, high performance, column storage, scalability, and real-time reading and writing of NoSQL (non-relational database), adopts column cluster-oriented storage and authority control in design, carries out independent retrieval through a column cluster, and can be designed to be very sparse because a null column does not occupy a storage space due to specific design. HBase is capable of running on its database in real time, rather than running the MapReduce (a programming model for parallel operations on large-scale datasets) task.
HBase is partitioned into tables, which are further partitioned into clusters of columns. A column cluster must be defined using a schema (collection of database objects, so-called database objects, i.e. tables, indexes, views, stored procedures, etc.), which groups together columns of a certain type (columns do not require schema definition). For example, a "message" column cluster may contain: "to", "from", "date", "subject" and "body". And each Key-Value pair is defined as a Cell in the HBase, and each Key consists of a RowKey, a column cluster, a column and a timestamp. In HBase, a row is a set of Key-Value mappings that are uniquely identified by RowKey. Since HBase utilizes the infrastructure of Hadoop, it can be horizontally expanded using general purpose equipment.
Based on the storage mode of HBase, service data is stored in a table, and each service data table has rows and columns and is a multidimensional mapping structure. Inside the surface, each row represents a data object. Each row is composed of a row key (RowKey) and one or more columns, and the row key is the unique identification of the row, so that single-row data can be acquired through a single row key; the data is acquired by accessing multiple rows of data in a given interval through an interval of a row key and scanning the whole table.
As a specific embodiment of the present invention, the data to be stored has a plurality of attribute components, and the attribute components include: row keys, time stamps, column clusters, and column qualifiers.
Wherein the row key, column family and column qualifier in the attribute component together identify a cell, and the data stored in the cell is referred to as cell data and is stored in binary bytes. Columns are jointly identified by a column family and a column qualifier. All columns in the table need to be organized within a column family. Once determined, the column family cannot be easily modified because it affects the real physical storage structure of HBase, but the column qualifiers and their corresponding values in the column family can be dynamically added and deleted.
And step S4, analyzing the data to be stored according to the characteristics of the required data, acquiring required field data and missing field data, and feeding back the missing field data.
Specifically, according to the attribute component characteristics of the demand data, the collected power data is analyzed to obtain demand field data and missing field data, and the missing field data is fed back, and the method for obtaining the missing field data is as follows: comparing the consistency of the attribute components of the acquired data field and the service demand field, and if the attribute components of the acquired data field and the service demand field are inconsistent, missing field data exists; if the field data is consistent with the field data, the field data is the required field data.
And step S5, storing the required field data to the middle station according to a column type storage mode.
Specifically, a column corresponding to each required service field is set, and the required field data is stored to the middle station in a column storage mode according to the column corresponding to each required service field.
As shown in fig. 2, which is a schematic diagram of an actual physical storage structure of the middlebox, the null value does not exist in the storage method of fig. 2, which is due to the characteristic of column storage, i.e., the continuity of the storage columns. This shows that, by using such a storage method, when the uploaded service data table has a large amount of null data or non-current service demand data, the problem that a large amount of irrelevant data occupies a storage space can be avoided. When data is updated or supplemented, the table data can be updated by updating the data of the column family corresponding to the RowKey, so that the cost of updating the full-table data is reduced.
And step S6, supplementing the data stored in the middle platform by means of column dynamic expansion according to the missing field data.
Based on HBase, column qualifiers and their corresponding values in the column family of the stage data are dynamically appended according to the missing field data.
Example two
As shown in fig. 5, an HBase-based data storage optimization system 100 includes:
the abnormal characteristic data acquisition device 10 is used for acquiring system operation characteristic data and electric power data acquisition device operation characteristic data;
the data processor 20 is used for calculating an abnormal risk evaluation value of the acquired data according to the system operation characteristic data and the power data acquisition device operation characteristic data;
the electric power data acquisition device 30 is used for acquiring the electric power data when the acquired data abnormal risk assessment value is lower than a preset threshold value, and otherwise, forbidding acquiring the electric power data;
the data transmission module 40 is used for uploading the acquired power data serving as data to be stored to an HBase starting database;
the obtaining module 50 is configured to analyze data to be stored according to characteristics of the demand data, obtain demand field data and missing field data therein, and feed back the missing field data;
the data storage module 60 is used for storing the required field data to the middle station in a column type storage mode;
and a data supplementing module 70, configured to supplement the data stored in the middle by means of column dynamic expansion according to the missing field data.
The beneficial effect that this application realized is as follows:
(1) the method and the device realize optimization of the storage process of the service data of the electric power data, complete flexible storage of the service data on the premise of ensuring safety, reliability and consistency of the data, and reduce occupation of missing data on storage space resources.
(2) According to the method, the line storage of the business data is changed into the column storage by introducing related components of HBase (open source database), the data is stored by setting column families and dynamically expanding columns, for a large number of missing data fields in a business data table, compression on the data storage space of the missing field data is realized by the characteristic of column cluster type storage, and the occupation of the storage space is reduced. In addition, the efficiency of data query can be improved.
(3) The application evaluates the abnormal condition of the electric power system and the abnormal risk of the electric power data acquisition device, and then judges whether the electric power data can be acquired, guarantees that the electric power system and the electric power data acquisition device acquire data under the normal condition, and improves the reliability and accuracy of the acquired data.
The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (9)

1. A data storage optimization method based on HBase is characterized by comprising the following steps:
pre-constructing a power data acquisition risk assessment model;
judging whether to allow the collection of the electric power data or not according to the risk assessment model, if so, collecting the electric power data, and if not, forbidding the collection of the electric power data;
uploading the acquired power data serving as data to be stored to an HBase starting database;
analyzing data to be stored according to the characteristics of the demand data, acquiring demand field data and missing field data in the data, and feeding back the missing field data;
storing the required field data to a middle station in a columnar storage mode;
supplementing data stored in the central station in a column dynamic expansion mode according to the missing field data;
the method for judging whether to allow the collection of the power data according to the risk assessment model comprises the following steps:
collecting system operation characteristic data and electric power data acquisition device operation characteristic data, and inputting the data into a risk assessment model;
the risk assessment model calculates an abnormal risk assessment value of the collected data according to the system operation characteristic data and the operation characteristic data of the electric power data collection device;
when the acquired data abnormal risk assessment value is lower than a preset threshold value, acquiring electric power data, otherwise, forbidding acquisition of the electric power data;
the method for calculating the abnormal risk assessment value of the acquired data comprises the following substeps:
calculating abnormal values of the power system according to the operation characteristic data of the power system;
calculating abnormal values of the electric power data acquisition device according to the operation characteristic data of the electric power data acquisition device;
calculating an abnormal risk evaluation value of the acquired data according to the abnormal value of the power system and the abnormal value of the power data acquisition device;
the calculation formula of the abnormal value of the power system is as follows:
Figure DEST_PATH_IMAGE001
wherein HdIndicating an electric power system abnormal value; t2 represents the total time period for collecting the operation characteristic data of the power system; n represents the total number of categories of the power system operation characteristic data; qjRepresenting the influence weight of the operation characteristic data of the power system in the jth category on the abnormal value of the power system; dbjStandard values representing operation characteristic data of the power system in the jth category; ds is a group ofjtRepresenting the actual measured value of the operation characteristic data of the power system in the jth category at the tth moment; t1 represents the duration of the frequency fluctuation; aftRepresenting a frequency deviation value of the power system at the time t; yftIndicating the allowed frequency deviation value of the power system.
2. The HBase-based data storage optimization method according to claim 1, wherein in the uploading process of the power data, according to preset checking rules, data quality analysis checking is performed to obtain null data volume of corresponding fields in the uploaded data.
3. The HBase-based data storage optimization method according to claim 1, wherein the data to be stored has a plurality of attribute components, the attribute components including: row keys, time stamps, column clusters, and column qualifiers.
4. The HBase-based data storage optimization method according to claim 1 or 3, wherein the method for storing the required field data to the middle station in a columnar storage mode comprises the following steps: and setting columns corresponding to the required service fields, and storing the required field data to the middle station according to the columns corresponding to the required service fields in a column type storage mode.
5. The HBase-based data storage optimization method according to claim 1, wherein the requirement field data is stored in a columnar storage manner in a service data table of the middle station, the service data table has rows and columns, each row represents a data object, each row includes a row key, and one or more columns.
6. The HBase-based data storage optimization method according to claim 1, wherein the method for collecting the system operation characteristic data comprises the following steps: gather the electric power system operation characteristic data in a period of time, electric power system operation characteristic data includes: voltage, current, frequency offset value, oscillation, and scheduling load.
7. The HBase-based data storage optimization method according to claim 6, wherein the operation characteristic data of the power data acquisition device comprises: operating current, operating voltage and operating frequency.
8. The HBase-based data storage optimization method according to claim 7, wherein the calculation formula of the collected data abnormal risk assessment value is as follows:
Figure 331601DEST_PATH_IMAGE002
;
wherein, FcRepresenting the collected data abnormal risk assessment value, and K1 representing the influence weight of the power system abnormal value on the collected data abnormal risk assessment value; k2 represents the influence weight of the abnormal value of the electric power data acquisition device on the acquired data abnormal risk assessment value; hdIndicating abnormal value of power system, HcAn abnormal value indicating the power data collection apparatus.
9. An HBase-based data storage optimization system, the system comprising:
the abnormal characteristic data acquisition device is used for acquiring system operation characteristic data and electric power data acquisition device operation characteristic data;
the data processor is used for calculating an abnormal risk evaluation value of the acquired data according to the system operation characteristic data and the power data acquisition device operation characteristic data;
the electric power data acquisition device is used for acquiring the electric power data when the acquired data abnormal risk assessment value is lower than a preset threshold value, and otherwise, forbidding acquisition of the electric power data;
the data transmission module is used for uploading the acquired power data serving as data to be stored to the HBase starting database;
the acquisition module is used for analyzing the data to be stored according to the characteristics of the demand data, acquiring the demand field data and the missing field data, and feeding back the missing field data;
the data storage module is used for storing the required field data to the middle station in a column type storage mode;
the data supplement module is used for supplementing the data stored in the central station in a column dynamic expansion mode according to the missing field data;
the method for calculating the abnormal risk assessment value of the acquired data comprises the following substeps:
calculating abnormal values of the power system according to the operation characteristic data of the power system;
calculating abnormal values of the electric power data acquisition device according to the operation characteristic data of the electric power data acquisition device;
calculating an abnormal risk evaluation value of the acquired data according to the abnormal value of the power system and the abnormal value of the power data acquisition device;
the calculation formula of the abnormal value of the power system is as follows:
Figure 203742DEST_PATH_IMAGE003
wherein HdIndicating an electric power system abnormal value; t2 represents the total time period for collecting the operation characteristic data of the power system; n represents the total number of categories of the power system operation characteristic data; qjRepresenting the influence weight of the operation characteristic data of the power system in the jth category on the abnormal value of the power system; dbjStandard values representing operation characteristic data of the power system in the jth category; ds is a group ofjtRepresenting the actual measured value of the operation characteristic data of the power system in the jth category at the tth moment; t1 represents the duration of the frequency fluctuation; aftRepresenting a frequency deviation value of the power system at the time t; yftIndicating the allowed frequency deviation value of the power system.
CN202110549557.0A 2021-05-20 2021-05-20 HBase-based data storage optimization method and system Active CN113031878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110549557.0A CN113031878B (en) 2021-05-20 2021-05-20 HBase-based data storage optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110549557.0A CN113031878B (en) 2021-05-20 2021-05-20 HBase-based data storage optimization method and system

Publications (2)

Publication Number Publication Date
CN113031878A CN113031878A (en) 2021-06-25
CN113031878B true CN113031878B (en) 2021-08-06

Family

ID=76455443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110549557.0A Active CN113031878B (en) 2021-05-20 2021-05-20 HBase-based data storage optimization method and system

Country Status (1)

Country Link
CN (1) CN113031878B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN109145040A (en) * 2018-06-28 2019-01-04 中译语通科技股份有限公司 A kind of data administering method based on double message queues
CN110555021A (en) * 2018-03-26 2019-12-10 深圳先进技术研究院 Data storage method, query method and related device
CN112819041A (en) * 2021-01-14 2021-05-18 吴娟 Data processing method and system based on electric power big data platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256800A (en) * 2002-03-20 2008-09-03 松下电器产业株式会社 Information recording medium, recording apparatus, reproduction apparatus, recording method and reproduction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN110555021A (en) * 2018-03-26 2019-12-10 深圳先进技术研究院 Data storage method, query method and related device
CN109145040A (en) * 2018-06-28 2019-01-04 中译语通科技股份有限公司 A kind of data administering method based on double message queues
CN112819041A (en) * 2021-01-14 2021-05-18 吴娟 Data processing method and system based on electric power big data platform

Also Published As

Publication number Publication date
CN113031878A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN111444236B (en) Mobile terminal user portrait construction method and system based on big data
CN109669934B (en) Data warehouse system suitable for electric power customer service and construction method thereof
EP2263180B1 (en) Indexing large-scale gps tracks
US9135280B2 (en) Grouping interdependent fields
CN111984499A (en) Fault detection method and device for big data cluster
CN111552813A (en) Power knowledge graph construction method based on power grid full-service data
CN106777093A (en) Skyline inquiry systems based on space time series data stream application
US9189489B1 (en) Inverse distribution function operations in a parallel relational database
CN112800115B (en) Data processing method and data processing device
CN111209274A (en) Data quality checking method, system, equipment and readable storage medium
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN106294805A (en) Data processing method and device
CN115858168A (en) Earth application model arrangement system and method based on importance ranking
CN115344207A (en) Data processing method and device, electronic equipment and storage medium
CN113031878B (en) HBase-based data storage optimization method and system
CN111414355A (en) Offshore wind farm data monitoring and storing system, method and device
CN115345163A (en) Outfield quality analysis method and system based on fault data
CN112306421B (en) Method and system for storing MDF file in analysis and measurement data format
CN115658682A (en) Data storage method, data storage device, computer storage medium and computer program product
CN114358812A (en) Multi-dimensional power marketing analysis method and system based on operation and maintenance big data
CN111782596B (en) Radio monitoring data processing method based on high-performance hybrid computing
CN117787572B (en) Abnormal electricity utilization user identification method and device, storage medium and electronic equipment
CN109766254B (en) IT system operation and maintenance monitoring data auxiliary preprocessing method and system
CN115858895B (en) Multi-source heterogeneous data processing method and system for smart city
CN118051798A (en) Data center management method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant