CN114328533A

CN114328533A - Metadata unified management method, system, medium, device, and program

Info

Publication number: CN114328533A
Application number: CN202111642439.0A
Authority: CN
Inventors: 邹普
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-12

Abstract

Provided are a metadata unified management method, system, medium, device, and program. In the application, in the process of data import operation to the target Hbase, any one of the multiple data paths of the metadata unified management system firstly calls the metadata management center, obtains a metadata model corresponding to the target Hbase in the metadata management center, and then processes the data subjected to the data import operation according to the metadata model so as to write the processed data into the target Hbase. Therefore, in the data import process, the metadata model is managed in a unified manner in the metadata management center, and when any data source channel imports data, the metadata model is used as the standard, so that the technical problem that the data cannot be aligned in the synchronization of multiple data channels is effectively solved.

Description

Metadata unified management method, system, medium, device, and program

Technical Field

The present application relates to the field of financial technology (Fintech), and in particular, to a method, system, medium, device, and program for unified metadata management.

Background

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (fintech), and the interface testing technology is no exception, but due to the requirements of security and real-time performance of the financial industry, higher requirements are also put forward on the data management technology.

At present, with the advent of cloud computing and large-scale data era, various types of internet applications emerge endlessly, and particularly, with the rapid development of Social networks represented by Social Networking Services (SNS), the conventional relational database has been increasingly unable to meet the requirements of cloud computing and large data on processing of mass data, high concurrent read-write speed, expandability, high availability, and the like. In order to meet the requirements on processing mass data, high concurrent read-write speed, expandability, high availability and the like, a non-relational distributed storage system is developed. The HBase is a highly reliable, high-performance, nematic and scalable distributed storage system, and a large-scale structured storage cluster can be built on a cheap hardware environment by utilizing the HBase technology.

However, the data source for the HBase is usually multi-channel, and easily causes confusion of metadata of multiple data sources.

Disclosure of Invention

The embodiment of the application provides a metadata unified management method, a metadata unified management system, a metadata unified management medium, a metadata unified management device and a metadata unified management program, so as to solve the technical problem of metadata confusion of multiple data sources caused by multi-channel data sources of HBase.

In a first aspect, an embodiment of the present application provides a metadata unified management method, which is applied to a metadata unified management system, where the metadata unified management system includes: hbase cluster, metadata management center and a plurality of data paths, wherein each data path is realized by different codes, and the method comprises the following steps:

calling the metadata management center in the process of data import operation to a target Hbase in any data path, wherein the target Hbase is a database for carrying out the data import operation in the Hbase cluster;

acquiring a metadata model corresponding to the target Hbase in the metadata management center;

and processing the data subjected to the data import operation according to the metadata model so as to write the processed data into the target Hbase.

In one possible design, the plurality of data lanes include: a first data path, a second data path, and a third data path;

the first data path and the third data path are used for inputting real-time data, and the second data path is used for inputting data before a preset time, so that the data input through the second data path supplements the data input through the first data path or the third data path.

In one possible design, processing the data subjected to the data import operation according to the metadata model to write the processed data to the target Hbase includes:

if the data path for carrying out the data import operation is the first data path, after the core online system generates transaction data, the core online system sends the transaction data to a preset stream processing platform through an asynchronous task;

consuming the transaction data in the preset stream processing platform through a preset data source extraction tool, and writing the transaction data into the target Hbase according to the metadata model; or,

if the data access for carrying out the data import operation is the second data access, generating transaction data to an online library by a core online system, and extracting the transaction data to a preset data warehouse tool in a drawing mode through a preset data transmission tool;

cleaning and processing the transaction data through the data warehouse tool, and reading the metadata model for data assembly;

writing the assembled transaction data into the target Hbase through a preset data import tool; or,

if the data path for carrying out the data import operation is the third data path, calling a preset remote calling tool after the core online system generates transaction data;

and writing the transaction data into the target Hbase through the preset remote calling tool according to the metadata model.

In one possible design, after the processing the data subjected to the data import operation according to the metadata model, the method further includes:

and performing consistency check on the column definitions corresponding to the data subjected to the data import operation in each data path according to a preset information abstract algorithm, and writing the processed data into the target Hbase after confirming that the columns of the data written in each data path have the same sequence.

In a possible design, the method for unified management of metadata further includes:

if the metadata model in the metadata management center changes, the metadata management center sends a metadata model change notification to each data source;

wherein a metadata model in the metadata management center includes: a Hbase row-bond rule comprising: hbase column cluster, Hbase column structure and Hbase column check field; the Hbase column cluster is used for constraining the column cluster in the HBase, the Hbase column structure is used for constraining the column structure in the HBase, and the Hbase column check field is used for checking the column in the HBase.

In one possible design, after the processing the data subjected to the data import operation according to the metadata model to write the processed data to the target Hbase, the method further includes:

reading metadata from a metadata center of the metadata management center, and generating a data code according to the metadata;

and packaging and compiling the data codes to generate corresponding function classes, wherein the function classes are used for providing service capacity for the metadata unified management system.

In one possible design, after the packing and compiling the data codes to generate the corresponding functional classes, the method further includes:

acquiring a data query instruction, wherein the metadata unified management system is used for providing data support for a data query system, and the function class is used for supporting the query instruction;

and reading corresponding metadata from a metadata center of the metadata management center according to the query instruction.

In one possible design, after the reading of the corresponding metadata from the metadata center of the metadata management center according to the query instruction, the method further includes:

periodically reading new metadata from a metadata center of the metadata management center according to a preset duration, and generating a new data code according to the new metadata;

checking whether a first mapping relation and a second mapping relation are consistent, wherein the first mapping relation is a mapping relation between the data codes and metadata in the metadata center, and the first mapping relation is a mapping relation between the new data codes and the metadata in the metadata center;

and if the codes are inconsistent, sending alarm information to reconstruct the codes.

In a second aspect, an embodiment of the present application further provides a metadata unified management system, including: the system comprises an Hbase cluster, a metadata management center and a plurality of data paths, wherein each data path is realized through different codes;

In a possible design, if the data path for performing the data import operation is the first data path, after the core online system generates transaction data, the core online system sends the transaction data to a preset stream processing platform through an asynchronous task;

In a possible design, consistency check is performed on column definitions corresponding to data subjected to the data import operation in each data path according to a preset information summarization algorithm, and after confirming that columns of data written in each data path have the same sequence, the processed data is written in the target Hbase.

In one possible design, if the metadata model in the metadata management center changes, the metadata management center sends a metadata model change notification to each data source;

In one possible design, metadata is read from a metadata center of the metadata management center, and a data code is generated according to the metadata;

In one possible design, the system further includes:

the metadata unified management system is used for providing data support for the data query system, and the function class is used for supporting the query instruction;

In one possible design, periodically reading new metadata from a metadata center of the metadata management center according to a preset time length, and generating a new data code according to the new metadata;

In a third aspect, an embodiment of the present application further provides an electronic device, including:

a processor; and the number of the first and second groups,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any one of the metadata unified management methods of the first aspect via execution of the executable instructions.

In a fourth aspect, the present application further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the metadata unified management methods in the first aspect.

In a fifth aspect, the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for unified management of metadata in any one of the first aspect is implemented.

According to the metadata unified management method, the metadata unified management system, the metadata unified management medium, the metadata unified management device and the metadata unified management program, when any one of a plurality of data paths of the metadata unified management system performs data import operation on a target Hbase, a metadata management center is called first, a metadata model corresponding to the target Hbase in the metadata management center is obtained, and then data for performing the data import operation is processed according to the metadata model so that the processed data can be written into the target Hbase. Therefore, in the data import process, the metadata model is managed in a unified manner in the metadata management center, and when any data source channel imports data, the metadata model is used as the standard, so that the technical problem that the data cannot be aligned in the synchronization of multiple data channels is effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic application scenario diagram of a metadata unified management method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a metadata unified management method according to an embodiment of the present application;

FIG. 3 is a table of the structure of a stored model of Hbase in the prior art;

FIG. 4 is a structural table of a stored model of Hbase in the embodiment of the present application;

fig. 5 is a flowchart illustrating a metadata unified management method according to a second embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of explanation, the relevant terms are first explained as follows:

hive: hive is a data warehouse tool based on Hadoop, can carry out data sorting, special query and analysis processing on data sets in files stored on an HDFS, provides a query language-hiveQL similar to SQL language, can realize simple MR statistics through HQL sentences, and converts the HQL sentences into MR tasks to be executed.

Hadoop: is a software framework that enables distributed processing of large amounts of data. Hadoop includes four modules:

common: common tools that support other modules;

HDFS (Hadoop distributed File System): hadoop Distributed File System (HDFS), a Distributed File System providing high throughput access performance;

YARN, a framework for providing job scheduling and cluster resource management;

MapReduce is a big data parallel computing framework, which is called MR for short.

Hbase: HBase is an open-source non-relational distributed database (NoSQL), which refers to the BigTable modeling of Google, and the programming language of realization is Java. The Hadoop file system is part of a Hadoop item of an Apache software foundation, runs on an HDFS file system, and provides services similar to BigTable in scale for Hadoop. Therefore, it can store massive sparse data with fault tolerance. Hbase includes two aspects to data storage management: the management of metadata is the first, and the management of data is the second.

Metadata: hbase stores corresponding region information in a meta table, reading and writing of a client firstly reads the meta table each time to find a server where a response region is located, and then directly performs reading and writing operations through RPC

Data: the data of Hbase is stored in HDFS in strict lexicographic order with hfile as the minimum unit of file,

Spark Streaming: spark provides a framework for real-time computation of big data; its underlayer, also Spark Core based; the basic calculation model is also a big data real-time calculation model RDD based on the memory, but aiming at the characteristics of real-time calculation, a layer of packaging is carried out on the RDD, which is called DStream (similar to DataFrame in Spark SQL); RDD is therefore the core of the overall Spark technology ecology. Spark Streaming is an extension of Spark Core Api, and can be used for processing large-scale, high-throughput, fault-tolerant real-time data streams; supports reading data from a wide variety of data sources, such as Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP Socket, and enables data processing using complex algorithms like high order functions, such as map, reduce, join, window; the processed data may be saved to a file system, database, Dashboard, or the like.

Service Hbase metadata: the metadata related by the application is metadata in an Hbase service level, such as rowkey rules in the Hbase, clusters and various column values in value, and due to the fact that multiple data sources are updated simultaneously, particularly, since channels of the data sources are not the same department or subsystem, agreement is difficult to achieve, and then chaotic management of the metadata is caused.

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (fintech), and the interface testing technology is no exception, but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the technology. The traditional financial industry is gradually changing to the financial technology (Finteh), and the interface testing technology is no exception, but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put on the data management technology. At present, with the advent of cloud computing and large-scale data era, various types of internet applications emerge endlessly, and particularly, with the rapid development of Social networks represented by Social Networking Services (SNS), the conventional relational database has been increasingly unable to meet the requirements of cloud computing and large data on processing of mass data, high concurrent read-write speed, expandability, high availability, and the like. In which, taking foreign FriendFeed as an example, 2.5 million user dynamics are achieved in one month, and for a relational database, Structured Query Language (SQL) Query is performed on the surface and the inside of one 2.5 million records, which is extremely inefficient.

In order to meet the requirements on processing mass data, high concurrent read-write speed, expandability, high availability and the like, a non-relational distributed storage system is developed. The HBase is a highly reliable, high-performance, nematic and scalable distributed storage system, and a large-scale structured storage cluster can be built on a cheap hardware environment by utilizing the HBase technology. The HBase is based on a Hadoop HDFS file storage system, the MapReduce is used for processing mass data, the Zookeeper is used as a cooperative service, a good solution is provided for a super-large-scale and high-concurrency mass data real-time response system by using a simple key-data mapping relation, and the data source of the HBase is a plurality of channels, such as HIVE batch derivative, T0 random writing and the like. However, since HBase is KV (Key-Value) storage, confusion in metadata management for multiple data sources is a big problem.

In the prior art, the general data writing form of Hbase includes random writing and batch writing, however, in any case, the Hbase first passes through a memory of the Hbase regioonserver, and then the data in the memory is emptied (flush) to a Hadoop Distributed File System (HDFS) through the Memstore to perform storage persistence. Therefore, the following schemes are adopted in the general data importing means:

the first scheme is as follows: and (3) randomly reading and writing into the cluster through a client: the scheme is that data in Mysql or Hive is read in a Java API form, and Hbase clusters are written in a row-by-row or batch mode.

However, the first scheme is a typical operation on a relational database and a non-relational database, and is written in the form of an API, while metadata (rowkey) for Hbase is generally spliced in code, and fields are also controlled by the code. Under the condition, especially in the Hbase multi-data-source channel mode, the consistency of the metadata written by each party is very difficult to ensure, and errors are easy to occur in production.

Scheme II: by accepting a real-time input data stream, splitting the data into multiple batches (Batch), e.g., converting each 1s of data collected into one Batch, then passing each Batch to the Spark's compute engine, a resultant data stream is generated, and then writing the resultant data stream to the target Hbase via Hbase api.

However, the second scheme is similar to the first scheme, except that the second scheme is encapsulated by the introduced Spark Streaming API, so that a data model (metadata) for writing of Hbase is also implemented by code, and thus there is a problem that service Hbase metadata cannot be managed.

The third scheme is as follows: the Hive data is led into a single cluster in a bulk form, a data file of the Hive is converted into a final persistent data file through an MR according to the format of a final Hbase storage file hfile, and the data file is moved to a corresponding Hbase storage directory.

However, the third scheme is to introduce the data after washing in Hive into Hbase in bulk by means of bulk loading through the MR task, so that the performance is the best, and the influence on the performance of large data Hbase is reduced to the minimum. However, the metadata management towards the derivative of Hbase is also disturbed and very error prone.

In summary, there is a great disadvantage in all of the above three approaches, because the Hbase is a Nosql database, i.e. a non-relational database, there is no strong check on the alignment, and the big data query system obtains and gives the data to the periphery according to the Hbase, because such multiple channels all depend on the metadata of the Hbase and cannot guarantee the definition and maintenance of the metadata in each system.

Therefore, an embodiment of the present application aims to provide a metadata unified management method, where in a process of performing a data import operation on a target Hbase, any one of a plurality of data paths of a metadata unified management system first calls a metadata management center, and obtains a metadata model corresponding to the target Hbase in the metadata management center, and then processes data subjected to the data import operation according to the metadata model, so as to write the processed data into the target Hbase. Therefore, in the data import process, the metadata model is managed in a unified manner in the metadata management center, and when any data source channel imports data, the metadata model is used as the standard, so that the technical problem that the data cannot be aligned in the synchronization of multiple data channels is effectively solved.

Fig. 1 is a schematic application scenario diagram of a metadata unified management method according to an embodiment of the present application. As shown in fig. 1, in this embodiment, the data import format mainly includes three formats, and the three formats are complementary to each other, and the method 1 and the method 3 are T0 format warehousing, but considering that there is a risk of data loss in the T0 data, the T1 format of the method 2 is required to perform complement number and perform data bottom-entering processing. It should be noted that the T0 method is to bin data in real time, and the T1 is to bin data the next day, that is, the T1 method is more complete than the T0 method, but has a certain hysteresis.

For mode 1, T0 is imported into Hbase via a quasi-real-time approach, where just in time is after the core online system generates a transaction, the online system sends the transaction data to the Kafka cluster via asynchronous tasks, and then consumes the Kafka data via Spark Streaming and writes to Hbase according to the model definition.

For the mode 2, referred to as T1 pushing, currently, transactions are generated synchronously by an online system to an online library of a core, metadata layer data of the core is extracted to Hive in the form of sqop extraction, then data cleaning and processing of an Extract-Transform-Load (ETL) are performed by Hive, and meanwhile, a metadata center is read to acquire Hbase service metadata and assemble the Hbase service metadata, and the Hbase service metadata is imported to Hbase in the form of bulk Load, which should be the most efficient mode in the import mode, and Hfile data in the Hbase format is directly imported to Hbase in the form of exchange of an underlying file, which is effective in a standing horse, but this approach has a certain hysteresis at the data level, and the hysteresis aging in the scheme is greater than or equal to 1 day.

For the mode 3, it means that the online service provided by the core is called to return T0 data through Remote Method Invocation (RMI), since the core only provides data query of T0 in order to guarantee availability and stability of the transaction, and the previous historical data is deleted and cleaned up regularly.

Therefore, the core patch layer data is synchronized to the Hbase cluster of the big data real-time query system through 3 ways in the application. As the subsystems are involved, the Hbase data storage definition is flexible enough, and the data structure is not strongly restricted, the system is easy to cause multi-party misalignment, data synchronization disorder and data error production.

With continued reference to fig. 2, in the present embodiment, the above problem is solved by performing metadata management. Specifically, in the data synchronization process, since there is a misalignment in the synchronization of multiple data paths, the metadata of the Hbase service layer is managed in the metadata center. The metadata management center is based on a financial reliable data source such as TDSQL (Tencent cloud database) as a storage medium, the model is used as the standard no matter how many data source channels are, specific models refer to model details shown below, the metadata management center is modified before developers and informs each subsystem whether to change, and the metadata management center is called to acquire metadata of a target Hbase table when the data source channels relate to related data source Hbases each time: hbase rowkey, column cluster, column, etc. and combined with the code of each path separately, where it is required to pull up-to-date metadata from the management center each time.

Since the returned Column Value after pulling is a Json character string, KV is traversed in T0 RMI and SPARK STREAMING, ETL is traversed after reading a field Column according to JSON character string, and the fields of the Hive Bulkload table correspond to one.

In order to ensure that data of an Hbase service metadata center and Hive push columns are in one-to-one correspondence, but Json objects are obtained by deserializing data of the metadata center by Hive from ETL bulk load to Hbase, different Json order inconsistency of the deserialization of the Json serialization tool may exist, meanwhile, for other channel derivatives, it is also feared that mapping errors are caused by read data misordering, and finally, data in Hbase is misordered, so that information summarization Algorithm (MD5 Message-Digest Algorithm) consistency secondary check is performed on the obtained column definitions in each channel, and misordering of the read data caused by secondary deserialization is prevented.

In addition, the column Key returned after pulling is the column name of Hbase, the Value is the column name of Hive, after each system reads the model Value, Json serialization is performed on the Value first, but due to different serialization means, such as Fastjson, Gson of Java or Json dump in Python, the positions of Key and Value may change after object serialization, and because in Hive data processing, the Bulkload table production is performed by splicing commands strictly according to the field sequence

Therefore, it is necessary to strictly specify format of dimport, columns, which are read by Json, to check whether the sequence is consistent, and to ensure that no data field is missing or redundant, so in the embodiment of the present application, each field needs to be checked, and the field of HBASE will be used.

After each system takes Hbase _ columns to perform json conversion, the Hbase _ columns are compared with check fields in a full-field manner, such as comparing an Md5(Hbase _ columns _ check) value with an Md5(Hbase _ columns. key) value, so that data read by the system is ensured to be correct and ordered, a plurality of subsystems can be unified, all the systems in a plurality of aspects take the definition as the standard, only columns are ensured at present, and the like, such as rowkey rules and the like. Due to the definition of the management center and the unified metadata management, the metadata disorder phenomenon caused by inconsistent pace of a multi-party system can be well solved, and meanwhile, after the version of the metadata center is changed, the metadata can be timely pushed to each relevant responsible person, so that the unified management of the metadata is achieved, and the unified knowledge can be realized.

According to the metadata management steps, unified management of metadata centers can be achieved aiming at metadata management of Hbase table level, each channel source is guaranteed to be unique in accuracy according to a unified configuration principle, however, due to the fact that a large data query system QS is an actual interface output service, the data accuracy of QS directly determines the accuracy of the service. The traditional method is to directly fix the mapping relation in the code to represent the meta-field information in Hbase, so that the problems of whether the fields are complete, whether the fields correspond to each other, whether the fields are misplaced and the like need to be manually guaranteed, so that a plurality of uncertain factors exist, and Bug is easily generated. In the application, the code automatically generates the mapping relation and checks the accuracy of the field according to the definition of the metadata center, so as to ensure that the model definition of the metadata center is completely consistent with that of the metadata center. In the application, Java beans in the big Data query system are in one-to-one correspondence with table fields of Hbase, and beans in Service Data Object (DTO) are provided after relevant conversion according to a Service scene. Therefore, the mapping relationship in Hbase Domain is required to ensure the consistency between DTO and metadata.

Meanwhile, when the Continuous Integration (CI) is built, it is automatically checked whether the Hbase Domain in the big data query service QS is consistent with the Hbase column of the metadata center. The big data query system pulls the Hbase Columns field and the Rowkey rule field of the metadata center through Java Database Connectivity (JDBC) and serializes through jsonnobj. Then, metadata information defined by the corresponding Hbase and hive is acquired, and an Hbase Column check field is read at the same time and used for checking whether data after Fastjson serialization is accurate or not. Therefore, the data read by the metadata center can be ensured to be completely correct, and conditions are provided for automatically generating codes according to the read metadata. Then, packing and compiling are carried out, and then the relevant Domin mapping class can be automatically generated and used for subsequent service development and automatically generated function classes.

Finally, with reference to fig. 1, periodic reliability inspection can be performed, and according to the foregoing steps, the problem of metadata unification of multiple heterogeneous data sources can be basically solved, but it is not guaranteed that codes in subsequent big data query service are consistent with a metadata center, and since a plurality of channel sources all go to the metadata management center dynamically to obtain corresponding Hbase table-level metadata each time when going to an Hbase derivative, the problem of periodic reliability guarantee does not exist. The method is only needed in a scene that metadata in the metadata is solidified into a system code, for example, a big data query system, so a timing task needs to be configured, whether the mapping relation between the Hbase table level metadata in the metadata and the metadata in the big data query service is normal or not is periodically checked, and if the mapping relation is abnormal, a Fata alarm is sent and code construction is carried out, so that the data in the data metadata and the service code are consistent under the scene.

Fig. 2 is a flowchart illustrating a metadata unified management method according to an embodiment of the present application. As shown in fig. 2, the metadata unified management method provided in this embodiment includes:

step 101, any one of the plurality of data paths calls a metadata management center in the process of data import operation to the target Hbase.

In this embodiment, the metadata unified management method is applied to a metadata unified management system, where the metadata unified management system includes: the system comprises an Hbase cluster, a metadata management center and a plurality of data paths, wherein each data path is realized by different codes. And any one of the plurality of data paths calls the metadata management center in the process of carrying out data import operation on a target Hbase, wherein the target Hbase is a database for carrying out data import operation in the Hbase cluster.

It should be noted that the plurality of data paths may include: a first datapath, a second datapath, and a third datapath. The first data path and the third data path are used for inputting real-time data, and the second data path is used for inputting data before a preset time so that the data input through the second data path supplements the data input through the first data path or the third data path.

In one possible design, the form for data import may take three forms, namely, performing data import operations through three different data paths. The three types are complementary to each other, and the method 1 (i.e., through the first data path) and the method 3 (i.e., through the third data path) are stored in a T0 manner, but considering that there is a risk of data loss in the T0 data, the method needs to perform complement in a T1 manner of the method 2 (i.e., through the second data path) to perform data bottom-entering processing. It should be noted that the T0 method is to bin data in real time, and the T1 is to bin data the next day, that is, the T1 method is more complete than the T0 method, but has a certain hysteresis.

In the method 1, if the data path for data import operation is the first data path, after the core online system generates transaction data, the core online system sends the transaction data to the preset stream processing platform through an asynchronous task, then the transaction data in the preset stream processing platform is consumed through the preset data source extraction tool, and the transaction data is written into the target Hbase according to the metadata model. Specifically, the method 1 is that T0 is introduced into Hbase through a quasi-real-time approach, where the quasi-real-time approach is that after a core online system generates a transaction, the online system sends transaction data to the Kafka cluster through an asynchronous task, and then consumes the Kafka data through Spark Streaming and writes the data into Hbase according to the model definition.

In the mode 2, if the data path for data import operation is the second data path, the core online system generates transaction data to the online library, extracts the transaction data to the preset data warehouse tool in a drawing mode through the preset data transmission tool, cleans and processes the transaction data through the data warehouse tool, reads the metadata model for data assembly, and writes the assembled transaction data into the target Hbase through the preset data import tool. Specifically, the method 2 refers to T1 pushing, and currently, transactions are generated synchronously by an online system to an online library of a core, metadata of the core is extracted to Hive in the form of sqop extraction, then data cleaning and processing of an Extract-Transform-Load (ETL) are performed by Hive, and meanwhile, a metadata center is read to obtain Hbase service metadata and assemble the Hbase service metadata, and the Hbase service metadata is imported into Hbase in the form of bulk.

For the mode 3, if the data path for data import operation is the third data path, the preset remote calling tool is called after the core online system generates transaction data; and writing the transaction data into the target Hbase according to the metadata model by presetting a remote calling tool. Specifically, the mode 3 refers to calling an online service provided by the core to return T0 data through a Remote Method Invocation (RMI), and since the core only provides data query of T0 to ensure availability and stability of the transaction, the previous historical data is deleted and cleaned up periodically.

Therefore, the core patch layer data is synchronized to the Hbase cluster of the big data real-time query system through 3 ways in the application.

And 102, acquiring a metadata model corresponding to a target Hbase in a metadata management center.

FIG. 3 is a structural table of a stored model of Hbase in the prior art. As shown in FIG. 3, the storage model of Hbase mainly comprises a main key, a column cluster, a column, a value of a basic unit, and the column cluster and the column can be freely expanded. However, the primary key must be unique in the entire table, and arranged in a lexicographic order in the underlying storage, which leads to the flexibility of such KV storage, so that the data source channels cannot agree strictly according to certain definitions. When the data source channel is changed, various abnormal conditions such as Hbase data confusion and loss occur.

Therefore, in order to uniformly manage the part of data, all data sources can be subject to the data of the metadata center, and each data source channel receives notification when any change occurs. FIG. 4 is a structural table of a storage model of Hbase in the embodiment of the present application. As shown in fig. 4, a storage model of Hbase, i.e., a metadata model, may be subjected to the model definition shown in fig. 4. Specifically, the metadata model in the metadata management center includes: hbase row bond rules, which include: the HBase verification method comprises an Hbase column cluster, an Hbase column structure and an Hbase column verification field, wherein the Hbase column cluster is used for restraining a column cluster in the HBase, the Hbase column structure is used for restraining the column structure in the HBase, and the Hbase column verification field is used for verifying a column in the HBase. And if the metadata model in the metadata management center changes, the metadata management center sends a metadata model change notice to each data source.

And 103, processing the data subjected to the data import operation according to the metadata model so as to write the processed data into the target Hbase.

After a metadata model corresponding to a target Hbase in a metadata management center is obtained, processing data subjected to data importing operation according to the metadata model so as to write the processed data into the target Hbase. Specifically, in this step, the above problem is solved by performing metadata management. Specifically, in the data importing process, since there is a misalignment in the synchronization of multiple data paths, the metadata of the Hbase service layer is managed in the metadata center. The metadata management center takes the model as a reference no matter how many data source paths are based on a financial reliable data source such as a TDSQL (Tencent cloud database), the specific model refers to the model details shown in FIG. 4, the metadata management center modifies the data source paths before developers and informs each subsystem whether to change, and the data source paths call the metadata model center to acquire metadata of a target Hbase table when related to a relevant data source Hbase each time: hbase rowkey, column cluster, column, etc. and combined with the code of each path separately, where it is required to pull up-to-date metadata from the management center each time.

In this embodiment, in the process of performing a data import operation on a target Hbase, any one of a plurality of data paths of the metadata unified management system first calls a metadata management center, obtains a metadata model corresponding to the target Hbase in the metadata management center, and then processes data subjected to the data import operation according to the metadata model, so as to write the processed data into the target Hbase. Therefore, in the data import process, the metadata model is managed in a unified manner in the metadata management center, and when any data source channel imports data, the metadata model is used as the standard, so that the technical problem that the data cannot be aligned in the synchronization of multiple data channels is effectively solved.

Fig. 5 is a flowchart illustrating a metadata unified management method according to a second embodiment of the present application. As shown in fig. 5, the metadata unified management method provided in this embodiment includes:

step 201, any data path in the plurality of data paths calls a metadata management center in the process of data import operation to the target Hbase.

Step 202, obtaining a metadata model corresponding to a target Hbase in the metadata management center.

The storage model of Hbase is shown in FIG. 3, and mainly comprises a main key, a column cluster, a column, a value of a basic unit, and the column cluster and the column can be freely expanded. However, the primary key must be unique in the entire table, and arranged in a lexicographic order in the underlying storage, which leads to the flexibility of such KV storage, so that the data source channels cannot agree strictly according to certain definitions. When the data source channel is changed, various abnormal conditions such as Hbase data confusion and loss occur.

Therefore, in order to uniformly manage the part of data, all data sources can be subject to the data of the metadata center, and each data source channel receives notification when any change occurs. FIG. 4 is a structural table of a storage model of Hbase in the embodiment of the present application. As shown in fig. 4, the metadata model in the metadata management center includes: hbase row bond rules, which include: the HBase verification method comprises an Hbase column cluster, an Hbase column structure and an Hbase column verification field, wherein the Hbase column cluster is used for restraining a column cluster in the HBase, the Hbase column structure is used for restraining the column structure in the HBase, and the Hbase column verification field is used for verifying a column in the HBase. And if the metadata model in the metadata management center changes, the metadata management center sends a metadata model change notice to each data source.

And step 203, processing the data subjected to the data import operation according to the metadata model.

And 204, performing consistency check on the column definitions corresponding to the data subjected to the data import operation in each data path according to a preset information summarization algorithm, and writing the processed data into the target Hbase after confirming that the columns of the data written in each data path have the same sequence.

Specifically, in order to ensure that data of an Hbase service metadata center and Hive push columns are in one-to-one correspondence, but Json objects are deserialized by data of the metadata center in ETL bulk to Hbase by Hive, different Json order inconsistency problems of the back parsing of the Json serialization tool may exist, and meanwhile, for other channel derivatives, it is also feared that the read data is confused to cause mapping errors, and finally, the data in Hbase is confused, so that information summarization Algorithm (MD5 Message-Digest Algorithm) consistency secondary verification is performed on the obtained column definitions in each channel, and the wrong ordering of the read data due to secondary deserialization is prevented.

Step 205, reading metadata from a metadata center of the metadata management center, and generating a data code according to the metadata.

And step 206, packaging and compiling the data codes to generate corresponding function classes, wherein the function classes are used for providing service capacity for the metadata unified management system.

And step 207, periodically reading new metadata from the metadata center of the metadata management center according to a preset time length, and generating a new data code according to the new metadata.

And 208, checking whether the first mapping relation is consistent with the second mapping relation.

And step 209, if the codes are not consistent, sending alarm information to rebuild the codes.

In step 205-step 209, according to the metadata management steps, unified management of metadata center can be performed for metadata management at Hbase table level, and each channel source guarantees unique accuracy according to a unified configuration principle, but the data accuracy of QS directly determines the accuracy of service because the QS is an actual interface output service. The traditional method is to directly fix the mapping relation in the code to represent the meta-field information in Hbase, so that the problems of whether the fields are complete, whether the fields correspond to each other, whether the fields are misplaced and the like need to be manually guaranteed, so that a plurality of uncertain factors exist, and Bug is easily generated. In the application, the code automatically generates the mapping relation and checks the accuracy of the field according to the definition of the metadata center, so as to ensure that the model definition of the metadata center is completely consistent with that of the metadata center. In the application, Java beans in the big Data query system are in one-to-one correspondence with table fields of Hbase, and beans in Service Data Object (DTO) are provided after relevant conversion according to a Service scene. Therefore, the mapping relationship in Hbase Domain is required to ensure the consistency between DTO and metadata.

In addition, regular reliability inspection can be performed, according to the steps, the problem of metadata unification of multiple heterogeneous data sources can be basically solved, but the consistency of codes in follow-up big data query service and a metadata center cannot be guaranteed, and because a plurality of channel sources all dynamically go to the metadata management center to obtain corresponding Hbase table-level metadata every time when the channel sources go to an Hbase derivative, the problem of regular reliability guarantee does not exist. The method is only needed in a scene that metadata in the metadata is solidified into a system code, for example, a big data query system, so a timing task needs to be configured, whether the mapping relation between the Hbase table level metadata in the metadata and the metadata in the big data query service is normal or not is periodically checked, and if the mapping relation is abnormal, a Fata alarm is sent and code construction is carried out, so that the data in the data metadata and the service code are consistent under the scene.

On the basis of the above embodiment, after the data codes are packed and compiled to generate corresponding function classes, the data query instruction may be obtained, where the metadata unified management system is used to provide data support for the data query system, and the function classes are used to support the query instruction. Then, corresponding metadata is read from a metadata center of the metadata management center according to the query instruction.

In this embodiment, in the data import process, by uniformly managing the metadata model in the metadata management center, when data import is performed on any data source path, the metadata model is used as a reference, and thus the technical problem that data cannot be aligned in synchronization of multiple data paths is effectively solved, and the technical effect that heterogeneous service metadata for the service Hbase is unified in multiple ways is achieved. In addition, the technical scheme provided by this embodiment has the technical effect of fast migration and replication for various service systems because the scheme modes are uniform and can be conveniently migrated.

An embodiment of the present application further provides a metadata unified management system, including: the system comprises an Hbase cluster, a metadata management center and a plurality of data paths, wherein each data path is realized through different codes;

In one possible design, the system further includes:

Fig. 6 is a schematic structural diagram of an electronic device shown in the present application according to an example embodiment. As shown in fig. 6, the present embodiment provides an electronic device 300, including:

a processor 301; and the number of the first and second groups,

a memory 302 for storing executable instructions of the processor, which may also be a flash (flash memory);

wherein the processor 301 is configured to perform the steps of the above-described method via execution of the executable instructions.

Alternatively, the memory 302 may be separate or integrated with the processor 301.

When the memory 302 is a device independent from the processor 301, the electronic device 300 may further include:

a bus 303 for connecting the processor 301 and the memory 302.

The present embodiment also provides a readable storage medium, in which a computer program is stored, and when at least one processor of the electronic device executes the computer program, the electronic device executes the steps of the above method.

The present embodiment also provides a program product comprising a computer program stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of the electronic device, and execution of the computer program by the at least one processor causes the electronic device to perform the steps of the above-described method.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A unified metadata management method is applied to a unified metadata management system, and the unified metadata management system comprises the following steps: hbase cluster, metadata management center and a plurality of data paths, wherein each data path is realized by different codes, and the method comprises the following steps:

2. A unified management method for metadata according to claim 1, wherein said plurality of data paths comprise: a first data path, a second data path, and a third data path;

3. The method for unified management of metadata according to claim 2, wherein the step of processing the data subjected to the data import operation according to the metadata model to write the processed data into the target Hbase comprises:

4. The unified management method for metadata according to any one of claims 1 to 3, further comprising, after the processing the data subjected to the data import operation according to the metadata model:

5. The unified management method for metadata, according to claim 4, further comprising:

6. The method for unified management of metadata according to claim 5, wherein after said processing the data subjected to said data import operation according to said metadata model to write the processed data into said target Hbase, further comprising:

7. The method for unified management of metadata, according to claim 6, further comprising, after said packaging and compiling said data codes to generate corresponding functional classes:

8. A unified management method for metadata according to claim 7, after the reading of corresponding metadata from the metadata center of the metadata management center according to the query instruction, further comprising:

9. A unified metadata management system, comprising: the system comprises an Hbase cluster, a metadata management center and a plurality of data paths, wherein each data path is realized through different codes;

10. An electronic device, comprising:

a processor; and

a memory for storing a computer program for the processor;

wherein the processor is configured to implement the metadata unified management method of any one of claims 1 to 8 by executing the computer program.

11. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the metadata unified management method according to any one of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the metadata unified management method according to any of claims 1 to 8.