CN109739894B

CN109739894B - Method, device, equipment and storage medium for supplementing metadata description

Info

Publication number: CN109739894B
Application number: CN201910014080.9A
Authority: CN
Inventors: 司东华; 陈文浩; 周可; 兰冲; 杨东方; 罗迎宣; 沈蓉; 王惠平; 敖林; 郭腾达; 覃娆; 张玲; 张志杰; 汪亚男; 邱毅
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2022-12-09
Anticipated expiration: 2039-01-04
Also published as: CN109739894A

Abstract

The invention discloses a method for supplementing metadata description, which comprises the following steps: acquiring pre-stored full metadata from each source database; performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata; acquiring a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. The invention also discloses a device, equipment and a storage medium for supplementing the metadata description. The invention reduces manual intervention, lowers labor cost and improves the overall working efficiency in the process of supplementing metadata description.

Description

Method, device, equipment and storage medium for supplementing metadata description

Technical Field

The invention relates to the technical field of data governance, in particular to a method, a device, equipment and a storage medium for supplementing metadata description.

Background

Metadata is data used to describe data, and field description information of metadata refers to information describing the meaning of a field in a metadata database, which can assist a data user to better understand the data. In a large data platform, sometimes, field description information of metadata is not created for various reasons, which may cause that data cannot be effectively understood when a third party uses data, causing troubles.

The existing supplementary metadata field describes a main scheme, and is basically supplemented by manual participation of related developers, and specifically comprises two schemes: 1) Developers modify the metadata description information in the source database through the script, and then the metadata description information is acquired and processed through the metadata management system and displayed to the data users. The scheme needs to revise the table structure of the source database, which increases the risk of the development process, and has higher cost and slower effect. 2) Based on the metadata management system, a metadata management module is newly added in the metadata management system to support description editing and modification aiming at the metadata hierarchy. The scheme does not need to re-develop the process, has no influence on the existing source system database, and the whole process is still supplemented by the whole description of the manual intervention, so that the labor cost is high and the working efficiency is low.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for supplementing metadata description, and aims to reduce manual intervention, reduce labor cost and improve the overall working efficiency in the process of supplementing metadata description.

To achieve the above object, the present invention provides a method of supplementing a metadata description, the method of supplementing a metadata description comprising the steps of:

acquiring pre-stored full metadata from each source database;

performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata;

acquiring a target field to be subjected to supplementary description from the full metadata, and determining a target table to which the target field belongs;

determining an upstream table of the target table according to the blood relationship map;

and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity.

Preferably, before the step of obtaining the pre-saved full amount of metadata from each source database, the method further includes:

pulling metadata from a preset source database;

and storing the pulled metadata into a preset metadata resource library.

Preferably, the step of performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata comprises:

acquiring a data table type of the full metadata, wherein the data table type comprises a big data platform data table and a traditional relational data table;

calling a preset blood relationship hook function to obtain blood relationship between different big data platform data tables and blood relationship between the big data platform data table and the traditional relational data table;

and calling a preset graph database engine, and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.

Preferably, the step of analyzing the similarity between the metadata field in the upstream table and the target field comprises:

carrying out deduplication processing on metadata fields in the upstream table;

removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed;

and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.

Preferably, the step of respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm includes:

respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm;

and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.

Preferably, the step of performing supplementary description on the target field according to the similarity includes:

determining fields with the highest similarity with the target fields from the metadata field set;

and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.

arranging the fields in the metadata field set according to the sequence of the similarity from high to low to obtain a supplementary description selection list of the target field;

when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user;

and acquiring the field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.

In addition, to achieve the above object, the present invention also provides an apparatus for supplementing a metadata description, including:

the first acquisition module is used for acquiring pre-stored full metadata from each source database;

the construction module is used for carrying out data blood relationship analysis on the full amount metadata so as to construct blood relationship maps among different data tables of the full amount metadata;

the second acquisition module is used for acquiring a target field to be subjected to supplementary description from the full metadata and determining a target table to which the target field belongs;

the determining module is used for determining an upstream table of the target table according to the blood relationship map;

and the supplementary description module is used for analyzing the similarity between the metadata field in the upstream table and the target field and carrying out supplementary description on the target field according to the similarity.

In addition, to achieve the above object, the present invention also provides an apparatus for supplementing metadata description, including: a memory, a processor and a program of supplementary metadata description stored on the memory and executable on the processor, which program of supplementary metadata description when executed by the processor implements the steps of the method of supplementary metadata description as described above.

Furthermore, to achieve the above object, the present invention also provides a storage medium having stored thereon a program of supplementary metadata description, which when executed by a processor, implements the steps of the method of supplementary metadata description as described above.

The method includes the steps that pre-stored full metadata from various source databases are obtained; performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata; acquiring a target field to be subjected to supplementary description from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. Compared with the prior art, the method has the advantages that the metadata description is automatically supplemented for the target field according to the blood relationship among different data tables, so that the manual intervention is reduced, the cost is reduced, and the overall working efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for supplementing metadata descriptions according to the present invention;

FIG. 3 is a diagram illustrating a metadata management system pulling metadata from a source database according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a detailed step of step S20 in FIG. 2;

FIG. 5 is a diagram illustrating a relationship between a target table and an upstream table in an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating an analysis result of similarity analysis performed on the target field Cust _ name in the target table1 in fig. 5.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The device of the embodiment of the present invention that supplements the metadata description may be a PC or a server device.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein programs of an operating system, a network communication module, a user interface module, and a supplementary metadata description.

In the apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting a data user and communicating data with the data user; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a program of the supplementary metadata description stored in the memory 1005 and perform the operations in the embodiments of the method of the supplementary metadata description described below.

Based on the above hardware structure, various embodiments of the method for supplementing metadata description of the present invention are proposed.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of a method for supplementary metadata description according to the present invention, the method including:

step S10, acquiring pre-stored full metadata from each source database;

the method for supplementing metadata description of the embodiment is applied to a metadata management system on a big data platform, is mounted on the device for supplementing metadata description, and is mainly used for collecting metadata from a source database and providing the collected metadata to a metadata application for a data user to use.

In the metadata management system, a total amount of metadata from each source database is stored in advance. Before the step S10, the method may include: pulling metadata from a preset source database; and storing the pulled metadata into a preset metadata resource library so as to obtain the pre-stored full metadata from each source database in the metadata resource library subsequently.

Referring to fig. 3, fig. 3 is a schematic diagram of the metadata management system pulling metadata from the source database according to an embodiment of the present invention. The metadata management system comprises a metadata acquisition module and a metadata resource base, wherein the source database comprises a large data platform database (such as Hive, hbase and the like) and a distributed traditional relational database (such as MySQL, oracle and the like). The metadata collection module can periodically pull metadata from a big data platform database and a traditional relational database through an ETL (Extract-Transform-Load) for describing the process of extracting (Extract), transforming (Transform) and loading (Load) data from a source end to a destination end, and then store the pulled metadata in a metadata resource library.

It should be noted that the metadata pulled from the source database includes, but is not limited to, metadata describing database cluster information, database information, data table information, and field information of the source system (referring to the system corresponding to each source database). The database cluster information of the source system can include cluster codes, cluster types, data source types, located logic network areas, located physical network areas, IP addresses, network ports and the like; the database information of the source system can comprise database ID, database name, data source type, cluster code, library description information, development responsible person and the like; the data table information of the source system may include table ID, table name, library ID, library name, table description, development leader, modification time, etc.; the field information of the source system may include a field ID, a field name, a table name, a sequence number, a data type, a length, whether it may be empty, whether it is indexed, whether it is partitioned, a field description, and the like.

In this embodiment, first, the pre-saved full amount of metadata derived from each source database is obtained, and specifically, the pre-saved full amount of metadata derived from each source database may be obtained from the metadata repository.

Step S20, carrying out data blood relationship analysis on the full-amount metadata to construct blood relationship maps among different data tables of the full-amount metadata;

the full metadata comprises a plurality of different data tables, in the step, data blood margin analysis is carried out on the full metadata, namely the data relation between the data tables is analyzed, the data relation comprises the source (upstream blood margin) and the data flow direction (downstream blood margin) of the data in the tables, and finally, a blood margin map between the different data tables of the full metadata is constructed.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a refinement step of step S20 in fig. 2. The step S20 may further include:

step S21, obtaining the data table type of the full amount of metadata, wherein the data table type comprises a big data platform data table and a traditional relational data table;

step S22, calling a preset blood margin hook function to obtain blood margin relations between different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables;

and S23, calling a preset graph database engine, and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.

In a specific implementation, the data table types of the full amount of metadata are first obtained, and the data table types may include a big data platform data table derived from a big data platform database and a traditional relational data table derived from a traditional relational database.

Then, a preset blood margin Hook function (linkage Hook) is called to obtain blood margin relations between different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables. At present, data circulation processing is mainly realized in three modes, namely, hive (a data warehouse tool based on Hadoop), spark (a general big data rapid processing engine) and Sqoop (a tool for transmitting data between Hadoop and a relational database server), so that blood margin capturing can be mainly realized by analyzing Hive sql, spark sql and Sqoop execution processes:

a) Hive linkage Hook. The Hive kindred function calls an autonomously realized Hive execution behavior analysis API (Application Programming Interface) to obtain SQL input data information, SQL output data information and associated task information by asynchronously capturing Hive execution SQL statements;

b) Spark-SQL linkage Hook. The blood margin hook function of Spark acquires SQL statements executed by Spark-SQL asynchronously, and calls an independently-realized Spark SQL execution behavior analysis API to obtain SQL input data information, output data information and associated task information;

c) Sqoop linkage Hook. The blood-source hook function of the Sqoop analyzes parameters of the Sqoop execution command by asynchronously capturing the execution command of the Sqoop to obtain input data and output data related information of the execution command and related task information.

The linkage Hook corresponding to Hive and Spark-SQL is used for capturing the blood relationship between different big data platform data tables, the blood relationship can be obtained by one or two blood relationship Hook functions, and the Sqoop linkage Hook is used for capturing the blood relationship between the big data platform data tables and the traditional relational data tables. In this embodiment, after the blood relationship is obtained through each blood relationship hook function, a timing scheduling system of the big data platform may periodically trigger a data calculation component (MapReduce) of the big data platform to process blood relationship data obtained every day, and the processed data is finally stored in a Hive table, where each piece of processed blood relationship information includes: the format of the record can be: src table [ source table ], dstTable [ target table ], extra _ info [ association information ]).

And then, calling a preset graph database engine by the metadata management system, and constructing the acquired blood relationship between different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph. The construction process comprises the following steps: a target table of the blood relationship information can be used as a source table of other blood relationship information, so that a complete blood relationship chain is formed, different blood relationship chains form a complete blood relationship chart, namely a blood relationship map.

Step S30, obtaining a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs;

in this step, a target field to be described in a supplementary manner is obtained from the full-amount metadata, specifically, the metadata management system may capture an undescribed field or a field described as a messy code from the pre-stored full-amount metadata as a target field to be described in a supplementary manner periodically or when a capture instruction of a user is received, and then determine a data table to which the target field belongs as a target table.

Step S40, determining an upstream table of the target table according to the blood relationship map;

in this step, the upstream table of the target table, i.e. all data tables containing data sources of the target table, is determined according to the constructed blood-related map. Because the upstream blood relationship of the data table and the field metadata in the table have a closer relationship, generally, the meaning of the data cannot be changed due to data transfer, and therefore, for some field metadata which are not described, the recommended supplement of the field metadata description can be carried out by utilizing the blood relationship of the upstream blood relationship of the field metadata.

And S50, analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity.

The upstream table generally includes several metadata fields, and in this step, the similarity between each metadata field in the upstream table and the target field is analyzed, and then the target field is subjected to supplementary description according to the similarity. For example, the field information of the field with the highest similarity to the target field in the upstream table may be obtained, and the field information is used as the supplementary description of the target field, or the field information of a plurality of fields with the similarity to the target field being greater than or equal to a preset threshold in the upstream table is obtained and recommended to the user, and the user selects one field information from the plurality of field information as the supplementary description of the target field, so that different supplementary description modes can be flexibly selected in specific implementation. The method for performing supplementary description on the target field through the field similarity has the advantages that the accuracy of supplementary description is high, developers do not need to participate in the description in the whole process, only final confirmation is needed, and the overall working efficiency is greatly improved.

The embodiment acquires the pre-saved full amount of metadata from each source database; performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata; acquiring a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. Compared with the prior art, the method has the advantages that the metadata description is automatically supplemented for the target field according to the blood relationship among different data tables, so that the manual intervention is reduced, the cost is reduced, and the overall working efficiency is improved.

Further, based on the first embodiment of the method of supplementing a metadata description of the present invention, a second embodiment of the method of supplementing a metadata description of the present invention is proposed.

In this embodiment, the step of analyzing the similarity between the metadata field in the upstream table and the target field may include: carrying out deduplication processing on metadata fields in the upstream table; removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed; and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.

In this embodiment, considering that the metadata fields in the upstream table may be repeated, for example, one field may exist in multiple data tables at the same time, in order to improve the calculation efficiency, the metadata fields in the upstream table may be subjected to deduplication processing first. Specifically, referring to fig. 5, fig. 5 is a schematic diagram of a relationship between a target table and an upstream table in an embodiment of the present invention. Wherein, table1 is a target table, which contains a target field, list _ name, table2-table6 are upstream tables of table1, and table4 and table6 both include IDs, so that only one ID field can be reserved for deduplication.

Then, preset invalid fields, such as id, field, mark and other fields which are commonly used but have no actual business meaning, are removed from the metadata fields after the deduplication processing, and finally a metadata field set to be analyzed is obtained. And then, respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.

In an embodiment, the step of separately calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm may include: respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm; and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.

Specifically, the similarity analysis algorithm adopts a Levenshtein Distance algorithm, which is the minimum number of editing operations required for converting two character strings from one to another (the editing operations include replacing one character with another, inserting one character, and deleting one character), and the smaller the editing Distance, the greater the similarity between the two character strings. In this embodiment, the minimum number of editing operations required to convert each field in the metadata field set into the target field may be calculated according to an editing distance algorithm, and then the similarity between each field in the metadata field set and the target field may be calculated according to the minimum number of editing operations.

Certainly, in specific implementation, other algorithms may also be used to calculate the similarity between each field in the metadata field set and the target field, which is not limited in this embodiment.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating an analysis result of similarity analysis performed on the target field cure _ name in the target table1 of fig. 5. Through similarity analysis, the similarity between the Cust _ name, custname, c _ name, prod _ code and telephone in the metadata field set and the target field Cust _ name is respectively 1, 0.89,0.67,0.56,0.22 and 0.11.

And after the similarity is obtained through calculation, performing supplementary description on the target field according to the similarity.

In an embodiment, the step of performing supplementary description on the target field according to the similarity includes: determining fields with the highest similarity with the target fields from the metadata field set; and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.

For example, in fig. 6, a field Cust _ name having the highest similarity to the target field Cust _ name may be taken, and field description information of the Cust _ name may be obtained from a corresponding upstream table, and the field description information may be used as a supplementary description of the target field Cust _ name. This way a fast matching of the supplementary description information is achieved.

In another embodiment, the step of performing a supplementary description on the target field according to the similarity includes: according to the sequence of the similarity from high to low, fields in the metadata field set are arranged to obtain a supplementary description selection list of the target field; when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user; and acquiring field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.

For example, the fields in the metadata field set in fig. 6 are arranged in the order of high similarity to low similarity, at this time, the supplementary description selection list may be presented to the front-end user, the user selects one field from the list, then, the field description information of the field selected by the user is obtained from the corresponding upstream table, and the field description information is used as the supplementary description of the target field. By the method, the user can flexibly select the supplementary description information, and the flexibility of supplementary description of the metadata is improved.

The invention also provides a device for supplementing the metadata description. The device for supplementing metadata description in the embodiment of the invention comprises:

the construction module is used for carrying out data blood relationship analysis on the full-amount metadata so as to construct blood relationship maps among different data tables of the full-amount metadata;

a determining module for determining an upstream table of the target table according to the blood relationship map;

Further, the apparatus for supplementing metadata description further comprises:

the pulling module is used for pulling the metadata from a preset source database;

and the storage module is used for storing the pulled metadata into a preset metadata resource library.

Further, the building module comprises:

the acquisition unit is used for acquiring the data table types of the full metadata, wherein the data table types comprise a big data platform data table and a traditional relational data table;

the system comprises a first calling unit, a second calling unit and a third calling unit, wherein the first calling unit is used for calling a preset blood margin hook function to obtain blood margin relations among different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables;

and the second calling unit is used for calling a preset graph database engine and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.

Further, the supplementary description module is further configured to:

carrying out deduplication processing on metadata fields in the upstream table;

Further, the supplementary description module is further configured to:

according to the sequence of the similarity from high to low, fields in the metadata field set are arranged to obtain a supplementary description selection list of the target field;

The operations executed by the program modules may refer to the method embodiment of the present invention described in the supplementary metadata, and are not described herein again.

The invention also provides a storage medium.

The storage medium of the present invention has stored thereon a program of supplementary metadata description which, when executed by a processor, implements the steps of the method of supplementary metadata description as described above.

The method implemented when the program described with the supplementary metadata running on the processor is executed may refer to the embodiments of the method described with the supplementary metadata of the present invention, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention or the portions contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of supplementing a metadata description, the method comprising the steps of:

acquiring pre-stored full metadata from each source database;

analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity;

the step of performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata comprises:

acquiring data table types of the full amount of metadata, wherein the data table types comprise a big data platform data table and a traditional relational data table;

calling a preset blood relationship hook function to obtain blood relationship between different big data platform data tables and blood relationship between the big data platform data tables and the traditional relational data tables;

2. A method for supplementing metadata descriptions as recited in claim 1 wherein the step of obtaining a pre-saved full amount of metadata from each source database is preceded by the steps of:

pulling metadata from a preset source database;

and storing the pulled metadata into a preset metadata resource library.

3. A method of supplementing a metadata description as claimed in any one of claims 1 to 2 wherein said step of analysing the similarity of metadata fields in said upstream table to said target field comprises:

carrying out deduplication processing on metadata fields in the upstream table;

4. A method of supplementing metadata descriptions according to claim 3 wherein said step of separately calculating the similarity of each field in said set of metadata fields to said target field according to a predetermined algorithm comprises:

5. A method of supplementing a metadata description as recited in claim 3, wherein the step of supplementing the target field with a description based on the similarity includes:

6. A method of supplementing a metadata description as recited in claim 3, wherein the step of supplementing the target field with a description based on the similarity includes:

7. An apparatus for supplementing a metadata description, the apparatus comprising:

the supplementary description module is used for analyzing the similarity between the metadata field in the upstream table and the target field and carrying out supplementary description on the target field according to the similarity;

the building module comprises:

8. An apparatus for supplementing a metadata description, the apparatus comprising: a memory, a processor, and a program of supplemental metadata description stored on the memory and executable on the processor, the program of supplemental metadata description implementing the steps of the method of supplemental metadata description as claimed in any of claims 1 to 6 when executed by the processor.

9. A storage medium, characterized in that the storage medium has stored thereon a program of supplementary metadata description, which when executed by a processor implements the steps of the method of supplementary metadata description according to any one of claims 1 to 6.