CN109739894B - Method, device, equipment and storage medium for supplementing metadata description - Google Patents

Method, device, equipment and storage medium for supplementing metadata description Download PDF

Info

Publication number
CN109739894B
CN109739894B CN201910014080.9A CN201910014080A CN109739894B CN 109739894 B CN109739894 B CN 109739894B CN 201910014080 A CN201910014080 A CN 201910014080A CN 109739894 B CN109739894 B CN 109739894B
Authority
CN
China
Prior art keywords
metadata
field
description
target
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910014080.9A
Other languages
Chinese (zh)
Other versions
CN109739894A (en
Inventor
司东华
陈文浩
周可
兰冲
杨东方
罗迎宣
沈蓉
王惠平
敖林
郭腾达
覃娆
张玲
张志杰
汪亚男
邱毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201910014080.9A priority Critical patent/CN109739894B/en
Publication of CN109739894A publication Critical patent/CN109739894A/en
Application granted granted Critical
Publication of CN109739894B publication Critical patent/CN109739894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for supplementing metadata description, which comprises the following steps: acquiring pre-stored full metadata from each source database; performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata; acquiring a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. The invention also discloses a device, equipment and a storage medium for supplementing the metadata description. The invention reduces manual intervention, lowers labor cost and improves the overall working efficiency in the process of supplementing metadata description.

Description

Method, device, equipment and storage medium for supplementing metadata description
Technical Field
The invention relates to the technical field of data governance, in particular to a method, a device, equipment and a storage medium for supplementing metadata description.
Background
Metadata is data used to describe data, and field description information of metadata refers to information describing the meaning of a field in a metadata database, which can assist a data user to better understand the data. In a large data platform, sometimes, field description information of metadata is not created for various reasons, which may cause that data cannot be effectively understood when a third party uses data, causing troubles.
The existing supplementary metadata field describes a main scheme, and is basically supplemented by manual participation of related developers, and specifically comprises two schemes: 1) Developers modify the metadata description information in the source database through the script, and then the metadata description information is acquired and processed through the metadata management system and displayed to the data users. The scheme needs to revise the table structure of the source database, which increases the risk of the development process, and has higher cost and slower effect. 2) Based on the metadata management system, a metadata management module is newly added in the metadata management system to support description editing and modification aiming at the metadata hierarchy. The scheme does not need to re-develop the process, has no influence on the existing source system database, and the whole process is still supplemented by the whole description of the manual intervention, so that the labor cost is high and the working efficiency is low.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for supplementing metadata description, and aims to reduce manual intervention, reduce labor cost and improve the overall working efficiency in the process of supplementing metadata description.
To achieve the above object, the present invention provides a method of supplementing a metadata description, the method of supplementing a metadata description comprising the steps of:
acquiring pre-stored full metadata from each source database;
performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata;
acquiring a target field to be subjected to supplementary description from the full metadata, and determining a target table to which the target field belongs;
determining an upstream table of the target table according to the blood relationship map;
and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity.
Preferably, before the step of obtaining the pre-saved full amount of metadata from each source database, the method further includes:
pulling metadata from a preset source database;
and storing the pulled metadata into a preset metadata resource library.
Preferably, the step of performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata comprises:
acquiring a data table type of the full metadata, wherein the data table type comprises a big data platform data table and a traditional relational data table;
calling a preset blood relationship hook function to obtain blood relationship between different big data platform data tables and blood relationship between the big data platform data table and the traditional relational data table;
and calling a preset graph database engine, and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.
Preferably, the step of analyzing the similarity between the metadata field in the upstream table and the target field comprises:
carrying out deduplication processing on metadata fields in the upstream table;
removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed;
and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.
Preferably, the step of respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm includes:
respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm;
and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.
Preferably, the step of performing supplementary description on the target field according to the similarity includes:
determining fields with the highest similarity with the target fields from the metadata field set;
and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.
Preferably, the step of performing supplementary description on the target field according to the similarity includes:
arranging the fields in the metadata field set according to the sequence of the similarity from high to low to obtain a supplementary description selection list of the target field;
when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user;
and acquiring the field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.
In addition, to achieve the above object, the present invention also provides an apparatus for supplementing a metadata description, including:
the first acquisition module is used for acquiring pre-stored full metadata from each source database;
the construction module is used for carrying out data blood relationship analysis on the full amount metadata so as to construct blood relationship maps among different data tables of the full amount metadata;
the second acquisition module is used for acquiring a target field to be subjected to supplementary description from the full metadata and determining a target table to which the target field belongs;
the determining module is used for determining an upstream table of the target table according to the blood relationship map;
and the supplementary description module is used for analyzing the similarity between the metadata field in the upstream table and the target field and carrying out supplementary description on the target field according to the similarity.
In addition, to achieve the above object, the present invention also provides an apparatus for supplementing metadata description, including: a memory, a processor and a program of supplementary metadata description stored on the memory and executable on the processor, which program of supplementary metadata description when executed by the processor implements the steps of the method of supplementary metadata description as described above.
Furthermore, to achieve the above object, the present invention also provides a storage medium having stored thereon a program of supplementary metadata description, which when executed by a processor, implements the steps of the method of supplementary metadata description as described above.
The method includes the steps that pre-stored full metadata from various source databases are obtained; performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata; acquiring a target field to be subjected to supplementary description from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. Compared with the prior art, the method has the advantages that the metadata description is automatically supplemented for the target field according to the blood relationship among different data tables, so that the manual intervention is reduced, the cost is reduced, and the overall working efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a method for supplementing metadata descriptions according to the present invention;
FIG. 3 is a diagram illustrating a metadata management system pulling metadata from a source database according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a detailed step of step S20 in FIG. 2;
FIG. 5 is a diagram illustrating a relationship between a target table and an upstream table in an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating an analysis result of similarity analysis performed on the target field Cust _ name in the target table1 in fig. 5.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The device of the embodiment of the present invention that supplements the metadata description may be a PC or a server device.
As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein programs of an operating system, a network communication module, a user interface module, and a supplementary metadata description.
In the apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting a data user and communicating data with the data user; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a program of the supplementary metadata description stored in the memory 1005 and perform the operations in the embodiments of the method of the supplementary metadata description described below.
Based on the above hardware structure, various embodiments of the method for supplementing metadata description of the present invention are proposed.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of a method for supplementary metadata description according to the present invention, the method including:
step S10, acquiring pre-stored full metadata from each source database;
the method for supplementing metadata description of the embodiment is applied to a metadata management system on a big data platform, is mounted on the device for supplementing metadata description, and is mainly used for collecting metadata from a source database and providing the collected metadata to a metadata application for a data user to use.
In the metadata management system, a total amount of metadata from each source database is stored in advance. Before the step S10, the method may include: pulling metadata from a preset source database; and storing the pulled metadata into a preset metadata resource library so as to obtain the pre-stored full metadata from each source database in the metadata resource library subsequently.
Referring to fig. 3, fig. 3 is a schematic diagram of the metadata management system pulling metadata from the source database according to an embodiment of the present invention. The metadata management system comprises a metadata acquisition module and a metadata resource base, wherein the source database comprises a large data platform database (such as Hive, hbase and the like) and a distributed traditional relational database (such as MySQL, oracle and the like). The metadata collection module can periodically pull metadata from a big data platform database and a traditional relational database through an ETL (Extract-Transform-Load) for describing the process of extracting (Extract), transforming (Transform) and loading (Load) data from a source end to a destination end, and then store the pulled metadata in a metadata resource library.
It should be noted that the metadata pulled from the source database includes, but is not limited to, metadata describing database cluster information, database information, data table information, and field information of the source system (referring to the system corresponding to each source database). The database cluster information of the source system can include cluster codes, cluster types, data source types, located logic network areas, located physical network areas, IP addresses, network ports and the like; the database information of the source system can comprise database ID, database name, data source type, cluster code, library description information, development responsible person and the like; the data table information of the source system may include table ID, table name, library ID, library name, table description, development leader, modification time, etc.; the field information of the source system may include a field ID, a field name, a table name, a sequence number, a data type, a length, whether it may be empty, whether it is indexed, whether it is partitioned, a field description, and the like.
In this embodiment, first, the pre-saved full amount of metadata derived from each source database is obtained, and specifically, the pre-saved full amount of metadata derived from each source database may be obtained from the metadata repository.
Step S20, carrying out data blood relationship analysis on the full-amount metadata to construct blood relationship maps among different data tables of the full-amount metadata;
the full metadata comprises a plurality of different data tables, in the step, data blood margin analysis is carried out on the full metadata, namely the data relation between the data tables is analyzed, the data relation comprises the source (upstream blood margin) and the data flow direction (downstream blood margin) of the data in the tables, and finally, a blood margin map between the different data tables of the full metadata is constructed.
Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a refinement step of step S20 in fig. 2. The step S20 may further include:
step S21, obtaining the data table type of the full amount of metadata, wherein the data table type comprises a big data platform data table and a traditional relational data table;
step S22, calling a preset blood margin hook function to obtain blood margin relations between different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables;
and S23, calling a preset graph database engine, and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.
In a specific implementation, the data table types of the full amount of metadata are first obtained, and the data table types may include a big data platform data table derived from a big data platform database and a traditional relational data table derived from a traditional relational database.
Then, a preset blood margin Hook function (linkage Hook) is called to obtain blood margin relations between different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables. At present, data circulation processing is mainly realized in three modes, namely, hive (a data warehouse tool based on Hadoop), spark (a general big data rapid processing engine) and Sqoop (a tool for transmitting data between Hadoop and a relational database server), so that blood margin capturing can be mainly realized by analyzing Hive sql, spark sql and Sqoop execution processes:
a) Hive linkage Hook. The Hive kindred function calls an autonomously realized Hive execution behavior analysis API (Application Programming Interface) to obtain SQL input data information, SQL output data information and associated task information by asynchronously capturing Hive execution SQL statements;
b) Spark-SQL linkage Hook. The blood margin hook function of Spark acquires SQL statements executed by Spark-SQL asynchronously, and calls an independently-realized Spark SQL execution behavior analysis API to obtain SQL input data information, output data information and associated task information;
c) Sqoop linkage Hook. The blood-source hook function of the Sqoop analyzes parameters of the Sqoop execution command by asynchronously capturing the execution command of the Sqoop to obtain input data and output data related information of the execution command and related task information.
The linkage Hook corresponding to Hive and Spark-SQL is used for capturing the blood relationship between different big data platform data tables, the blood relationship can be obtained by one or two blood relationship Hook functions, and the Sqoop linkage Hook is used for capturing the blood relationship between the big data platform data tables and the traditional relational data tables. In this embodiment, after the blood relationship is obtained through each blood relationship hook function, a timing scheduling system of the big data platform may periodically trigger a data calculation component (MapReduce) of the big data platform to process blood relationship data obtained every day, and the processed data is finally stored in a Hive table, where each piece of processed blood relationship information includes: the format of the record can be: src table [ source table ], dstTable [ target table ], extra _ info [ association information ]).
And then, calling a preset graph database engine by the metadata management system, and constructing the acquired blood relationship between different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph. The construction process comprises the following steps: a target table of the blood relationship information can be used as a source table of other blood relationship information, so that a complete blood relationship chain is formed, different blood relationship chains form a complete blood relationship chart, namely a blood relationship map.
Step S30, obtaining a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs;
in this step, a target field to be described in a supplementary manner is obtained from the full-amount metadata, specifically, the metadata management system may capture an undescribed field or a field described as a messy code from the pre-stored full-amount metadata as a target field to be described in a supplementary manner periodically or when a capture instruction of a user is received, and then determine a data table to which the target field belongs as a target table.
Step S40, determining an upstream table of the target table according to the blood relationship map;
in this step, the upstream table of the target table, i.e. all data tables containing data sources of the target table, is determined according to the constructed blood-related map. Because the upstream blood relationship of the data table and the field metadata in the table have a closer relationship, generally, the meaning of the data cannot be changed due to data transfer, and therefore, for some field metadata which are not described, the recommended supplement of the field metadata description can be carried out by utilizing the blood relationship of the upstream blood relationship of the field metadata.
And S50, analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity.
The upstream table generally includes several metadata fields, and in this step, the similarity between each metadata field in the upstream table and the target field is analyzed, and then the target field is subjected to supplementary description according to the similarity. For example, the field information of the field with the highest similarity to the target field in the upstream table may be obtained, and the field information is used as the supplementary description of the target field, or the field information of a plurality of fields with the similarity to the target field being greater than or equal to a preset threshold in the upstream table is obtained and recommended to the user, and the user selects one field information from the plurality of field information as the supplementary description of the target field, so that different supplementary description modes can be flexibly selected in specific implementation. The method for performing supplementary description on the target field through the field similarity has the advantages that the accuracy of supplementary description is high, developers do not need to participate in the description in the whole process, only final confirmation is needed, and the overall working efficiency is greatly improved.
The embodiment acquires the pre-saved full amount of metadata from each source database; performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata; acquiring a target field to be supplemented and described from the full metadata, and determining a target table to which the target field belongs; determining an upstream table of the target table according to the blood relationship map; and analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity. Compared with the prior art, the method has the advantages that the metadata description is automatically supplemented for the target field according to the blood relationship among different data tables, so that the manual intervention is reduced, the cost is reduced, and the overall working efficiency is improved.
Further, based on the first embodiment of the method of supplementing a metadata description of the present invention, a second embodiment of the method of supplementing a metadata description of the present invention is proposed.
In this embodiment, the step of analyzing the similarity between the metadata field in the upstream table and the target field may include: carrying out deduplication processing on metadata fields in the upstream table; removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed; and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.
In this embodiment, considering that the metadata fields in the upstream table may be repeated, for example, one field may exist in multiple data tables at the same time, in order to improve the calculation efficiency, the metadata fields in the upstream table may be subjected to deduplication processing first. Specifically, referring to fig. 5, fig. 5 is a schematic diagram of a relationship between a target table and an upstream table in an embodiment of the present invention. Wherein, table1 is a target table, which contains a target field, list _ name, table2-table6 are upstream tables of table1, and table4 and table6 both include IDs, so that only one ID field can be reserved for deduplication.
Then, preset invalid fields, such as id, field, mark and other fields which are commonly used but have no actual business meaning, are removed from the metadata fields after the deduplication processing, and finally a metadata field set to be analyzed is obtained. And then, respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.
In an embodiment, the step of separately calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm may include: respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm; and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.
Specifically, the similarity analysis algorithm adopts a Levenshtein Distance algorithm, which is the minimum number of editing operations required for converting two character strings from one to another (the editing operations include replacing one character with another, inserting one character, and deleting one character), and the smaller the editing Distance, the greater the similarity between the two character strings. In this embodiment, the minimum number of editing operations required to convert each field in the metadata field set into the target field may be calculated according to an editing distance algorithm, and then the similarity between each field in the metadata field set and the target field may be calculated according to the minimum number of editing operations.
Certainly, in specific implementation, other algorithms may also be used to calculate the similarity between each field in the metadata field set and the target field, which is not limited in this embodiment.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating an analysis result of similarity analysis performed on the target field cure _ name in the target table1 of fig. 5. Through similarity analysis, the similarity between the Cust _ name, custname, c _ name, prod _ code and telephone in the metadata field set and the target field Cust _ name is respectively 1, 0.89,0.67,0.56,0.22 and 0.11.
And after the similarity is obtained through calculation, performing supplementary description on the target field according to the similarity.
In an embodiment, the step of performing supplementary description on the target field according to the similarity includes: determining fields with the highest similarity with the target fields from the metadata field set; and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.
For example, in fig. 6, a field Cust _ name having the highest similarity to the target field Cust _ name may be taken, and field description information of the Cust _ name may be obtained from a corresponding upstream table, and the field description information may be used as a supplementary description of the target field Cust _ name. This way a fast matching of the supplementary description information is achieved.
In another embodiment, the step of performing a supplementary description on the target field according to the similarity includes: according to the sequence of the similarity from high to low, fields in the metadata field set are arranged to obtain a supplementary description selection list of the target field; when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user; and acquiring field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.
For example, the fields in the metadata field set in fig. 6 are arranged in the order of high similarity to low similarity, at this time, the supplementary description selection list may be presented to the front-end user, the user selects one field from the list, then, the field description information of the field selected by the user is obtained from the corresponding upstream table, and the field description information is used as the supplementary description of the target field. By the method, the user can flexibly select the supplementary description information, and the flexibility of supplementary description of the metadata is improved.
The invention also provides a device for supplementing the metadata description. The device for supplementing metadata description in the embodiment of the invention comprises:
the first acquisition module is used for acquiring pre-stored full metadata from each source database;
the construction module is used for carrying out data blood relationship analysis on the full-amount metadata so as to construct blood relationship maps among different data tables of the full-amount metadata;
the second acquisition module is used for acquiring a target field to be subjected to supplementary description from the full metadata and determining a target table to which the target field belongs;
a determining module for determining an upstream table of the target table according to the blood relationship map;
and the supplementary description module is used for analyzing the similarity between the metadata field in the upstream table and the target field and carrying out supplementary description on the target field according to the similarity.
Further, the apparatus for supplementing metadata description further comprises:
the pulling module is used for pulling the metadata from a preset source database;
and the storage module is used for storing the pulled metadata into a preset metadata resource library.
Further, the building module comprises:
the acquisition unit is used for acquiring the data table types of the full metadata, wherein the data table types comprise a big data platform data table and a traditional relational data table;
the system comprises a first calling unit, a second calling unit and a third calling unit, wherein the first calling unit is used for calling a preset blood margin hook function to obtain blood margin relations among different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables;
and the second calling unit is used for calling a preset graph database engine and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.
Further, the supplementary description module is further configured to:
carrying out deduplication processing on metadata fields in the upstream table;
removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed;
and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.
Further, the supplementary description module is further configured to:
respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm;
and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.
Further, the supplementary description module is further configured to:
determining fields with the highest similarity with the target fields from the metadata field set;
and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.
Further, the supplementary description module is further configured to:
according to the sequence of the similarity from high to low, fields in the metadata field set are arranged to obtain a supplementary description selection list of the target field;
when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user;
and acquiring the field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.
The operations executed by the program modules may refer to the method embodiment of the present invention described in the supplementary metadata, and are not described herein again.
The invention also provides a storage medium.
The storage medium of the present invention has stored thereon a program of supplementary metadata description which, when executed by a processor, implements the steps of the method of supplementary metadata description as described above.
The method implemented when the program described with the supplementary metadata running on the processor is executed may refer to the embodiments of the method described with the supplementary metadata of the present invention, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention or the portions contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A method of supplementing a metadata description, the method comprising the steps of:
acquiring pre-stored full metadata from each source database;
performing data margin analysis on the full amount metadata to construct a margin map between different data tables of the full amount metadata;
acquiring a target field to be subjected to supplementary description from the full metadata, and determining a target table to which the target field belongs;
determining an upstream table of the target table according to the blood relationship map;
analyzing the similarity between the metadata field in the upstream table and the target field, and performing supplementary description on the target field according to the similarity;
the step of performing data margin analysis on the full-scale metadata to construct a margin map between different data tables of the full-scale metadata comprises:
acquiring data table types of the full amount of metadata, wherein the data table types comprise a big data platform data table and a traditional relational data table;
calling a preset blood relationship hook function to obtain blood relationship between different big data platform data tables and blood relationship between the big data platform data tables and the traditional relational data tables;
and calling a preset graph database engine, and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.
2. A method for supplementing metadata descriptions as recited in claim 1 wherein the step of obtaining a pre-saved full amount of metadata from each source database is preceded by the steps of:
pulling metadata from a preset source database;
and storing the pulled metadata into a preset metadata resource library.
3. A method of supplementing a metadata description as claimed in any one of claims 1 to 2 wherein said step of analysing the similarity of metadata fields in said upstream table to said target field comprises:
carrying out deduplication processing on metadata fields in the upstream table;
removing preset invalid fields from the metadata fields subjected to duplicate removal processing to obtain a metadata field set to be analyzed;
and respectively calculating the similarity between each field in the metadata field set and the target field according to a preset algorithm.
4. A method of supplementing metadata descriptions according to claim 3 wherein said step of separately calculating the similarity of each field in said set of metadata fields to said target field according to a predetermined algorithm comprises:
respectively calculating the minimum editing operation times required for converting each field in the metadata field set into the target field according to an editing distance algorithm;
and calculating the similarity between each field in the metadata field set and the target field according to the minimum editing operation times.
5. A method of supplementing a metadata description as recited in claim 3, wherein the step of supplementing the target field with a description based on the similarity includes:
determining fields with the highest similarity with the target fields from the metadata field set;
and acquiring the field description information of the field with the highest similarity with the target field, and taking the field description information of the field with the highest similarity with the target field as the supplementary description of the target field.
6. A method of supplementing a metadata description as recited in claim 3, wherein the step of supplementing the target field with a description based on the similarity includes:
arranging the fields in the metadata field set according to the sequence of the similarity from high to low to obtain a supplementary description selection list of the target field;
when a selection instruction triggered by a user based on the supplementary description selection list is received, determining a field selected by the user;
and acquiring the field description information of the field selected by the user, and taking the field description information of the field selected by the user as the supplementary description of the target field.
7. An apparatus for supplementing a metadata description, the apparatus comprising:
the first acquisition module is used for acquiring pre-stored full metadata from each source database;
the construction module is used for carrying out data blood relationship analysis on the full amount metadata so as to construct blood relationship maps among different data tables of the full amount metadata;
the second acquisition module is used for acquiring a target field to be subjected to supplementary description from the full metadata and determining a target table to which the target field belongs;
a determining module for determining an upstream table of the target table according to the blood relationship map;
the supplementary description module is used for analyzing the similarity between the metadata field in the upstream table and the target field and carrying out supplementary description on the target field according to the similarity;
the building module comprises:
the acquisition unit is used for acquiring the data table types of the full metadata, wherein the data table types comprise a big data platform data table and a traditional relational data table;
the system comprises a first calling unit, a second calling unit and a third calling unit, wherein the first calling unit is used for calling a preset blood margin hook function to obtain blood margin relations among different big data platform data tables and blood margin relations between the big data platform data tables and the traditional relational data tables;
and the second calling unit is used for calling a preset graph database engine and constructing the acquired blood relationship between the different big data platform data tables and the blood relationship between the big data platform data table and the traditional relational data table into a blood relationship graph.
8. An apparatus for supplementing a metadata description, the apparatus comprising: a memory, a processor, and a program of supplemental metadata description stored on the memory and executable on the processor, the program of supplemental metadata description implementing the steps of the method of supplemental metadata description as claimed in any of claims 1 to 6 when executed by the processor.
9. A storage medium, characterized in that the storage medium has stored thereon a program of supplementary metadata description, which when executed by a processor implements the steps of the method of supplementary metadata description according to any one of claims 1 to 6.
CN201910014080.9A 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description Active CN109739894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910014080.9A CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910014080.9A CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Publications (2)

Publication Number Publication Date
CN109739894A CN109739894A (en) 2019-05-10
CN109739894B true CN109739894B (en) 2022-12-09

Family

ID=66363782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910014080.9A Active CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Country Status (1)

Country Link
CN (1) CN109739894B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347882A (en) * 2019-06-27 2019-10-18 北京明略软件***有限公司 Consanguinity analysis method and device, storage medium and the electronic device of data
CN110442604B (en) * 2019-07-11 2022-03-11 新华三大数据技术有限公司 Data flow direction query method, data flow direction extraction method, data flow direction processing method and related devices
CN110399423A (en) * 2019-07-24 2019-11-01 北京明略软件***有限公司 Processing method and processing device, storage medium and the electronic device of metadata genetic connection
CN110647564B (en) * 2019-08-14 2023-11-24 中国平安财产保险股份有限公司 Hive table building method, electronic device and computer readable storage medium
CN112395287A (en) * 2019-08-19 2021-02-23 北京国双科技有限公司 Table classification method, table creation method, device, equipment and medium
CN110827934B (en) * 2019-08-19 2023-03-28 医渡云(北京)技术有限公司 CRF (conditional random access memory) monitoring method and device
CN111046242B (en) * 2019-11-27 2023-09-26 支付宝(杭州)信息技术有限公司 Data processing method, device, equipment and medium
CN112988698A (en) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 Data processing method and device
CN110968592B (en) * 2019-12-06 2023-11-21 深圳前海环融联易信息科技服务有限公司 Metadata acquisition method, metadata acquisition device, computer equipment and computer readable storage medium
CN111026736B (en) * 2019-12-13 2024-03-12 中盈优创资讯科技有限公司 Data blood margin management method and device and data blood margin analysis method and device
CN111125229B (en) * 2019-12-24 2024-06-28 杭州数梦工场科技有限公司 Data blood edge generation method and device and electronic equipment
CN111694858A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Data blood margin analysis method, device, equipment and computer readable storage medium
CN111782629B (en) * 2020-06-23 2024-05-17 京东科技控股股份有限公司 Feature processing script generation method and device
CN111488363B (en) * 2020-06-28 2020-10-02 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium
CN112100201B (en) * 2020-09-30 2024-02-06 东莞盟大集团有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112328599A (en) * 2020-11-12 2021-02-05 杭州数梦工场科技有限公司 Metadata-based field blood relationship analysis method and device
CN114691786A (en) * 2020-12-30 2022-07-01 中兴通讯股份有限公司 Method and device for determining data blood relationship, storage medium and electronic device
CN112711591B (en) * 2020-12-31 2021-10-08 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN114840563B (en) * 2021-02-01 2024-05-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for generating field description information
CN112817984B (en) * 2021-02-22 2023-10-20 杭州数梦工场科技有限公司 Data processing method and device, and data source acquisition method and device
CN113111636B (en) * 2021-05-17 2024-04-12 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN113626438B (en) * 2021-08-12 2024-04-30 深圳平安智汇企业信息管理有限公司 Data table management method, device, computer equipment and storage medium
CN116484084B (en) * 2023-06-21 2023-11-17 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169912A (en) * 2012-03-27 2014-11-26 株式会社东芝 Information processing terminal and method, and information management apparatus and method
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN105787051A (en) * 2016-02-26 2016-07-20 广州品唯软件有限公司 Analysis method and device based on metadata model
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169912A (en) * 2012-03-27 2014-11-26 株式会社东芝 Information processing terminal and method, and information management apparatus and method
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN105787051A (en) * 2016-02-26 2016-07-20 广州品唯软件有限公司 Analysis method and device based on metadata model
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method

Also Published As

Publication number Publication date
CN109739894A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109739894B (en) Method, device, equipment and storage medium for supplementing metadata description
CN110908997B (en) Data blood relationship construction method and device, server and readable storage medium
CN107784026B (en) ETL data processing method and device
JP5298117B2 (en) Data merging in distributed computing
EP3343411A1 (en) Sql auditing method and apparatus, server and storage device
US10102039B2 (en) Converting a hybrid flow
CN109376196B (en) Method and device for batch synchronization of redo logs
CN107506383B (en) Audit data processing method and computer equipment
CN108829884B (en) Data mapping method and device
CN106557307B (en) Service data processing method and system
CN107301214A (en) Data migration method, device and terminal device in HIVE
CN110009514B (en) Data extraction method, device, terminal and computer readable storage medium
CN110851511A (en) Data synchronization method and device
CN109471893B (en) Network data query method, equipment and computer readable storage medium
CN111008020A (en) Method for analyzing logic expression into general query statement
KR20150017052A (en) Method and system for performing workflow
CN111159268B (en) Method and device for running ETL (extract-transform-load) process in Spark cluster
CN113268500A (en) Service processing method and device and electronic equipment
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN115470191A (en) Database updating system, method and corresponding computer equipment and storage medium
CN113901021A (en) Method and device for generating upgrading script for multi-version database and electronic equipment
CN113535481A (en) Data backtracking method and device and nonvolatile computer readable storage medium
CN110188106B (en) Data management method and device
CN114385173A (en) Compiling method, device, equipment and storage medium
CN116775599A (en) Data migration method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant