CN117370356A - Method and related device for mapping metadata by data standard - Google Patents

Method and related device for mapping metadata by data standard Download PDF

Info

Publication number
CN117370356A
CN117370356A CN202311372494.1A CN202311372494A CN117370356A CN 117370356 A CN117370356 A CN 117370356A CN 202311372494 A CN202311372494 A CN 202311372494A CN 117370356 A CN117370356 A CN 117370356A
Authority
CN
China
Prior art keywords
metadata
standard
mapped
metadata information
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311372494.1A
Other languages
Chinese (zh)
Inventor
王夏霖
祁祥
陆俊健
张瑞珏
翟玉月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boc Financial Technology Suzhou Co ltd
Original Assignee
Boc Financial Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boc Financial Technology Suzhou Co ltd filed Critical Boc Financial Technology Suzhou Co ltd
Priority to CN202311372494.1A priority Critical patent/CN117370356A/en
Publication of CN117370356A publication Critical patent/CN117370356A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • G06F16/287Visualization; Browsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a related device for mapping metadata by a data standard, which can be applied to the field of big data or the field of finance. When standard mapping is required to be carried out on a target field, a data standard to be mapped for the target field is generated, metadata information of each field stored in a campus service database is pulled from a metadata management system, metadata information related to the data standard to be mapped is screened to obtain candidate metadata information, the matching degree of the data standard to be mapped and the candidate metadata information is calculated according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information, and the mapping relation between the data standard to be mapped and the target metadata information is obtained according to the matching degree so as to carry out compliance verification on the target metadata information. The method and the device can automatically match the mapping, save labor cost and improve the efficiency and accuracy of matching mapping.

Description

Method and related device for mapping metadata by data standard
Technical Field
The present disclosure relates to the field of data mapping technologies, and in particular, to a method and an apparatus for mapping metadata with data standards.
Background
At present, data under each field generated by each school department of a school is stored in a campus service database, metadata information of each field in the campus service database is stored in a metadata management system, and as data application systems used by each school department of the school are not uniform, data servers, data storage modes and data output expression forms used by each data application system are different, so that data of each school department of the school is not uniform.
In order to unify the data of each school department of a school, compliance verification is required to be performed on the metadata information stored in the metadata management system, wherein the compliance verification process is that after the establishment of the data standard is completed, the metadata information needing to be associated and mapped is manually pulled from the metadata management system, then the data standard is manually mapped and matched to the metadata information, the mapping relation between the data standard and the metadata information of each field is obtained, and finally the compliance verification is performed on each metadata having the mapping relation with the data standard.
However, due to the influence of many factors such as the expertise of the operator, working experience and the like, the results of mapping metadata information to the data standards by different operators may have deviation, which is time-consuming and labor-consuming and has larger mapping error.
Disclosure of Invention
In view of this, the present application provides a method and related device for mapping metadata with data standards, which are used for solving the problems of time and effort consuming and large mapping error caused by manually mapping data standards and metadata in the prior art, and the technical scheme is as follows:
a method of mapping metadata for a data standard, comprising:
when standard mapping is required to be carried out on the target field, generating a data standard to be mapped for the target field, wherein the data standard to be mapped comprises a standard Chinese name and a standard English name of the target field;
metadata information of each field stored in the campus service database is pulled from the metadata management system;
screening metadata information related to a data standard to be mapped from the metadata information of each field, wherein the screened metadata information is used as candidate metadata information, and the candidate metadata information comprises a metadata Chinese name and a metadata English name of a target field;
calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information;
And obtaining a mapping relation between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information, wherein the matching degree between the data standard to be mapped and the target metadata information is larger than a preset matching degree threshold.
Optionally, calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard chinese name and the standard english name contained in the data standard to be mapped and the metadata chinese name and the metadata english name contained in the candidate metadata information, includes:
splicing a standard Chinese name and a standard English name contained in a data standard to be mapped into a first character string;
splicing the Chinese names of the metadata and the English names of the metadata contained in the candidate metadata information into a second character string;
inputting the first character string and the second character string into a short text semantic matching model which is built in advance and is based on a long-short-time memory neural network, and obtaining the matching degree output by the short text semantic matching model as the matching degree of the data standard to be mapped and the candidate metadata information.
Optionally, pulling metadata information of each field stored in the campus service database from the metadata management system includes:
Periodically generating a first metadata acquisition request, and sending the first metadata acquisition request to a metadata management system through a remote call interface so as to pull metadata information of each field stored in a campus service database from the metadata management system;
and/or the number of the groups of groups,
and responding to the data pulling operation instruction of the user, generating a second metadata acquisition request, and sending the second metadata acquisition request to the metadata management system through a remote call interface so as to pull metadata information of each field stored in the campus service database from the metadata management system.
Optionally, the method further comprises:
displaying the metadata information of each pulled field on a first preset page in the form of a pull-down list;
screening metadata information related to a data standard to be mapped from the metadata information of each field, wherein the metadata information comprises:
and responding to a metadata checking operation instruction of a user on a first preset page, and screening metadata information related to the data standard to be mapped from the metadata information of each field.
Optionally, after generating the data standard to be mapped for the target field, before filtering metadata information related to the data standard to be mapped from metadata information of each field, the method further includes:
And approving the data standard to be mapped, and storing the data standard to be mapped into a preset data standard set after the approval is passed, wherein the data standards stored in the data standard set are of the same type.
Optionally, the mapping relationship includes a plurality of sub-mapping relationships, where the sub-mapping relationship refers to a mapping relationship between the data standard to be mapped and one target metadata information;
the method for mapping metadata by the data standard further comprises the following steps:
storing the mapping relation into a pre-established mapping relation form, and displaying the mapping relation form on a second preset page, wherein the mapping relation form comprises a standard Chinese name field, a standard English name field, a metadata Chinese name field, a metadata English name field and a matching degree field;
and responding to a deleting operation instruction of the user on a second preset page, and deleting the sub-mapping relation pointed by the deleting operation instruction from the mapping relation table.
Optionally, the data standard to be mapped further includes one or more pieces of information related to the target field: standard number, field type, and field length.
An apparatus for mapping metadata to data standards, comprising:
the standard generation module is used for generating a data standard to be mapped for the target field when standard mapping is required to be carried out on the target field, wherein the data standard to be mapped comprises a standard Chinese name and a standard English name of the target field;
The metadata pulling module is used for pulling metadata information of each field stored in the campus service database from the metadata management system;
the metadata screening module is used for screening metadata information related to the data standard to be mapped from the metadata information of each field, and the screened metadata information is used as candidate metadata information, wherein the candidate metadata information comprises a metadata Chinese name and a metadata English name of a target field;
the matching module is used for calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information;
and the mapping module is used for obtaining a mapping relation between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information, wherein the matching degree between the data standard to be mapped and the target metadata information is larger than a preset matching degree threshold value.
An electronic device includes a memory and a processor;
A memory for storing a program;
a processor for executing a program to perform the steps of the method of mapping metadata for data standards as described in any one of the above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of mapping metadata for a data standard as in any of the above.
According to the technical scheme, when standard mapping is required to be carried out on the target field, the method for mapping metadata by the data standard generates a data standard to be mapped for the target field, metadata information of each field stored in the campus service database is pulled from a metadata management system, metadata information related to the data standard to be mapped is screened from the metadata information of each field, the screened metadata information is used as candidate metadata information, the matching degree of the data standard to be mapped and the candidate metadata information is calculated according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the metadata information to be mapped, and the mapping relation between the data standard to be mapped and the target metadata information is obtained according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information. Therefore, the metadata information can be automatically screened, the screened candidate metadata information and the data standard to be mapped can be automatically matched and mapped, the whole mapping and matching process does not need to be manually participated, the labor cost is saved, and the efficiency and the accuracy of matching and mapping are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flowchart of a method for mapping metadata according to a data standard according to an embodiment of the present application;
FIG. 2 is a schematic diagram of LSTM-based short text semantic matching model training scenarios;
FIG. 3 is a schematic diagram of the structure of an LSTM-based short text semantic matching model;
fig. 4 is a schematic structural diagram of an apparatus for mapping metadata according to a data standard provided in an embodiment of the present application;
fig. 5 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The application provides a method and a related device for mapping metadata of data standards, and the method and the related device for mapping metadata of data standards provided by the application are described in detail through the following embodiments.
It should be noted that, the method and the related device for mapping metadata according to the data standard provided by the present application may be applied to the big data field or the financial field. The foregoing is merely an example, and is not intended to limit the application fields of the method for mapping metadata and related apparatuses for data standards provided in the present application.
It should be further noted that, metadata information related to the present application (including, but not limited to, metadata information analyzed by a user, stored metadata information, displayed metadata information, etc.) is information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with related laws and regulations and standards of related countries and regions.
Referring to fig. 1, a flowchart illustrating a method for mapping metadata according to an embodiment of the present application is shown, where the method for mapping metadata according to the data standard may include:
step S101, when standard mapping is needed to be carried out on the target field, a data standard to be mapped for the target field is generated.
Considering that the data application systems used by the school departments of the schools are not uniform, the data servers, the data storage modes and the data output expression forms used by the systems are different, so that metadata information of fields stored in a campus service database (which can be a relational database or a non-relational database, and is not particularly limited in the application), for example, 2 data tables, namely a student score table and a student card balance table, are stored in the campus service database, and the student score table is shown in the following table 1, and the student card balance table is shown in the following table 2.
Table 1 student score table
Id Number of school Student name Course Achievement
1 203150321 Zhang San Higher mathematics 85
2 203150322 Li Si Higher mathematics 83
3 203150323 Wang Wu Business English 77
4 203150324 Zhao Liu Higher mathematics 95
Table 2 student card balance meter
Id Number of school Name of the name Balance of balance Last consumption time
1 203150321 Zhang San 500.00 2023-3-15
2 203150322 Li Si 450.00 2023-3-14
3 203150323 Wang Wu 600.00 2023-3-13
4 203150324 Zhao Liu 800.50 2023-3-15
As shown in tables 1 and 2, each data table contains a plurality of fields, in which the same field may exist, however, the present inventors have found that even though the same field is the same, the metadata information under different data tables is represented differently due to the different data tables, for example, the metadata information is shown as follows for the fields "student name" in tables 1 and 2, respectively.
The metadata information of the field "student name" in table 1 is:
the metadata information of the field "student name" in table 2 is:
because the metadata information of the fields in each data table has different expression forms, in order to facilitate subsequent compliance verification, the data standard for each field may be first generated, and then the data standard and the metadata information of each field may be mapped in a matching manner, so as to facilitate subsequent compliance verification.
For convenience of description, the present embodiment is described taking a target field as an example, and when standard mapping needs to be performed on the target field, a data standard to be mapped for the target field may be generated.
In this embodiment of the present application, the data standard to be mapped includes a plurality of attributes related to the target field, where the plurality of attributes includes at least a standard chinese name and a standard english name, that is, the data standard to be mapped includes at least a standard chinese name of the target field and a standard english name of the target field.
Optionally, the data standard to be mapped may further include at least one of the following information related to the target field: standard number, field type, and field length.
Of course, other identification information indicating whether the target field is unique, for example, may be included in the data standard to be mapped.
For example, the target field is a student name, and the data criteria to be mapped generated for the student name are:
in one possible implementation, the present embodiment may pre-configure a plurality of attributes included in each data standard, for example, the configured attributes are shown in table 3 below.
Table 3 attributes contained in data standards
Sequence number Attribute code Attribute names Field type Field length Whether or not to be unique
1 Code Standard numbering Character type 200 Is that
2 Cname Standard Chinese name Character type 200 Whether or not
3 Ename Standard English name Character type 200 Whether or not
4
Then, the user may configure the target field based on table 3, and then the embodiment of the application may generate the standard field to be mapped in response to the configuration instruction of the user.
Step S102, metadata information of each field stored in the campus service database is pulled from the metadata management system.
As described in the background art, metadata information of each field stored in the campus service database is stored in the metadata management system, and in order to perform matching mapping on the metadata information and the data standard to be mapped, the metadata information can be pulled from the metadata management system through this step.
It should be noted that, the present embodiment does not limit the execution sequence of the present step and the previous step, that is, the present step may be executed before the previous step or may be executed after the previous step.
It should be further noted that, in the example in the previous step, the metadata information of each field included in each data table includes at least a chinese name and an english name of metadata of the field. Optionally, the metadata information further includes: metadata number, data format, data length, data source, etc.
Step S103, screening metadata information related to the data standard to be mapped from the metadata information of each field, wherein the screened metadata information is used as candidate metadata information.
As shown in the student score table and the student card balance table stored in the campus service database shown in step S101, the student score table and the student card balance table respectively include a plurality of fields, and each field included in each data table has corresponding metadata information. If the metadata information of each field contained in each data table is matched and mapped with the data standard to be mapped, the calculation amount is huge. In order to obtain the matching mapping result faster, the metadata information related to the data standard to be mapped can be screened from the metadata information of each field through the step, namely, the metadata information related to the target field is screened.
For convenience of subsequent description, the selected metadata information is used as candidate metadata information. Corresponding to the above-described "metadata information", the candidate metadata information includes at least a metadata chinese name and a metadata english name of the target field.
Step S104, calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information.
It should be noted that, when a plurality of candidate metadata information is screened in the previous step, the matching degree between the data standard to be mapped and the candidate metadata information needs to be calculated according to the standard chinese name and the standard english name included in the data standard to be mapped, and the metadata chinese name and the metadata english name included in each candidate metadata information.
Step 105, according to the matching degree of the data standard to be mapped and the candidate metadata information, the mapping relation between the data standard to be mapped and the target metadata information is obtained, so as to perform compliance verification on the target metadata information.
Here, the matching degree between the data standard to be mapped and the target metadata information is greater than a preset matching degree threshold, that is, the target metadata information refers to candidate metadata information whose matching degree with the data standard to be mapped is greater than the preset matching degree threshold.
For example, the candidate metadata information includes metadata information 1 to 10, and the matching degree between the candidate metadata information and the data standard to be mapped is respectively 0.8, 0.5, 0.7, 0.8, 0.9, 0.3, 0.1, 0.7, 0.6 and 0.9, and if the preset matching degree threshold is 0.5, the target metadata information is metadata information 1, metadata information 3, metadata information 4, metadata information 5, metadata information 8, metadata information 9 and metadata information 10, and the mapping relationship obtained in this step includes: a sub-mapping relation between a data standard to be mapped and metadata information 1, a sub-mapping relation between a data standard to be mapped and metadata information 3, a sub-mapping relation between a data standard to be mapped and metadata information 4, a sub-mapping relation between a data standard to be mapped and metadata information 5, a sub-mapping relation between a data standard to be mapped and metadata information 8, a sub-mapping relation between a data standard to be mapped and metadata information 9, and a sub-mapping relation between a data standard to be mapped and metadata information 10. Then, the metadata information 1, the metadata information 3, the metadata information 4, the metadata information 5, the metadata information 8, the metadata information 9, and the metadata information 10 may be subsequently subjected to compliance verification based on the mapping relationship obtained in this step.
According to the method for mapping metadata by the data standard, when standard mapping is needed to be carried out on the target field, the data standard to be mapped for the target field is generated, metadata information of each field stored in the campus service database is pulled from the metadata management system, metadata information related to the data standard to be mapped is screened from the metadata information of each field, the screened metadata information is used as candidate metadata information, the matching degree of the data standard to be mapped and the candidate metadata information is calculated according to standard Chinese names and standard English names contained in the data standard to be mapped and the metadata Chinese names and the metadata English names contained in the candidate metadata information, and the mapping relation between the data standard to be mapped and the target metadata information is obtained according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information. Therefore, the metadata information can be automatically screened, the screened candidate metadata information and the data standard to be mapped can be automatically matched and mapped, the whole mapping and matching process does not need to be manually participated, the labor cost is saved, and the efficiency and the accuracy of matching and mapping are improved.
In some embodiments of the present application, a process of calculating the matching degree between the data standard to be mapped and the candidate metadata information according to the standard chinese name and the standard english name included in the data standard to be mapped, and the metadata chinese name and the metadata english name included in the candidate metadata information in step S104 is described.
The present embodiment can be implemented in a variety of ways, including but not limited to the following.
In the first manner, the matching degree between the data standard to be mapped and the candidate metadata information is calculated through a preset matching degree algorithm, and the matching degree algorithm may include a maximum-minimum method, an arithmetic average-minimum method, a geometric average-minimum method, a correlation coefficient method, an exponential method and the like, which are not particularly limited herein.
The second way is implemented by a model.
In this embodiment, a short text semantic matching model based on a long-short-term memory neural network (LSTM) is provided, which can learn semantic features and time sequence features in texts by using LSTM units, fully measure similarity between different texts by combining context information, and finally obtain a matching degree calculation result of two input texts.
The short text semantic matching model based on the LSTM is obtained by training a first training character string spliced by a standard Chinese name and a standard English name contained in a training data standard, a second training character string spliced by a metadata Chinese name and a metadata English name contained in corresponding candidate metadata information as a training sample and a labeled matching degree label as a sample label.
Alternatively, the training environment is shown in table 4.
Table 4 model training environment
The training environments described above are merely examples and are not limiting on the embodiments of the present application.
When training the model, the training period was set to 200, the initial learning rate of the model was set to 0.0001, and in order to prevent overfitting during model training, the learning rate was attenuated 10 times every 20 periods as the training period was increased, and specific model training parameters are shown in table 5 below.
Table 5 model training parameters
Referring to fig. 2, which is a schematic diagram of the training situation of the LSTM-based short text semantic matching model provided in this embodiment, as shown in fig. 2, when the number of re-iterations of the LSTM-based short text semantic matching model provided in this embodiment is about 125, the accuracy rate reaches the highest, and then the model parameters of this event are saved as the best model parameters in this embodiment, where the model is the model trained in this embodiment.
After training the model, the embodiment of the application can splice the standard Chinese name and the standard English name contained in the data standard to be mapped into a first character string, splice the metadata Chinese name and the metadata English name contained in the candidate metadata information into a second character string, and then input the first character string and the second character string into a pre-established short text semantic matching model based on LSTM to obtain the matching degree output by the short text semantic matching model, wherein the matching degree is used as the matching degree of the data standard to be mapped and the candidate metadata information.
Alternatively, the matching degree may be a value between 0 and 1, where a closer matching degree to 1 indicates that the data standard to be mapped matches the candidate metadata information.
Alternatively, as shown in FIG. 3, the LSTM-based short text semantic matching model includes a character vector acquisition module, a word semantic extraction module, an attention module, a pooling module, and a prediction module.
The input layer is the first character string and the second character string, and the first character string and the second character string can be input to the character vector acquisition module to obtain vector representations of characters contained in the first character string and the second character string respectively. Optionally, the character vector acquisition module constructs character-level vectors using mainly Word2Vec, providing a vector representation for each pair of first and second strings.
Alternatively, the word semantic extraction module is mainly composed of three LSTM layers, wherein the first LSTM layer has 120 LSTM neurons, the second LSTM layer has 64 LSTM neurons, and the third LSTM layer has 32 LSTM neurons. The vector representation of each character contained in the first character string and the second character string respectively can be input into a word semantic extraction module, and semantic features of each word contained in the first character string and the second character string respectively can be extracted through the word semantic extraction module.
And then, judging the importance of the extracted semantic features by using the attention module, highlighting important information in the first character string and the second character string, reducing the sensitivity of the model to meaningless information, and improving the matching efficiency of the whole model.
And finally, calculating the weight of the semantic features of each word obtained by the attention module and the semantic features of each word based on a similarity calculation function to obtain the matching degree of model output.
For example, taking the metadata information of "student name" in the student score table provided above as an example, the first character string and the second character string { TEXT std ,TEXT meta1 The model output is 0.757 when = { "student name", "student name of student" } is input into the model and the matching degree is calculated; taking the metadata information of "student name" in the student card balance table provided above as an example, the first character string and the second character string { TEXT } std ,TEXT meta2 The model output is 0.511 when = { "student name", "name" } is input into the model and the matching degree is calculated. Here, TEXT std Representing the first character string, TEXT meta1 Metadata information representing "student name" in student score tableSecond character string, TEXT meta2 And a second character string corresponding to metadata information representing the 'student name' in the student card balance table.
In the embodiment of the application, the matching degree is calculated through the model, so that the accuracy of calculating the matching degree is improved, and the accuracy of the mapping relation determined later is further improved.
In some embodiments, the following description will be given of the process of "step S102, pulling metadata information of each field stored in the campus service database from the metadata management system" and "step S103, screening metadata information related to the data standard to be mapped from the metadata information of each field".
Optionally, in step S102, metadata information of each field stored in the campus service database may be automatically pulled from the metadata management system periodically.
Specifically, the embodiment may periodically generate a first metadata acquisition request, and send the first metadata acquisition request to the metadata management system through the remote call interface, so as to pull metadata information of each field stored in the campus service database from the metadata management system.
Preferably, when periodically pulling, only the metadata information updated in the period can be pulled, and all metadata information does not need to be pulled every time, so that the time for pulling the metadata information is saved, and the efficiency is improved.
Of course, in step S102, metadata information of each field stored in the campus service database may also be manually pulled from the metadata management system by the user. That is, in response to a data pulling operation instruction of a user, a second metadata acquisition request is generated, and the second metadata acquisition request is sent to the metadata management system through a remote call interface, so that metadata information of each field stored in the campus service database is pulled from the metadata management system.
Alternatively, the remote call interface may be an interface using an Open feign component, that is, the embodiment of the present application may use the Open feign component as a middleware to send the first metadata acquisition request and/or the second metadata acquisition request to the metadata management system.
Here, openFeign is a member of the Spring Cloud family, which can provide a very concise and efficient RPC (Remote Procedure Call Protocol ) call mode for the Rest API in HTTP (HyperText Transfer Protocol ) form, i.e. can realize a remote HTTP request without perceived operation when a remote method is called.
In an alternative embodiment, after the metadata information of each field is pulled from the metadata management system, the metadata information of each field that is pulled may be cached locally and displayed on the first preset page in the form of a drop-down list, so that a user may operate on the metadata information of each field that is pulled on the first preset page.
Optionally, the operation of the user may be a checking operation, and in this embodiment, metadata information related to the data standard to be mapped may be screened from metadata information of each field in response to a metadata checking operation instruction of the user on the first preset page.
Of course, the metadata information of each pulled field may also be displayed on the first preset page in other forms, for example, metadata information of all fields is directly displayed, which is not limited in this application.
For example, the user may hook up candidate metadata information waiting for mapping in a drop down list of the first preset page, and then click on the "smart map" button to begin the smart map work in the background of the system.
In another alternative embodiment, the process of "step S103, filtering metadata information related to the data standard to be mapped from the metadata information of each field" may further include: the method comprises the steps of obtaining standard numbers of target fields contained in data standards to be mapped and metadata numbers of fields contained in metadata information of fields, and screening metadata information related to the data standards to be mapped from the metadata information of the fields according to the standard numbers of the target fields and the metadata numbers of the fields.
That is, according to the embodiment of the application, the candidate metadata information can be manually screened by the user, the candidate metadata information can also be automatically screened, the screened candidate metadata information is more in line with the user demand by the manual mode, the manpower resources are saved by the automatic mode, and the user experience is better.
In yet another possible implementation manner, considering that the generated data standard to be mapped may be in error, in order to avoid that the error data standard to be mapped causes misleading for the subsequent mapping process, it is preferable to consume mapping time, after generating the data standard to be mapped for the target field, examine and approve the data standard to be mapped before screening metadata information related to the data standard to be mapped from metadata information of each field, and store the data standard to be mapped into a preset data standard set after the examination and approval is passed, where the data standard stored in the data standard set is the same type.
That is, after the standard approval of the data to be mapped passes, the subsequent screening and mapping steps are performed.
For example, the present embodiment may be optionally applied to a data standard mapping system, where the data standard mapping system includes a data standard formulation issuing module, a data standard modification approval module, a data standard mapping management module, and the like. Firstly, a data standard making and issuing module generates a data standard to be mapped for a target field, then a user can click an issuing button to issue the data standard to be mapped to a data standard changing and approving module, the data standard changing and approving module is used for approving the data standard to be mapped, if the data standard is approved, a data standard mapping management module can carry out subsequent screening and mapping processes, if the data standard is approved, the data standard mapping management module needs to be modified and issued again, and the metadata information and the mapping process of the data standard to be mapped can be carried out until the data standard is approved.
The data standard set is a set of data standards, and may be used for unified management of data standards and data standard attributes, in this embodiment, the same type of data standard is generally classified into one data standard set, and the type of data standard may be manually defined or may be obtained by analyzing based on each attribute of the data standard.
Optionally, in the approval process of this embodiment, the data standard set may be further used as a unit, that is, after a plurality of data standards of the same type are generated, the plurality of data standards of the same type are put into one data standard set, then the release button is clicked, the data standard set is released to the data standard modification approval module to approve, if the data standard set has a problem or does not meet the requirement, the data standard set is returned to the data standard formulation release module to modify, if the content in the data standard set meets the requirement, the approval is passed, and at this time, the data standard current collection is transferred to the data standard mapping management module to perform the subsequent screening and mapping process.
The embodiment can ensure that the data standard to be mapped used in the subsequent mapping is an accurate data standard by examining and approving the data standard to be mapped, thereby avoiding the problem of error in the mapping process caused by error in the data standard to be mapped and improving the mapping efficiency.
As described in step S105, the mapping relationship obtained in the embodiment of the present application includes a plurality of sub-mapping relationships, where the sub-mapping relationship refers to a mapping relationship between the data standard to be mapped and one target metadata information.
Based on this, optionally, after obtaining the mapping relationship, the embodiment may store the mapping relationship in a pre-established mapping relationship table, and display the mapping relationship table on a second preset page, where the mapping relationship table includes a standard chinese name field, a standard english name field, a metadata chinese name field, a metadata english name field, and a matching degree field; if the user performs the deletion operation on the mapping relation form on the second preset page, the sub-mapping relation pointed by the deletion operation instruction can be deleted from the mapping relation form in response to the deletion operation instruction of the user on the second preset page.
That is, the embodiment allows the user to perform secondary screening on the obtained mapping relationship, so as to manually delete the matching result of the data standard to be mapped and part of the target metadata information, and after the secondary screening is completed, the mapping relationship is formally established between the data standard to be mapped and the rest of the target metadata information in the mapping relationship form.
In summary, after the data standard to be mapped is generated, metadata information of each field is obtained in batches through a remote interface, matching degree of the data standard to be mapped and candidate metadata information is calculated by using a model, a mapping relation table (which can be understood as a recommendation list) is summarized based on the calculated matching degree, and the mapping relation table can be further screened and checked manually, and then the data standard to be mapped is formally mapped to target metadata information.
In the standard metadata mapping process, the method combines the modes of automatic mapping and manual screening of the matching results, and improves the efficiency of the mapping process and the accuracy of the mapping results.
The embodiment of the application further provides a device for mapping metadata of the data standard, and the device for mapping metadata of the data standard and the method for mapping metadata of the data standard described below can be referred to correspondingly.
Referring to fig. 4, a schematic structural diagram of an apparatus for mapping metadata according to an embodiment of the present application is shown, and as shown in fig. 4, the apparatus for mapping metadata according to the data standard may include: a standard generation module 401, a metadata pull module 402, a metadata screening module 403, a matching module 404, and a mapping module 405.
The standard generating module 401 is configured to generate a data standard to be mapped for the target field when standard mapping is required for the target field, where the data standard to be mapped includes a standard chinese name and a standard english name of the target field.
The metadata pulling module 402 is configured to pull metadata information of each field stored in the campus service database from the metadata management system.
The metadata filtering module 403 is configured to filter metadata information related to the data standard to be mapped from metadata information of each field, where the filtered metadata information is used as candidate metadata information, and the candidate metadata information includes a metadata chinese name and a metadata english name of the target field.
And the matching module 404 is configured to calculate a matching degree between the data standard to be mapped and the candidate metadata information according to the standard chinese name and the standard english name included in the data standard to be mapped, and the metadata chinese name and the metadata english name included in the candidate metadata information.
And the mapping module 405 is configured to obtain a mapping relationship between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information, so as to perform compliance verification on the target metadata information, where the matching degree between the data standard to be mapped and the target metadata information is greater than a preset matching degree threshold.
In one possible implementation manner, the matching module may include: the system comprises a first splicing module, a second splicing module and a model calculation module.
And the first splicing module is used for splicing the standard Chinese names and the standard English names contained in the data standard to be mapped into a first character string.
And the second splicing module is used for splicing the metadata Chinese names and the metadata English names contained in the candidate metadata information into a second character string.
The model calculation module is used for inputting the first character string and the second character string into a short text semantic matching model which is built in advance and is based on the long-short-time memory neural network, and obtaining the matching degree output by the short text semantic matching model, wherein the matching degree is used as the matching degree of the data standard to be mapped and the candidate metadata information.
In one possible implementation manner, the metadata pulling module may specifically be used for:
periodically generating a first metadata acquisition request, and sending the first metadata acquisition request to a metadata management system through a remote call interface so as to pull metadata information of each field stored in a campus service database from the metadata management system;
and/or the number of the groups of groups,
and responding to the data pulling operation instruction of the user, generating a second metadata acquisition request, and sending the second metadata acquisition request to the metadata management system through a remote call interface so as to pull metadata information of each field stored in the campus service database from the metadata management system.
In one possible implementation manner, the apparatus for mapping metadata by using the data standard provided in the embodiment of the present application may further include: and a metadata display module.
And the metadata display module is used for displaying the metadata information of each pulled field on a first preset page in the form of a pull-down list.
Based on this, the metadata filtering module may specifically be configured to respond to a metadata checking operation instruction of the user on the first preset page, and filter metadata information related to the data standard to be mapped from metadata information of each field.
In one possible implementation manner, the apparatus for mapping metadata by using the data standard provided in the embodiment of the present application may further include: and an approval module.
And the approval module is used for approving the data standard to be mapped before screening the metadata information related to the data standard to be mapped from the metadata information of each field after generating the data standard to be mapped for the target field, and storing the data standard to be mapped into a preset data standard set after approval, wherein the data standards stored in the data standard set are of the same type.
In one possible implementation manner, the mapping relationship includes a plurality of sub-mapping relationships, where the sub-mapping relationships refer to mapping relationships between the data standard to be mapped and one target metadata information.
Based on this, the apparatus for mapping metadata according to the data standard provided in the embodiment of the present application may further include: and the form storage display module and the sub-mapping relation deleting module.
The form storage display module is used for storing the mapping relation into a pre-established mapping relation form and displaying the mapping relation form on a second preset page, wherein the mapping relation form comprises a standard Chinese name field, a standard English name field, a metadata Chinese name field, a metadata English name field and a matching degree field.
And the sub-mapping relation deleting module is used for responding to a deleting operation instruction of the user on a second preset page and deleting the sub-mapping relation pointed by the deleting operation instruction from the mapping relation table.
In a possible implementation manner, the data standard to be mapped further includes one or more pieces of information related to the target field: standard number, field type, and field length.
The embodiment of the application also provides electronic equipment. Alternatively, fig. 5 shows a block diagram of a hardware structure of an electronic device, and referring to fig. 5, the hardware structure of the electronic device may include: at least one processor 501, at least one communication interface 502, at least one memory 503, and at least one communication bus 504;
In the embodiment of the present application, the number of the processor 501, the communication interface 502, the memory 503, and the communication bus 504 is at least one, and the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504;
the processor 501 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 503 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), etc., such as at least one magnetic disk memory;
wherein the memory 503 stores a program, the processor 501 may call the program stored in the memory 503, the program being for:
when standard mapping is required to be carried out on the target field, generating a data standard to be mapped for the target field, wherein the data standard to be mapped comprises a standard Chinese name and a standard English name of the target field;
metadata information of each field stored in the campus service database is pulled from the metadata management system;
screening metadata information related to a data standard to be mapped from the metadata information of each field, wherein the screened metadata information is used as candidate metadata information, and the candidate metadata information comprises a metadata Chinese name and a metadata English name of a target field;
Calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information;
and obtaining a mapping relation between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information, wherein the matching degree between the data standard to be mapped and the target metadata information is larger than a preset matching degree threshold.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the application also provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements a method for mapping metadata according to the data standard.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of mapping metadata to data standards, comprising:
when standard mapping is required to be carried out on a target field, generating a data standard to be mapped for the target field, wherein the data standard to be mapped comprises a standard Chinese name and a standard English name of the target field;
metadata information of each field stored in the campus service database is pulled from the metadata management system;
Screening metadata information related to the data standard to be mapped from the metadata information of each field, wherein the screened metadata information is used as candidate metadata information, and the candidate metadata information comprises a metadata Chinese name and a metadata English name of the target field;
calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information;
and obtaining a mapping relation between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information, wherein the matching degree between the data standard to be mapped and the target metadata information is larger than a preset matching degree threshold.
2. The method of mapping metadata according to claim 1, wherein calculating the matching degree between the data standard to be mapped and the candidate metadata information according to the standard chinese name and the standard english name included in the data standard to be mapped and the metadata chinese name and the metadata english name included in the candidate metadata information includes:
Splicing the standard Chinese names and the standard English names contained in the data standard to be mapped into a first character string;
splicing the metadata Chinese names and the metadata English names contained in the candidate metadata information into a second character string;
and inputting the first character string and the second character string into a short text semantic matching model which is established in advance and is based on a long-short-time memory neural network, and obtaining the matching degree output by the short text semantic matching model as the matching degree of the data standard to be mapped and the candidate metadata information.
3. The method for mapping metadata according to claim 1, wherein the pulling metadata information of each field stored in the campus service database from the metadata management system comprises:
periodically generating a first metadata acquisition request, and sending the first metadata acquisition request to the metadata management system through a remote call interface so as to pull metadata information of each field stored in the campus service database from the metadata management system;
and/or the number of the groups of groups,
and responding to a data pulling operation instruction of a user, generating a second metadata acquisition request, and sending the second metadata acquisition request to the metadata management system through the remote call interface so as to pull metadata information of each field stored in the campus service database from the metadata management system.
4. A method of mapping metadata to data standards as in claim 3, further comprising:
displaying the metadata information of each pulled field on a first preset page in the form of a pull-down list;
the filtering the metadata information related to the data standard to be mapped from the metadata information of each field includes:
and responding to a metadata checking operation instruction of a user on the first preset page, and screening metadata information related to the data standard to be mapped from the metadata information of each field.
5. The method of mapping metadata according to claim 1, further comprising, after said generating the data standard to be mapped for the target field, before said filtering metadata information related to the data standard to be mapped from the metadata information of the respective fields:
and approving the data standard to be mapped, and storing the data standard to be mapped into a preset data standard set after the approval is passed, wherein the data standards stored in the data standard set are of the same type.
6. The method for mapping metadata according to claim 1, wherein the mapping relationship includes a plurality of sub-mapping relationships, the sub-mapping relationships being mapping relationships between the data standard to be mapped and one of the target metadata information;
The method for mapping metadata by the data standard further comprises the following steps:
storing the mapping relation into a pre-established mapping relation form, and displaying the mapping relation form on a second preset page, wherein the mapping relation form comprises a standard Chinese name field, a standard English name field, a metadata Chinese name field, a metadata English name field and a matching degree field;
and responding to a deleting operation instruction of a user on the second preset page, and deleting the sub-mapping relation pointed by the deleting operation instruction from the mapping relation form.
7. The method of mapping metadata according to any of claims 1-6, wherein the data standard to be mapped further comprises one or more pieces of information related to the target field: standard number, field type, and field length.
8. An apparatus for mapping metadata to data standards, comprising:
the standard generation module is used for generating a data standard to be mapped for the target field when standard mapping is required to be carried out on the target field, wherein the data standard to be mapped comprises a standard Chinese name and a standard English name of the target field;
The metadata pulling module is used for pulling metadata information of each field stored in the campus service database from the metadata management system;
the metadata screening module is used for screening metadata information related to the data standard to be mapped from the metadata information of each field, and the screened metadata information is used as candidate metadata information, wherein the candidate metadata information comprises a metadata Chinese name and a metadata English name of the target field;
the matching module is used for calculating the matching degree of the data standard to be mapped and the candidate metadata information according to the standard Chinese name and the standard English name contained in the data standard to be mapped and the metadata Chinese name and the metadata English name contained in the candidate metadata information;
and the mapping module is used for obtaining a mapping relation between the data standard to be mapped and the target metadata information according to the matching degree of the data standard to be mapped and the candidate metadata information so as to carry out compliance verification on the target metadata information, wherein the matching degree between the data standard to be mapped and the target metadata information is larger than a preset matching degree threshold value.
9. An electronic device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the method for mapping metadata according to any one of claims 1 to 7.
10. A readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the steps of the method of mapping metadata for data standards according to any one of claims 1 to 7.
CN202311372494.1A 2023-10-23 2023-10-23 Method and related device for mapping metadata by data standard Pending CN117370356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311372494.1A CN117370356A (en) 2023-10-23 2023-10-23 Method and related device for mapping metadata by data standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311372494.1A CN117370356A (en) 2023-10-23 2023-10-23 Method and related device for mapping metadata by data standard

Publications (1)

Publication Number Publication Date
CN117370356A true CN117370356A (en) 2024-01-09

Family

ID=89392472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311372494.1A Pending CN117370356A (en) 2023-10-23 2023-10-23 Method and related device for mapping metadata by data standard

Country Status (1)

Country Link
CN (1) CN117370356A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992443A (en) * 2024-04-07 2024-05-07 云启智慧科技有限公司 Data management system based on knowledge management and identification main data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992443A (en) * 2024-04-07 2024-05-07 云启智慧科技有限公司 Data management system based on knowledge management and identification main data

Similar Documents

Publication Publication Date Title
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US20210081611A1 (en) Methods and systems for language-agnostic machine learning in natural language processing using feature extraction
CN105573966B (en) Adaptive modification of content presented in a spreadsheet
US10402163B2 (en) Intelligent data extraction
US11972201B2 (en) Facilitating auto-completion of electronic forms with hierarchical entity data models
US9268766B2 (en) Phrase-based data classification system
CN106651057B (en) Mobile terminal user age prediction method based on installation package sequence list
CN111159220B (en) Method and apparatus for outputting structured query statement
CN110555451A (en) information identification method and device
CN117370356A (en) Method and related device for mapping metadata by data standard
CN107291774B (en) Error sample identification method and device
US20220172712A1 (en) Machine learning to propose actions in response to natural language questions
CN109409419B (en) Method and apparatus for processing data
JP2023523191A (en) ACCOUNT IDENTIFICATION METHODS, DEVICES, ELECTRONIC DEVICES AND COMPUTER-READABLE MEDIA
CN110780970B (en) Data screening method, device, equipment and computer readable storage medium
US10705810B2 (en) Automatic code generation
CN113610215B (en) Task processing network generation method, task processing device and electronic equipment
US20230351172A1 (en) Supervised machine learning method for matching unsupervised data
CN113254612A (en) Knowledge question-answering processing method, device, equipment and storage medium
CN109885504B (en) Recommendation system test method, device, medium and electronic equipment
CN112131379A (en) Method, device, electronic equipment and storage medium for identifying problem category
CN110019547B (en) Method, device, equipment and medium for acquiring association relation between clients
CN117131208B (en) Industrial science and technology text data pushing method, device, equipment and medium
CN113886779A (en) Method for identifying person identity, storage medium and computer program product
CN115204883A (en) Data processing method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination