CN114185869A

CN114185869A - Data model auditing method based on data standard

Info

Publication number: CN114185869A
Application number: CN202111463766.XA
Authority: CN
Inventors: 王峰
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-03-15

Abstract

The invention discloses a data model auditing method based on a data standard, which belongs to the technical field of data model auditing and solves the problems of easy occurrence of data quality problem and generation of more reconstruction cost caused by the fact that the data standard is not introduced in the data model design process in the existing scheme. The invention provides a method for auditing the data model, and introduces the data standard in the stage of designing the data model to replace the manual evaluation of the data model after the design is finished, thereby not only monitoring the data quality in advance, avoiding the problem of the data quality as much as possible, but also reducing the cost generated by modifying the data model.

Description

Data model auditing method based on data standard

Technical Field

The invention belongs to the technical field of data model auditing, and particularly relates to a data model auditing method based on a data standard.

Background

At present, the application of data models in various fields is more and more common, so that judging whether the design of the data models passes or not becomes a subject which cannot be bypassed. In the existing scheme, whether a data model is understood by a basic model-based expert for business and data is judged, data standards are not involved in the design process of the data model, and data quality problems and certain transformation cost can be brought in the follow-up process.

Disclosure of Invention

The invention discloses a data model auditing method based on a data standard, aiming at solving the problems that the data quality is easy to occur and more reconstruction cost is generated because the data standard is not introduced in the data model design process in the existing scheme.

The technical scheme of the invention is as follows:

the invention relates to a data model auditing method based on data standards, which comprises the following steps:

s1: collecting data model design information: acquiring entity attributes in a data model in a design stage, wherein the entity attributes comprise entity attribute names and entity attribute business meanings, and setting the corresponding entity attribute names and entity attribute business meanings as entity attribute text data of a key-value pair structure;

s2: calculating a similarity coefficient: acquiring a standard information item in the existing data standard, wherein the standard information item comprises a standard information item name and a standard information item service meaning, setting the corresponding standard information item name and the standard information item service meaning as standard information item text data with a key-value pair structure, performing text word segmentation processing on the entity attribute text data and the standard information item text data, and calculating a similarity coefficient of each entity attribute text data and the standard information item text data according to a text word segmentation processing result;

s3: data comparison and arrangement: according to the obtained similarity coefficient in the step S2, eliminating the combination of the entity attribute text data and the standard information item text data, the similarity coefficient of which does not meet the requirements, according to the requirements of the user, and sorting the combination of the entity attribute text data and the standard information item text data, the similarity coefficient of which meets the requirements, in descending order according to the similarity coefficient;

s4: model auditing: if the similarity coefficient shows that the entity attribute text data is completely the same as the standard information item text data, directly checking whether the corresponding entity attribute is consistent with the standard information item, if so, checking the entity attribute to pass, otherwise, checking the entity attribute not to pass; if the similarity coefficient shows that the entity attribute text data is different from the standard information item text data, manually determining the standard information item text data which is most similar to the entity attribute text data according to the similarity coefficient, after determining the standard information item text data, checking whether the corresponding entity attribute is consistent with the standard information item, if so, checking the entity attribute to pass, otherwise, checking the entity attribute not to pass;

s5: and (3) model audit feedback: feeding back the result of step S4, for the entity attribute that failed step S4, returning the entity attribute, and modifying the data model according to the returned entity attribute.

The working principle of the technical scheme is as follows:

the method comprises the steps of collecting entity attributes in a data model in a design stage, obtaining standard information items in a data standard, calculating similarity coefficients of entity attribute text data and standard information item text data, executing different operations on the entity attributes according to different execution of the similarity coefficients, judging whether the entity attributes pass audit or not, returning the entity attributes which do not pass audit to a data model designer, and modifying the data model.

Compared with the prior art, the technical scheme has the advantages that the data standard is introduced in the data model design stage to replace manual evaluation of the data model after the data model is designed, so that the data quality is monitored in advance, the data quality problem is avoided as much as possible, and meanwhile, the cost for modifying the data model is reduced.

Further, the entity attribute further includes an entity attribute data type, an entity attribute data length, and an entity attribute data precision.

By setting the entity attributes, a basis is provided for auditing the entity attributes, and the accuracy of model auditing is improved.

Further, the standard information item also comprises a standard information item technical attribute and a marking information item management attribute.

By setting the standard information items, a basis is provided for auditing the entity attributes, and the accuracy of model auditing is further improved.

Further, the similarity coefficient is calculated by J (a, B) ═ a ═ B/a ═ B, if the similarity coefficient is 1, it means that a and B are identical, and if the similarity coefficient is less than 1, it means that a and B are not identical.

Through the setting of the similarity function, the similarity between the entity attribute text and the standard information item text can be visually seen, and meanwhile, the judgment standard is convenient to set.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

1. the data standard is introduced in the stage of designing the data model to replace manual evaluation of the data model after the design is finished, so that the data quality is monitored in advance, the problem of the data quality is avoided as much as possible, and the cost for modifying the data model is reduced.

2. By setting the entity attributes, a basis is provided for auditing the entity attributes, and the accuracy of model auditing is improved.

3. By setting the standard information items, a basis is provided for auditing the entity attributes, and the accuracy of model auditing is further improved.

4. Through the setting of the similarity function, the similarity between the entity attribute text and the standard information item text can be visually seen, and meanwhile, the judgment standard is convenient to set.

Drawings

FIG. 1 is a flowchart of a data model auditing method based on data standards according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of embodiments of the present application, generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a method for auditing a data model based on data standards according to this embodiment includes the following steps:

Specifically, the similarity coefficient of the entity attribute text and the standard information item text is calculated after word segmentation processing is carried out on the entity attribute text and the standard information item text; and if the standard information item cannot be determined manually, storing the corresponding entity attribute into a data standard pending library, and if the verification is passed, forming a new standard information item.

The working principle of the above embodiment is as follows:

The entity attributes further comprise entity attribute data type, entity attribute data length and entity attribute data precision.

The standard information item also comprises a standard information item technical attribute and a marking information item management attribute.

Specifically, the standard information item technical attributes include data type, data length and data precision.

The similarity coefficient is calculated by the formula J (A, B) ═ A.n.B/A.u.B, if the similarity coefficient is 1, then A and B are completely the same, if the similarity coefficient is less than 1, then A and B are not completely the same.

Specifically, a combination of the entity attribute text having a similarity coefficient of less than 0.3 and the standard information text is excluded.

The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, without departing from the technical idea of the present application, several changes and modifications can be made, which are all within the protection scope of the present application.

Claims

1. A method for auditing a data model based on data standards is characterized by comprising the following steps:

2. The method of claim 1, wherein the entity attributes further include entity attribute data type, entity attribute data length, and entity attribute data precision.

3. The method of claim 1, wherein the standard information items further comprise standard information item technical attributes and annotation information item management attributes.

4. The method of claim 1, wherein the similarity coefficient is calculated as J (A, B) ═ An B/Au B, if the similarity coefficient is 1, then A and B are completely the same, and if the similarity coefficient is less than 1, then A and B are not completely the same.