CN110008193B - Data standardization method and device - Google Patents

Data standardization method and device Download PDF

Info

Publication number
CN110008193B
CN110008193B CN201910304451.7A CN201910304451A CN110008193B CN 110008193 B CN110008193 B CN 110008193B CN 201910304451 A CN201910304451 A CN 201910304451A CN 110008193 B CN110008193 B CN 110008193B
Authority
CN
China
Prior art keywords
metadata
industry standard
data
database
standard library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910304451.7A
Other languages
Chinese (zh)
Other versions
CN110008193A (en
Inventor
刘俊良
廖华琛
王怡君
王双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201910304451.7A priority Critical patent/CN110008193B/en
Publication of CN110008193A publication Critical patent/CN110008193A/en
Application granted granted Critical
Publication of CN110008193B publication Critical patent/CN110008193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data standardization method and device, which are used for comparing metadata of a service database with metadata of a plurality of standard databases in sequence, finding out the same metadata and marking the same metadata as similar metadata. The difference metadata in the business database different from the standard database is used. And calculating the similarity between the data corresponding to the difference metadata and the sample data prestored in the service database. And identifying the metadata corresponding to the sample data with the data similarity larger than the preset threshold value as similar metadata in the industry standard library. And counting the quantity of the metadata identified as the similar metadata in the industry standard library, and determining the industry standard library with the maximum quantity as the industry standard library closest to the business database.

Description

Data standardization method and device
Technical Field
The present application relates to the field of data processing, and in particular, to a data normalization method and apparatus.
Background
With the popularization and development of information technology, the informatization degree of governments and enterprises is higher and higher, and further the business data volume is further increased. In the face of large amounts of business data, it has become a trend to efficiently and quickly build accurate and normative data models. However, in the face of a large number of industry standards, it takes a lot of time and effort to establish the relationship between actual business data and the existing standards through a manual identification method.
Disclosure of Invention
In order to overcome at least one of the deficiencies in the prior art, an object of the present application is to provide a data standardization method applied to a data processing device, where the data processing device has a plurality of industry standard libraries prestored therein, and the industry standard libraries have sample data prestored therein; the method comprises the following steps:
acquiring a service database;
for each industry standard library, comparing the metadata of the industry standard library with the metadata of the business database;
identifying metadata in the industry standard library which is the same as the metadata in the business database as similar metadata;
aiming at different difference metadata in the business database and different difference metadata in the industry standard database, calculating the similarity between data corresponding to the difference metadata and sample data in the industry standard database, and identifying the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard database;
and counting the quantity of the metadata which are identified as the similar metadata in each industry standard library, and determining the industry standard library with the maximum quantity as the industry standard library which is closest to the business database.
Optionally, the step of calculating similarity between data corresponding to the difference metadata and sample data in the industry standard library includes:
and calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library through an artificial neural network.
Optionally, the method further comprises:
creating a standard information database according to the similar metadata in the closest industry standard library;
and acquiring data corresponding to similar metadata in the closest industry standard library from the service database, and storing the data into the standard information database.
Optionally, the data processing device further includes an industry shared information base, and the method further includes:
comparing the metadata of the industry shared information base with the metadata of the standard information base to determine the same shared metadata in the standard information base as in the industry shared information base;
and creating a shared data table according to the data corresponding to the shared metadata.
Optionally, the method further comprises:
and providing a corresponding interface for each shared data table, so that other equipment acquires the data in the shared data table through the interface.
Optionally, the metadata includes a field name, and the step of identifying metadata in the industry standard library that is the same as the business database as similar metadata includes:
and identifying the field names in the industry standard library, which are the same as the field names in the service database, as similar metadata.
Optionally, the metadata further includes a table name, a field type, and a field length.
Another objective of the embodiments of the present application is to provide a data standardization apparatus, which is applied to a data processing device, where the data processing device has a plurality of industry standard libraries prestored therein, and the industry standard libraries have sample data prestored therein, and the data standardization apparatus includes an obtaining module, a comparing module, an identifying module, a similarity calculation module, and a statistics module;
the acquisition module is used for acquiring a service database;
the comparison module is used for comparing the metadata of the industry standard library with the metadata of the business database aiming at each industry standard library;
the identification module is used for identifying the metadata in the industry standard library, which is the same as the metadata in the business database, as similar metadata;
the similarity calculation module is used for calculating the similarity between data corresponding to the difference metadata and sample data in the industry standard database aiming at the difference metadata different from the industry standard database in the business database, and identifying the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard database;
the statistical module is used for counting the quantity of the metadata marked as the similar metadata in each industry standard library, and determining the industry standard library with the largest quantity as the industry standard library closest to the business database.
Optionally, the comparison module compares the metadata of the industry standard library with the metadata of the business database by:
and calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library through an artificial neural network.
Optionally, the data normalization apparatus further includes a creation module and a writing module;
the creating module is used for creating a standard information database according to the similar metadata in the closest industry standard library;
and the writing module is used for acquiring data corresponding to similar metadata in the closest industry standard library from the service database and storing the data into the standard information database.
Compared with the prior art, the method has the following beneficial effects:
the embodiment of the application provides a data standardization method and device, which are used for comparing metadata of a service database with metadata of a plurality of standard databases in sequence to find out the same metadata, and identifying the same metadata as similar metadata. The difference metadata in the business database different from the standard database is used. And calculating the similarity between the data corresponding to the difference metadata and the sample data prestored in the service database. And identifying the metadata corresponding to the sample data with the data similarity larger than the preset threshold value as similar metadata in the industry standard library. And counting the quantity of the metadata identified as the similar metadata in the industry standard library, and determining the industry standard library with the maximum quantity as the industry standard library closest to the business database.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating steps of a data normalization method according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a comparison between a service data table and an industry standard data table provided in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a data normalization apparatus according to an embodiment of the present application;
fig. 5 is a second schematic structural diagram of a data normalization apparatus according to an embodiment of the present application.
Icon: 100-a data processing device; 130-a processor; 120-a memory; 110-a data normalization means; 500-service data table; 600-industry standards data sheet; 1101-an acquisition module; 1102-a comparison module; 1103-an identification module; 1104-similarity calculation module; 1105-a statistics module; 1106-creation module; 1107-write module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Referring to fig. 1, fig. 1 is a block diagram of a data processing apparatus 100 according to an embodiment of the present disclosure, where the data processing apparatus 100 includes a data normalization device 110, a memory 120, and a processor 130.
The elements of the memory 120 and the processor 130 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The data normalization apparatus 110 includes at least one software function module which can be stored in the memory 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the data processing device 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the data normalization device 110.
The data processing device 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a data normalization method applied to the data processing apparatus 100 shown in fig. 1, wherein the data processing apparatus 100 is pre-stored with a plurality of industry standard libraries, and the industry standard libraries are pre-stored with sample data; the individual steps of the data normalization method are described in detail below.
And step S100, acquiring a service database.
Optionally, the industry standard library is a database recording typical data among various industries. For example, in one possible example, the industry standards library of the educational industry includes data such as student name, student class, student gender, and student score. The industry standard library of the financial industry includes principal, interest rate, depositor name, sex, age and other data. The data processing apparatus 100 links a service database, and obtains metadata of the service database, where the metadata of the service database includes a database name, a table name, a field name, and a field type.
Step S200, aiming at each industry standard library, comparing the metadata of the industry standard library with the metadata of the business database.
Step S300, identifying the metadata in the industry standard library, which is the same as the metadata in the business database, as similar metadata.
Optionally, for each industry standard library, the data processing device 100 uses the industry standard library as a target industry standard library, and compares the metadata in the business database with the metadata in the target industry standard library to find out the same metadata. The data processing apparatus 100 marks the same metadata as similar metadata. For example, referring to FIG. 3, in one possible example, the metadata includes a field name. The service data table 500 includes field names "age", "fisrtname", and "lastname". The industry standard data table 600 includes field names "age", "number", and "name". The data processing device 100 compares the business data table 500 to the industry standard data table 600, where the "age" field names are the same and the "age" fields are labeled as similar metadata.
Optionally, to further ensure that the data corresponding to the same metadata in the business database and the industry standard library are also similar. The data processing device 100 performs similarity calculation on data corresponding to the same metadata in the business database and the industry standard database, respectively. And identifying the metadata with the similarity larger than a preset threshold as similar metadata. Referring to fig. 2, the data processing apparatus 100 performs similarity calculation between data corresponding to an "age" field in a business database and data corresponding to an "age" field in an industry standard library.
And by comparing whether the metadata are the same or not, the metadata which are similar to the metadata in the business database and the industry standard database are quickly screened out. The named data field name may be in and out for the same data due to different developers, e.g., different developers may name the field name as "score" or "achievement" for a student's exam achievement. By simple metadata comparison, it is impossible to determine whether the two are similar.
Step S400, aiming at different difference metadata between the business database and the industry standard database, calculating the similarity between data corresponding to the difference metadata and sample data in the industry standard database, and identifying the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard database.
Alternatively, there may be duplicate fields in the traffic database whose field names are not the same, but whose actual data are similar. The data processing device 100 performs similarity calculation on data corresponding to the difference metadata in the service database and all sample data in the industry standard library, and identifies the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard library.
In one embodiment provided by the present application, the data processing apparatus 100 inputs data corresponding to the difference metadata and all sample data in the industry standard library into an artificial neural network, and calculates a similarity between the data corresponding to each difference metadata and the sample data corresponding to each metadata in the industry standard library. The data processing apparatus 100 identifies metadata corresponding to sample data having a similarity greater than a preset threshold as similar metadata.
In another embodiment provided by the present application, the data processing apparatus 100 sequentially selects target difference metadata from the difference metadata, performs similarity calculation on data corresponding to the target difference metadata and sample data corresponding to each metadata in the industry standard library, and identifies metadata corresponding to sample data with similarity greater than a preset threshold as similar metadata. Referring again to fig. 3, the difference metadata in the service data table 500 are "lastname" and "firstname". The data processing device 100 performs similarity calculation on the data corresponding to the "lastname" field and the "age" field, the "number" field, and the "name" field in the industry standard data table 600, respectively. The data processing device 100 performs similarity calculation again on the data corresponding to the "firstname" field and the "age" field, the "number" field, and the "name" field in the industry standard data table 600, respectively. If the similarity of the "lastname" field and the "age" field, the similarity of the "number" field and the "name" field are 0.2, 0.1 and 0.7 respectively, wherein the preset threshold of the similarity is 0.6. The data processing device 100 identifies the "name" field in the industry standard data table 600 as a similar field corresponding to the "lastname" field.
Step S500, counting the quantity of the metadata marked as the similar metadata in each industry standard library, and determining the industry standard library with the largest quantity as the industry standard library closest to the business database.
Alternatively, since the data processing apparatus 100 has a plurality of industry standard libraries prestored therein, counts the number of metadata marked as similar fields in each industry standard library, and determines the industry standard library with the largest number of similar metadata as the industry standard library closest to the business database.
Optionally, the data processing device 100 creates a standard information database from similar metadata in the closest industry standard library. The data processing apparatus 100 acquires data corresponding to similar metadata in the closest industry standard library from the business database and stores the data in the standard information database.
Referring to fig. 3 again, the data processing device 100 extracts the "name" field and the "age" field in the industry standard library, and creates a standard information database according to the "name" field and the "age" field. And stores the data corresponding to the "age" field and the "lastname" field in the service data table 500 into the standard information database. It should be noted that the data processing apparatus 100 stores the data in the service data table 500 into the standard information base, and performs corresponding processing if the data type or the data length is different.
Optionally, the data processing apparatus 100 further includes an industry shared information base, and the metadata of the industry shared information base is compared with the metadata of the standard information database to determine the same shared metadata in the standard information database as in the industry shared information base. The data processing apparatus 100 creates a shared data table from the data corresponding to the number of shared elements.
Optionally, for each shared data table, a corresponding interface is provided, so that other devices can access the data in the shared data table through the interface.
The embodiment of the present application further provides a data normalization apparatus 110, which is applied to the data processing device 100, wherein the tree processing device pre-stores a plurality of industry standard libraries, and the industry standard libraries pre-store sample data. Referring to fig. 4, the data normalization apparatus 110 includes an obtaining module 1101, a comparing module 1102, an identifying module 1103, a similarity calculating module 1104, and a counting module 1105.
The obtaining module 1101 is configured to obtain a service database.
In the present embodiment, the obtaining module 1101 is configured to execute step S100 in fig. 2, and reference may be made to the detailed description of step S100 for a detailed description of the obtaining module 1101.
The comparing module 1102 is configured to compare, for each industry standard library, metadata of the industry standard library with metadata of the business database.
In this embodiment, the comparing module 1102 is configured to perform step S200 in fig. 2, and the detailed description about the comparing module 1102 may refer to the detailed description about step S200.
The identifying module 1103 is configured to identify metadata in the industry standard library that is the same as the service database as similar metadata.
In this embodiment, the identification module 1103 is configured to perform step S300 in fig. 2, and the detailed description about the identification module 1103 may refer to the detailed description of step S300.
The similarity calculation module 1104 is configured to calculate, for different difference metadata in the business database and different from the industry standard library, a similarity between data corresponding to the difference metadata and sample data in the industry standard library, and identify, in the industry standard library, metadata corresponding to sample data whose data similarity exceeds a preset threshold as similar metadata.
In the present embodiment, the similarity calculation module 1104 is configured to execute step S400 in fig. 2, and reference may be made to the detailed description of step S400 for a detailed description of the similarity calculation module 1104.
The statistical module 1105 is configured to count the number of metadata identified as the similar metadata in each of the industry standard libraries, and determine the industry standard library with the largest number as the industry standard library closest to the service database.
In this embodiment, the statistics module 1105 is configured to execute step S500 in fig. 2, and reference may be made to the detailed description of step S500 for a detailed description of the statistics module 1105.
Optionally, the comparing module 1102 compares the metadata of the industry standard library with the metadata of the business database by:
and calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library through an artificial neural network.
Referring to fig. 5 again, the data normalization apparatus 110 further includes a creation module 1106 and a writing module 1107.
The creation module 1106 is configured to create a standards information database based on similar metadata in the closest industry standards library.
The writing module 1107 is configured to obtain data corresponding to the similar metadata in the closest industry standard library from the service database, and store the data in the standard information database.
To sum up, the embodiments of the present application provide a data normalization method and apparatus, which compare metadata of a service database with metadata of multiple standard databases in sequence, find out the same metadata, and identify the same metadata as similar metadata. The difference metadata in the business database different from the standard database is used. And calculating the similarity between the data corresponding to the difference metadata and the sample data prestored in the service database. And identifying the metadata corresponding to the sample data with the data similarity larger than the preset threshold value as similar metadata in the industry standard library. And counting the quantity of the metadata identified as the similar metadata in the industry standard library, and determining the industry standard library with the maximum quantity as the industry standard library closest to the business database.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. The data standardization method is characterized by being applied to data processing equipment, wherein a plurality of industry standard libraries are prestored in the data processing equipment, and sample data are prestored in the industry standard libraries; the method comprises the following steps:
acquiring a service database;
for each industry standard library, comparing the metadata of the industry standard library with the metadata of the business database; the metadata includes a field name;
identifying metadata in the industry standard library which is the same as the metadata in the business database as similar metadata;
aiming at different difference metadata in the business database and different difference metadata in the industry standard database, calculating the similarity between data corresponding to the difference metadata and sample data in the industry standard database, and identifying the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard database;
and counting the quantity of the metadata which are identified as the similar metadata in each industry standard library, and determining the industry standard library with the maximum quantity as the industry standard library which is closest to the business database.
2. The method of claim 1, wherein the step of calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library comprises:
and calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library through an artificial neural network.
3. The method of data normalization of claim 1, further comprising:
creating a standard information database according to the similar metadata in the closest industry standard library;
and acquiring data corresponding to similar metadata in the closest industry standard library from the service database, and storing the data into the standard information database.
4. The data normalization method of claim 3, wherein the data processing device further comprises an industry shared information base, the method further comprising:
comparing the metadata of the industry shared information base with the metadata of the standard information base to determine the same shared metadata in the standard information base as in the industry shared information base;
and creating a shared data table according to the data corresponding to the shared metadata.
5. The method of claim 4, further comprising:
and providing a corresponding interface for each shared data table, so that other equipment acquires the data in the shared data table through the interface.
6. The data normalization method of claim 1, wherein the step of identifying metadata in the industry standard repository that is the same as the business database as similar metadata comprises:
and identifying the field names in the industry standard library, which are the same as the field names in the service database, as similar metadata.
7. The data normalization method of claim 1, wherein the metadata further includes a table name, a field type, and a field length.
8. A data standardization device is applied to data processing equipment, a plurality of industry standard libraries are prestored in the data processing equipment, sample data are prestored in the industry standard libraries, and the data standardization device comprises an acquisition module, a comparison module, an identification module, a similarity calculation module and a statistic module;
the acquisition module is used for acquiring a service database;
the comparison module is used for comparing the metadata of the industry standard library with the metadata of the business database aiming at each industry standard library; the metadata includes a field name;
the identification module is used for identifying the metadata in the industry standard library, which is the same as the metadata in the business database, as similar metadata;
the similarity calculation module is used for calculating the similarity between data corresponding to the difference metadata and sample data in the industry standard database aiming at the difference metadata different from the industry standard database in the business database, and identifying the metadata corresponding to the sample data with the data similarity exceeding a preset threshold value as similar metadata in the industry standard database;
the statistical module is used for counting the quantity of the metadata marked as the similar metadata in each industry standard library, and determining the industry standard library with the largest quantity as the industry standard library closest to the business database.
9. The data normalization apparatus of claim 8, wherein the comparison module compares the metadata of the industry standard library with the metadata of the business database by:
and calculating the similarity between the data corresponding to the difference metadata and the sample data in the industry standard library through an artificial neural network.
10. The data normalization apparatus of claim 8, wherein the data normalization apparatus further comprises a creation module, a write module;
the creating module is used for creating a standard information database according to the similar metadata in the closest industry standard library;
and the writing module is used for acquiring data corresponding to similar metadata in the closest industry standard library from the service database and storing the data into the standard information database.
CN201910304451.7A 2019-04-16 2019-04-16 Data standardization method and device Active CN110008193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910304451.7A CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910304451.7A CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Publications (2)

Publication Number Publication Date
CN110008193A CN110008193A (en) 2019-07-12
CN110008193B true CN110008193B (en) 2021-06-18

Family

ID=67172159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910304451.7A Active CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Country Status (1)

Country Link
CN (1) CN110008193B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765118B (en) * 2019-10-21 2022-05-17 北京明略软件***有限公司 Data revision method, revision device and readable storage medium
CN111078639B (en) * 2019-12-03 2022-03-22 望海康信(北京)科技股份公司 Data standardization method and device and electronic equipment
CN113495902A (en) * 2020-03-19 2021-10-12 华为技术有限公司 Data processing method and data standard management system
CN112084245B (en) * 2020-09-03 2024-03-12 深圳力维智联技术有限公司 Data management method, device, equipment and storage medium based on micro-service architecture
CN113282650A (en) * 2020-11-24 2021-08-20 苏州律点信息科技有限公司 Service data processing method and device based on big data
CN112699160B (en) * 2021-03-23 2022-04-26 中国信息通信研究院 Metadata template upgrading method and device and readable storage medium
CN113111636B (en) * 2021-05-17 2024-04-12 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN115185923B (en) * 2022-07-07 2023-03-07 中国气象局气象探测中心 Method and system for managing meteorological observation metadata and intelligent terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2793906A1 (en) * 1999-05-19 2000-11-24 Bull Sa SYSTEM AND METHOD FOR MANAGING ATTRIBUTES IN AN OBJECT-ORIENTED ENVIRONMENT
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2793906A1 (en) * 1999-05-19 2000-11-24 Bull Sa SYSTEM AND METHOD FOR MANAGING ATTRIBUTES IN AN OBJECT-ORIENTED ENVIRONMENT
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device

Also Published As

Publication number Publication date
CN110008193A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN110008193B (en) Data standardization method and device
CN109360089B (en) Loan risk prediction method and device
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN112667825B (en) Intelligent recommendation method, device, equipment and storage medium based on knowledge graph
US20140278339A1 (en) Computer System and Method That Determines Sample Size and Power Required For Complex Predictive and Causal Data Analysis
US20230045330A1 (en) Multi-term query subsumption for document classification
CN111209538A (en) Table data quality probing method and device
CN109656928B (en) Method and device for obtaining relationships between tables
Pfaffel et al. A missing data approach to correct for direct and indirect range restrictions with a dichotomous criterion: A simulation study
CN111489105A (en) Enterprise risk identification method, device and equipment
CN111210321B (en) Risk early warning method and system based on contract management
CN113704599A (en) Marketing conversion user prediction method and device and computer equipment
CN113554175A (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
CN108804561B (en) Data synchronization method and device
CN112329810A (en) Image recognition model training method and device based on saliency detection
CN112052310A (en) Information acquisition method, device, equipment and storage medium based on big data
CN111222923A (en) Method and device for judging potential customer, electronic equipment and storage medium
CN109783877B (en) Time sequence model establishment method, device, computer equipment and storage medium
CN111914868A (en) Model training method, abnormal data detection method and device and electronic equipment
CN112560433A (en) Information processing method and device
CN117349358B (en) Data matching and merging method and system based on distributed graph processing framework
CN116501375B (en) Data dictionary version management method, device, computer equipment and storage medium
CN113254787B (en) Event analysis method, device, computer equipment and storage medium
US20230297880A1 (en) Cognitive advisory agent
CN110765118B (en) Data revision method, revision device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant