CN116244386A - Identification method of entity association relation applied to multi-source heterogeneous data storage system - Google Patents

Identification method of entity association relation applied to multi-source heterogeneous data storage system Download PDF

Info

Publication number
CN116244386A
CN116244386A CN202310143615.9A CN202310143615A CN116244386A CN 116244386 A CN116244386 A CN 116244386A CN 202310143615 A CN202310143615 A CN 202310143615A CN 116244386 A CN116244386 A CN 116244386A
Authority
CN
China
Prior art keywords
association
entity
data
model
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310143615.9A
Other languages
Chinese (zh)
Other versions
CN116244386B (en
Inventor
姚宏宇
朱朝强
王刚
申忠玲
于艳波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Original Assignee
BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD filed Critical BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Priority to CN202310143615.9A priority Critical patent/CN116244386B/en
Publication of CN116244386A publication Critical patent/CN116244386A/en
Application granted granted Critical
Publication of CN116244386B publication Critical patent/CN116244386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present application provide a method, apparatus, device, and computer-readable storage medium for identifying entity associations applied to a multi-source heterogeneous data storage system. The method comprises the steps of obtaining entity association models of all data tables; determining the association relation of the table/field in the entity association model according to the set weight level; and correcting the association relation to generate an entity relation diagram, and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system. In this way, complex association relations between the data tables can be automatically extracted from the original data tables, the understanding efficiency of the data relations is improved, the requirements of association analysis and management of flexible and changeable business requirements on the data are met, data preparation can be rapidly realized, and a standard data system is arranged based on the identified data association relations.

Description

Identification method of entity association relation applied to multi-source heterogeneous data storage system
Technical Field
Embodiments of the present application relate to the field of data analysis, and in particular, to a method, an apparatus, a device, and a computer readable storage device for identifying entity association relationships applied to a multi-source heterogeneous data storage system.
Background
In the traditional technology, the association relation of the data tables is established through the main external key of the database, and meanwhile, in the construction process of some normalized business systems, some standard data model tools are adopted to define the association relation of the business tables, and form a database design document.
However, the means are often defined and maintained aiming at the association relationship inside a single service system and not provided with a cross-system; for management reasons, it is also impossible to restrict each application system to have an explicit design document for the service database; the deeper reason is that the same data on the business is used in different ways and contents in different business systems, and the naming standards, field types and field contents of the same data table and field on the business are also different.
The main data management system is generally adopted in the industry for data maintenance, but the technology is only suitable for the situation that business is deeply known, business owner data specifications are relatively perfect, and after the completion, each business system is required to be modified and adapted.
Disclosure of Invention
According to the embodiment of the application, an identification scheme applied to entity association relations of a multi-source heterogeneous data storage system is provided.
In a first aspect of the present application, a method for identifying entity association relationships applied to a multi-source heterogeneous data storage system is provided. The method comprises the following steps:
acquiring entity association models of all data tables;
determining the association relation of the table/field in the entity association model according to the set weight level;
and correcting the association relation to generate an entity relation diagram, and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system.
Further, the obtaining the entity association model of each data table includes:
based on the characteristics and the storage mode of the data, the entity association model is obtained according to the data source types of different association relation models.
Further, the obtaining the entity association model according to the data source types of different association relation models based on the characteristics and the storage mode of the data includes:
if the data source type is a relational database, acquiring an ER model relation through a metadata interface of the database to form an entity association model;
if the data source type is a database design document, identifying the design document, extracting association relations in the document, and forming an entity association model
If the data source type is business SQL audit, analyzing the SQL statement, extracting field association relation in the where clause to form an entity association model
If the data source type is manual input, directly acquiring the association relation between the table and the field to form an entity association model
If the data source type is table metadata, extracting the association relation of table/field in the table metadata through annotation, field name, field annotation and/or field type to form an entity association model
If the data source type is the data content, text analysis is carried out on the data content, and the field association relation is extracted to form an entity association model.
Further, the correcting the association relation includes:
correcting the association relation by the following formula:
Figure BDA0004088412720000021
wherein alpha is i Weights representing machine learning rules;
Figure BDA0004088412720000022
indicating whether the rule is satisfied, wherein 1 is not satisfied when the rule is satisfied, and 0 is not satisfied;
c represents the weight of manual input rules and ER model analysis, and the value is 100%;
R i representing the calculation result of each rule and algorithm;
I person the defined artificial rule, artificial rule or ER model analysis is 1 when satisfied, and 0 when not satisfied.
In a second aspect of the present application, an identification apparatus for entity association applied to a multi-source heterogeneous data storage system is provided. The device comprises:
the acquisition module is used for respectively acquiring entity association models of the data tables;
the determining module is used for determining the association relation of the table/field in the entity association model according to the set weight level;
and the identification module is used for correcting the association relation to generate an entity relation diagram and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system.
In a third aspect of the present application, an electronic device is provided. The electronic device includes: a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method as described above when executing the program.
In a fourth aspect of the present application, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as according to the first aspect of the present application.
The method for identifying entity association relations applied to the multi-source heterogeneous data storage system comprises the steps of obtaining entity association models of all data tables; determining the association relation of the table/field in the entity association model according to the set weight level; and correcting the association relation to generate an entity relation diagram, so as to finish intelligent identification of the association relation of the multi-source heterogeneous data storage system, realize automatic table association analysis of the accessed data table, reduce the manual participation degree, improve the working efficiency and reduce the construction difficulty of a service system.
It should be understood that the description in this summary is not intended to limit key or critical features of embodiments of the present application, nor is it intended to be used to limit the scope of the present application. Other features of the present application will become apparent from the description that follows.
Drawings
The above and other features, advantages and aspects of embodiments of the present application will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 illustrates a flow chart of a method of identifying entity associations applied to a multi-source heterogeneous data storage system in accordance with an embodiment of the present application;
FIG. 2 illustrates a correlation analysis schematic in accordance with an embodiment of the present application;
FIG. 3 illustrates an entity association generation diagram according to an embodiment of the present application;
FIG. 4 illustrates an associated weight hierarchy diagram across business systems according to an embodiment of the present application;
FIG. 5 illustrates a block diagram of an identification device applied to entity associations of a multi-source heterogeneous data storage system, according to an embodiment of the present application;
fig. 6 shows a schematic diagram of a structure of a terminal device or server suitable for implementing an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments in this disclosure without inventive faculty, are intended to be within the scope of this disclosure.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Fig. 1 illustrates a flowchart of a method of identifying entity associations applied to a multi-source heterogeneous data storage system according to an embodiment of the present disclosure. The method comprises the following steps:
s110, obtaining entity association models of all data tables.
In some embodiments, the entity association model can be obtained by performing targeted association analysis from multiple dimensions and directions through the entity association analysis engine based on the characteristics and storage mode of the data, namely according to the data source types of different association relationship models.
Specifically, as shown in fig. 2, if the data source type is a relational database, acquiring ER model relationships through a metadata interface of the database to form an entity association model;
if the data source type is a database design document, identifying the design document, extracting association relations in the document, and forming an entity association model;
if the data source type is business SQL audit, analyzing the SQL statement, extracting a field association relationship in a where clause to form an entity association model;
if the data source type is manual input, directly acquiring the association relationship between the table and the field to form an entity association model;
if the data source type is table metadata, extracting the association relation of tables/fields in the table metadata through notes, field names, field notes and/or field types to form an entity association model;
if the data source type is the data content, text analysis is carried out on the data content, and the field association relation is extracted to form an entity association model.
S120, according to the set weight level, determining the association relation of the table/field in the entity association model.
In some embodiments, referring to fig. 3, according to different characteristics and storage modes of the data, different analysis methods are adopted to analyze the entity association model obtained in step S110, so as to obtain an entity association relationship:
ER model relation extraction can analyze data of a relation type row-column structure;
the service SQL analysis, the table metadata extraction and the data content extraction analysis can be used for analyzing a logic library table, a field and/or data content on a global logic database so as to realize the association relation analysis of semi-structured data such as relational row and column data, noSQL and the like, distributed file storage system data and/or other API interface data;
the document analysis can be used for identifying the database design document and acquiring the characteristics and the association of the data entity;
the manual input is a manual operation channel provided under the condition that the entity association relation cannot be obtained through the means, so that service personnel can manually set or adjust the entity association relation according to the actual service relation.
In some embodiments, in the process of automatically calculating the association relationship of the data entity, the accuracy of the association relationship obtained by different technical means is different for different scenes, so that the weight level of each technical means needs to be processed.
In a single service system, there are few data entities generally involved, and the weight levels from high to low are generally: the manual entry > ER model relation extraction > business SQL analysis > table metadata extraction analysis > data content extraction > database document analysis.
Further, if there is manual entry, the manual entry is taken as a reference, and secondly, when there is no manual entry, but there is an ER model relationship, the ER model relationship is taken as a reference, and finally, analysis is performed according to the weight level.
In the case of cross-service systems, referring to fig. 4, the association relationship in a single service system is obtained first, and then the association analysis (such as system a and system B) is performed on the data entities (such as data tables) of all service systems, so as to implement the analysis of table association in a and B.
For example, after an association is established between one table T1 in a and a certain table T2 in B, the association table of the table T1 in a and the table T2 in B is automatically analyzed, so that the association relationship between the table T1 and a plurality of tables in the whole service system B is automatically obtained.
Further, typically in the case of cross-business systems, there is typically no ER model on top of heterogeneous storage systems, and there are few database design documents across business systems. Thus, the association weight hierarchy across business systems is typically, from high to low, manually enter > business SQL parse > table metadata extraction analysis > data content extraction analysis.
S130, correcting the association relation to generate an entity relation diagram, and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system.
In some embodiments, taking field association within a single business system as an example, the association reliability can be calculated by the following formula:
Figure BDA0004088412720000061
wherein alpha is i Weights representing machine learning rules; i represents a means number;
Figure BDA0004088412720000062
indicating whether the rule is satisfied, wherein 1 is not satisfied when the rule is satisfied, and 0 is not satisfied;
c represents the weight of manual input rules and ER model analysis, and the value is 100%;
R i representing the calculation result of each rule and algorithm;
I person the manual rules to be defined are defined in terms of,the manual rule or ER model analysis is 1 when satisfied and 0 when not satisfied.
The following is illustrative:
the associated confidence level for fields 1 and 2 in tables T1 and T2 is a.
Wherein, whether manual input exists or not is set as Iperson1;
the manual input weight R1, the value of Iperson1 is only two cases, 0 and 1, and the value of R1 is 100%; the similar ER model relation extraction also comprises two variables, whether ER model relation Iperson2 exists or not, the value of Iperson2 only has two conditions, 0 and 1, the ER model weight is R2, and the value of R2 is 100%; by analogy, the business SQL analyzes the values of the existing variable a3 and the weight R3, the value of the a3 is 0 and 1, the values of the table metadata extraction analysis existing variable a4 and the weight R4, the value of the a4 is 0 and 1, the values of the data content extraction analysis existing variable a5 and the variable R5, the value of the a5 is 0 and 1, and the values of the database document analysis existing variable a6 and the weight R6, and the value of the a6 is 0 and 1.
In practical applications, the certificates are mutually complemented according to the calculation results among various calculation engines. If the ER model can verify the result of the data content extraction analysis, the service SQL analysis can supplement the field relation which is not set in the ER model. The manual entry and ER model will correct the results of the automated computation means, automatically adjusting the weight variables Ri after iterative analysis of the results by a machine learning analysis method, such as a naive bayes algorithm. Namely, the correction of the association relationship is completed.
Further, based on the corrected association relationship, an entity relationship diagram for describing the table/field association relationship is generated, and intelligent identification of the table and/or field association relationship in the multi-source heterogeneous data storage system is completed.
According to the embodiment of the disclosure, the following technical effects are achieved:
by the method, complex association relations among the data tables can be automatically and intelligently extracted from the existing data tables, compared with the traditional manual method based on experience and documents, the understanding efficiency of the data relations is greatly improved, the requirements of flexible and changeable business requirements on global management and special data management of the data can be met, data preparation can be rapidly realized, and a standard data system can be arranged based on the identified data association relations.
Meanwhile, the problems of high coordination difficulty, high manual workload and the like of multiple main bodies in the traditional data preparation work are solved, and the manual participation degree is reduced, the working efficiency is improved, the construction difficulty of a service system is reduced, and the data value is improved through various means such as automation, intellectualization and manual assistance.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required in the present application.
The foregoing is a description of embodiments of the method, and the following further describes embodiments of the device.
FIG. 5 illustrates a block diagram of an identification apparatus 500 for entity associations for a multi-source heterogeneous data storage system according to an embodiment of the present application, as shown in FIG. 5, the apparatus 500 comprising:
an obtaining module 510, configured to obtain entity association models of the data tables respectively;
a determining module 520, configured to determine association relationships between tables/fields in the entity association model according to the set weight level;
and the identification module 530 is used for correcting the association relationship to generate an entity relationship graph, and completing intelligent identification of the association relationship of the multi-source heterogeneous data storage system.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the described modules may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
Fig. 6 shows a schematic diagram of a structure of a terminal device or server suitable for implementing an embodiment of the present application.
As shown in fig. 6, the terminal device or the server 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, the above method flow steps may be implemented as a computer software program according to embodiments of the present application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor. Wherein the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may be present alone without being incorporated into the electronic device. The computer-readable storage medium stores one or more programs that when executed by one or more processors perform the methods described herein.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the application referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or their equivalents is possible without departing from the spirit of the application. Such as the above-mentioned features and the technical features having similar functions (but not limited to) applied for in this application are replaced with each other.

Claims (10)

1. The method for identifying the entity association relation applied to the multi-source heterogeneous data storage system is characterized by comprising the following steps of:
acquiring entity association models of all data tables;
determining the association relation of the table/field in the entity association model according to the set weight level;
and correcting the association relation to generate an entity relation diagram, and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system.
2. The method of claim 1, wherein the obtaining the entity-association model for each data table comprises:
based on the characteristics and the storage mode of the data, the entity association model is obtained according to the data source types of different association relation models.
3. The method according to claim 2, wherein the obtaining the entity association model according to the data source types of the different association models based on the characteristics and the storage manner of the data includes:
if the data source type is a relational database, acquiring an ER model relation through a metadata interface of the database to form an entity association model;
if the data source type is a database design document, identifying the design document, extracting association relations in the document, and forming an entity association model;
if the data source type is business SQL audit, analyzing the SQL statement, extracting a field association relationship in a where clause to form an entity association model;
if the data source type is manual input, directly acquiring the association relationship between the table and the field to form an entity association model;
if the data source type is table metadata, extracting the association relation of tables/fields in the table metadata through notes, field names, field notes and/or field types to form an entity association model;
if the data source type is the data content, text analysis is carried out on the data content, and the field association relation is extracted to form an entity association model.
4. The method of claim 3, wherein modifying the association relationship comprises:
correcting the association relation by the following formula:
Figure FDA0004088412710000021
wherein alpha is i Weights representing machine learning rules;
Figure FDA0004088412710000022
indicating whether the rule is satisfied, wherein 1 is not satisfied when the rule is satisfied, and 0 is not satisfied;
c represents the weight of manual input rules and ER model analysis, and the value is 100%;
R i representing the calculation result of each rule and algorithm;
I person the defined artificial rule, artificial rule or ER model analysis is 1 when satisfied, and 0 when not satisfied.
5. An identification device applied to entity association relation of a multi-source heterogeneous data storage system, which is characterized by comprising:
the acquisition module is used for respectively acquiring entity association models of the data tables;
the determining module is used for determining the association relation of the table/field in the entity association model according to the set weight level;
and the identification module is used for correcting the association relation to generate an entity relation diagram and completing intelligent identification of the association relation of the multi-source heterogeneous data storage system.
6. The apparatus of claim 5, wherein the obtaining the entity-association model for each data table comprises:
based on the characteristics and the storage mode of the data, the entity association model is obtained according to the data source types of different association relation models.
7. The apparatus of claim 6, wherein the obtaining the entity-association model according to the data source types of the different association models based on the characteristics and the storage manner of the data comprises:
if the data source type is a relational database, acquiring an ER model relation through a metadata interface of the database to form an entity association model;
if the data source type is a database design document, identifying the design document, extracting association relations in the document, and forming an entity association model
If the data source type is business SQL audit, analyzing the SQL statement, extracting field association relation in the where clause to form an entity association model
If the data source type is manual input, directly acquiring the association relation between the table and the field to form an entity association model
If the data source type is table metadata, extracting the association relation of table/field in the table metadata through annotation, field name, field annotation and/or field type to form an entity association model
If the data source type is the data content, text analysis is carried out on the data content, and the field association relation is extracted to form an entity association model.
8. The apparatus of claim 7, wherein the modifying the association relationship comprises:
correcting the association relation by the following formula:
Figure FDA0004088412710000031
wherein alpha is i Weights representing machine learning rules;
Figure FDA0004088412710000032
indicating whether the rule is satisfied, wherein 1 is not satisfied when the rule is satisfied, and 0 is not satisfied;
c represents the weight of manual input rules and ER model analysis, and the value is 100%;
R i representing the calculation result of each rule and algorithm;
I person the manual rule, manual rule or ER model analysis, which represents the definition, is 1 when satisfied and 0 when not satisfied.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the computer program, implements the method according to any of claims 1-4.
10. A computer readable storage device, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-4.
CN202310143615.9A 2023-02-10 2023-02-10 Identification method of entity association relation applied to multi-source heterogeneous data storage system Active CN116244386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310143615.9A CN116244386B (en) 2023-02-10 2023-02-10 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310143615.9A CN116244386B (en) 2023-02-10 2023-02-10 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Publications (2)

Publication Number Publication Date
CN116244386A true CN116244386A (en) 2023-06-09
CN116244386B CN116244386B (en) 2023-12-12

Family

ID=86630858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310143615.9A Active CN116244386B (en) 2023-02-10 2023-02-10 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Country Status (1)

Country Link
CN (1) CN116244386B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649454A (en) * 2024-01-29 2024-03-05 北京友友天宇***技术有限公司 Binocular camera external parameter automatic correction method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150371159A1 (en) * 2014-06-18 2015-12-24 International Business Machines Corporation Generating Business Rule Model
CN107391537A (en) * 2017-04-25 2017-11-24 阿里巴巴集团控股有限公司 Generation method, device and the equipment of data relationship model
CN113326345A (en) * 2020-02-28 2021-08-31 拓尔思天行网安信息技术有限责任公司 Knowledge graph analysis and application method, platform and equipment based on dynamic ontology
CN114201616A (en) * 2021-12-28 2022-03-18 山东合天智汇信息技术有限公司 Knowledge graph construction method and system based on multi-source database
CN114443854A (en) * 2021-12-30 2022-05-06 深圳晶泰科技有限公司 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
CN114722159A (en) * 2022-06-01 2022-07-08 中科航迈数控软件(深圳)有限公司 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
CN114756532A (en) * 2022-03-15 2022-07-15 上海创图网络科技股份有限公司 Multi-source heterogeneous data acquisition method and device based on cultural Tianmao and electronic equipment
WO2022257436A1 (en) * 2021-06-08 2022-12-15 网络通信与安全紫金山实验室 Data warehouse construction method and system based on wireless communication network, and device and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150371159A1 (en) * 2014-06-18 2015-12-24 International Business Machines Corporation Generating Business Rule Model
CN107391537A (en) * 2017-04-25 2017-11-24 阿里巴巴集团控股有限公司 Generation method, device and the equipment of data relationship model
CN113326345A (en) * 2020-02-28 2021-08-31 拓尔思天行网安信息技术有限责任公司 Knowledge graph analysis and application method, platform and equipment based on dynamic ontology
WO2022257436A1 (en) * 2021-06-08 2022-12-15 网络通信与安全紫金山实验室 Data warehouse construction method and system based on wireless communication network, and device and medium
CN114201616A (en) * 2021-12-28 2022-03-18 山东合天智汇信息技术有限公司 Knowledge graph construction method and system based on multi-source database
CN114443854A (en) * 2021-12-30 2022-05-06 深圳晶泰科技有限公司 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
CN114756532A (en) * 2022-03-15 2022-07-15 上海创图网络科技股份有限公司 Multi-source heterogeneous data acquisition method and device based on cultural Tianmao and electronic equipment
CN114722159A (en) * 2022-06-01 2022-07-08 中科航迈数控软件(深圳)有限公司 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649454A (en) * 2024-01-29 2024-03-05 北京友友天宇***技术有限公司 Binocular camera external parameter automatic correction method and device, electronic equipment and storage medium
CN117649454B (en) * 2024-01-29 2024-05-31 北京友友天宇***技术有限公司 Binocular camera external parameter automatic correction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116244386B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN110908997B (en) Data blood relationship construction method and device, server and readable storage medium
US9971967B2 (en) Generating a superset of question/answer action paths based on dynamically generated type sets
CN109726298B (en) Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature
CN111061833A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN110597844B (en) Unified access method for heterogeneous database data and related equipment
CN116244386B (en) Identification method of entity association relation applied to multi-source heterogeneous data storage system
CN111651552A (en) Structured information determination method and device and electronic equipment
CN111309834A (en) Method and device for matching wireless hotspot with interest point
CN113419789A (en) Method and device for generating data model script
CN108733688B (en) Data analysis method and device
CN115510249A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN112328805A (en) Entity mapping method of vulnerability description information and database table based on NLP
EP3961433A2 (en) Data annotation method and apparatus, electronic device and storage medium
CN108694172B (en) Information output method and device
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
US20230126509A1 (en) Database management system and method for graph view selection for a relational-graph database
CN117112727A (en) Large language model fine tuning instruction set construction method suitable for cloud computing service
CN111639161A (en) System information processing method, apparatus, computer system and medium
CN107273293B (en) Big data system performance test method and device and electronic equipment
CN113760240B (en) Method and device for generating data model
CN112199544B (en) Full-image mining early warning method, system, electronic equipment and computer readable storage medium
CN116383777B (en) Data management platform and data right determining method facing data management
CN116303370B (en) Script blood margin analysis method, script blood margin analysis device, storage medium, script blood margin analysis equipment and script blood margin analysis product
CN115098576A (en) Method, apparatus, medium, and device for preprocessing input data of a geo-processing tool
CN118427186A (en) Data blood edge tracing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant