CN112632015A - Data format conversion method and device, storage medium and electronic equipment - Google Patents

Data format conversion method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112632015A
CN112632015A CN202011509604.0A CN202011509604A CN112632015A CN 112632015 A CN112632015 A CN 112632015A CN 202011509604 A CN202011509604 A CN 202011509604A CN 112632015 A CN112632015 A CN 112632015A
Authority
CN
China
Prior art keywords
data
source data
entity
mapping
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011509604.0A
Other languages
Chinese (zh)
Inventor
介飞
黄艳香
吴信东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011509604.0A priority Critical patent/CN112632015A/en
Publication of CN112632015A publication Critical patent/CN112632015A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data format conversion method and device, a storage medium and electronic equipment, and belongs to the field of artificial intelligence. Wherein, the method comprises the following steps: acquiring source data and analyzing the data type of the source data; converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json; and importing the source data in the intermediate format into the graph database. According to the invention, the technical problems of the related technology are solved, the data mobility and the universality are improved, the same data content in different data formats can be converted into the universal data supporting the application, and the data storage efficiency is improved.

Description

Data format conversion method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of artificial intelligence, in particular to a data format conversion method and device, a storage medium and electronic equipment.
Background
In the related art, with the rise of knowledge services based on graph structure data, such as a knowledge graph, more and more applications are constructed based on an entity relationship graph, but a traditional relational database cannot support the applications, for example, relationship reasoning between entities is performed, if the relational database is used, join operation is needed to complete the functions, but the essential data model of the relational database determines that the operation efficiency is low, and the construction of a data storage system taking the graph structure data as a core is a better solution.
In the related art, multi-source heterogeneous data is usually stored in different data sources and is represented by using different data formats, so that data description (metadata, such as column/attribute names, data types and the like) and actual attribute values of different data have certain differences, and uniform conversion is required during exchange. The solution of the related technology is to adopt DataX proposed by the company allibaba, use a star architecture, use an abstract unified data description class, convert all heterogeneous data into the type, and then convert into a format conforming to a target data source, but the general type is faced with traditional relational data and other row and column data, and cannot be converted into a graph data structure.
In view of the above problems in the related art, no effective solution has been found at present.
Disclosure of Invention
The embodiment of the invention provides a data format conversion method and device, a storage medium and electronic equipment.
According to an aspect of an embodiment of the present application, there is provided a method for converting a data format, including: acquiring source data and analyzing the data type of the source data; converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json; and importing the source data in the intermediate format into the graph database.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is of a first data type from a relational database, mapping each relational table of the source data into a type of entity in an intermediate format, and setting the identifier of the relational table as a corresponding type entity identifier; for each relation table, mapping each column of the relation table to be an attribute of a corresponding entity, mapping each row of the relation table to be an entity, and mapping the column corresponding to each row to be an attribute value of the entity attribute; and mapping the foreign key relationship between the relationship tables to edges between different types of entities in the intermediate format.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is of a second data type from the HBase database, mapping each table of the source data into a class entity of an intermediate format, and setting a table identifier as a corresponding class entity identifier; mapping each unit of the source data into an attribute, wherein the attribute identification of the attribute comprises the following fields: column family, qualifier, version number; and mapping each row of the source data into an entity, and setting a unique distinguishing mark of the corresponding entity through a row key.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is a third data type from a comma separated value file format CSV, mapping each file of the source data to an entity class in an intermediate format, and setting a file identifier as a corresponding class entity identifier; mapping each row of the source data to an entity; and mapping the field separated by each divider of the source data into an entity attribute of the entity.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is of a fourth data type in an Excel format, mapping each table of the source data into an entity class in an intermediate format, and setting a table identifier as a corresponding class entity identifier; mapping each row of the source data into one entity; and mapping each cell of the source data to be an attribute of the entity, and mapping the identifier of the corresponding column of the cell to be an attribute identifier.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is of the Json data type, mapping each object of the source data to an entity of an intermediate format; and mapping each key value pair of the object in the source data into a group of attribute names and values of corresponding entities, and establishing an edge between the entities corresponding to the two objects when the key corresponding value of the object is also the object.
Further, converting the source data into an intermediate format according to the data type includes: if the source data is of an extensible markup language (XML) data type, mapping each mark of the source data to an entity of an intermediate format, and mapping a plurality of marks with the same mark to entities of the same type; and mapping the attribute identification and the attribute value corresponding to the mark of the source data into the attribute identification and the attribute value of the corresponding entity, and constructing an edge between the entities corresponding to the two marks when the two marks have a nested relation.
Further, after importing the source data in the intermediate format in the graph database, the method further comprises: receiving export instructions for target data, wherein the export instructions are used for indicating data content of a target data type to be exported from the graph database; inversely converting the data content of the source data from the intermediate format to the target data type.
According to another aspect of the embodiments of the present application, there is also provided a data format conversion apparatus, including: the acquisition module is used for acquiring source data and analyzing the data type of the source data; the conversion module is used for converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json; and the importing module is used for importing the source data in the intermediate format into the graph database.
Further, the conversion module includes: the first mapping unit is used for mapping each relation table of the source data into a type of entity in an intermediate format if the source data is of a first data type from a relational database, and the relation table identification is set as a corresponding type entity identification; the second mapping unit is used for mapping each column of the relation table into the attribute of the corresponding entity, mapping each row of the relation table into one entity and mapping the column corresponding to each row into the attribute value of the entity attribute aiming at each relation table; and the third mapping unit is used for mapping the foreign key relationship between the relationship tables into edges between different types of entities in the intermediate format.
Further, the conversion module includes: a fourth mapping unit, configured to map each table of the source data into a class entity in an intermediate format if the source data is of a second data type from the HBase database, where a table identifier is set as a corresponding class entity identifier; a fifth mapping unit, configured to map each unit of the source data into an attribute, where an attribute identifier of an attribute includes the following fields: column family, qualifier, version number; and the sixth mapping unit is used for mapping each line of the source data into an entity and setting a unique distinguishing mark of the corresponding entity through a line key.
Further, the conversion module includes: a seventh mapping unit, configured to map each file of the source data into an entity class in an intermediate format if the source data is of a third data type from a comma-separated-value file format CSV, where a file identifier is set as a corresponding class entity identifier; an eighth mapping unit, configured to map each row of the source data into one entity; and the ninth mapping unit is used for mapping the field separated by each divider of the source data into an entity attribute of the entity.
Further, the conversion module includes: a tenth mapping unit, configured to map, if the source data is of a fourth data type in an Excel format, each table of the source data to an entity class in an intermediate format, where a table identifier is set as a corresponding class entity identifier; an eleventh mapping unit, configured to map each row of the source data into one entity; and the twelfth mapping unit is used for mapping each cell of the source data to an attribute of the entity, and the identifier of the corresponding column of the cell is mapped to an attribute identifier.
Further, the conversion module includes: a thirteenth mapping unit, configured to map each object of the source data into an entity in an intermediate format if the source data is of a Json data type; a fourteenth mapping unit, configured to map each key-value pair of an object in the source data into a set of attribute names and values of corresponding entities, and establish an edge between entities corresponding to two objects when a key-corresponding value of an object is also an object.
Further, the conversion module includes: a fifteenth mapping unit, configured to map each tag of the source data into an entity in an intermediate format if the source data is of an extensible markup language XML data type, and map a plurality of tags with the same tag into a homogeneous entity; and a sixteenth mapping unit, configured to map the attribute identifier and the attribute value corresponding to the tag of the source data to the attribute identifier and the attribute value of the corresponding entity, and construct an edge between entities corresponding to the two tags when the two tags have a nested relationship.
Further, the apparatus further comprises: a receiving module, configured to receive an export instruction of target data after the importing module imports the source data in the intermediate format in a graph database, where the export instruction is used to indicate that data content of a target data type is exported from the graph database; and the inverse conversion module is used for inversely converting the data content of the source data from the intermediate format into the target data type.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.
According to the method and the device, the source data are obtained, the data type of the source data is analyzed, then the source data are converted into the entity attribute graph format described by Json according to the data type, the source data in the intermediate format are imported into the graph database, the source data are converted into the entity attribute graph format described by Json according to the data type of the source data, the scheme for converting and storing the multi-source heterogeneous data into the entity attribute graph is achieved, the technical problems of the related technology are solved, the data mobility and the universality are improved, the same data content in different data formats can be converted into the universal data supporting the application, and the data storage efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a server according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of converting data formats according to an embodiment of the present invention;
FIG. 3 is a flow chart of one implementation of an embodiment of the present invention;
fig. 4 is a block diagram of a data format conversion apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device implementing an embodiment of the invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The method provided by the embodiment one of the present application may be executed in a server, a computer, or a similar computing device. Taking an example of the server running on the server, fig. 1 is a hardware structure block diagram of a server according to an embodiment of the present invention. As shown in fig. 1, the server 10 may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a server program, for example, a software program and a module of application software, such as a server program corresponding to a data format conversion method in an embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the server program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, a data format conversion method is provided, and fig. 2 is a flowchart of a data format conversion method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, acquiring source data and analyzing the data type of the source data;
in this embodiment, when source data is imported, a corresponding reading module may be constructed for each type of data source to complete connection and reading of an original data source, and a data type of the source data may be analyzed based on a storage format, a storage database type, and the like of the source data, or may be analyzed according to a data type input by a user.
Step S204, converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json;
the entity Attribute graph in this embodiment includes three elements, that is, a Node (Node), an Attribute (Attribute), and an Edge (Edge), where the Edge is used to describe a relationship between the Node and the Node, such as a logical relationship (parent-child relationship, belonging relationship, etc.), a physical relationship, and the like, the Attribute includes a Node Attribute and an Edge Attribute, and the Node is used to describe a storage entity, such as "zhang san", "china", and the like, and is a storage object.
Optionally, before the source data is converted into the intermediate format according to the data type, or after the source data is converted into the intermediate format according to the data type, tag information of the source data, such as the data format of the source data, read data, a data source, and the like, may also be set as an attribute of a corresponding entity in the intermediate format, so as to trace the source of the converted data subsequently.
Step S206, importing source data in an intermediate format into a graph database;
optionally, after the source data in the intermediate format is imported into the graph database, the upper layer application may be supported based on the source data in the intermediate format, or the source data in the intermediate format may be converted into another required data format.
Through the steps, the source data are obtained, the data type of the source data is analyzed, then the source data are converted into the entity attribute graph format described by Json according to the data type, the source data in the intermediate format are imported into the graph database, the source data are converted into the entity attribute graph format described by Json according to the data type of the source data, the scheme for converting the multi-source heterogeneous data into the entity attribute graph is achieved, the technical problems of the related technology are solved, the data migration performance and the data universality are improved, the same data content in different data formats can be converted into the universal data supporting the application, and the data storage efficiency is improved.
The data types of this embodiment may be structured data and semi-structured data, the structured data may be for various relational databases (e.g., MySQL, Oracle, etc.), non-relational databases (e.g., HBase, Redis, etc.), CSV, Excel, etc., and the semi-structured data includes type data such as Json, XML, etc., but is not limited thereto. The following illustrates the manner in which source data of various data types is converted:
in an example of this embodiment, the data type is a storage type of a relational database, the relational database may be MySQL, Oracle, MS SQL Server, Hive, and the like, and the converting the source data into the intermediate format according to the data type includes: if the source data is of a first data type from the relational database, mapping each relational table of the source data into a type of entity in an intermediate format, and setting the identifier of the relational table as a corresponding type entity identifier; for each relation table, mapping each column of the relation table to be the attribute of the corresponding entity, mapping each row of the relation table to be one entity, and mapping the column corresponding to each row to be the attribute value of the entity attribute; and mapping the foreign key relationship between the relationship tables into edges between different classes of entities in the intermediate format.
In an example of this embodiment, the data type is a storage type of the non-relational database HBase, and converting the source data into the intermediate format according to the data type includes: if the source data is of a second data type from the HBase database, mapping each table of the source data into a class entity of an intermediate format, and setting the table identification as a corresponding class entity identification; mapping each unit of the source data into an attribute, wherein the attribute identification of the attribute comprises the following fields: column family, qualifier, version number; each line of the source data is mapped into an entity, and a unique distinguishing mark of the corresponding entity is set through a line key.
In an example of this embodiment, the data type is a storage type of Comma-Separated Values file format (CSV), and converting the source data into the intermediate format according to the data type includes: if the source data is a third data type from a comma separated value file format CSV, mapping each file of the source data into an entity class in an intermediate format, and setting the file identification as a corresponding class entity identification; mapping each row of source data to an entity; the field separated by each divider of the source data is mapped to an entity attribute of the entity.
In an example of this embodiment, the data type is an Excel (may be microsoft or WPS) format, and converting the source data into the intermediate format according to the data type includes: if the source data is of a fourth data type in an Excel format, mapping each table of the source data into an entity class in an intermediate format, and setting the table identification as a corresponding class entity identification; mapping each row of source data into one entity; and mapping each cell of the source data to be an attribute of the entity, and mapping the identifier of the corresponding column of the cell to be an attribute identifier.
In an example of this embodiment, the data type is a semi-structured Json data type, and converting the source data into the intermediate format according to the data type includes: if the source data is of the Json data type, mapping each object of the source data to an entity of the intermediate format; and mapping each key value pair of the object in the source data into a group of attribute names and values of corresponding entities, and establishing an edge between the entities corresponding to the two objects when the key corresponding value of the object is also the object.
In an example of this embodiment, the data type is a semi-structured eXtensible Markup Language (XML) data type, and converting the source data into the intermediate format according to the data type includes: if the source data is of an XML data type, mapping each mark of the source data into an entity of an intermediate format, and mapping a plurality of marks with the same mark into entities of the same type; and mapping the attribute identification and the attribute value corresponding to the mark of the source data into the attribute identification and the attribute value of the corresponding entity, and constructing an edge between the entities corresponding to the two marks when the two marks have a nested relation.
In one example, the unified Json intermediate format resulting from the above-described multiple data format read conversion is:
Figure BDA0002846000780000091
Figure BDA0002846000780000101
Figure BDA0002846000780000111
"xxxx" represents a specific value of the corresponding item.
In the format conversion process, four types of basic data types including integer type, floating point type, character type and Boolean type are supported, and other data types are converted into the four types of data types.
In an implementation manner of this embodiment, after importing the source data in the intermediate format in the graph database, the method further includes: receiving export instructions for target data, wherein the export instructions are used for indicating data content of a target data type to be exported from the graph database; inversely converting the data content of the source data from the intermediate format to the target data type.
Fig. 3 is a schematic diagram of an implementation of an embodiment of the present invention, which includes three steps: data import, namely constructing a corresponding reading module for each type of data source, completing tasks of connection, reading and format conversion of an original data source, and outputting entity attribute graph format data described by taking Json as an intermediate format; data access, namely converting various format data into intermediate format data described by Json, storing the intermediate format data into a graph database, and generating corresponding nodes, edges and attributes; and data export, namely converting entity attribute graph format data stored in a graph database into an intermediate format described by Json, and converting the intermediate format into a target data format, such as a relational table. The following explanation is made:
step 1: the data import object is various data source connection parameters or data files, the intermediate data format of the entity attribute graph described by Json is output, and the conversion steps from the original data format to the structure of the entity attribute graph are as follows according to different data types:
relational databases, such as MySQL, Oracle, MS SQL Server, Hive, etc., the conversion process includes:
a relation table corresponds to a type of entity, and the table name is used as the name of the type of entity;
one column of the table is used as the attribute of the entity of the type;
one row of the relation table represents a specific entity, and the corresponding column of the relation table is the specific value of the entity attribute;
foreign key relationships between relationship tables represent edges between different classes of entities.
The conversion process of the non-relational database HBase comprises the following steps:
one table corresponds to one type of entity, and the table name is used as the name of the type of entity;
one Cell (Cell) corresponds to one attribute, and the name of the attribute is jointly determined by a column family (column family), a qualifier (qualifier) and a version number (version);
one row corresponds to a specific entity, and a row key (row key) is used as a unique distinguishing mark of the entity;
the HBase itself does not store the relationships between entities, which can be specified by the user.
CSV data, the conversion process includes:
a file is used as an entity class, and a file name is used as the entity name of the class;
one row as a specific entity;
a field separated by a divider serves as an attribute.
The CSV itself does not store the relationships between entities, which can be specified by the user.
Json data, the conversion process comprises:
one object (object) corresponds to one entity;
a key-value pair (key-value pair) of the object represents a set of attribute names and values of the entity;
if a key corresponding value of the object is the object, an edge is established between two object corresponding entities.
XML data, the conversion process comprises:
one mark corresponds to one entity, and if the mark names are the same, the two marks belong to the same entity;
marking the attribute name and the attribute value corresponding to the entity to form the attribute name and the attribute value of the corresponding entity;
and if the two marks have nesting relation, constructing an edge between the two corresponding entities.
Excel data: the conversion process comprises the following steps:
a table is used as an entity class, and a table name is used as the entity name of the class;
one row as a specific entity;
one cell is taken as an attribute, and the corresponding column name of the cell is taken as an attribute name;
excel itself does not store relationships between entities, which can be specified by the user.
In addition, if data in other formats needs to be accessed to the current frame, a corresponding module can be constructed according to the data format, a conversion rule is determined, and the intermediate format from the original data to the entity data diagram described by Json is completed.
Step 2: the user specifies a tag for reading data, which corresponds to the label item of the Json intermediate format and is used for indicating the format, time, data source information and the like of the read data;
and 3, step 3: the conversion result is modified, for example, two attribute information of the same entity are combined, two attribute values of the same attribute are combined, entity description information is perfected, and specifically, an entity label (referred to as "entry label" in the above Json intermediate format), an attribute name (referred to as "property 1"), a relationship label ("relationship label") between entities, an entity ("start entry", "end entry") corresponding to the relationship, an attribute ("start property", "end property"), and the like of each type of entity can be modified.
And 4, step 4: importing the intermediate format data described by Json into a graph database, and adding an additional timestamp attribute for each node to indicate the accurate time of importing the graph database, namely adding a key value pair of ' timestamp ': xxxx ' in ' property1 ': xxxx, ' property 2 ': xxxx, …;
and 5, step 5: data export, namely describing data stored in a graph database by the Json intermediate format, and converting the intermediate format into a corresponding data format.
The conversion rule is the reverse conversion process of step 1. For example, in a relational database such as MySQL, Oracle, MS SQL Server, Hive, etc., the reverse transformation process includes:
one type of entity corresponds to a relation table, and the table name is used as a label (entity label) of the type of entity;
the attribute of the entity is used as a column of the table;
a specific entity represents a row of the relational table, and the specific value of the entity attribute represents the value of the corresponding column;
edges between different classes of entities represent foreign key relationships between the relationship tables.
The scheme of the embodiment provides a unified data exchange framework which takes an entity attribute graph as a core and is used for structured and semi-structured data, the framework uniformly converts various data formats into an entity attribute graph form, stores the entity attribute graph form data into a graph database, converts the entity attribute graph form data into data of any other format, completes data exchange taking a graph structure as the core, completes data exchange of various data sources and the entity attribute graph data format, supports application of an upper layer based on graph structure data, and improves the universality of the data.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a data format conversion device is further provided for implementing the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a data format conversion apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes: an acquisition module 40, a conversion module 42, an import module 44, wherein,
the acquisition module 40 is configured to acquire source data and analyze a data type of the source data;
a conversion module 42, configured to convert the source data into an intermediate format according to the data type, where the intermediate format is an entity attribute graph format described by Json;
an importing module 44 for importing the source data in the intermediate format in the graph database.
Optionally, the conversion module includes: the first mapping unit is used for mapping each relation table of the source data into a type of entity in an intermediate format if the source data is of a first data type from a relational database, and the relation table identification is set as a corresponding type entity identification; the second mapping unit is used for mapping each column of the relation table into the attribute of the corresponding entity, mapping each row of the relation table into one entity and mapping the column corresponding to each row into the attribute value of the entity attribute aiming at each relation table; and the third mapping unit is used for mapping the foreign key relationship between the relationship tables into edges between different types of entities in the intermediate format.
Optionally, the conversion module includes: a fourth mapping unit, configured to map each table of the source data into a class entity in an intermediate format if the source data is of a second data type from the HBase database, where a table identifier is set as a corresponding class entity identifier; a fifth mapping unit, configured to map each unit of the source data into an attribute, where an attribute identifier of an attribute includes the following fields: column family, qualifier, version number; and the sixth mapping unit is used for mapping each line of the source data into an entity and setting a unique distinguishing mark of the corresponding entity through a line key.
Optionally, the conversion module includes: a seventh mapping unit, configured to map each file of the source data into an entity class in an intermediate format if the source data is of a third data type from a comma-separated-value file format CSV, where a file identifier is set as a corresponding class entity identifier; an eighth mapping unit, configured to map each row of the source data into one entity; and the ninth mapping unit is used for mapping the field separated by each divider of the source data into an entity attribute of the entity.
Optionally, the conversion module includes: a tenth mapping unit, configured to map, if the source data is of a fourth data type in an Excel format, each table of the source data to an entity class in an intermediate format, where a table identifier is set as a corresponding class entity identifier; an eleventh mapping unit, configured to map each row of the source data into one entity; and the twelfth mapping unit is used for mapping each cell of the source data to an attribute of the entity, and the identifier of the corresponding column of the cell is mapped to an attribute identifier.
Optionally, the conversion module includes: a thirteenth mapping unit, configured to map each object of the source data into an entity in an intermediate format if the source data is of a Json data type; a fourteenth mapping unit, configured to map each key-value pair of an object in the source data into a set of attribute names and values of corresponding entities, and establish an edge between entities corresponding to two objects when a key-corresponding value of an object is also an object.
Optionally, the conversion module includes: a fifteenth mapping unit, configured to map each tag of the source data into an entity in an intermediate format if the source data is of an extensible markup language XML data type, and map a plurality of tags with the same tag into a homogeneous entity; and a sixteenth mapping unit, configured to map the attribute identifier and the attribute value corresponding to the tag of the source data to the attribute identifier and the attribute value of the corresponding entity, and construct an edge between entities corresponding to the two tags when the two tags have a nested relationship.
Optionally, the apparatus further comprises: a receiving module, configured to receive an export instruction of target data after the importing module imports the source data in the intermediate format in a graph database, where the export instruction is used to indicate that data content of a target data type is exported from the graph database; and the inverse conversion module is used for inversely converting the data content of the source data from the intermediate format into the target data type.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring source data and analyzing the data type of the source data;
s2, converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json;
and S3, importing the source data of the intermediate format into the graph database.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring source data and analyzing the data type of the source data;
s2, converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json;
and S3, importing the source data of the intermediate format into the graph database.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 5, including a processor 51, a communication interface 52, a memory 53 and a communication bus 54, where the processor 51, the communication interface 52, and the memory 53 complete communication with each other through the communication bus 54, and the memory 53 is used for storing computer programs; and a processor 51 for executing the program stored in the memory 53.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (11)

1. A method for converting a data format, comprising:
acquiring source data and analyzing the data type of the source data;
converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json;
and importing the source data in the intermediate format into the graph database.
2. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is of a first data type from a relational database, mapping each relational table of the source data into a type of entity in an intermediate format, and setting the identifier of the relational table as a corresponding type entity identifier;
for each relation table, mapping each column of the relation table to be an attribute of a corresponding entity, mapping each row of the relation table to be an entity, and mapping the column corresponding to each row to be an attribute value of the entity attribute;
and mapping the foreign key relationship between the relationship tables to edges between different types of entities in the intermediate format.
3. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is of a second data type from the HBase database, mapping each table of the source data into a class entity of an intermediate format, and setting a table identifier as a corresponding class entity identifier;
mapping each unit of the source data into an attribute, wherein the attribute identification of the attribute comprises the following fields: column family, qualifier, version number;
and mapping each row of the source data into an entity, and setting a unique distinguishing mark of the corresponding entity through a row key.
4. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is a third data type from a comma separated value file format CSV, mapping each file of the source data to an entity class in an intermediate format, and setting a file identifier as a corresponding class entity identifier;
mapping each row of the source data to an entity;
and mapping the field separated by each divider of the source data into an entity attribute of the entity.
5. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is of a fourth data type in an Excel format, mapping each table of the source data into an entity class in an intermediate format, and setting a table identifier as a corresponding class entity identifier;
mapping each row of the source data into one entity;
and mapping each cell of the source data to be an attribute of the entity, and mapping the identifier of the corresponding column of the cell to be an attribute identifier.
6. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is of the Json data type, mapping each object of the source data to an entity of an intermediate format;
and mapping each key value pair of the object in the source data into a group of attribute names and values of corresponding entities, and establishing an edge between the entities corresponding to the two objects when the key corresponding value of the object is also the object.
7. The method of claim 1, wherein converting the source data into an intermediate format according to the data type comprises:
if the source data is of an extensible markup language (XML) data type, mapping each mark of the source data to an entity of an intermediate format, and mapping a plurality of marks with the same mark to entities of the same type;
and mapping the attribute identification and the attribute value corresponding to the mark of the source data into the attribute identification and the attribute value of the corresponding entity, and constructing an edge between the entities corresponding to the two marks when the two marks have a nested relation.
8. The method of claim 1, wherein after importing the intermediate format source data in a graph database, the method further comprises:
receiving export instructions for target data, wherein the export instructions are used for indicating data content of a target data type to be exported from the graph database;
inversely converting the data content of the source data from the intermediate format to the target data type.
9. An apparatus for converting a data format, comprising:
the acquisition module is used for acquiring source data and analyzing the data type of the source data;
the conversion module is used for converting the source data into an intermediate format according to the data type, wherein the intermediate format is an entity attribute graph format described by Json;
and the importing module is used for importing the source data in the intermediate format into the graph database.
10. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 8.
11. An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:
a memory for storing a computer program;
a processor for performing the method steps of any of claims 1 to 8 by executing a program stored on a memory.
CN202011509604.0A 2020-12-18 2020-12-18 Data format conversion method and device, storage medium and electronic equipment Pending CN112632015A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011509604.0A CN112632015A (en) 2020-12-18 2020-12-18 Data format conversion method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011509604.0A CN112632015A (en) 2020-12-18 2020-12-18 Data format conversion method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112632015A true CN112632015A (en) 2021-04-09

Family

ID=75317465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011509604.0A Pending CN112632015A (en) 2020-12-18 2020-12-18 Data format conversion method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112632015A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861486A (en) * 2021-04-25 2021-05-28 成都淞幸科技有限责任公司 Data integration method, device, equipment and storage medium of semi-structured file
CN113553458A (en) * 2021-08-10 2021-10-26 北京明略软件***有限公司 Data export method and device in graph database
CN115481298A (en) * 2022-11-14 2022-12-16 阿里巴巴(中国)有限公司 Graph data processing method and electronic equipment
CN115563187A (en) * 2022-10-17 2023-01-03 中航信移动科技有限公司 Data conversion method, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631907A (en) * 2013-11-26 2014-03-12 中国科学院信息工程研究所 Method and system for migrating relational data to HBbase
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631907A (en) * 2013-11-26 2014-03-12 中国科学院信息工程研究所 Method and system for migrating relational data to HBbase
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861486A (en) * 2021-04-25 2021-05-28 成都淞幸科技有限责任公司 Data integration method, device, equipment and storage medium of semi-structured file
CN113553458A (en) * 2021-08-10 2021-10-26 北京明略软件***有限公司 Data export method and device in graph database
CN115563187A (en) * 2022-10-17 2023-01-03 中航信移动科技有限公司 Data conversion method, storage medium and electronic equipment
CN115563187B (en) * 2022-10-17 2023-08-04 中航信移动科技有限公司 Data conversion method, storage medium and electronic equipment
CN115481298A (en) * 2022-11-14 2022-12-16 阿里巴巴(中国)有限公司 Graph data processing method and electronic equipment
CN115481298B (en) * 2022-11-14 2023-03-14 阿里巴巴(中国)有限公司 Graph data processing method and electronic equipment

Similar Documents

Publication Publication Date Title
CN112632015A (en) Data format conversion method and device, storage medium and electronic equipment
WO2022143045A1 (en) Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN109582831B (en) Graph database management system supporting unstructured data storage and query
CN111090417B (en) Binary file analysis method, binary file analysis device, binary file analysis equipment and binary file analysis medium
CN103500196A (en) EXCEL data export method and export device in multi-concurrence large data volume environment
CN108052635A (en) A kind of heterogeneous data source unifies conjunctive query method
CN111241182A (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN110866029B (en) sql statement construction method, device, server and readable storage medium
CN103177094A (en) Cleaning method of data of internet of things
CN111460232A (en) Functional module searching method, device, terminal and computer readable storage medium
CN112988780A (en) Data checking method and device, storage medium and electronic equipment
CN107368500B (en) Data extraction method and system
CN115525652A (en) User access data processing method and device
CN114862449A (en) Method and device for calculating unique natural person identifier, electronic equipment and storage medium
CN104021216A (en) Message proxy server and information publish subscription method and system
CN113934807A (en) Territorial space planning system and method based on GIS
CN111859863A (en) Document structure conversion method and device, storage medium and electronic equipment
CN116431637A (en) Internet of things platform object model conversion method and device and computer equipment
CN112487251A (en) User ID data association method and device
CN116226082A (en) Database model generation method and device, storage medium and electronic equipment
CN112052254B (en) Data encapsulation method, electronic device and storage medium
CN115080594A (en) Method and system for carrying out multi-dimensional analysis on data and electronic equipment
CN112052239B (en) Data encapsulation method, electronic device and storage medium
CN117271480B (en) Data processing method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination