CN114996297B - Data processing method, device, equipment and medium - Google Patents

Data processing method, device, equipment and medium Download PDF

Info

Publication number
CN114996297B
CN114996297B CN202210390235.0A CN202210390235A CN114996297B CN 114996297 B CN114996297 B CN 114996297B CN 202210390235 A CN202210390235 A CN 202210390235A CN 114996297 B CN114996297 B CN 114996297B
Authority
CN
China
Prior art keywords
data
original
relational data
edge
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210390235.0A
Other languages
Chinese (zh)
Other versions
CN114996297A (en
Inventor
吴丽清
陈少静
陈舒杭
刘一辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202210390235.0A priority Critical patent/CN114996297B/en
Publication of CN114996297A publication Critical patent/CN114996297A/en
Application granted granted Critical
Publication of CN114996297B publication Critical patent/CN114996297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a device, equipment, a medium and a product. The method comprises the following steps: determining a first object and a second object related to the first original relational data in response to a first input of the first original relational data by the first user; determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database; displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database; determining first index information stored in a target attribute hierarchy in response to a second input of the first user to the target attribute hierarchy in the first attribute hierarchy; acquiring second original relational data of a target attribute hierarchy between a first object and a second object from a relational database according to the first index information; and displaying the second original relational data. Thus, the efficiency of acquiring the required data from the mass data can be improved, and the computing resources can be saved.

Description

Data processing method, device, equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and medium.
Background
With the development of the big data age, the discovery of the relation among things from the large-scale data with loose organization structure has become one of the main flows of data analysis in various fields, so how to determine and acquire partial data participating in analysis in massive data is important.
In the prior art, structured data can be stored through a graph database, each node in the graph database corresponds to an object, multiple edges can exist between two nodes, one piece of data is stored in each edge, and when partial data between two objects is to be searched, the data stored in each edge between two nodes corresponding to the two objects needs to be traversed, so that required data is determined.
However, when the data between the two objects is large, the number of edges between the corresponding two nodes is very large, the efficiency of searching for the required data by traversing the data stored in each edge is very low, and a large amount of computational resources are occupied.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment and a medium, which can at least solve the problems that the efficiency of searching required data by traversing data stored in each edge in the prior art is very low and a large amount of computation resources are occupied.
In a first aspect, an embodiment of the present application provides a data processing method, including:
determining a first object and a second object related to the first original relational data in response to a first input of the first original relational data by the first user;
determining a first node corresponding to a first object in a graph database and a second node corresponding to a second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edges are used for indicating the existence of a relationship between the objects corresponding to the two nodes connected with the edges, different attribute layers are arranged on the edges, and the different attribute layers are used for storing index information of original relationship data of different types;
displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database;
determining first index information stored in a target attribute hierarchy in response to a second input of the first user to the target attribute hierarchy in the first attribute hierarchy;
acquiring second original relational data of a target attribute hierarchy between a first object and a second object from a relational database according to the first index information;
and displaying the second original relational data.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
a first determining module, configured to determine a first object and a second object related to the first original relational data in response to a first input of the first original relational data by a first user;
the second determining module is used for determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edge is used for indicating the existence of a relation between the objects corresponding to the two nodes connected with the edge, different attribute layers are arranged on the edge, and the different attribute layers are used for storing index information of original relation data of different types;
the first display module is used for displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
a third determining module, configured to determine first index information stored in a target attribute hierarchy in response to a second input of the first user to the target attribute hierarchy in the first attribute hierarchy;
the first acquisition module is used for acquiring second original relational data of the target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
And the second display module is used for displaying the second original relation data.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as shown in any of the embodiments of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a data processing method as shown in any of the embodiments of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, instructions in which, when executed by a processor of an electronic device, cause the electronic device to perform the data processing method shown in any of the embodiments of the first aspect.
The data processing method, the device, the equipment and the medium of the embodiment of the application can respond to the first input of the first user to the first original relational data, determine the first object and the second object related to the first original relational data, determine the first node corresponding to the first object in the graph database and the second node corresponding to the second object in the graph database, then display the first attribute hierarchy corresponding to the first side between the first node and the second node in the graph database, then respond to the second input of the first user to the target attribute hierarchy in the first attribute hierarchy, determine the first index information stored in the target attribute hierarchy, acquire the second original relational data of the target attribute hierarchy between the first object and the second object from the relational database according to the first index information, and display the second original relational data. Therefore, the original relational data is not required to be stored in the graph database, but the original relational data is stored in the relational database, and only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connected between every two nodes in the graph database can be set, the edge is used for indicating the existence of a relation between objects corresponding to the two connected nodes, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of the original relational data of different types in the relational database, and therefore, when partial data between the two nodes is required to be acquired, the corresponding attribute layers can be selected, the data stored in each edge between the two nodes is not required to be traversed, the efficiency of acquiring the required data from massive data is improved, and the calculation resources are saved.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present application, the drawings that are needed to be used in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.
FIG. 1 is a basic architecture diagram of a protogram database provided in one embodiment of the present application;
FIG. 2 is a flow chart of a data processing method according to one embodiment of the present application;
FIG. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the particular embodiments described herein are meant to be illustrative of the application only and not limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
In addition, it should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of national laws and regulations.
Here, a method for acquiring a required part of data from massive data through a traditional relational database and a knowledge graph in the prior art will be briefly described.
First, a relational database uses a relational model as an organization method of data, and the data is usually stored in the form of two-dimensional tables, and each table is associated by defining a primary key. The ideas for performing the association query in the relational database are generally as follows: the table to be associated is determined, then the fields to be queried are determined, and the association condition and the association mode are determined. The connection of the watch is divided into an inner connection, an outer connection, a cross connection, a self connection and the like.
Wherein the interconnections are divided into equal-value connections and non-equal-value connections: equivalent connection refers to comparing the values of the connection columns of the two tables by using an equal sign "=", which is equivalent to taking records with equal values of the connection columns of the two tables after the two tables execute Cartesian; non-equivalent join refers to the use of ">" or "<" to compare the values of the join columns of two tables, corresponding to taking a record of the join column value of one table being greater than or less than the other table after the two tables have been Cartesian.
The external connection is divided into left external connection, right external connection and full external connection: the left external connection query result comprises all rows to be queried of a left table and a right table, all data in the left table is displayed, but the data of the right table is queried only when the data is matched with the left table, otherwise, the data is displayed as null; the right external connection query result comprises all rows to be queried of the left table and the right table, all data in the right table is displayed, but the data of the left table is queried only when the data is matched with the data on the right table, otherwise, the data is displayed as null; the all-external connection query result comprises all rows to be queried of the left table and the right table, and corresponding fields have no value to display null.
The cross-connect makes a Cartesian product query between the tables, also called a table, for each row in the left table and all rows in the right table.
The key point of the connection query of the self-connection current table and the self-connection current table is that a table is virtualized, namely, an individual name is defined for the self-connection current table
Under the condition of massive data, the situation of low efficiency occurs when table association inquiry is directly carried out in the relational database, so the relational data also provides an indexing mechanism to accelerate the retrieval speed. An index is a data structure in a relational database that is used to quickly find records. The index type is a main key index, an external key index, a single field index, a multi-field index, and the like.
The primary key index is an index established on the primary key and is a unique index, and a record can be rapidly positioned through the primary key.
The foreign key index is an index built on the foreign key, and needs to be associated with a field of another table, which greatly improves the speed of table association.
The single-field index is an index of a field in the table, and can be selected to establish a common index, a unique index and a full-text index, or can be selected to be a multi-path search TREE (B-TREE) or a hash (hash).
The multi-field index indexes a plurality of fields in the table, and a left minimum matching principle can be used when the multi-field index is used, for example: indexes (A, B, C and D), wherein when inquiring, the index A is independently used as an inquiring condition, and no additional index is required to be established; the use of (a, B) and (a, B, C) also does not require the creation of additional indexes; however, using B, (B, C) and (B, C, D) queries, additional indexing is required.
However, the relational database conceals the association in the foreign key structure, and has no display expression, so that the complexity of association inquiry and calculation is brought, particularly, when multi-hop inquiry is processed, a large amount of table adding calculation (join) is brought, and the complexity of calculation increases exponentially with the increase of hop count. For example, after determining the user information and the user transaction table, the shortest path for the account transaction record is found, and when the multi-hop query is processed, the result cannot be obtained from more than 5 hops by using the traditional relational database.
And secondly, data retrieval and acquisition can be carried out based on a knowledge graph, wherein the knowledge graph is a graph which is formed by nodes and edges and is used for describing the association relation between the object entities, the nodes represent the object entities, and the edges between the nodes represent the association between the objects. There are two ways of knowledge-based relationship retrieval: the method is characterized by comprising the steps of firstly retrieving a knowledge graph based on a relational database and secondly retrieving the knowledge graph based on a protogram database.
The method is characterized in that based on the knowledge graph retrieval of a relational database, relationship information among things is stored in a two-dimensional table form, fields representing the relationship information are stored as data columns or data groups, and typical storage and retrieval modes include the following steps:
1) Ternary group table based retrieval
The method utilizes a relational database to suggest a table containing three columns (subjects, predictes and objects), wherein the subjects represent subjects, the predictes represent relations and the objects represent objects, all things entities and relations are stored in the ternary group table, and related queries are carried out through SQL sentences, so that a plurality of self-connecting queries (self-join) can be contained under multiple related constraints, and the efficiency is quite low.
2) Retrieval based on attribute tables
The method uses the type of the object entity as the center, stores the relationship belonging to the same type entity as an attribute as a table, and the retrieval is basically similar to the retrieval of the traditional relational database. But many nulls are generated.
3) Retrieval based on vertical partition table
The basic idea of the method is to group the triples by relation attributes, build a table containing two columns (subjects) for each relation attribute, and perform query calculation on the subjects-subjects.
4) Retrieval based on full index structure
The method also builds a triplet table containing (subjects, predictes, objects), but adds various optimization means. Firstly, a mapping table is established, namely all field values are mapped to unique identifications, real values are not stored in the triple table any more, and only corresponding identifications are stored. Then a six-fold index is established: SPO, SOP, PSO, POS, OPS, OSP, the multi-dimensional graph query requirement (where O is fully spelled as object, representing object; P is fully spelled as predicte, representing relationship, S is fully spelled as object, representing subject) is covered.
Based on the knowledge graph retrieval of the original graph database, the structural features of the graph are utilized for storage and query, the relation extraction process in the knowledge graph can be used for carrying out explicit description and definition of the relation based on the determined business meaning, one side represents one business relation, and the relation is stored as the side in the graph model. The basic idea is to represent a graph as an adjacency list, namely, an adjacency relationship is represented as an adjacency relationship table, and then an index is established based on the adjacency relationship table so as to optimize the query on the graph.
As shown in fig. 1, the basic architecture of the physical storage of the protogram database may include: node store file 101, relationship edge store file 102, label store file 103, attribute store file 104, relationship edge type store file 105, attribute index file 106, and dynamic store file 107.
In the node storage file and the relation storage file, the storage positions of each node and each relation edge are fixed, so that an access address is obtained through identification, the identification of the node comprises identification of a first relation edge, identification of a first attribute edge and identification of a first label, the identifications are similar to pointers (or called indexes), and the relation edges, the attribute edges, the labels and the like related to the node can be quickly searched. The identity of the relational edge store is stored like nodes to quickly retrieve the head and tail nodes, relational edge types, etc. to which the relational edge relates. Through the design, a first relation edge of the node can be quickly found from one node, then another adjacent node is found from the relation edge, and further a second relation edge, a third relation edge and an N relation edge are found, so that full-traversal search is realized.
However, the graph database is efficient in modeling and searching for association relationships, and can support queries of multi-hop association relationships. However, the existing knowledge graph is mostly applied to entity and relation extraction and graph construction from unstructured data, and for structured data, the existing graph database generally constructs relation edges based on specific attribute fields, when massive data records exist in the attribute fields, a large number of relation edges are generated, and when multi-hop association relation inquiry is performed, the search amount is increased geometrically, and a large amount of calculation force is consumed.
That is, when the data center performs data mining analysis, the main service angle has comprehensive analysis and service connectivity analysis from different service dimensions based on the same event, and at this time, multi-table association analysis is required. At this time, how to find the small data set related to the event from the mass storage to participate in analysis is mostly restricted, and searching the data participating in calculation from the mass data occupies a large amount of expenditure, so that the calculation efficiency is low, and the performance loss is serious.
In order to solve the above problem, the embodiment of the present application proposes a graph number separation principle for the original relational data with high structuring, and the essence of the graph number separation is to use the graph as the index of the service data with the relational communication feature. Based on the method, a service communication relation topology construction and retrieval mode can be established, the service relation discovery based on the structured data is changed, the index information of the original relation data related to the communication relation is quickly found, the abnormal small data set is quickly positioned from the relation topology, further data analysis is carried out, and therefore the efficiency of complex association query of massive structured data is improved; meanwhile, after the relation of the service is abstracted through the graph, multidimensional service convergence expression is realized, and service personnel can find the potential possibility of multi-table linkage analysis.
Based on this, a graph-based data retrieval mechanism can be established for the original relational data. The concept of edges is also constructed, and geometric figure data is agreed in the graph database as the uniqueness of the existence of the edges. According to the graph number separation principle, establishing an edge aiming at the relation between two nodes in a graph database, wherein the edge only represents the communication relation of a graph layer, dividing the edge into different attribute layers according to different source tables, namely different relations represented by original relation data, and recording index information of the original relation data for generating the relation in the corresponding attribute layers. Thus, the method can realize the carding of all the relations between the two nodes in the global original relation data and the sorting of the index information of the original relation data generating the relations. The data model realizes translation and linkage work between graph data and data center relation data, and realizes linkage index and display of relation and original service data.
Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application, and it should be noted that the data processing method may be applied to a data processing system, and as shown in fig. 1, the data processing method may include the following steps:
S210, determining a first object and a second object related to the first original relational data in response to the first input of the first user to the first original relational data;
s220, determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database;
s230, displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
s240, determining first index information stored in a target attribute hierarchy in response to a second input of the first user to the target attribute hierarchy in the first attribute hierarchy;
s250, acquiring second original relational data of a target attribute hierarchy between a first object and a second object from a relational database according to the first index information;
s260, displaying the second original relation data.
The method comprises the steps of determining a first object and a second object related to first original relational data in response to first input of a first user on the first original relational data, determining a first node corresponding to the first object in a graph database and a second node corresponding to the second object in the graph database, displaying a first attribute layer corresponding to a first edge between the first node and the second node in the graph database, determining first index information stored in the target attribute layer in response to second input of the first user on the target attribute layer, acquiring second original relational data of the target attribute layer between the first object and the second object from the relational database according to the first index information, and displaying the second original relational data. Therefore, the original relational data is not required to be stored in the graph database, but the original relational data is stored in the relational database, and only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connected between every two nodes in the graph database can be set, the edge is used for indicating the existence of a relation between objects corresponding to the two connected nodes, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of the original relational data of different types in the relational database, and therefore, when partial data between the two nodes is required to be acquired, the corresponding attribute layers can be selected, the data stored in each edge between the two nodes is not required to be traversed, the efficiency of acquiring the required data from massive data is improved, and the calculation resources are saved.
Referring to S210, the first original relational data is data stored in a relational database, and may be business data, transaction data, or archive data. The first input may be an input for inputting an identifier of the first original relational data, for example, inputting a number of the first original relational data, or may be an input for clicking to select the first original relational data, or may be another input for the first original relational data, which is not limited herein. The first object may be a business or an individual and the second object may also be a business or an individual.
Specifically, after a first user makes a first input of first original relational data, the data processing system may determine, in response to the first input, a first object and a second object to which the first original relational data relates. For example, if the first original relationship data is transaction data, the first object and the second object may be both parties to the transaction of the transaction data.
Referring to S220, the graph database may be applied to a knowledge graph, or may be applied to other graphs, which is not limited herein. Nodes in the graph database can correspond to objects, no more than one edge can be connected between every two nodes, the edge can be used for indicating that a relationship exists between the objects corresponding to the two nodes connected by the edge, different attribute layers can be arranged on the edge, and the different attribute layers can be used for storing index information of original relationship data of different types. Different types of raw relational data may represent different relationships between two objects to which the raw relational data relates. The index information may include storage locations, for example, numbers 0001 to 0100 in table a, but may include other index information, which is not limited herein.
Specifically, the correspondence between the object and the node in the graph database may be stored in advance, for example, the correspondence between the object identifier and the node identifier may be stored. According to the corresponding relation between the prestored objects and the nodes, the corresponding first node of the first object in the graph database and the corresponding second node of the second object in the graph database can be determined.
Referring to S230, since there is no more than one edge connecting between every two nodes in the graph database, the edge is used to indicate that there is a relationship between objects corresponding to the two nodes connected thereto, if there is a relationship between the first object and the second object, there is one and only one edge between the first node and the second node, so that after determining the first node and the second node, the first edge between the first node and the second node can be determined. Different attribute layers are arranged on the edges, and the different attribute layers can be used for storing index information of original relation data of different types, so that in order to facilitate a first user to select and acquire the original relation data of a required type, a first attribute layer corresponding to the first edge can be displayed. The first attribute hierarchy may include one or more attribute hierarchies.
Referring to S240, the target attribute hierarchy may include one or more attribute hierarchies. The second input may be an input of the first user clicking the selected target attribute level, or an input of the first user entering the target attribute level, or other inputs of the first user aiming at the target attribute level, which is not limited herein.
Specifically, the first user clicks on a target attribute level from the displayed first attribute level, and the data processing system may determine the first index information stored in the target attribute level in response to the click input, that is, the second input.
Referring to S250, the second original relationship data may be original relationship data of a target type between the first object and the second object, the target type corresponding to the target attribute hierarchy, the original relationship data of the target type between the first object and the second object may be embodied, and the target relationship between the first object and the second object may be embodied. The first index information may be a storage location of the second original relational data in the relational database, and of course, may also be other index information of the second original relational data in the relational database, which is not limited herein. Thus, the second original relational data can be acquired from the relational database based on the first index information.
Here, the graph database can have compatibility of input and export of the relational database through the ODBC protocol, a user is not required to convert original relational data in the relational database into specific formats such as character separation Values (csv) and the like, and then the specific formats are imported into the graph database, linkage operation of graph topology and the original relational data can be achieved, the difficulty of user composition is reduced, and analysis can be performed based on a traditional analysis algorithm. Moreover, through the ODBC protocol, a user can display the association relationship among different objects and specific original relationship data under the relationship in a graphic visualization environment.
Referring to S260, after the second original relationship data is obtained, it may be displayed for viewing and use by the first user.
In some examples, if the first user wants to analyze whether a transaction is abnormal, the first user may analyze the relationship between the transaction parties, determine whether the transaction is reasonable based on the relationship between the transaction parties, and want to analyze based on the relationship between the transaction parties, so as to obtain data that may represent the relationship between the transaction parties. To obtain data representing the relationship between the transaction parties, the first user may enter the transaction number "0001" of the original transaction data of the transaction, i.e., the first original relationship data, at the data processing system, and the data processing system may determine, in response to the input, i.e., the first input, the transaction parties of the original transaction data: enterprise a and enterprise B, i.e., first object and second object. Then, a node a corresponding to the enterprise A in the graph database, namely a first node, and a node B corresponding to the enterprise B in the graph database can be determined, an edge z between the node a and the node B, namely a first edge, can be determined after the node a and the node B are determined in the graph database, and further an attribute level m, an attribute level n and an attribute level v corresponding to the edge z, namely a first attribute level, can be determined and displayed, wherein a storage position r of original relational data of a fund transfer class between the enterprise A and the enterprise B in the relational database can be stored in the attribute level m, a storage position s of original relational data of an interpersonal relation class between the enterprise A and the enterprise B in the relational database can be stored in the attribute level n, and a storage position t of original relational data of a tax bill record class between the enterprise A and the enterprise B in the relational database can be stored in the attribute level v. The original relation data of the fund transfer class between the enterprise A and the enterprise B can represent the fund transfer relation between the enterprise A and the enterprise B, the original relation data of the interpersonal relation class between the enterprise A and the enterprise B can represent the interpersonal relation between the enterprise A and the enterprise B, the original relation data of the tax bill record class between the enterprise A and the enterprise B can represent the tax relation between the enterprise A and the enterprise B, if a first user wants to analyze whether the transaction between the enterprise A and the enterprise B occurs reasonably according to the fund transfer relation and the interpersonal relation, the attribute hierarchy m and the attribute hierarchy n, namely the target attribute hierarchy, can be clicked and selected, namely the data processing system can respond to the click input, namely the second input, and the storage position r stored in the attribute hierarchy m and the storage position s stored in the attribute hierarchy n, namely the first index information are determined. Then, the data processing system can respectively obtain the original relationship data of the fund transfer class and the original relationship data of the interpersonal relationship class between the enterprises A and B, namely the second original relationship data, from the storage position r and the storage position s in the relationship database according to the storage position r and the storage position s, so that the original relationship data of the fund transfer class and the original relationship data of the interpersonal relationship class between the enterprises A and B can be displayed for a first user to check, and whether the transaction between the enterprises A and B is reasonable or not can be analyzed by the first user according to the displayed data.
Based on the above, when comprehensive analysis is required from different service dimensions based on the same event, through the attribute hierarchy corresponding to the edges, the user can select the relationship needing to participate in the analysis and the original relationship type data which needs to participate in the analysis and can reflect the relationship. The implementation mechanism is that the original relational data is quickly collected in the relational database through the attribute hierarchy stored on the selected side recorded in the graph database and the index information of all the original relational data which is recorded on the attribute hierarchies and generates the relation, and through an open database interconnection (Open Database Connectivity, ODBC) interface protocol, so that a data set which needs to participate in analysis is found from mass data. Of course, the edges representing the relationship of a transaction can also be found by the transaction, so that other relationships between the two objects can be found, and related original relationship data can be found in the same way as described above.
In some embodiments, the data size of the original relational data may be very large, so when the user needs to acquire a part of the data, it may take a lot of time to retrieve the part of the data from the massive data, and before S210, the method may further include:
Responding to a third input of the first user to at least one piece of original relational data, and acquiring the at least one piece of original relational data;
for each piece of original relational data in at least one piece of original relational data, respectively executing the following steps to obtain a graph database:
determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data and third and fourth objects related to the original relational data according to the original relational data;
establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
creating a second attribute hierarchy which is the attribute hierarchy corresponding to the second edge;
the second index information is stored in a second attribute hierarchy.
Here, the range of the original relationship data to be acquired may be determined first, and a graph database may be generated for the original relationship data within the range. For example, a graph database may be generated for the raw relationship data for month 1 of 2022. Of course, the first user may randomly select some original relational data according to the requirement to generate the graph database, which is not limited herein. The third input may be an input determining at least one piece of original relational data for generating the graph database. The second attribute hierarchy may be used to store second index information of the original relational data in the relational database.
Specifically, the first user selects at least one piece of original relational data, the data processing system can respond to the selection input, namely, the third input, obtain the at least one piece of original relational data, and determine, for each piece of original relational data, a second attribute level corresponding to the original relational data, second index information of the original relational data, and third and fourth objects related to the original relational data according to a preset corresponding relationship between a type and an attribute level of the original relational data. Then, according to the corresponding relation between the pre-stored objects and the nodes, a third node corresponding to the third object and a fourth node corresponding to the fourth object can be determined, and since the original relational data relates to the third object and the fourth object, a relation exists between the third object and the fourth object, and an edge between the third node and the fourth node, namely, a second edge, can be established. An edge may be a memory space that may be divided into a plurality of sub-memory spaces, each of which may be a hierarchy of attributes for storing index information. Because the attribute hierarchy corresponding to the original relationship data is the second attribute hierarchy, the second attribute hierarchy may be created as the attribute hierarchy corresponding to the second side, and then the second index information of the original relationship data is stored in the second attribute hierarchy.
It should be noted that after an edge is established between the third node and the fourth node, even if other original relational data still relate to the third object and the fourth object, the edge between the third node and the fourth node is not established any more, that is, only one edge is established at most between the two nodes.
After executing the method on each piece of original relation data in at least one piece of original relation data, a graph database corresponding to the at least one piece of original relation data can be obtained.
In some examples, the first user may want to check 2022 for the presence of anomalous data in the 1 st month of raw relationship data, 2022 may be entered into the data processing system, and the data processing system may obtain 2022 all of the raw relationship data for 1 st month in response to this input, i.e., the third input, and process each of the raw relationship data separately. The process of processing each piece of original relationship data will be described by taking the original relationship data L as an example: according to the type of the original relation data L and the corresponding relation between the type and the attribute hierarchy, determining the attribute hierarchy w corresponding to the original relation data L, namely a second attribute hierarchy, and determining the storage position u of the original relation data L in the relation database, namely second index information, and the enterprise C and the enterprise D related to the original relation data L, namely a third object and a fourth object. Then, according to the corresponding relation between the prestored object and the node, a node C corresponding to the enterprise C, namely a third node, and a node D corresponding to the enterprise D, namely a fourth node, can be determined. Since the original relational data L relates to the enterprise C and the enterprise D, there is a relation between the enterprise C and the enterprise D, and an edge y between the node C and the node D, that is, a second edge, can be established. Then, since the attribute hierarchy corresponding to the original relationship data L is the attribute hierarchy w, the attribute hierarchy w can be created as the attribute hierarchy corresponding to the edge y, and the storage position u of the original relationship data L in the relationship database can be stored in the attribute hierarchy w. Thus, the processing of the original relational data L is completed, and the graph database can be obtained by performing the above processing on each piece of the obtained original relational data of 1 month in 2022.
Of course, it should also be noted that at most one edge is established between any two nodes. For example, if the original relational data of 1 month in 2022 has the original relational data H, the enterprises related to the original relational data H are the enterprises C and D, at this time, the edge between the node C and the node D is not required to be established again, but the storage position q of the original relational data H in the relational database is stored in the attribute hierarchy corresponding to the edge y, specifically, if the attribute hierarchy corresponding to the original relational data H is also the attribute hierarchy w, no new attribute hierarchy is required to be created, and the storage position q is directly stored in the created attribute hierarchy w; if the attribute hierarchy corresponding to the original relational data H is not the created attribute hierarchy but the attribute hierarchy p, a new attribute hierarchy p corresponding to the edge y needs to be created, and then the storage location q is stored in the newly created attribute hierarchy p.
In this way, the graph database generated through the process can store original relational data in the relational database, and only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connected between every two nodes in the graph database can be set, the edge is used for indicating the existence of a relation between objects corresponding to the two nodes connected with the edge, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of the original relational data of different types in the relational database, and therefore, when partial data between the two nodes is required to be acquired, the corresponding attribute layers can be selected without traversing the data stored in each edge between the two nodes, so that the required data can be acquired from massive data quickly, and the computing resources can be saved.
In addition, the storage space of the graph database can be saved by generating the graph database through the process.
Based on the graph number separation concept, in the process of ontology of the original relational data, only the object and business relation related to the original relational data are subjected to ontology redefinition, so that the knowledge reconstruction difficulty of the traditional original relational data is reduced.
In some embodiments, to facilitate rapid generation of the graph database, before the obtaining the at least one piece of original relational data in response to the third input of the at least one piece of original relational data by the first user, the method may further include:
acquiring an object list;
and according to the object list, establishing a node corresponding to each object in the graph database.
Here, the object list may be a list storing object data. In order to facilitate rapid generation of the graph database, a node corresponding to each object in the object list may be first established in the graph database. An object identifier may also be set for each object, and a node identifier may be set for each node. In addition, the correspondence between the object and the node may be stored, or the correspondence between the object identifier and the node identifier may be stored. Specifically, the nodes can be established in real time or periodically, or newly added nodes can be established before the graph database is generated each time.
In some examples, enterprise a, enterprise B, enterprise C, enterprise D, enterprise E, and enterprise F may be included in the object list. According to the object list, a node a corresponding to the enterprise A, a node B corresponding to the enterprise B, a node C corresponding to the enterprise C, a node D corresponding to the enterprise D, a node E corresponding to the enterprise E and a node F corresponding to the enterprise F can be established in the graph database.
Thus, through the process, the corresponding nodes of each object can be pre-established before the graph database is generated, so that the graph database can be generated quickly.
Based on the method, global nodes are established in the graph database layer, global identification is made for all nodes, edges with only graph topology communication relations between two nodes are agreed, all service attributes are set as attribute layers, index information of all original relation data generating the relations is stored through the attribute layers, and the minimization of the number of the edges between the two nodes is achieved. Minimizing the memory space of the graph and minimizing the computational effort in doing the connectivity analysis of the graph.
In some embodiments, the user wants to analyze whether the relationship between two objects is abnormal, and there may be an indirect relationship between the two objects, for example, there is a money transfer relationship between object a and object B, there is a money transfer relationship between object B and object C, and there may be an indirect money transfer relationship between object a and object C. While according to the relevant rule, there may not be such an indirect funds transfer relationship between object a and object C, then the relationship between object a and object B may be considered abnormal, and in order to determine whether there is an abnormal relationship between the two objects, after storing the second index information into the second attribute hierarchy, the method may further include:
Determining, in response to a fourth input of the first user to the fifth object, the sixth object, and the first condition, all first paths that communicate in the graph database a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object;
for each first path in all the first paths, the following steps are respectively executed:
judging whether a third side included in the first path meets a first condition;
the first path is displayed in a case where a third side included in the first path satisfies a first condition.
Here, the first condition may be determined by setting a definition condition on an attribute hierarchy of an edge included in the first path. For example, the first condition may be that the attribute hierarchy m and the attribute hierarchy n are set for each side included in the first path, or the first condition may be that at least one of the attribute hierarchy m and the attribute hierarchy n is set for each side included in the first path. Of course, the first condition may also be other limitation of the attribute hierarchy of the edge included in the first path according to the actual requirement, which is not limited herein. The first condition may be a condition satisfied by an edge included in an abnormal path to be searched by the first user. The fourth input may be an input inputting the fifth object, the sixth object, and the first condition, and may be an entry or click input, for example. The total first paths may be all paths in the graph database that connect the fifth node and the sixth node. Each first path may include all nodes and edges between the fifth node and the sixth node in the path, as well as the fifth node and the sixth node. The third side may be a side comprised by the first path, which may comprise one or more sides.
Specifically, the first user may select a fifth object and a sixth object in the data processing system, and set a first condition, where the data processing system may determine, according to a correspondence between a pre-stored object and a node, a fifth node corresponding to the fifth object in the graph database and a sixth node corresponding to the sixth object in the graph database, further determine all first paths that connect the fifth node and the sixth node, and then determine whether edges included in each first path, that is, third edges, meet the first condition, and if yes, display the first path, which indicates that the first path has an anomaly; if not, the first path is not displayed. Here, the first path having an abnormality may be displayed in a specific color on the basis of displaying the entire map database, or only the first path having an abnormality may be displayed, and the other paths may not be displayed. When the number of the abnormal first paths exceeds the preset number, that is, the number is excessive, in order to avoid the confusion of the display content, the preset number of the abnormal first paths can be displayed first, and then the rest abnormal first paths are displayed according to the input of the first user in a mode of displaying the preset number of the abnormal first paths each time.
In some examples, a path between enterprise E and enterprise F is indicated as abnormal if each edge included in the path has an attribute hierarchy m set. Based on this, if the first user wants to analyze whether there is an abnormal relationship between the enterprise E and the enterprise F, that is, whether there is an abnormal path, then the first user may select the enterprise E and the enterprise F, that is, the fifth object and the sixth object, in the data processing system, and set the first condition to: each edge is provided with an attribute hierarchy m. The data processing system can determine a node E corresponding to the enterprise E, namely a fifth node, and a node F corresponding to the enterprise F, namely a sixth node, according to the corresponding relation between the object and the node, and then can determine all first paths of the node E and the node F according to the communication relation of the graph database: path g and path j. And respectively judging whether each edge of the path g and the path j is provided with an attribute hierarchy m. For example, the path g is "node e, side x, node c, side h, node a, side o, node f", where the included sides include side x, side h, and side o, that is, the third side, where the sides x, h, and o each set the attribute hierarchy m, so that the third side included in the path g satisfies the first condition, and there is an anomaly in the path g, so that the path g is displayed. The judgment process for the path j is the same as the judgment process for the path g, but the third side included in the path j does not satisfy the first condition, and thus the path j is not displayed.
Therefore, the path between the two nodes can be determined according to the communication relation of the graph database through the process, and the abnormal path is screened out through setting the first condition, so that the abnormal relation between the two objects is rapidly determined.
In some embodiments, if it is determined that there is an abnormality in the relationship between the two objects, the abnormality cause needs to be analyzed by looking up corresponding original relationship data, and in order to facilitate the user to look up the corresponding original relationship data, after determining whether the third edge included in the first path meets the first condition, the method may further include:
displaying a third attribute hierarchy corresponding to a third side included in the first path under the condition that the third side included in the first path meets the first condition;
determining third index information stored in a fourth attribute hierarchy in response to a fifth input by the first user to the fourth attribute hierarchy;
and acquiring third original relational data from the relational database according to the third index information.
Here, the first path may include one or more third sides, each of which may correspond to one or more third attribute levels, the fourth attribute level may be an attribute level of a fourth one of the third sides, and the fifth input may be an input to click or enter the fourth attribute level. The third index information may be a storage location of the third original relational data in the relational database.
Specifically, if the third edge included in the first path meets the first condition, it indicates that the first path is an abnormal path, at this time, in addition to the nodes and the third edge included in the first path, a third attribute level corresponding to each third edge may be displayed, so that the first user selects a fourth attribute level in the third attribute levels according to experience, and then the data processing system may obtain, from the relational database, third original relational data that the first user wants to view according to third index information stored in the fourth attribute level selected by the first user. The third initial relationship data may then be displayed for viewing by the first user to analyze the cause of the anomaly.
In some examples, the edge x, the edge h, and the edge o included in the determination path g all satisfy the first condition, and thus, the attribute hierarchy m and the attribute hierarchy v corresponding to the edge x, the attribute hierarchy m and the attribute hierarchy n corresponding to the edge h, and the attribute hierarchy m and the attribute hierarchy p corresponding to the edge o may be displayed, respectively. The first user suspects that the original relation data T corresponding to the attribute hierarchy n corresponding to the edge h, namely the third original relation data, may have an abnormality according to experience, so that the attribute hierarchy n, namely the fourth attribute hierarchy, can be clicked, the data processing system responds to the click input, namely the fifth input, to determine a storage position i stored in the fourth attribute hierarchy, namely the third index information, and then obtains the original relation data T corresponding to the attribute hierarchy n from the relation database according to the storage position i, and displays the original relation data T so as to enable the first user to check and analyze the cause of the abnormality.
Therefore, when the abnormal path is found, the user can conveniently and quickly index the original relation type data to be checked, and the abnormal reason is quickly analyzed.
In some embodiments, it is required to analyze whether there is an abnormality in all relationships related to a certain object, and in order to determine whether there is an abnormality in all relationships related to a certain object, after storing the second index information in the second attribute hierarchy, the method may further include:
responsive to a sixth input by the first user of the seventh object and the second condition, determining a seventh node for the seventh object in the graph database;
determining all second paths through the seventh node;
for each second path in all the second paths, the following steps are respectively executed:
judging whether a fifth edge included in the second path meets a second condition;
and displaying the second path in the case that the fifth side included in the second path meets the second condition.
Here, the second condition may be determined by setting a definition condition on an attribute hierarchy of the edge included in the second path. For example, the second condition may be that the attribute hierarchy m and the attribute hierarchy n are set for each side included in the second path, or the second condition may be that at least one of the attribute hierarchy m and the attribute hierarchy n is set for each side included in the second path. Of course, the first condition may also be other limitation of the attribute hierarchy of the edge included in the second path according to the actual requirement, which is not limited herein. The second condition may be a condition satisfied by an edge included in the abnormal path to be searched by the first user. The sixth input may be an input inputting the seventh object and the second condition, for example, may be an enter or click input. The total second path may be all paths in the graph database that pass through the seventh node. Each second path may include all nodes and edges in the path. The fifth edge may be an edge included in the second path, and the fifth edge may include one or more edges.
Specifically, the first user may select a seventh object in the data processing system and set a second condition, and the data processing system may determine, according to a correspondence between a pre-stored object and a node, a seventh node corresponding to the seventh object in the graph database, further determine all second paths passing through the seventh node, and determine whether edges included in each second path, that is, fifth edges, respectively, satisfy the second condition, and if so, display the second path, which indicates that the second path has an abnormality; if not, the second path is not displayed. Here, the second path having an abnormality may be displayed in a specific color on the basis of displaying the entire map database, or only the second path having an abnormality may be displayed without displaying other paths. When the number of the abnormal second paths exceeds the preset number, that is, the number is too large, in order to avoid the confusion of the display contents, the preset number of the abnormal second paths can be displayed first, and then the rest abnormal second paths are displayed according to the input of the first user in a mode of displaying the preset number of the abnormal second paths each time.
In some examples, a path through enterprise E is indicated as being abnormal if each edge included in the path has an attribute hierarchy m set. Based on this, if the first user wants to analyze whether there is an abnormal relationship between the enterprise E and other enterprises, that is, whether there is an abnormal path, then the first user may select the enterprise E, that is, the seventh object, in the data processing system, and set the second condition as follows: each edge is provided with an attribute hierarchy m. The data processing system can determine a node E corresponding to the enterprise E, namely a seventh node, according to the corresponding relation between the object and the node, and then can determine all second paths passing through the node E according to the communication relation of the graph database: path g, path j, and path k. And respectively judging whether each edge of the path g, the path j and the path k is provided with an attribute hierarchy m. For example, the path g is "node e, side x, node c, side h, node a, side o, node f", where the included sides include side x, side h, and side o, that is, the fifth side, where the sides x, h, and o each set the attribute hierarchy m, so the fifth side included in the path g satisfies the second condition, and there is an abnormality in the path g, so the path g is displayed. The judgment process for the path j and the path k is the same as the judgment process for the path g, but the fifth sides included in the path k and the path j do not satisfy the second condition, and therefore the path k and the path j are not displayed.
In some examples, path g is "node e, edge x, node c, edge h, node a, edge o, node f", path k is "node e, edge x, node c, edge h, node a, edge o, node f, edge l, node b", and since edge x, edge h, and edge o each set attribute level m, the second condition is satisfied, path g may be displayed, and edge l does not set attribute level m, and therefore path k is not displayed.
Therefore, the path passing through the seventh node can be determined according to the communication relation of the graph database, and the abnormal path is screened out by setting the second condition, so that the abnormal relation between the seventh object and other objects can be rapidly determined.
In some embodiments, if it is determined that there is an abnormality in the relationship between the certain object and the other objects, the abnormality cause needs to be analyzed by looking up the corresponding original relationship data, and in order to facilitate the user to look up the corresponding original relationship data, after determining whether the fifth edge included in the second path meets the second condition, the method may further include:
displaying a fifth attribute hierarchy corresponding to a fifth edge included in the second path under the condition that the fifth edge included in the second path meets a second condition;
Determining fourth index information stored in a sixth attribute hierarchy in response to a sixth input by the first user to the sixth attribute hierarchy in the fifth attribute hierarchy;
and acquiring fourth original relational data from the relational database according to the fourth index information.
Here, the specific process of acquiring the fourth original relationship data is the same as the specific process of acquiring the third original relationship data, and will not be described herein.
Based on the above, when the service connectivity needs to be analyzed, the connectivity of the service can be firstly found through the connectivity analysis of the edges, then all the edges related to the service connectivity are selected, the attribute hierarchy related to the service connectivity and the storage positions of all the original service data recorded on the attribute hierarchy and used for generating the relationship are found through the edges, and the rapid collection of the original service data in the relational database is realized through the ODBC interface protocol, so that the data set needing to participate in the analysis is found from the massive data.
According to the embodiment of the application, the analyst can be helped to accurately lock the data range needing to be concerned by locking the analysis target through the relation and establishing the index of the graph, so that the huge calculation overhead of data traversal or large-table association of the traditional relation type database can be avoided, and the query efficiency is effectively improved. In addition, the query operation can be more targeted in the process of comprehensively analyzing and analyzing the service connectivity from different service dimensions based on the same event, so that the analysis and query efficiency is improved.
Based on the same inventive concept, the embodiment of the application also provides a data processing device. The following describes in detail a data processing apparatus according to an embodiment of the present application with reference to fig. 3.
Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
As shown in fig. 3, the data processing apparatus may include:
a first determining module 301, configured to determine, in response to a first input of first original relational data by a first user, a first object and a second object related to the first original relational data;
a second determining module 302, configured to determine a first node corresponding to a first object in the graph database and a second node corresponding to a second object in the graph database, where the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edge is used to indicate that a relationship exists between the objects corresponding to the two connected nodes, different attribute levels are set on the edge, and the different attribute levels are used to store index information of different types of original relational data;
a first display module 303, configured to display a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
A third determining module 304, configured to determine, in response to a second input of the first user to a target attribute level in the first attribute level, first index information stored in the target attribute level;
a first obtaining module 305, configured to obtain, from the relational database, second original relational data of the target attribute hierarchy between the first object and the second object according to the first index information;
a second display module 306, configured to display second original relational data.
The method comprises the steps of determining a first object and a second object related to first original relational data in response to first input of a first user on the first original relational data, determining a first node corresponding to the first object in a graph database and a second node corresponding to the second object in the graph database, displaying a first attribute layer corresponding to a first edge between the first node and the second node in the graph database, determining first index information stored in the target attribute layer in response to second input of the first user on the target attribute layer, acquiring second original relational data of the target attribute layer between the first object and the second object from the relational database according to the first index information, and displaying the second original relational data. Therefore, the original relational data is not required to be stored in the graph database, but the original relational data is stored in the relational database, and only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connected between every two nodes in the graph database can be set, the edge is used for indicating the existence of a relation between objects corresponding to the two connected nodes, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of the original relational data of different types in the relational database, and therefore, when partial data between the two nodes is required to be acquired, the corresponding attribute layers can be selected, the data stored in each edge between the two nodes is not required to be traversed, the efficiency of acquiring the required data from massive data is improved, and the calculation resources are saved.
In some embodiments, the data size of the original relational data may be very large, so when a user needs to acquire a part of the data, it may take a lot of time to retrieve the part of the data from the massive data, and in order to quickly acquire the required part of the data from the massive data, the apparatus may further include:
the second acquisition module is used for responding to the third input of the first user to the at least one piece of original relation data before the first object and the second object related to the target service data are determined in response to the first input of the first user to the target service data, and acquiring the at least one piece of original relation data;
a fourth determining module, configured to perform, for each piece of original relationship data in the at least one piece of original relationship data, respectively: determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data and third and fourth objects related to the original relational data according to the original relational data;
the first establishing module is configured to perform, for each piece of original relational data in the at least one piece of original relational data, respectively: establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
The creation module is used for respectively executing, for each piece of original relation data in the at least one piece of original relation data: creating a second attribute hierarchy which is the attribute hierarchy corresponding to the second edge;
a storage module, configured to, for each piece of original relational data in the at least one piece of original relational data, perform: and storing the second index information into a second attribute hierarchy to obtain a graph database.
In some embodiments, to facilitate rapid generation of the graph database, the apparatus may further comprise:
a third obtaining module, configured to obtain an object list before obtaining at least one piece of original relational data in response to a third input of the at least one piece of original relational data by the first user;
and the second establishing module is used for establishing the node corresponding to each object in the graph database according to the object list.
In some embodiments, the user wants to analyze whether the relationship between two objects is abnormal, and there may be an indirect relationship between the two objects, for example, there is a money transfer relationship between object a and object B, there is a money transfer relationship between object B and object C, and there may be an indirect money transfer relationship between object a and object C. While according to the relevant regulations, there may not be such an indirect funds transfer relationship between object a and object C, then the relationship between object a and object B may be considered abnormal, and to determine whether there is an abnormal relationship between the two objects, the apparatus may further comprise:
A fifth determining module, configured to determine, after storing the second index information in the second attribute hierarchy, all first paths that connect a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object in the graph database in response to a fourth input of the fifth object, the sixth object, and a first condition by setting a definition condition on the attribute hierarchy of the edge included in the first path;
the first judging module is configured to execute, for each first path in all the first paths, respectively: judging whether a third side included in the first path meets a first condition;
a third display module, configured to execute, for each of all the first paths, respectively: the first path is displayed in a case where a third side included in the first path satisfies a first condition.
In some embodiments, if it is determined that there is an abnormality in the relationship between the two objects, the abnormality cause needs to be analyzed by looking at the corresponding original relationship data, and in order to facilitate the user to look at the corresponding original relationship data, the apparatus may further include:
the fourth display module is used for displaying a third attribute hierarchy corresponding to the third edge included in the first path under the condition that the third edge included in the first path meets the first condition after judging whether the third edge included in the first path meets the first condition;
And a sixth determining module, configured to determine third index information stored in a fourth attribute hierarchy in response to a fifth input by the first user to the fourth attribute hierarchy in the third attribute hierarchy, where the fourth attribute hierarchy is a fourth attribute hierarchy in the third side.
And the fourth acquisition module is used for acquiring third original relational data from the relational database according to the third index information.
In some embodiments, it is required to analyze whether there is an abnormality in all relationships related to a certain object, and in order to determine whether there is an abnormality in all relationships related to a certain object, the apparatus may further include:
a seventh determining module, configured to determine, after storing the second index information in the second attribute hierarchy, a seventh node corresponding to the seventh object in the graph database in response to a sixth input of the seventh object and the second condition by the first user;
an eighth determining module, configured to determine all second paths through the seventh node;
the second judging module is configured to execute, for each second path in all the second paths, respectively: judging whether a fifth edge included in the second path meets a second condition or not, wherein the second condition is determined by setting a limiting condition on an attribute hierarchy of the edge included in the second path;
A fourth display module for executing, for each of all the second paths, respectively: and displaying the second path in the case that the fifth side included in the second path meets the second condition.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 4, the electronic device 4 is capable of implementing a data processing method and a structure diagram of an exemplary hardware architecture of the electronic device of the data processing apparatus according to an embodiment of the present application. The electronic device may refer to an electronic device in an embodiment of the present application.
The electronic device 4 may comprise a processor 401 and a memory 402 in which computer program instructions are stored.
In particular, the processor 401 described above may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits implementing embodiments of the present application.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. Memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 402 is a non-volatile solid state memory. In particular embodiments, memory 402 may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, memory 402 includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to a method in accordance with an aspect of the application.
The processor 401 implements any of the data processing methods of the above embodiments by reading and executing computer program instructions stored in the memory 402.
In one example, the electronic device may also include a communication interface 403 and a bus 404. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected to each other by a bus 404 and perform communication with each other.
The communication interface 403 is mainly used to implement communication between each module, device, unit and/or apparatus in the embodiment of the present application.
Bus 404 includes hardware, software, or both, that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 404 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device may execute the data processing method in the embodiment of the present application, thereby implementing the data processing method and apparatus described in connection with fig. 1 to 3.
In addition, in combination with the data processing method in the above embodiment, the embodiment of the present application may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; which when executed by a processor, implement any of the data processing methods of the above embodiments.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present application, and they should be included in the scope of the present application.

Claims (7)

1. A data processing method, wherein original relational data is stored in a relational database, and only index information of the original relational data in the relational database is stored in a graph database, comprising:
acquiring an object list;
according to the object list, establishing a node corresponding to each object in a graph database;
responding to a third input of the first user to at least one piece of original relational data, and acquiring the at least one piece of original relational data;
for each piece of original relational data in the at least one piece of original relational data, respectively executing the following steps to obtain the graph database:
Determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data and third and fourth objects related to the original relational data according to the original relational data;
establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
creating the second attribute hierarchy as the attribute hierarchy corresponding to the second edge;
storing the second index information into the second attribute hierarchy;
determining a first object and a second object related to first original relational data in response to first input of the first user on the first original relational data, wherein the first input is input for inputting or selecting identification of the first original relational data;
determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edge is used for indicating that a relationship exists between the objects corresponding to the two connected nodes, different attribute layers are arranged on the edge, and different attribute layers are used for storing index information of original relational data of different types;
Displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
determining first index information stored in a target attribute hierarchy in response to a second input by a first user to the target attribute hierarchy in the first attribute hierarchy;
acquiring second original relational data of a target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
and displaying the second original relation data.
2. The method of claim 1, wherein after said storing said second index information into said second attribute hierarchy, said method further comprises:
determining, in response to a fourth input by a first user of a fifth object, a sixth object, and a first condition, all first paths that communicate a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object in the graph database, the first condition being determined by setting a constraint on an attribute hierarchy of an edge included in the first paths;
for each first path in all the first paths, respectively executing the following steps:
Judging whether a third side included in the first path meets the first condition;
and displaying the first path under the condition that the third side included in the first path meets the first condition.
3. The method of claim 2, wherein after said determining whether the third side included in the first path satisfies the first condition, the method further comprises:
displaying a third attribute hierarchy corresponding to a third side included in the first path under the condition that the third side included in the first path meets the first condition;
determining third index information stored in a fourth attribute hierarchy in the third attribute hierarchy in response to a fifth input of a first user to the fourth attribute hierarchy, wherein the fourth attribute hierarchy is a fourth attribute hierarchy in the third side;
and acquiring third original relational data from the relational database according to the third index information.
4. The method of claim 1, wherein after said storing said second index information into said second attribute hierarchy, said method further comprises:
responsive to a sixth input by the first user of a seventh object and a second condition, determining a seventh node to which the seventh object corresponds in the graph database;
Determining all second paths through the seventh node;
for each second path in all the second paths, respectively executing the following steps:
judging whether a fifth edge included in the second path meets the second condition or not, wherein the second condition is determined by setting a limiting condition on the attribute hierarchy of the edge included in the second path;
and displaying the second path under the condition that the fifth edge included in the second path meets the second condition.
5. A data processing apparatus in which original relational data is stored in a relational database, and only index information of the original relational data in the relational database is stored in a graph database, the apparatus comprising:
the third acquisition module is used for acquiring an object list;
the second establishing module is used for establishing a node corresponding to each object in the graph database according to the object list;
the second acquisition module is used for responding to the third input of the first user on at least one piece of original relational data and acquiring the at least one piece of original relational data;
the processing module is used for respectively executing the following steps for each piece of original relational data in the at least one piece of original relational data to obtain the graph database: determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data and third and fourth objects related to the original relational data according to the original relational data; establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object; creating the second attribute hierarchy as the attribute hierarchy corresponding to the second edge; storing the second index information into the second attribute hierarchy;
The first determining module is used for responding to first input of a first user on first original relational data, determining a first object and a second object related to the first original relational data, wherein the first input is input for inputting or selecting identification of the first original relational data;
a second determining module, configured to determine a first node corresponding to the first object in a graph database and a second node corresponding to the second object in the graph database, where the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edge is used to indicate that a relationship exists between the objects corresponding to the two connected nodes, the edge is provided with different attribute levels, and different attribute levels are used to store index information of different types of original relational data;
the first display module is used for displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
a third determining module, configured to determine first index information stored in a target attribute hierarchy in the first attribute hierarchy in response to a second input of the first user to the target attribute hierarchy;
The first acquisition module is used for acquiring second original relational data of a target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
and the second display module is used for displaying the second original relation type data.
6. An electronic device, the device comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as claimed in any one of claims 1-4.
7. A computer storage medium, characterized in that the computer storage medium has stored thereon computer program instructions which, when executed by a processor, implement the data processing method according to any of claims 1-4.
CN202210390235.0A 2022-04-14 2022-04-14 Data processing method, device, equipment and medium Active CN114996297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210390235.0A CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210390235.0A CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114996297A CN114996297A (en) 2022-09-02
CN114996297B true CN114996297B (en) 2023-09-26

Family

ID=83023471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210390235.0A Active CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114996297B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1349081A1 (en) * 2002-03-28 2003-10-01 LION Bioscience AG Method and apparatus for querying relational databases
CN106874422A (en) * 2017-01-25 2017-06-20 东南大学 A kind of figure querying method of facing relation type database
CN109726305A (en) * 2018-12-30 2019-05-07 中国电子科技集团公司信息科学研究院 A kind of complex_relation data storage and search method based on graph structure
EP3511842A1 (en) * 2018-01-16 2019-07-17 Palantir Technologies Inc. Concurrent automatic adaptive storage of datasets in graph databases
CN111930958A (en) * 2020-07-13 2020-11-13 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN112507354A (en) * 2020-12-04 2021-03-16 北京神州泰岳软件股份有限公司 Graph database-based authority management method and system
CN112988752A (en) * 2021-03-29 2021-06-18 北京大米科技有限公司 Resource management method, device, storage medium and electronic equipment
CN112988915A (en) * 2021-01-27 2021-06-18 厦门市健康医疗大数据中心(厦门市医药研究所) Data display method and device
CN112988758A (en) * 2021-04-26 2021-06-18 北京芯愿景软件技术股份有限公司 Target object positioning method and device, electronic equipment and storage medium
CN113901279A (en) * 2021-12-03 2022-01-07 支付宝(杭州)信息技术有限公司 Graph database retrieval method and device
CN114116716A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Hierarchical data retrieval method, device and equipment
CN114118816A (en) * 2021-11-30 2022-03-01 建信金融科技有限责任公司 Risk assessment method, device and equipment and computer storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10579680B2 (en) * 2016-05-13 2020-03-03 Tibco Software Inc. Using a B-tree to store graph information in a database
US10445321B2 (en) * 2017-02-21 2019-10-15 Microsoft Technology Licensing, Llc Multi-tenant distribution of graph database caches

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1349081A1 (en) * 2002-03-28 2003-10-01 LION Bioscience AG Method and apparatus for querying relational databases
CN106874422A (en) * 2017-01-25 2017-06-20 东南大学 A kind of figure querying method of facing relation type database
EP3511842A1 (en) * 2018-01-16 2019-07-17 Palantir Technologies Inc. Concurrent automatic adaptive storage of datasets in graph databases
CN109726305A (en) * 2018-12-30 2019-05-07 中国电子科技集团公司信息科学研究院 A kind of complex_relation data storage and search method based on graph structure
CN111930958A (en) * 2020-07-13 2020-11-13 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN112507354A (en) * 2020-12-04 2021-03-16 北京神州泰岳软件股份有限公司 Graph database-based authority management method and system
CN112988915A (en) * 2021-01-27 2021-06-18 厦门市健康医疗大数据中心(厦门市医药研究所) Data display method and device
CN112988752A (en) * 2021-03-29 2021-06-18 北京大米科技有限公司 Resource management method, device, storage medium and electronic equipment
CN112988758A (en) * 2021-04-26 2021-06-18 北京芯愿景软件技术股份有限公司 Target object positioning method and device, electronic equipment and storage medium
CN114116716A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Hierarchical data retrieval method, device and equipment
CN114118816A (en) * 2021-11-30 2022-03-01 建信金融科技有限责任公司 Risk assessment method, device and equipment and computer storage medium
CN113901279A (en) * 2021-12-03 2022-01-07 支付宝(杭州)信息技术有限公司 Graph database retrieval method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Neo4j的煤矿领域知识图谱构建及查询方法研究;叶帅;信息科技辑;全文 *

Also Published As

Publication number Publication date
CN114996297A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
US11372851B2 (en) Systems and methods for rapid data analysis
CN104756107B (en) Using location information profile data
US9507824B2 (en) Automated creation of join graphs for unrelated data sets among relational databases
US10140325B2 (en) Data source identification mapping in blended data operations
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN111259627A (en) Document analysis method and device, computer storage medium and equipment
WO2016029230A1 (en) Automated creation of join graphs for unrelated data sets among relational databases
CN112753029A (en) System and method for graph-based query analysis
CN114153980A (en) Knowledge graph construction method and device, inspection method and storage medium
CN115328883A (en) Data warehouse modeling method and system
US11573987B2 (en) System for detecting data relationships based on sample data
WO2017065891A1 (en) Automated join detection
CN114996297B (en) Data processing method, device, equipment and medium
CN110990423B (en) SQL statement execution method, device, equipment and storage medium
CN113761185A (en) Main key extraction method, equipment and storage medium
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN107609110B (en) Mining method and device for maximum multiple frequent patterns based on classification tree
CN110008239A (en) Logic based on precomputation optimization executes optimization method and system
US20100121837A1 (en) Apparatus and Method for Utilizing Context to Resolve Ambiguous Queries
CN115658680A (en) Data storage method, data query method and related device
Cavoretto et al. Node-bound communities for partition of unity interpolation on graphs
Oujdi et al. C4. 5 decision tree algorithm for spatial data, alternatives and performances
Cai et al. Application of association rule algorithm in distributed new SQL database design
Shao et al. An abnormal data analysis and processing method for genealogy graph databases
US20220342879A1 (en) Data searching system, device, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant