Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
The embodiment of the specification provides a data processing method, which can be particularly applied to a system comprising a server and a terminal device. As can be seen in figure 1. The terminal equipment can be connected with the server in a wired or wireless mode to carry out data interaction.
Specifically, when a user (or an organization, a platform) wants to determine whether a first target object (e.g., a natural person or an associated company) related to a target enterprise is a beneficiary of the enterprise of the target enterprise (which may be referred to as a second target object), the user may first collect business data related to the target enterprise, such as an advertisement posted by the target enterprise, a contract or agreement established by the target enterprise with other entity objects, registration information of the target enterprise queried at an administrative authority, and the like. Wherein, the business data related to the target enterprise searched by the user cannot directly indicate that the first target object is the enterprise beneficiary of the target enterprise.
Further, the user can generate a corresponding data processing request according to the business data related to the target enterprise through the terminal device. The data processing request may carry the service data related to the target enterprise. The data processing request may specifically include a request data for requesting to predict whether the first target object is an enterprise beneficiary of the target enterprise (which is equivalent to predicting whether a relationship between the first target object and the second target object exists between the first target object and the second target object, where the first target object is the enterprise beneficiary of the second target object) by using the business data related to the target enterprise. And sending the data processing request to a server.
After receiving the data processing request, the server may obtain service data related to the target enterprise, which is carried in the data processing request. And predicting whether the first target object is an enterprise beneficiary of the target enterprise as a processing result by using business data related to the target enterprise in response to the data processing request, and giving interpretation data for supporting the processing result.
Specifically, the server may first construct a knowledge graph of the target enterprise according to the business data related to the target enterprise. The knowledge graph of the target enterprise includes a plurality of entity objects (e.g., natural persons, enterprises, cases, domain names, etc.) related to the target object, which are determined based on the business data, and known relationships between the entity objects, which are determined based on the business data (e.g., enterprise a is a sub-company of the target enterprise, a payment account of natural person B and a payment account of natural person C are friends, enterprise D and the target enterprise are companies with registered addresses, etc.). The target enterprise knowledge graph comprises a first target object and a second target object, and no known relationship capable of representing the first target object as the enterprise beneficiary of the second target object exists between the first target object and the second target object.
Further, the server may determine a plurality of paths leading from the first target object to the second target object according to the relationship type of the preset relationship to be predicted and according to the knowledge graph of the target enterprise. Furthermore, the server can process the plurality of paths by calling a pre-trained preset processing model comprising a characteristic layer, an LSTM layer and an Attention layer so as to determine whether the first target object is an enterprise beneficiary of the target enterprise as a processing result; meanwhile, the server can also determine a target path with strong relevance with the processing result from the paths through the preset processing model, and the target path is used as interpretation data for supporting the processing result so as to interpret the given processing result.
In specific implementation, the server may first call a feature layer in the preset processing model to perform feature processing on the multiple paths, respectively, so as to obtain multiple sets of path features corresponding to the multiple paths, respectively. For example, the identity of the entity object on each path, the entity type of the entity object, the relationship type of the known relationship between the entity objects, and the like are determined. And calling an LSTM layer in the preset processing model to respectively process the multiple groups of path features so as to obtain multiple feature vectors respectively corresponding to the multiple groups of path features. Then, calling an Attention layer in the preset processing model to calculate weighted values of a plurality of paths according to the plurality of feature vectors; calculating a judgment probability that a preset relation exists between the first target object and the second target object by using the Attention layer according to the paths and the weight values of the paths; and finally, outputting the judgment probability and the weighted values of the paths through a preset processing model.
The server can determine whether the first target object is an enterprise beneficiary of the target enterprise or not according to the judgment probability output by the preset processing model, and obtain a corresponding processing result; meanwhile, the server may further find one or more paths with the largest weight values from the plurality of paths as target paths with stronger association with the processing result according to the weight values of the paths and according to the weight values of the paths output by the preset processing model, and determine the target paths as the interpretation data for supporting the processing result.
The server may feed back the processing result to the terminal device together with the interpretation data.
The terminal device may receive and present the above-described processing results to the user, as well as interpretation data for supporting the processing results. The user can judge whether the processing result is reliable or not by combining the interpretation data displayed on the terminal equipment, and further determine whether to adopt and use the processing result or not.
In this embodiment, the server may specifically include a background server that is applied to a network platform side and is capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may also be a software program that runs in the electronic device and provides support for data processing, storage, and network interaction. In this embodiment, the number of servers included in the server is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.
In this embodiment, the terminal device may specifically include a front-end device that is applied to a user side and can implement functions such as data acquisition and data transmission. Specifically, the terminal device may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, a smart wearable device, and the like. Alternatively, the terminal device may be a software application capable of running in the electronic device. For example, it may be some APP running on a cell phone, etc.
Referring to fig. 2, an embodiment of the present disclosure provides a data processing method. The method can be applied to the server side in particular. In particular implementations, the method may include the following.
S201: business data related to the target enterprise is obtained.
In some embodiments, the business data related to the target enterprise may be specifically understood as data including an association relationship between the target enterprise and other entity objects. Specifically, the business data related to the target enterprise may include: an announcement posted externally by the target enterprise, a contract or agreement established by the target enterprise with other entity objects, information registered by the target enterprise at the management authority, and so forth. Of course, the above listed service data is only a schematic illustration. In specific implementation, according to a specific application scenario, other types of business data besides the listed business data may also be introduced as the business data related to the target enterprise. The present specification is not limited to these.
In some embodiments, when implemented, the server may receive a data processing request from the terminal device that cannot be addressed to the target enterprise, and obtain, from the data processing request, business data related to the target enterprise, which is provided by the terminal device.
The server may also actively collect and obtain business data associated with the target enterprise based on the identification information of the target enterprise (e.g., name, enterprise number, etc. of the target enterprise). For example, the server may access a website of the target enterprise and collect announcements of the target enterprise posted on the website as a type of business data associated with the target enterprise. The server may also query the target enterprise's registry information at the regulatory agency as a type of business data associated with the target enterprise. The server may also obtain, from other entity objects (e.g., businesses or natural persons, etc.) that have a collaboration with the target enterprise, an agreement, contract, or collaborative plan established by the other entity objects with the target enterprise, as a type of linguistic data associated with the target enterprise.
S202: constructing a knowledge graph of the target enterprise according to the business data related to the target enterprise; wherein the target business's knowledge-graph includes a plurality of entity objects related to the target business, including at least a first target object and a second target object, and known relationships between the entity objects.
In some embodiments, the knowledge-graph may be specifically understood as a data structure including a set of nodes and a set of edges between the nodes.
Further, the aforementioned target enterprise knowledge graph may be specifically understood as a data structure that includes a set of entity objects related to the target enterprise and a set of known relationships between the entity objects, which are determined based on business data related to the target enterprise. The entity object may correspond to a node in the target enterprise's knowledge graph, and the known relationship between the entity objects may correspond to a connecting edge between two nodes in the target enterprise's knowledge graph.
In some embodiments, the entity object may be specifically understood as an object directly or indirectly associated with the target enterprise, which is determined based on business data related to the target enterprise. Specifically, the entity object may include: businesses (e.g., subsidiaries of the target business, parent businesses of the target business, etc.), natural persons (e.g., board of the target business, legal of the target business, etc.), cases (e.g., risk events involved by the target business), domain names (e.g., ICP of a website registered by the target business, etc.), and the like. The risk event may specifically include an illegal action directly or indirectly participated in by the target enterprise. The icp (internet Content provider) may be a web Content provider, for example. Of course, the various entity types listed above are merely illustrative. In specific implementation, the entity object may further include other entity types according to a specific application scenario and specific characteristics of the target enterprise. The present specification is not limited to these.
In some embodiments, the known relationships described above may specifically understand relationships between entity objects that have been determined based on business data associated with the target business. Specifically, the relationship may include: corporate relationships for a business (e.g., natural person F is corporate for business Q, etc.), corporate sub-or parent relationships for a business (e.g., business a is a child of business U, business V is a parent of business U, etc.), friend relationships for payment accounts (e.g., natural person B's payment account and business C's payment account are friends), and the like. Of course, the various relationship types listed above are merely illustrative. In specific implementation, the relationship types may also include other relationship types according to specific application scenarios and specific characteristics of the target enterprise. For example, a registered address relationship of the enterprises (e.g., the registered address of the a enterprise is the U enterprise is the same), etc. The present specification is not limited to these.
In some embodiments, at least a first target object and a second target object to be predicted whether a preset relationship exists are further included in the plurality of entity objects in the knowledge graph of the target enterprise. And determining whether a preset relation exists between the first target object and the second target object based on the indication map of the target enterprise.
In some embodiments, the first target object may be a natural person X, the second target object may be a target enterprise Y, and the preset relationship may be a corporate beneficiary of the target enterprise Y that is the natural person X. Accordingly, the data processing goal of the server is to predict whether the natural person X is a business beneficiary of the target business Y, i.e., to determine whether the preset relationship described above exists between the natural person X (which may be denoted as a first target object X) and the target business Y (which may be denoted as a second target object Y).
Of course, the first target object, the second target object and the preset relationship listed above are only a schematic illustration. In specific implementation, the first target object, the second target object, and the preset relationship may also be other types of entity objects and other types of relationships according to different application scenarios and processing requirements. For example, the first target object may also be an enterprise M, the second target object may also be a target enterprise Y, and the preset relationship may also be a parent company of the enterprise M as the target enterprise Y. Accordingly, the data processing target of the server is to predict whether the business M is a parent company of the target business Y, or the like.
In some embodiments, referring to fig. 3, the above-mentioned building a knowledge graph of a target enterprise according to the business data related to the target enterprise may be implemented by the following steps: determining an entity object related to the target enterprise, an identity of the entity object, an entity type of the entity object, a known relationship between the entity objects and a relationship type of the known relationship according to the business data related to the target enterprise; establishing a node according to the identity of the entity object; and establishing connecting edges among the nodes according to the entity types of the entity objects, the known relations among the entity objects and the relation types of the known relations so as to obtain the corresponding knowledge graph of the target enterprise.
The identity of the entity object may be specifically understood as identification information corresponding to the entity object and capable of distinguishing other entity objects. Specifically, the identity of the entity object may be an identity ID of the entity object, or a name of the entity object.
In some embodiments, when the target enterprise knowledge graph is specifically constructed, a node marked with an identity of an entity object may be first established in the knowledge graph as a node corresponding to the entity object. Meanwhile, the entity type of the entity object can be marked on the node.
According to the method, after the nodes corresponding to the entity objects are established in the graph, two nodes corresponding to the two entity objects with known relations in the graph can be found according to the known relations between the entity objects, and the edges corresponding to the relation types are selected to connect the two nodes according to the relation types of the known relations.
Or two nodes corresponding to two entity objects with known relationship in the graph are found according to the known relationship between the entity objects, and the two nodes are connected by using a connecting edge; and marking the connected edges according to the relationship type of the known relationship so as to mark the relationship type corresponding to the connected edges.
According to the method, after the connecting edges corresponding to the known relation are connected, the knowledge graph of the target enterprise is obtained. The knowledge graph spectrum of the target enterprise comprises nodes of a plurality of entity objects related to the target enterprise, and the nodes of the entity objects with known relations are connected through connecting edges corresponding to the relation types.
S203: and determining a plurality of paths leading from the first target object to the second target object according to the knowledge graph of the target enterprise.
In some embodiments, the path may be a path obtained by combining and connecting edges between nodes by using a node corresponding to the first target object as a start node and a node corresponding to the second target object as an end node. Some of the paths may not have other nodes between the start node and the end node, and some of the paths may have other nodes between the start node and the end node.
In some embodiments, referring to fig. 3, in a specific implementation, according to a relationship type of a preset relationship, the server may traverse paths from a start node to an end node in the target enterprise's knowledge graph and connect through connecting edges corresponding to the known relationship to obtain multiple paths from the first target object to the second target object. For example, path 1, path 2, path 3 … …, path N.
In some embodiments, if the target enterprise knowledge graph is complex and the number of involved paths is large, the length threshold of the path to be traversed may be set according to the accuracy requirement. The length of the path may be specifically understood as the number of nodes (including a start node and an end node) on the path. The greater the number of nodes on a path, the longer the path length. Conversely, if the number of nodes on a path is smaller, the length of the path is longer.
And the server can only traverse the path with the length less than or equal to the length threshold of the path according to the length threshold of the path to find out a plurality of paths pointing to the second target object from the first target object. Therefore, the data processing amount of the server side can be reduced, and the whole processing efficiency is improved.
In some embodiments, specifically, for example, referring to fig. 4, in the knowledge graph of the target enterprise D, the following two paths from the first target object to the second target object may be determined by traversing the knowledge graph of the target enterprise D, with the node a (corresponding to the first target object: natural person a) as the starting node and the node D (corresponding to the second target object: target enterprise D) as the ending node: path 1 and path 2.
Path 1 may be specifically represented as: a (P1) B (P2) C (P3) D. Node B and node C are intermediate nodes located between node a and node D on path 1, respectively. The above-mentioned P1 can be expressed as a relationship type corresponding to a connecting edge between the node a and the node B. P2 may be expressed as a type of relationship corresponding to a connecting edge between node B and node C. P3 may be expressed as a type of relationship corresponding to a connecting edge between node C and node D. The path length of path 1 is 4.
Similarly, path 2 may be specifically expressed as: a (P2) E (P3) M (P4) D. Wherein, the node E and the node M are respectively intermediate nodes between the node a and the node D on the path 2. The above-mentioned P2 can be expressed as a relationship type corresponding to a connecting edge between the node a and the node E. P3 may be expressed as a type of relationship corresponding to a connecting edge between node E and node M. P4 may be expressed as a type of relationship corresponding to a connecting edge between node M and node D. The path length of path 2 is 4.
S204: and processing the plurality of paths by calling a preset processing model to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determining a target path associated with the processing result as interpretation data for the processing result.
In some embodiments, the preset processing model may be specifically understood as a data processing model trained in advance, and capable of predicting whether a preset relationship exists between a first target object and a second target object that do not originally have a preset relationship in a knowledge graph according to a plurality of paths determined based on the knowledge graph of the target enterprise. Meanwhile, the preset processing model can also output the weight values of the paths which are calculated and used when whether the preset relation exists between the first target object and the second target object is determined according to the paths.
Specifically, as shown in fig. 5, the preset processing model at least includes the following three-layer network structure: a feature layer, an LSTM layer, and an Attention layer. Wherein, the characteristic layer is connected with the LSTM layer, and the LSTM layer is connected with the Attention layer.
The feature layer may be a network layer responsible for feature extraction. The path features extracted by the feature layer are transmitted to the LSTM layer.
The LSTM (Long Short-Term Memory) layer may be specifically understood as a network layer that can learn a Long dependency relationship based on time cycle and can convert received path features into a feature vector corresponding to a path that can be interpreted by a model. The feature vectors obtained via the LSTM layer are transmitted to the Attention layer.
The Attention layer (also referred to as an Attention layer) may be specifically understood as a method that a plurality of paths can be comprehensively analyzed according to feature vectors through training and learning to determine weight values of the paths; and determining a network layer with a judgment probability of a preset relation between the first target object and the second target object according to the weight values of the paths and the paths. The Attention layer enables a model network to automatically learn and evaluate the weighted values of different paths by adopting an unsupervised mode through a full connection layer or a convolution layer.
In some embodiments, referring to fig. 3, in implementation, the server may call the preset processing model, and input a plurality of paths as model inputs into the preset processing model. A preset process model is run. The preset processing model can specifically process the input multiple paths through the multiple network layers, and finally output the judgment probability that the preset relationship exists between the first target object and the second target object and the weighted values of the multiple paths as model output.
In some embodiments, in specific implementation, when the server invokes a preset processing model to process the plurality of paths, the server may first invoke a feature layer in the preset processing model to perform feature processing on the plurality of paths, respectively, so as to obtain a plurality of sets of path features corresponding to the plurality of paths, respectively.
Wherein the path characteristics include at least one of: the identity of the entity objects on the path, the entity type of the entity objects, the relationship type of known relationships between the entity objects, and the like. Of course, the above listed path features are only illustrative. In particular implementations, other types of features than those listed above may also be introduced as path features. For example, the strength of a known relationship, an entity feature of an entity object, and the like may also be introduced as a path feature. The present specification is not limited to these.
In some embodiments, in the above manner, the feature processing may be performed on the multiple paths respectively, so as to obtain multiple sets of path features. Each set of path features may correspond to a path.
In some embodiments, in specific implementation, when the server invokes a preset processing model to process the plurality of paths, the server may further invoke an LSTM layer in the preset processing model to process the plurality of sets of path features respectively, so as to obtain a plurality of feature vectors corresponding to the plurality of sets of path features respectively.
The feature vector may be specifically understood as an implicit expression for a path based on path features; and the above-mentioned feature vector is the data that the model (or the Attention layer) can interpret and process. The feature vector corresponds to a path.
In some embodiments, in specific implementation, when the server calls a preset processing model to process the plurality of paths, the server may finally call an Attention layer in the preset processing model to calculate weight values of the plurality of paths according to the plurality of feature vectors; calculating a judgment probability that a preset relation exists between the first target object and the second target object according to the paths and the weight values of the paths by using the Attention layer; and outputting the decision probability and the weight values of the paths.
The weight value of the path may be used to represent a degree of dependence of the Attention layer on the path corresponding to the weight value when the plurality of paths are integrated to determine the decision probability.
Specifically, for example, if the weight value of a path is larger, when the Attention layer integrates a plurality of paths to determine the decision probability, the higher the dependency degree on the path is, and accordingly, the stronger the association between the path and the finally obtained decision probability is, the larger the interpretation effect of the path on the decision probability is. Conversely, if the weight value of a path is smaller, the degree of dependence on the path is lower when the Attention layer integrates a plurality of paths to determine the decision probability, and accordingly, the correlation between the path and the finally obtained decision probability is weaker, and the interpretation effect of the path on the decision probability is smaller.
In some embodiments, the server may determine, according to a model output of a preset processing model, whether the first target object and the second target object have a preset relationship as a processing result, and determine a target path associated with the processing result as interpretation data for the processing result.
In some embodiments, referring to fig. 3, the server may compare the determination probability that the first target object and the second target object in the model output have the preset relationship with a preset probability threshold to obtain a corresponding comparison result. According to the comparison result, in the case that it is determined that the determination probability is greater than or equal to the preset probability threshold, it may be determined that the first target object and the second target object have the preset relationship as the corresponding processing result. The specific numerical value of the preset probability threshold can be flexibly set according to historical data and specific precision requirements.
On the contrary, according to the comparison result, in the case where it is determined that the above determination probability is smaller than the preset probability threshold, it may be determined that the first target object and the second target object do not have the preset relationship as the corresponding processing result.
In some embodiments, referring to fig. 3, while determining the processing result according to the model output, the server may further select, according to the weight values of the plurality of paths in the model output, one or more paths with the largest weight value from the plurality of paths as the target path associated with the processing result. The target path may then be determined as interpretation data for the processing result.
Thus, the server can determine whether the first target object and the second target object have the preset relationship as the processing result (or called the prediction result); and determining a target path with stronger relevance as interpretation data supporting the processing result.
It should be noted that the target path determined and used as the interpretation data in the above manner is based on the path dimension to interpret the processing result, and is relatively more reasonable, has higher accuracy and reference value, and is more suitable for data processing scenarios like link prediction.
In some embodiments, for example, as shown in fig. 6, a path 1 (i.e., a (P1) B (P2) C (P3) D) and a path 2 (i.e., a (P2) E (P3) M (P4) D) are used as model inputs, a preset processing model is input, and the two paths are specifically processed by using the preset processing model.
During specific processing, feature extraction can be performed on the two paths through a feature layer in a preset processing model, so that two groups of path features are obtained and recorded as: a first set of path features (corresponding to path 1) and a second set of path features (corresponding to path 2). The two sets of path features are then input to the LSTM layer, which is connected to the feature layer.
Then, the two sets of path features are processed through an LSTM layer in a preset processing model, so as to generate corresponding feature vectors that can be interpreted by the model according to the two sets of path features, and the feature vectors are respectively recorded as: the feature vector for path 1 (corresponding to path 1) and the feature vector for path 2 (corresponding to path 2). The two feature vectors are then input to the Attention layer, which is connected to the LSTM layer.
Further, the two feature vectors are processed through an Attention layer in a preset processing model, so that the weight value of the path corresponding to the feature vector is determined according to the two feature vectors. For example, the weight value of path 1 is determined to be 0.8 according to the feature vector of path 1, and the weight value of path 2 is determined to be 0.2 according to the feature vector of path 2. Further, the two paths may be comprehensively used according to the weights of the two paths, and a determination probability that a connection edge (corresponding to a preset relationship, that is, a relationship in which the natural person a is an enterprise beneficiary of the target enterprise D) exists between the start node a (corresponding to the natural person a) and the end node D (corresponding to the target enterprise D) in the two paths is predicted to be 0.75.
Finally, referring to fig. 6, after the above processing, a preset processing model may output the above decision probability of 0.75, and the weight values of the two paths (i.e., the weight value of 0.8 for path 1 and the weight value of 0.2 for path 2) calculated by the Attention layer in the process of determining the above decision probability as model outputs. The server may obtain, from the model output, a determination probability 0.75 that the nature a and the target enterprise D have a preset relationship, and a weight value 0.8 of the path 1 and a weight value 0.2 of the path 2 that are determined and used by the preset processing model when determining the determination probability.
Further, the server may first compare the decision probability with a preset probability threshold (e.g., 0.6). According to the comparison result, determining that the judgment probability is greater than a preset probability threshold value, and judging that the preset relation exists between the nature A and the target enterprise D, namely obtaining a processing result: natural person a is the enterprise beneficiary of target enterprise D.
In addition, the server may numerically compare the weight value of path 1 with the weight value of path 2. According to the comparison result, when the weight value of the path 1 is determined to be greater than that of the path 2, it can be judged that the dependency degree on the path 1 is higher in the process that the preset processing model processes the two paths to obtain the judgment probability 0.75 (corresponding processing result: the natural person A is the enterprise beneficiary of the target enterprise D). Further, it can be determined that the path 1 has a higher degree of correlation with the processing result than the path 2. Therefore, the above path 1 can be determined as the interpretation data for the processing result.
The server may feed back the processing result to the user through the terminal device as interpretation data for the processing result. After the user determines that the natural person A is the enterprise beneficiary of the target enterprise D according to the processing result, the processing result can be verified by utilizing the interpretation data.
In addition, the user can further analyze and process the relationship between the entity objects in the knowledge graph of the target enterprise more specifically according to the processing result and the knowledge graph of the target enterprise in combination with the interpretation data, and detect the enterprise risk of the target enterprise according to the analysis and processing result so as to determine whether the target enterprise has enterprise risks such as money laundering risk, illegal trade risk, transaction credit risk and the like.
In the embodiment, a plurality of paths between a first target object and a second target object which are to be predicted and have a preset relationship are determined according to a knowledge graph of a target enterprise, which is constructed based on business data related to the target enterprise; and processing the plurality of paths by using a preset processing model, determining whether a preset relation exists between the first target object and the second target object according to the model output to serve as a processing result, and determining and utilizing a target path related to the processing result to serve as interpretation data for the processing result. Therefore, the processing result of whether the first target object and the second target object have the preset relationship can be determined, and simultaneously, more accurate and reasonable interpretation data about the processing result with higher reference value can be obtained.
In some embodiments, the building a target enterprise knowledge graph according to the business data related to the target enterprise may be implemented by: determining entity objects related to the target enterprise, identity marks of the entity objects, known relations among the entity objects and relation types of the known relations according to the business data related to the target enterprise; establishing a node according to the identity of the entity object; and establishing connecting edges between the nodes according to the known relationship between the entity objects and the relationship type of the known relationship so as to obtain the corresponding knowledge graph of the target enterprise.
In some embodiments, in specific implementation, the entity type of the entity object may be determined according to the business data related to the target enterprise, and the entity type of the entity object is marked on a node corresponding to the entity object in the graph. For example, a data tag is set on the node to characterize the entity type.
In some embodiments, the entity type may specifically include: enterprises, natural people, cases, domain names, etc.
In some embodiments, the relationship type may specifically include at least one of: corporate relationship of the enterprise, parent company relationship of the enterprise, co-registered address relationship, subsidiary account friend relationship, and the like.
Of course, the entity types and relationship types listed above are merely illustrative. In specific implementation, according to specific situations and processing requirements, other entity types and relationship types besides the listed entity types and relationship types may be included. The present specification is not limited to these.
In some embodiments, the determining a plurality of paths from the first target object to the second target object according to the knowledge graph of the target enterprise may include the following steps: according to the relationship type of the preset relationship, traversing paths which are connected through the known relationship and are from the starting node to the ending node in the knowledge graph of the target enterprise by taking the first target object as the starting node and the second target object as the ending node in the knowledge graph of the target enterprise.
In some embodiments, in order to reduce the amount of computation and improve the processing efficiency, some paths with longer path length and weaker relevance may be filtered in advance when finding the path. Specifically, a path length threshold may be set according to the precision requirement; and according to the relationship type of the preset relationship, when traversing and searching a path from the starting node to the ending node in the target enterprise knowledge graph, only traversing and searching a path with the path length less than or equal to the length threshold value of the path to obtain a plurality of paths meeting the requirements. Thereby effectively reducing the number of paths to be traversed.
In some embodiments, the preset process model may include at least: a feature layer, an LSTM layer, an Attention layer and other network structures.
In some embodiments, the anchoring layer in the preset processing model can be replaced by a posing layer (pooling layer) based on the anchoring mechanism. The above-mentioned Pooling layer may be divided into a maximum Pooling layer (Max Pooling), a Local Mean Pooling layer (Local Mean Pooling), a Global Mean Pooling layer (Global Mean Pooling), and the like, according to different calculation methods.
In some embodiments, the following may be included when the plurality of paths are processed by invoking a preset processing model to determine whether the first target object and the second target object have a preset relationship as a processing result, and determine a target path associated with the processing result as interpretation data for the processing result. And inputting the determined paths into a preset processing model by taking the determined paths as model input. And operating a preset processing model, calling a plurality of network layers in the preset processing model to specifically process the plurality of paths to obtain a judgment probability that the first target object and the second target object have a preset relation, and determining and using path weighted values of the plurality of paths in the process of determining the judgment probability by the preset processing model to output as the model. Further, whether a preset relation exists between the first target object and the second target object or not can be determined as a processing result according to the judgment probability in the model output; and simultaneously, according to the path weight of each path, one or more path groups with the strongest correlation with the processing result (or judgment probability) are screened out from the paths to be used as target paths, and the target paths are used as interpretation data aiming at the processing result.
In some embodiments, the processing the plurality of paths by invoking a preset processing model may include, in specific implementation: and calling a feature layer in the preset processing model to perform feature processing on the plurality of paths respectively so as to obtain a plurality of groups of path features corresponding to the plurality of paths respectively. Wherein the path characteristics include at least one of: the identity of the entity objects on the path, the entity type of the entity objects, the relationship type of known relationships between the entity objects, and the like. Wherein each set of path features corresponds to a path.
The above listed path features are only illustrative. In particular, other types of path characteristics may be introduced, as the case may be. E.g., physical characteristics of the physical object, etc.
In some embodiments, the entity characteristics of the entity object on the path may be used to replace the identity of the entity object on the path as a path characteristic to be input to a subsequent LSTM layer for processing.
In some embodiments, in order to make the length of data input to the LSTM layer through the feature layer consistent, before invoking the feature layer in the preset processing model to perform feature processing on the plurality of paths, the method may further include the following steps: detecting whether a length of the plurality of paths is equal to a length threshold matching the LSTM layer. When the length of the path is determined to be greater than the length threshold, the path may be cut off first; in the event that the length of the path is determined to be less than the length threshold, a padding process (e.g., padding) is performed on the path. So that the lengths of the multiple paths are matched to subsequent LSTM layers.
In some embodiments, the processing the plurality of paths by invoking a preset processing model may further include, in specific implementation: and calling an LSTM layer in the preset processing model to respectively process the multiple groups of path features so as to obtain multiple feature vectors respectively corresponding to the multiple groups of path features. The feature vectors correspond to a path, and the feature vectors can be specifically understood as data that can be interpreted and processed by the Attention layer.
In some embodiments, the processing the plurality of paths by invoking a preset processing model may further include, in specific implementation: calling an Attention layer in the preset processing model to calculate the weighted values of a plurality of paths according to the plurality of feature vectors; calculating a judgment probability that a preset relation exists between the first target object and the second target object according to the paths and the weight values of the paths by using the Attention layer; and outputting the judgment probability and the weighted values of the paths as a model output of a preset processing model.
In some embodiments, in implementation, after outputting the decision probability and the weight values of the multiple paths, the method may further include: comparing the judgment probability with a preset probability threshold value to obtain a comparison result; and determining whether a preset relation exists between the first target object and the second target object according to the comparison result to obtain a corresponding processing result.
If it is determined that the decision probability is greater than or equal to the preset probability threshold according to the comparison result, it may be determined that a preset relationship exists between the first target object and the second target object. If the judgment probability is determined to be smaller than the preset probability threshold according to the comparison result, it can be determined that the preset relationship does not exist between the first target object and the second target object.
In some embodiments, in implementation, after outputting the decision probability and the weight values of the multiple paths, the method may further include: according to the weight values of the paths, one or more paths with the maximum weight values are screened out from the paths to serve as target paths related to the processing results; determining the target path as interpretation data for the processing result.
Specifically, one or more paths with the largest weight values may be selected from the multiple paths according to the weight values of the multiple paths, and the selected path is used as the target path. The paths with the weight values larger than or equal to the preset weight threshold value can be screened out from the paths according to the weight values of the paths, and the paths can be used as target paths.
In some embodiments, in implementation, the target path with a higher correlation with the processing result may be determined as solution data for supporting the processing result. Of course, the interpretation data for interpreting the processing result may be generated based on the target route in combination with the business data related to the target enterprise.
In some embodiments, after determining whether the first target object and the second target object have a preset relationship as a processing result and a target path associated with the processing result as interpretation data for the processing result, the method further includes: and generating a data processing report about the target enterprise according to the interpretation data and the processing result. The data processing report of the target enterprise can be specifically used for characterizing the operation condition of the target enterprise, and/or the risk condition of the target enterprise, and/or the internal relationship of the target enterprise, and the like. Furthermore, the user can more comprehensively and accurately analyze the relevant conditions of the target enterprise according to the data processing report of the target enterprise.
In some embodiments, after obtaining the interpretation data, the user may use the interpretation data to determine a predetermined process model to obtain a judgment basis for the process result, and then the user may know why the predetermined process model provides the process result, and know a judgment logic inside the predetermined process model, so as to avoid using the predetermined process model as a black box, and thus, the predetermined process model may be better used for link prediction.
In some embodiments, after obtaining the interpretation data, the user may further use the interpretation data as a reference, and perform data processing in a related scene more comprehensively and accurately according to a corresponding processing result. For example, in a manual auditing scenario, whether a target enterprise has a corresponding enterprise risk or not may be comprehensively determined according to the above explained data and the processing result. For example, in the customer appeal scene, it can be comprehensively determined whether the customer appeal target needs to take the corresponding responsibility or not according to the interpretation data and the processing result.
In some embodiments, the preset relationship between the first target object and the second target object to be predicted to exist may specifically include: the first target object is an enterprise beneficiary of the second target object. Of course, the above-listed predetermined relationship is only an illustrative one. In specific implementation, the preset relationship may further include relationships of other relationship types according to specific application scenarios and processing requirements. For example, the preset relationship may be a parent company of which the first target object is the second target object, a creditor of which the first target object is the second target object, or the like.
In some embodiments, before implementation, model training may be performed according to sample data to establish a preset processing model.
In some embodiments, the preset processing model may be specifically established as follows.
S1: acquiring sample data, and determining a type label of the sample data according to the sample type of the sample data; wherein the sample types include positive samples and negative samples.
S2: and constructing a corresponding sample knowledge graph according to the sample data, and determining a plurality of sample paths from the sample knowledge graph.
S3: training an initial processing model according to the type label of the sample data and the plurality of sample paths to obtain a preset processing model meeting requirements; wherein the initial treatment model comprises at least an initial LSTM layer and an initial attention layer.
In some embodiments, in specific implementation, corresponding positive samples and negative samples may be obtained as sample data according to a preset relationship for a preset processing model to be trained.
Specifically, for example, the preset relationship for the preset processing model to be trained is the enterprise beneficiary with the first target object as the second target object.
At this time, business data related to the target enterprise B, for example, in the case where the natural person a is not an enterprise beneficiary of the target enterprise B, may be collected as negative sample data in a targeted manner, and a type tag indicating a negative sample may be set on the sample data. Business data related to the target enterprise D, such as a case where the enterprise C is a target enterprise D benefits, is collected as positive sample data, and a type tag indicating the positive sample is set on the sample data.
In some embodiments, in specific implementation, a corresponding knowledge graph of an enterprise may be constructed according to the sample data, and the constructed knowledge graph is used as a sample knowledge graph. And determining a plurality of sample paths according to the sample knowledge graph.
In some embodiments, in determining the sample path, it is considered that a directly found path may be absent from a sample knowledge-graph constructed based on negative sample data. For example, a path from a starting node C (corresponding business C) to an ending node D (corresponding target business D) connected by a set of edges corresponding to known relationships may not be found in the sample knowledge graph. In this case, the start node and the end node may be fixed, and then a path from the start node to the end node may be found as a sample path by means of a jump (e.g., an N-degree jump).
In some embodiments, the initial processing model may specifically include an initial feature layer, an initial LSTM layer, and an initial networking layer such as an initial Attention layer.
In some embodiments, the training of the initial processing model according to the type label of the sample data and the plurality of sample paths may include the following steps: and inputting a plurality of sample paths corresponding to one sample datum into the initial processing model to obtain corresponding probability values and weight values of the plurality of sample paths. And determining a processing result according to the probability value. And calculating a loss function according to the type label of the sample data, the processing result and the weight values of the plurality of sample paths. And according to the loss function, network parameters in the initial processing model, such as the network parameters of the feature layer, the network parameters of the LSTM layer, the network parameters of the Attention layer, etc., in the initial processing model are adjusted in a targeted manner until the value of the loss function is smaller than a preset error value.
According to the method, the initial processing model can be trained and adjusted for multiple times by utilizing a plurality of sample paths corresponding to a plurality of sample data, so that the preset processing model with the accuracy meeting the requirement is obtained.
As can be seen from the above, in the data processing method provided in the embodiments of the present specification, a plurality of paths between a first target object and a second target object, which are to be predicted to have a preset relationship, are determined according to a target enterprise knowledge graph constructed based on business data related to the target enterprise; and processing the plurality of paths by using a preset processing model, determining whether a preset relation exists between the first target object and the second target object according to the model output to serve as a processing result, and determining and utilizing a target path related to the processing result to serve as interpretation data for the processing result. Therefore, the processing result of whether the first target object and the second target object have the preset relationship can be determined, and simultaneously, more accurate and reasonable interpretation data about the processing result with higher reference value can be obtained.
Referring to fig. 7, an embodiment of the present specification further provides a data processing method. When the method is implemented, the following contents may be included.
S701: and acquiring target service data.
S702: constructing a target knowledge graph according to the target service data; wherein the target knowledge-graph comprises a plurality of data objects including at least a first target object and a second target object, and a plurality of known relationships between the data objects.
S703: and determining a plurality of paths leading from the first target object to the second target object according to the target knowledge graph.
S704: and processing the plurality of paths by calling a preset processing model to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determining a target path associated with the processing result as interpretation data for the processing result.
In some embodiments, the target service data may specifically be service data at least including a plurality of data objects, such as a first target object, a second target object, and so on, and a relationship between a feature of the data object and the plurality of data objects.
In some embodiments, the data object may specifically include at least one of: entity objects, account objects, merchandise objects, and so forth. Of course, the above listed data objects are only illustrative. In specific implementation, the method can also be applied to other types of data objects according to specific application scenarios and processing requirements. The present specification is not limited to these.
By the method, the processing result of whether the first target object and the second target object in the data object have the preset relationship can be determined efficiently, and meanwhile, more accurate and reliable interpretation data for supporting the processing result can be obtained.
Referring to fig. 8, an embodiment of the present disclosure further provides a data processing method. When the method is implemented, the following contents may be included.
S801: acquiring sample data, and determining a type label of the sample data according to the sample type of the sample data; wherein the sample types include positive samples and negative samples;
s802: constructing a corresponding sample knowledge graph according to the sample data, and determining a plurality of sample paths from the sample knowledge graph;
s803: training an initial processing model according to the type label of the sample data and the plurality of sample paths to obtain a preset processing model meeting requirements; wherein the initial processing model comprises at least an initial LSTM layer and an initial Attention layer; the preset processing model is used for determining the judgment probability that a first target object and a second target object in the knowledge graph have a preset relation and the weight values of a plurality of paths leading from the first target object to the second target object in the knowledge graph according to the knowledge graph.
By the method, a probability value capable of determining whether a preset relationship exists between the first target object and the second target object can be obtained through training, and meanwhile, a processing model capable of outputting the weight values of the multiple paths determined and used in the process of determining the probability value can be output, so that a user can be helped to determine whether a processing result of the preset relationship exists between the first target object and the second target object according to the probability value, and meanwhile, a target path with high relevance to the processing result can be screened out of the multiple paths according to the weight values of the multiple paths to serve as interpretation data supporting the processing result.
Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring business data related to a target enterprise; constructing a knowledge graph of the target enterprise according to the business data related to the target enterprise; wherein the target business's knowledge-graph includes a plurality of entity objects related to the target business, including at least a first target object and a second target object, and known relationships between the entity objects; determining a plurality of paths from the first target object to the second target object according to the knowledge graph of the target enterprise; and processing the plurality of paths by calling a preset processing model to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determining a target path associated with the processing result as interpretation data for the processing result.
In order to more accurately complete the above instructions, referring to fig. 9, another specific server is provided in the embodiments of the present specification, where the server includes a network communication port 901, a processor 902, and a memory 903, and the above structures are connected by an internal cable, so that the structures may perform specific data interaction.
The network communication port 901 may be specifically configured to obtain business data related to a target enterprise.
The processor 902 may be specifically configured to construct a knowledge graph of the target enterprise according to the service data related to the target enterprise; wherein the target business's knowledge-graph includes a plurality of entity objects related to the target business, including at least a first target object and a second target object, and known relationships between the entity objects; determining a plurality of paths from the first target object to the second target object according to the knowledge graph of the target enterprise; and processing the plurality of paths by calling a preset processing model to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determining a target path associated with the processing result as interpretation data for the processing result.
The memory 903 may be specifically configured to store a corresponding instruction program.
In this embodiment, the network communication port 901 may be a virtual port that is bound to different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor 902 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.
In this embodiment, the memory 903 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
The present specification further provides a computer storage medium based on the above data processing method, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring business data related to a target enterprise; constructing a knowledge graph of the target enterprise according to the business data related to the target enterprise; wherein the target business's knowledge-graph includes a plurality of entity objects related to the target business, including at least a first target object and a second target object, and known relationships between the entity objects; determining a plurality of paths from the first target object to the second target object according to the knowledge graph of the target enterprise; and processing the plurality of paths by calling a preset processing model to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determining a target path associated with the processing result as interpretation data for the processing result.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
Referring to fig. 10, in a software level, the embodiment of the present specification further provides a data processing apparatus, which may specifically include the following structural modules.
The obtaining module 1001 may be specifically configured to obtain business data related to a target enterprise.
The constructing module 1002 may be specifically configured to construct a knowledge graph of the target enterprise according to the business data related to the target enterprise; wherein the target business's knowledge-graph includes a plurality of entity objects related to the target business, including at least a first target object and a second target object, and known relationships between the entity objects.
The first determining module 1003 may be specifically configured to determine, according to the knowledge graph of the target enterprise, a plurality of paths from the first target object to the second target object.
The second determining module 1004 may be specifically configured to process the multiple paths by invoking a preset processing model, so as to determine whether a preset relationship exists between the first target object and the second target object as a processing result, and determine a target path associated with the processing result as interpretation data for the processing result.
In some embodiments, the preset process model may include at least: a feature layer, an LSTM layer, an Attention layer, and the like.
In some embodiments, the second determining module 1004 may be specifically configured to invoke a feature layer in the preset processing model to perform feature processing on the multiple paths respectively, so as to obtain multiple sets of path features corresponding to the multiple paths respectively; wherein the path characteristics include at least one of: the identity of the entity objects on the path, the entity type of the entity objects, the relationship type of known relationships between the entity objects, and the like.
In some embodiments, the second determining module 1004 may be further configured to call an LSTM layer in the preset processing model to respectively process the multiple sets of path features, so as to obtain multiple feature vectors respectively corresponding to the multiple sets of path features.
In some embodiments, the second determining module 1004 may be further configured to call an Attention layer in the preset processing model to calculate weight values of a plurality of paths according to the plurality of feature vectors; calculating a judgment probability that a preset relation exists between the first target object and the second target object according to the paths and the weight values of the paths by using the Attention layer; and outputting the decision probability and the weight values of the paths.
In some embodiments, the second determining module 1004 may be further configured to, according to the weight values of the multiple paths, screen one or more paths with a largest weight value from the multiple paths as target paths associated with the processing result; and using the target path as the interpretation data for the processing result.
In some embodiments, when implemented specifically, the building module 1002 may be configured to determine, according to the business data related to the target enterprise, an entity object related to the target enterprise, an identity of the entity object, a known relationship between the entity objects, and a relationship type of the known relationship; establishing a node according to the identity of the entity object; and establishing connecting edges between the nodes according to the known relationship between the entity objects and the relationship type of the known relationship so as to obtain the corresponding knowledge graph of the target enterprise.
In some embodiments, the apparatus may further include a training module, which may be configured to establish a preset processing model according to the following manner: acquiring sample data, and determining a type label of the sample data according to the sample type of the sample data; wherein the sample types include positive samples and negative samples; constructing a corresponding sample knowledge graph according to the sample data, and determining a plurality of sample paths from the sample knowledge graph; training an initial processing model according to the type label of the sample data and the plurality of sample paths to obtain a preset processing model meeting requirements; wherein the initial treatment model comprises at least an initial LSTM layer and an initial attention layer.
It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
As can be seen from the above, the data processing apparatus provided in the embodiment of the present specification can enable a user to obtain more accurate and reasonable interpretation data about a processing result with a higher reference value while obtaining the processing result indicating whether the first target object and the second target object have the preset relationship.
Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.