CN116976902A

CN116976902A - Data processing method, device, equipment and readable storage medium

Info

Publication number: CN116976902A
Application number: CN202310197894.7A
Authority: CN
Inventors: 陈萍
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-10-31

Abstract

The application discloses a data processing method, a device, equipment and a readable storage medium, comprising the following steps: carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model to obtain an attribute result and an estimated isomorphic network graph aiming at a sample service object set; parameter adjustment is carried out on the optimization graph structure learning model according to the attribute labels and the attribute results aiming at the sample service object set, and an adjusted optimization graph structure learning model is obtained; estimating the isomorphic network diagram, wherein the isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next training iteration when the adjusted diagram structure learning model does not meet the convergence condition; and if the adjusted optimization graph structure learning model meets the convergence condition, determining a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model. By adopting the method and the device, the identification coverage rate and the identification accuracy rate of illegal service objects can be improved.

Description

Data processing method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium.

Background

In the commercial system of online merchant transactions, illegal merchants exist, and the illegal merchants seriously influence the data security of users, so that the identification of the illegal merchants is important.

In the existing technical scheme for identifying illegal merchants, abnormal transactions are usually determined from massive transaction flow information by means of identification rules set by manual experience, and then sources of the abnormal transactions are identified and analyzed, so that illegal merchant accounts for generating the abnormal transactions are marked, and merchant accounts associated with the marked illegal merchant accounts can be quickly identified. However, the number of identification rules set based on manual experience is very limited, the number of illegal merchant accounts which can be determined is limited, the illegal merchant accounts can be easily identified, and the illegal merchant can frequently replace the merchant accounts to avoid the identification rules, so that the identification coverage rate and the accuracy rate of the illegal merchant accounts are lower.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment and a readable storage medium, which can improve the identification coverage rate and the identification accuracy rate of illegal service objects.

In one aspect, an embodiment of the present application provides a data processing method, including:

Acquiring a sample heterogeneous network diagram aiming at a sample service object set; nodes in the sample heterogeneous network graph respectively correspond to different sample service objects in the sample service object set; the edge weight of the connecting edge between the two nodes is used for representing the association degree of the two sample business objects;

carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model to obtain an attribute result aiming at a sample service object set and an estimated isomorphic network graph generated based on the attribute result;

parameter adjustment is carried out on the optimization graph structure learning model according to the attribute labels and the attribute results aiming at the sample service object set, and an adjusted optimization graph structure learning model is obtained; estimating the isomorphic network diagram, wherein the isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next training iteration when the adjusted diagram structure learning model does not meet the convergence condition;

if the adjusted optimization graph structure learning model meets the convergence condition, determining a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model; the node attribute prediction model is used for performing attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

the acquisition module is used for acquiring a sample heterogeneous network diagram aiming at the sample service object set; nodes in the sample heterogeneous network graph respectively correspond to different sample service objects in the sample service object set; the edge weight of the connecting edge between the two nodes is used for representing the association degree of the two sample business objects;

the optimization module is used for carrying out graph structure optimization processing on the sample heterogeneous network graph through the optimization graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result;

the adjustment module is used for carrying out parameter adjustment on the optimization graph structure learning model according to the attribute labels and the attribute results aiming at the sample service object set to obtain an adjusted optimization graph structure learning model; estimating the isomorphic network diagram, wherein the isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next training iteration when the adjusted diagram structure learning model does not meet the convergence condition;

the determining module is used for determining a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model if the adjusted optimization graph structure learning model meets the convergence condition; the node attribute prediction model is used for performing attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set.

Wherein, acquire the module, include:

the information acquisition unit is used for acquiring at least two pieces of history service information;

the object set generating unit is used for generating a historical service object set according to the historical service objects contained in the at least two pieces of historical service information;

the trusted filtering unit is used for performing trusted filtering processing on the historical service object set to obtain a sample service object set;

the correlation determining unit is used for determining the degree of correlation between every two sample service objects in the sample service object set according to the historical service information associated with the sample service object set in the at least two historical service information;

and the diagram construction unit is used for constructing a sample heterogeneous network diagram aiming at the sample service object set according to the sample service object set and the association degree.

Wherein the trusted filter unit comprises:

an object determination subunit, configured to obtain a trusted service object set; the set of trusted business objects includes one or more trusted business objects;

the object determining subunit is further configured to determine, as the historical service object to be filtered, the historical service object in the set of historical service objects, which is the same as any one of the one or more trusted service objects;

And the deleting subunit is used for deleting the historical service object to be filtered from the historical service object set to obtain a sample service object set.

The sample service object set comprises a sample service object M and a sample service object N; the historical service information comprises service attribute information corresponding to a historical service object;

an association determination unit including:

an information obtaining subunit, configured to obtain, from at least two pieces of history service information, service attribute information associated with the sample service object M as first service attribute information;

the information acquisition subunit is further configured to acquire, from at least two pieces of historical service information, service attribute information associated with the sample service object N as second service attribute information;

and the first determining subunit is used for determining the association degree between the sample service object M and the sample service object N according to the coincidence degree of the first service attribute information and the second service attribute information.

The sample service object set comprises a sample service object M and a sample service object N; a history service information comprises a service interaction relation between a history service object and a service interaction object;

an association determination unit including:

The object acquisition subunit is used for determining a service interaction object with a service interaction relation with the sample service object M according to at least two pieces of historical service information, and taking the service interaction object with the service interaction relation with the sample service object M as a first service interaction object set;

the object acquisition subunit is further configured to determine, according to at least two pieces of historical service information, a service interaction object having a service interaction relationship with the sample service object N, and use the service interaction object having the service interaction relationship with the sample service object N as a second service interaction object set;

a calculating subunit, configured to determine, according to the first service interaction object set and the second service interaction object set, a service reconciliation average degree between the sample service object M and the sample service object N;

the second determining subunit is configured to determine, according to the traffic harmony average, a degree of association between the sample traffic object M and the sample traffic object N if the traffic harmony average is greater than or equal to the harmony average threshold;

the second determining subunit is further configured to determine the default association degree as the association degree between the sample service object M and the sample service object N if the service mediation average degree is less than the mediation average threshold.

The computing subunit is specifically configured to determine, as a common service interaction object, a service interaction object that is the same in the first service interaction object set and the second service interaction object set; determining a first common object proportion according to the first service interaction object set and the common service interaction object; determining a second common object proportion according to the second service interaction object set and the common service interaction object; and determining the business reconciliation average degree between the sample business object M and the sample business object N according to the first common object proportion and the second common object proportion.

The computing subunit is specifically further configured to determine a product between the first common object proportion and the second common object proportion as a first numerical value; determining a sum of the first common object scale and the second common object scale as a second value; the ratio between the first value and the second value is determined as the traffic harmony measures between the sample traffic object M and the sample traffic object N.

Wherein the attribute result comprises an attribute prediction result and an attribute observation result; the optimized graph structure learning model comprises a node attribute prediction layer, a node attribute observation layer and a graph estimation layer; the node attribute prediction layer comprises a K-layer node attribute prediction sub-layer; k is a positive integer;

An optimization module comprising:

the prediction unit is used for carrying out attribute prediction processing on the sample heterogeneous network map through a K-layer node attribute prediction sublayer to obtain K node hidden layer characteristics and attribute prediction results aiming at a sample service object set;

the observation unit is used for respectively carrying out attribute observation treatment on the sample heterogeneous network graph and K node hidden layer characteristics through the node attribute observation layer to obtain an attribute observation result;

the estimation unit is used for carrying out expected maximization estimation processing on the attribute prediction result, the attribute observation result and the attribute label aiming at the sample service object set through the graph estimation layer to obtain an estimation isomorphic adjacency matrix;

and the filtering unit is used for carrying out isomorphic filtering processing on the estimated isomorphic adjacent matrix according to the isomorphic threshold value to obtain an estimated isomorphic network diagram.

Wherein estimating the isomorphic adjacency matrix includes estimating an isomorphic adjacency value Q _ij I is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; j is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; estimating isomorphic neighbor value Q _ij The method comprises the steps of representing the probability that a connecting edge exists between an ith node and a jth node in a sample heterogeneous network diagram;

A filtration unit comprising:

an updating subunit, configured to traverse the estimated isomorphic adjacency matrix and obtain an estimated isomorphic adjacency value Q _ij ；

Update sonA unit for estimating isomorphic adjacent value Q _ij Less than the isomorphism threshold, then isomorphism neighbor value Q will be estimated _ij Updating the default isomorphic neighbor value; the default isomorphic neighbor value is used for representing that no connecting edge exists between an ith node and a jth node in the sample heterogeneous network diagram;

and the diagram determining subunit is used for determining the estimated isomorphic network diagram according to the updated estimated isomorphic adjacent matrix when the estimated isomorphic adjacent matrix is traversed.

Wherein, adjustment module includes:

the first adjusting unit is used for adjusting parameters of the node attribute prediction layer based on the attribute prediction result and the attribute label to obtain an adjusted node attribute prediction layer;

the second adjusting unit is used for carrying out parameter adjustment on the graph estimation layer based on the attribute prediction result, the attribute observation result and the attribute label to obtain an adjusted graph estimation layer;

and the adjustment determining unit is used for determining the adjusted node attribute prediction layer, the adjusted graph estimation layer and the node attribute observation layer as an adjusted optimized graph structure learning model.

Wherein, above-mentioned data processing apparatus still includes:

the model application module is used for acquiring at least two target business information associated with the target business object set;

the model application module is also used for constructing a target heterogeneous network diagram aiming at the target service object set according to at least two target service information;

the model application module is also used for carrying out attribute prediction processing on the target heterogeneous network graph through the node attribute prediction model to obtain a target attribute prediction result aiming at the target service object set;

and the model application module is also used for respectively carrying out service processing on the target service objects in the target service object set according to the target attribute prediction result.

In one aspect, an embodiment of the present application provides a computer device, including: a processor, a memory, a network interface;

the processor is connected to the memory and the network interface, where the network interface is used to provide a data communication network element, the memory is used to store a computer program, and the processor is used to call the computer program to execute the method in the embodiment of the present application.

In one aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, the computer program being adapted to be loaded by a processor and to perform a method according to embodiments of the present application.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium, the computer instructions being read from the computer-readable storage medium by a processor of a computer device, the computer instructions being executed by the processor, causing the computer device to perform a method according to an embodiment of the present application.

In the embodiment of the application, a sample heterogeneous network diagram aiming at a sample service object set is firstly obtained, nodes in the sample heterogeneous network diagram respectively correspond to different sample service objects in the sample service object set, and the edge weight of a connecting edge between two nodes in the sample heterogeneous network diagram is used for representing the association degree of the two sample service objects; then carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result, and carrying out parameter adjustment on the optimized graph structure learning model according to an attribute tag aiming at the sample service object set and the attribute result to obtain an adjusted optimized graph structure learning model; if the adjusted optimization graph structure learning model meets the non-convergence condition, continuing to carry out parameter adjustment on the adjusted optimization graph structure learning model according to the estimated isomorphic network graph; if the adjusted optimization graph structure learning model complement meets the convergence condition, a node attribute prediction layer in the adjusted optimization graph structure learning model can be determined to be a node attribute prediction model, and the node attribute prediction model can be used for carrying out attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set. In the embodiment of the application, the optimization graph structure learning model can extract the similarity relationship among the nodes in the sample heterogeneous network graph, convert the similarity relationship into the estimated isomorphic network graph, then continuously extract the similarity relationship among the nodes in the estimated isomorphic network graph based on the adjusted optimization graph structure learning model, iterate the steps to finally obtain the node attribute prediction model capable of accurately identifying the node attribute, and aiming at a target service object set containing a large number of target service objects with different attributes, the node attribute prediction model can directly excavate the similarity relationship among the nodes in the target heterogeneous network graph aiming at the target service object set to obtain a target attribute prediction result, thereby improving the identification coverage rate and the identification accuracy rate of illegal service objects.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a scenario for batch identification of business objects according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an optimization graph structure learning model according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a data processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of construction of a sample heterogeneous network diagram according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Deep Learning (DL) is an inherent rule and presentation hierarchy of Learning sample data, and information obtained in these Learning processes greatly helps interpretation of data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing technology, machine learning, deep learning and other technologies, and is specifically described by the following embodiment.

Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the system may comprise a service server 100 and a terminal device cluster 10, and the terminal device cluster 10 may comprise a terminal device 10a, terminal devices 10b, …, and a terminal device 10n, wherein a communication connection may exist between the terminal device cluster 10, for example, a communication connection exists between the terminal device 10a and the terminal device 10b, a communication connection exists between the terminal device 10b and the terminal device 10n, and any terminal device in the terminal device cluster 10 may exist with the service server 100, for example, a communication connection exists between the terminal device 10a and the service server 100, and a communication connection exists between the terminal device 10b and the service server 100.

It should be understood that each terminal device in the terminal device cluster 10 shown in fig. 1 may be installed with an application client of the target application, and when the application client of the target application runs in each terminal device, data interaction may be performed between the application client and the service server 100 shown in fig. 1, so that the service server 100 may receive service data from each terminal device. The application client of the target application can be an application client of a game application, a video editing application, a live broadcast application, a short video application, a music application, a shopping application, a novel application, a payment application, a browser and the like, which has the function of displaying data information such as characters, images, audio and video. The application client may be an independent client, or may be an embedded sub-client integrated in a client (such as a game client, a payment client, a video client, etc.), which is not limited herein.

As shown in fig. 1, each terminal device in the terminal device cluster 10 may obtain a service request of a service object through an application client running the target application, and then send the service request to the service server 100. The service object refers to an object with a target application use authority, so one service user may correspond to a plurality of service objects, for example, the service object may be an application account, and one service user may register a plurality of application accounts. The service request may be an asset transfer request for transferring an asset associated with a service object to another service object, such as a transfer, reservation, commodity purchase, etc. For example, after receiving a service request sent by each terminal device through an application client, the service server 100 performs a corresponding asset transfer operation according to the service request, that is, transfers an asset associated with a service object to another service object, and generates corresponding service information, where the service server 100 records the service information. The service information may include information such as service initiation time, service type, etc. It can be understood that some illegal service objects obtained by illegal users through fake data registration and other modes may exist in the service objects, the illegal users obtain assets of service users corresponding to other service objects through the illegal service objects, and if the illegal service objects are allowed to continue to be used, more service users have asset losses.

In one possible embodiment, taking a gaming application as an example, the business object may be referred to as a gaming account number. Some scarce game equipments obtained by copying or other means can be transferred between game account numbers, so that many high-handed players can obtain the game equipments by brushing copies and the like, and then trade with the ordinary players through the game application, namely, after the ordinary players pay a certain amount of resources, the game equipments are transferred into the game account numbers of the ordinary players. However, some illegal users do not actually have game equipments, and after the ordinary player pays the corresponding amount of the funds to the ordinary player through the game application, the agreed game equipments are not transferred to the ordinary player and continue to trade with the next ordinary player; in addition, aiming at the easy-to-use common player, the illegal user can also replace the game account number to continuously contact the common player and illegally acquire the asset.

In one possible embodiment, taking a payment application as an example, the business object may be referred to as a merchant account. Because of the concealment of the merchant account in the payment application, the illegal user can complete communication of contents and transfer of assets of illegal transactions through the payment application, and even if the common user finds that the transactions are problematic, the illegal user is difficult to detect according to the merchant account of the illegal user, because the illegal user can register the merchant account by using fake data, or purchase and rent merchant accounts of other users. In addition, for concealment, illegal users generally register merchant accounts in batches, and a plurality of merchant accounts are used for dispersing the flow direction of the asset in the process of collecting money for common users, so that the detection of excessive amount of inflow assets is avoided.

In order to prevent the illegal user from obtaining a plurality of illegal service objects in batches, the service server 100 may periodically identify the attribute of the service object after performing illegal service through the target application, thereby marking the illegal service object, and restricting the service request of the marked service object, thereby protecting the asset of the normal service object. Referring to fig. 2 together, fig. 2 is a schematic view of a scenario of batch identification of service objects according to an embodiment of the present application.

As shown in fig. 2, any terminal device in the terminal device cluster 20 (which may be the terminal device 10 shown in fig. 1) may send a service request to the service server 200 (which may be the service server 100 shown in fig. 1) through an integrally installed target application, where the service server 200 responds to the received service request, performs a responsive service operation, generates corresponding service information, and writes the service information into the service log information 201. For example, within a target period of time (e.g., 1 day), terminal device 20a may send service request 1 and service request 2 to service server 200, terminal device 20b may send service request 3, … to service server 200, and terminal device 20n may send service request m to service server 200. It will be appreciated that different terminal devices may send different service requests to the service server 200 within the target time period, and further, the service objects associated with different service requests sent by the same terminal device may be different, for example, service request 1 may be associated with service object 1 and service request 2 may be associated with service object 2. After the service server 200 responds to the service request 1, the responsive service information 1 is generated, and the service information 1 may include the service object 1 associated with the service request 1 and other service related information, such as service start time, service type, and the like. Similarly, after the service server 200 responds to other service requests, other service information, for example, service information 2, … and service information m, is written in the service log information 201.

As shown in fig. 2, after determining that the target period of time is reached, the service server 200 may obtain m pieces of service information in the service log information 201, and then determine, according to the m pieces of service information, a service object set 202 that needs to determine an attribute, where the service object set 202 may include a service object associated with each piece of service information. The service server 200 may then construct a heterogeneous network map 203 for the set of service objects 202 from the m pieces of service information. Each node in the heterogeneous network map 203 corresponds to a business object, e.g., node 2031 may correspond to business object 1 and node 2032 may correspond to business object 2. In addition, the edge weights corresponding to the edges between two nodes in the heterogeneous network graph 203 are used to characterize the association degree of two business objects, and the greater the edge weight, the greater the association degree may be. For example, an edge weight of 0.3 for an edge between node 2031 and node 2032 may be used to characterize the degree of association of business object 1 and business object 2. It is understood that the degree of association of business object 1 and business object 2 may be determined based on business information associated with business object 1 and business object 2. After obtaining the heterogeneous network map 203 for the service object set 202, the service server 204 may input the heterogeneous network map 203 into the node attribute prediction model 204, that is, perform attribute prediction processing on the heterogeneous network map 203 through the node attribute prediction model 204, to obtain an attribute prediction result 205. The attribute prediction result 205 may include an anomaly (or risk) probability corresponding to each node, that is, an anomaly probability of each service object. When the anomaly probability is higher than the anomaly threshold, the service server 200 may determine that the service object is an illegitimate object or an anomalous object, and the service server 200 may directly mark the illegitimate object and reject responding to the service request associated with the illegitimate object. The process of obtaining the node attribute prediction model 204 may refer to a model training process described in step S101 to step S104 in the embodiment corresponding to fig. 3 below; the process of determining the degree of association between business objects may be described in the following steps S201-S204 in the embodiment corresponding to fig. 5.

Therefore, the service server can directly obtain all service objects subjected to the service according to massive service log information, determine the association degree between the service objects according to the service log information, thereby constructing a heterogeneous network map, and then perform attribute prediction processing on the heterogeneous network map through the node attribute prediction model 204 to directly obtain the abnormal (or risk) probabilities corresponding to all the service objects respectively. By adopting the batch business object identification method provided by the embodiment of the application, the identification coverage rate and the identification accuracy rate of illegal business objects can be improved.

It should be noted that the above data processing scheme can be applied to various scenes such as games, videos, commodity purchase and the like where service requests need to be initiated.

It should be understood that the above data connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may be connected through other connection manners, which is not limited herein.

It can be understood that the data processing method provided by the embodiment of the present application may be executed by a computer device, where the computer device includes, but is not limited to, the service server and the terminal device. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc.

It will be appreciated that in the specific embodiment of the present application, related data such as service requests, service information, etc. are required to obtain user permissions or agreements when the above embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of related data is required to comply with related laws and regulations and standards of related countries and regions.

Further, referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the application. Wherein the method may be performed by a computer device, such as a service server in the embodiment described above with respect to fig. 1. The following will describe an example of the method being executed by a computer device, wherein the data processing method may at least comprise the following steps S101-S104:

step S101, a sample heterogeneous network diagram aiming at a sample service object set is obtained; nodes in the sample heterogeneous network graph respectively correspond to different sample service objects in the sample service object set; the edge weight of the connecting edge between two nodes is used to characterize the degree of association of two sample business objects.

Specifically, the service object refers to an object having a service processing authority through a target application, for example, an application account. The target application may be a game application, a video editing application, a live broadcast application, a short video application, a music application, a shopping application, a novel application, a payment application, a browser, or the like, which has a service function such as asset transfer. As computer technology evolves, more and more business users tend to transfer assets in target applications through corresponding business objects, thereby completing a transaction over a network. Taking the payment application as an example, the general user 1 can communicate transaction contents with the merchant user 2 through the payment application, for example, after the general user 1 transfers 500 yuan to the merchant user 2, the merchant user 2 can transfer the article 3 to the general user 1, and after the communication is completed, the general user 1 can transfer the transaction contents to the merchant account 5 of the payment application by logging in the payment account 4 of the payment application to the merchant user 2. It can be understood that the platform side corresponding to the target application cannot guarantee the security of the seller (or the merchant) in the transaction, that is, cannot guarantee that the merchant 2 will transfer the article 3 to the general user 1 after the general user 1 transfers. If the merchant user 2 does not fulfill the commitment, the ordinary user 1 can hardly retrieve and transfer the assets to the merchant user 2, because the merchant user 2 can acquire the merchant account 5 by fake identity or purchase, rent the merchant account of other normal merchant users, and the like. Therefore, it can be understood that some illegal service objects obtained in batches by illegal service users exist in the service objects, and the illegal service users can obtain the assets of other common service users through the illegal service objects. Therefore, aiming at massive business objects, the platform side corresponding to the target application needs to be identified regularly, so that illegal business objects are determined, and wind control processing is carried out on the illegal business objects.

Specifically, in order to improve the detection coverage rate and the detection accuracy of the illegal service object, the computer device may first obtain a sample heterogeneous network diagram for the existing sample service object set. Wherein, the graph is a set of nodes and edges, and the heterogeneous network graph refers to a graph that the nodes and their neighboring nodes often belong to different categories. In the sample heterogeneous network graph for the sample service object set, the nodes refer to sample service objects in the sample service object set, and it can be understood that different nodes correspond to different sample service objects, and the attributes (or categories) corresponding to different nodes can be different, that is, the attributes corresponding to different sample service objects can be different. In the sample heterogeneous network graph for the sample service object set, the edge weight of the connecting edge between two nodes is used to represent the association degree of the two sample service objects, for example, the greater the edge weight is, the greater the association degree between the two sample service objects is.

Step S102, carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result.

Specifically, the attribute result may include an attribute prediction result and an attribute observation result; the optimized graph structure learning model can comprise a node attribute prediction layer, a node attribute observation layer and a graph estimation layer; the node attribute prediction layer may include a K-layer node attribute prediction sub-layer, where K is a positive integer. At this time, the graph structure optimization processing is performed on the sample heterogeneous network graph through the optimized graph structure learning model, so as to obtain an attribute result for the sample service object set and a feasible implementation process of estimating the isomorphic network graph generated based on the attribute result, which may be: performing attribute prediction processing on the sample heterogeneous network map through a K-layer node attribute prediction sublayer to obtain K node hidden layer characteristics and attribute prediction results aiming at a sample service object set; respectively carrying out attribute observation treatment on the sample heterogeneous network graph and K node hidden layer features through a node attribute observation layer to obtain an attribute observation result; carrying out expected maximization estimation processing on the attribute prediction result, the attribute observation result and the attribute labels aiming at the sample service object set through a graph estimation layer to obtain an estimated isomorphic adjacency matrix; and carrying out isomorphism filtering processing on the estimated isomorphism adjacent matrix according to the isomorphism threshold value to obtain an estimated isomorphism network diagram.

The node attribute prediction layer may be implemented by using a graph SAGE (neighbor node aggregation network) algorithm or a GCN (graph convolution neural network) algorithm, and is used for learning structural features of an input sample heterogeneous network graph, and then outputting classification of each node, that is, prediction probability that the node is an abnormal node. The algorithm core of graph SAGE is to optimize the sampling of the whole graph to the sampling of the current neighbor node, and the whole flow is roughly divided into 3 steps: 1. randomly sampling the nodes; 2. aggregating neighbors of the node; 3. and learning the nodes according to the aggregated information. Therefore, when the node attribute prediction layer adopts the graph SAGE algorithm, the node attribute prediction layer can generally comprise K node attribute prediction sublayers, each node attribute prediction sublayer can output the domain information of the sample heterogeneous network graph in the corresponding order, and finally, the hidden layer feature vector output by the kth node attribute prediction sublayer is input into the fully connected network, so that an attribute prediction result, namely, a classification result of each node, for example, a prediction probability value of whether the node is an abnormal node, can be obtained. The node attribute observation layer may be implemented by adopting a KNN (K-nearest neighbor) algorithm, where the principle of KNN is to determine which category x belongs to when predicting a new value x according to what category it is from the nearest K points. The node attribute observation values are used for extracting potential similarity relations among the nodes, and reducing the influence of noise on an algorithm, so that the homogeneity of the graph is improved. The graph estimator is used for eliminating noise as far as possible from the sample heterogeneous network graph according to the attribute prediction result, the attribute observation result and the attribute label aiming at the sample service object set, enhancing the potential similarity relationship among the nodes, namely converting the sample heterogeneous network graph into an estimated isomorphic adjacent matrix with high homogeneity, and carrying out isomorphic filtering processing on the estimated isomorphic adjacent matrix through an isomorphic threshold value to obtain the estimated isomorphic network graph with high homogeneity. In short, by optimizing the node attribute prediction layer, the node attribute observation layer and the graph estimation layer in the graph structure learning model, a high-heterogeneity network such as a sample heterogeneous network graph aiming at a sample service object set can be converted into an estimated isomorphic network graph with homogeneity, and illegal service objects can be detected in batches.

Specifically, estimating the isomorphic adjacency matrix includes estimating an isomorphic adjacency value Q _ij I is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; j is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; estimating isomorphic neighbor value Q _ij The method comprises the steps of representing the probability that a connecting edge exists between an ith node and a jth node in a sample heterogeneous network diagram; then isomorphic filtering processing is performed on the estimated isomorphic adjacent matrix according to the isomorphic threshold value, so as to obtain a feasible implementation process of the estimated isomorphic network diagram, which may be: traversing the estimated isomorphism adjacent matrix to obtain an estimated isomorphism adjacent value Q _ij The method comprises the steps of carrying out a first treatment on the surface of the If the isomorphic adjacent value Q is estimated _ij Less than the isomorphism threshold, then isomorphism neighbor value Q will be estimated _ij Updating the default isomorphic neighbor value; the default isomorphic neighbor value is used for representing that no connecting edge exists between an ith node and a jth node in the sample heterogeneous network diagram; when the estimated isomorphic adjacent matrix is traversed, determining an estimated isomorphic network diagram according to the updated estimated isomorphic adjacent matrix. For ease of understanding, assuming a isomorphism threshold of 0.2, the isomorphism adjacency matrix is estimated as:

wherein, the isomorphic adjacent value Q is estimated _ij For characterizing the probability of a connecting edge between an ith node and a jth node in a sample heterogeneous network graph, e.g. estimating isomorphic neighbor value Q ₁₂ 0.1, a probability of 0.1 indicating that there is a connecting edge between the 1 st node and the 2 nd node (or 1 st node and 2 nd node)The edge weight of the connecting edge between the 2 nd nodes is 0.1). The computer device may traverse the estimated isomorphic adjacency matrix, update values less than 0.2 to 0 (i.e., the default isomorphic adjacency values described above), and then update the estimated isomorphic adjacency matrix to:

at this time, in the estimated isomorphic network diagram constructed according to the updated estimated isomorphic adjacency matrix, the edge weight between the node 1 and the node 2 is 0, and it can be considered that there is no connecting edge between the node 1 and the node 2.

For ease of understanding, the optimized graph structure learning model is described by taking as an example a graph SAGE network layer (i.e. the node attribute prediction layer), a KNN network layer (i.e. the node attribute observation layer), and a graph estimator (i.e. the graph estimation layer). Referring to fig. 4, fig. 4 is a schematic structural diagram of an optimization graph structure learning model according to an embodiment of the application. As shown in fig. 4, the optimization graph structure learning model includes a graph sage network layer 41, a KNN network layer 42, and a graph estimator 43. The graphSAGE network layer 42 includes k graphSAGE sub-network layers, for example, a graphSAGE sub-network layer 41a, graphSAGE sub-network layers 41b, …, and graphSAGE sub-network layer 41k. As shown in fig. 4, each graphSAGE sub-network layer samples different nodes in the sample heterogeneous network map 44, and aggregates and learns neighboring nodes thereof, so that different graphSAGE sub-network layers can learn different domain information, and thus each graphSAGE sub-network layer can obtain a node hidden layer feature for representing the domain information, for example, the graphSAGE sub-network layer 41a can output the node hidden layer feature H (1), the graphSAGE sub-network layer 41b can output the node hidden layer feature H (2), and the …, and the graphSAGE sub-network layer 41k can output the node hidden layer feature H (k). Further, the graphSAGE network layer 41 may output a predicted value Z, which is the above-described attribute prediction result, including a probability value of whether each node is an abnormal node.

As shown in fig. 4, for the node features X and the K node hidden layer features corresponding to the sample heterogeneous network map 44, the computer device may perform attribute observation processing, that is, KNN classification processing, on the KNN network layer 42 to obtain an observed value O (0), observed values O (1), …, and observed values O (K), which are used as the observations of the best map of the sample heterogeneous network map 44, and may form an observation set o= { a, O (0), O (1), …, O (K) } in combination with the sample heterogeneous network map 44 (denoted by a).

As shown in fig. 4, after obtaining the observation set O and the observation value Z, the computer device may further obtain an attribute tag Y, where the attribute tag Y is used to represent a true classification result of each node, for example, the node a is an abnormal node, and the node B is not an abnormal node. The computer device may then send the observation set O, the predicted value Z, and the attribute tag Y to a graph estimator 43, where the graph estimator 43 may estimate a adjacency matrix Q with community structure by an Expectation Maximization (EM) algorithm, where Q _ij Representing node V _i And V _j There is a probability of edges between. Finally, the computer device filters and updates the adjacent matrix Q through the isomorphic threshold value, and then obtains the estimated isomorphic network diagram of the first round, namely an estimated diagram S (1).

Step S103, parameter adjustment is carried out on the optimized graph structure learning model according to the attribute labels aiming at the sample service object set and the attribute results, and an adjusted optimized graph structure learning model is obtained; and the estimated isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next round of training iteration when the adjusted diagram structure learning model does not meet the convergence condition.

Specifically, when the attribute result includes an attribute prediction result and an attribute observation result, and the optimization graph structure learning model may include a node attribute prediction layer, a node attribute observation layer and a graph estimation layer, parameter adjustment is performed on the optimization graph structure learning model according to an attribute tag for a sample service object set and the attribute result, so as to obtain a feasible implementation process of the adjusted optimization graph structure learning model, which may be: based on the attribute prediction result and the attribute label, carrying out parameter adjustment on the node attribute prediction layer to obtain an adjusted node attribute prediction layer; based on the attribute prediction result, the attribute observation result and the attribute label, carrying out parameter adjustment on the graph estimation layer to obtain an adjusted graph estimation layer; and determining the adjusted node attribute prediction layer, the adjusted graph estimation layer and the node attribute observation layer as an adjusted optimized graph structure learning model.

For easy understanding, please refer to the optimization graph structure learning model shown in fig. 4 again, after obtaining the predicted value Z, the computer device may perform semi-supervised learning on the graphSAGE network layer 41 according to the predicted value Z and the attribute tag, so as to adjust parameters of the graphSAGE network layer 41. Similarly, the computer device may perform parameter adjustment on the graph estimator 43 based on the attribute prediction result, the attribute observation result, and the attribute tag, to obtain an adjusted graph estimator.

Step S104, if the adjusted optimization graph structure learning model meets the convergence condition, determining a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model; the node attribute prediction model is used for performing attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set.

Specifically, when the computer device adjusts the optimization graph structure learning model, the cross entropy loss value can be determined according to the error condition between the attribute prediction result and the attribute label. If the cross entropy loss value is smaller than or equal to the cross entropy loss threshold value, the computer equipment can determine that the adjusted optimization graph structure learning model meets the convergence condition; if the cross entropy loss value is greater than the cross entropy loss threshold, the computer device may determine that the adjusted optimization graph structure learning model does not satisfy the convergence condition.

Optionally, if the adjusted optimization graph structure learning model does not meet the convergence condition, the computer device should continue to perform parameter adjustment on the adjusted optimization graph structure learning model. However, at this time, the adjusted optimization graph structure learning model (for understanding, may be referred to as a first round of iterative optimization graph structure learning model) may be subjected to parameter adjustment based on the obtained estimation isomorphic network graph (may be referred to as a first round of iterative optimization estimation isomorphic network graph, for example, the estimation graph S (1) in the embodiment corresponding to fig. 4), and because the homogeneity of the first round of iterative optimization estimation isomorphic network graph is higher than that of the sample heterogeneous network graph, the second round of iterative optimization graph structure learning model obtained by performing parameter adjustment on the first round of iterative optimization graph structure learning model through the first round of iterative optimization estimation isomorphic network graph may generate more accurate attribute prediction results and attribute observation results. It can be seen that in such iterative optimization process, learning of the optimization graph structure learning model and inference of the graph structure achieve a mutual promotion effect. And finally, a node attribute prediction layer in the optimization graph structure learning model meeting the convergence condition can well predict a target attribute prediction result of a target heterogeneous network graph aiming at the target service object set.

According to the method provided by the embodiment of the application, the computer equipment extracts similar (homogeneous) relations from the sample heterogeneous network diagram through the node attribute prediction layer, the node attribute observation layer and the diagram estimation layer in the optimized diagram structure learning model, converts the similar (homogeneous) relations into the estimated homogeneous network diagram, continuously carries out parameter adjustment on the optimized diagram structure learning model subjected to parameter adjustment based on the sample heterogeneous network diagram based on the estimated homogeneous network diagram, iterates the steps to finally obtain the estimated homogeneous network diagram which is enough homogeneous, so that the optimized diagram structure learning model meeting the convergence condition is obtained, attribute prediction processing can be directly carried out on the target heterogeneous network diagram through the node attribute prediction layer in the optimized diagram structure learning model meeting the convergence condition, the target attribute prediction result aiming at the target service object set is determined, rule matching is not needed to be carried out on each target service object, and the detection coverage rate and the detection accuracy rate of illegal service objects can be improved.

Further, referring to fig. 5, fig. 5 is a flow chart of a data processing method according to an embodiment of the application. The method is a possible embodiment of constructing a sample heterogeneous network map according to the embodiment corresponding to fig. 3, and the method may be performed by a computer device (for example, the service server 100 in the embodiment corresponding to fig. 1). The following will describe an example of the method being executed by a computer device, wherein the data processing method may at least comprise the following steps S201-S204:

Step S201, at least two pieces of history service information are obtained, and a history service object set is generated according to the history service objects contained in the two pieces of history service information.

Specifically, the historical service information refers to information related to a service generated by the computer device in the process of responding to a service request to execute a corresponding service operation, and the historical service information can include information such as service initiation time, service objects, service types and the like. For example, in a payment application, a historical business message may be a transaction stream message, which may include information such as an order identifier, an asset payment account number, an asset receiving account number, an asset amount, etc. It should be noted that, the historical business object refers to an object that needs to determine an attribute, for example, in a scenario of asset transfer of a payment application, it is generally only required to determine whether the asset receiver is a merchant account with an abnormal attribute, and at this time, all the historical business objects included in the set of historical business objects are merchant accounts.

Step S202, performing trusted filtering processing on the historical service object set to obtain a sample service object set.

Specifically, the trusted filtering process is performed on the historical service object set to obtain a feasible implementation process of the sample service object set, which may be: acquiring a trusted service object set; the set of trusted business objects includes one or more trusted business objects; the method comprises the steps of centralizing historical service objects, and determining the historical service objects which are identical to any one of one or more trusted service objects as historical service objects to be filtered; and deleting the historical service objects to be filtered from the historical service object set to obtain a sample service object set. It can be appreciated that in the business system, there may be some super business objects, i.e. business objects with public belief and large business volume, where the super business objects are trusted and do not need to predict their attributes, so that such super business objects can be written as trusted business objects into the set of trusted business objects, and then, after determining the set of historical business objects, the set of historical business objects can be trusted filtered according to the set of trusted business objects to obtain the set of sample business objects.

Step S203, determining the degree of association between every two sample service objects in the sample service object set according to the history service information associated with the sample service object set in the at least two history service information.

Specifically, the sample service object set includes a sample service object M and a sample service object N; the historical service information comprises service attribute information corresponding to a historical service object; one possible implementation procedure for determining the association degree between every two sample service objects in the sample service object set according to the history service information associated with the sample service object set in the at least two history service information may be: acquiring service attribute information associated with a sample service object M from at least two pieces of historical service information as first service attribute information; acquiring service attribute information associated with the sample service object N from at least two pieces of historical service information as second service attribute information; and determining the association degree between the sample service object M and the sample service object N according to the coincidence degree of the first service attribute information and the second service attribute information. The service attribute information may include attribute information such as identity information, contact information, address information, and the like. One possible determination manner of the coincidence degree of the first service attribute information and the second service attribute information may be: determining the total number of attribute types corresponding to the first service attribute information and the second service attribute information, determining the same attribute type number with the same attribute information in the first service attribute information and the second service attribute information, dividing the same attribute type number by the total number of attribute types, and obtaining the coincidence degree of the first service attribute information and the second service attribute information. For example, the total number of attribute types is 8, the number of identical attribute types is 2, and the degree of coincidence is 2/8=0.25.

Specifically, the sample service object set includes a sample service object M and a sample service object N; a history service information comprises a service interaction relation between a history service object and a service interaction object; one possible implementation procedure for determining the association degree between every two sample service objects in the sample service object set according to the history service information associated with the sample service object set in the at least two history service information may be: according to at least two pieces of historical service information, determining a service interaction object with a service interaction relation with a sample service object M, and taking the service interaction object with the service interaction relation with the sample service object M as a first service interaction object set; according to at least two pieces of historical service information, determining a service interaction object with a service interaction relation with a sample service object N, and taking the service interaction object with the service interaction relation with the sample service object N as a second service interaction object set; determining the business reconciliation average degree between the sample business object M and the sample business object N according to the first business interaction object set and the second business interaction object set; if the business harmony average degree is larger than or equal to the harmony average threshold, determining the association degree between the sample business object M and the sample business object N according to the business harmony average degree; if the traffic harmony average is less than the harmony average threshold, the default association is determined as the association between the sample traffic object M and the sample traffic object N.

Wherein, one possible implementation process of determining the service reconciliation average degree between the sample service object M and the sample service object N according to the first service interaction object set and the second service interaction object set may be: determining the same business interaction object in the first business interaction object set and the second business interaction object set as a common business interaction object; determining a first common object proportion according to the first service interaction object set and the common service interaction object; determining a second common object proportion according to the second service interaction object set and the common service interaction object; and determining the business reconciliation average degree between the sample business object M and the sample business object N according to the first common object proportion and the second common object proportion.

One possible implementation procedure for determining the service harmony measures between the sample service object M and the sample service object N according to the first common object proportion and the second common object proportion may be: determining a product between the first common object scale and the second common object scale as a first value; determining a sum of the first common object scale and the second common object scale as a second value; the ratio between the first value and the second value is determined as the traffic harmony measures between the sample traffic object M and the sample traffic object N.

From the above, it can be seen that the traffic reconciliation average between the sample traffic object M and the sample traffic object N can be determined by:

common business interaction object number = business interaction object number with business interactions with sample business object M and sample business object N simultaneously

First common object proportion=common business interaction object number/first business interaction object set business interaction object number

Second common object proportion=common business interaction object number/second business interaction object set business interaction object number

Traffic harmony measures = first common object proportion = second common object proportion/(first common object proportion + second common object proportion).

That is, one business interaction object may have an interaction relationship with a plurality of sample business objects, and different sample business objects may be connected through a common business interaction object, so that a sample heterogeneous network map may be constructed. For ease of understanding, please refer to fig. 6, fig. 6 is a schematic diagram of the construction of a sample heterogeneous network diagram according to an embodiment of the present application. As shown in fig. 6, the computer device first constructs a network diagram in the form of a "B-C" network, i.e. a "sample business object-business interaction object" diagram, for determining a business interaction object that is common between different sample business objects, e.g. the business interaction object that is common between the business object a and the business object B may be the business interaction object 1 and the business interaction object 3; the common business interaction objects between the business object C and the business object D may be a business interaction object 1, a business interaction object 2, and a business interaction object 4. The computer device may then construct a "B-B" network-form graph, i.e. a "sample business object-sample business object" graph, based on the common business interaction objects between the sample business objects, i.e. if there is a common business interaction object between two sample business objects, there is a connecting edge between the two sample business objects. The edge weight corresponding to the connecting edge between two sample business objects may be the business harmony average between the two sample business objects. After determining the edge weight corresponding to each edge, some connecting edges with lower edge weights can be removed, as shown in fig. 6, and assuming that the harmonic average threshold is 0.1, the connecting edges between the service object a and the service object C and the connecting edges between the service object a and the service object D can be removed, that is, the edge weights between the service object a and the service object C are updated to 0, and the edge weights between the service object a and the service object D are updated to 0.

And step S204, constructing a sample heterogeneous network diagram aiming at the sample service object set according to the sample service object set and the association degree.

By adopting the method provided by the embodiment of the application, the sample heterogeneous network graph aiming at massive sample business objects can be quickly constructed, the edge with relatively strong harmonic average degree among the nodes is reserved, the noise in the sample heterogeneous network graph can be better removed by optimizing the graph structure learning model, and the potential similarity relationship among the nodes can be learned.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus may be a computer program (including program code) running on a computer device, for example the data processing apparatus is an application software; the device can be used for executing corresponding steps in the data processing method provided by the embodiment of the application. As shown in fig. 7, the data processing apparatus 1 may include: the system comprises an acquisition module 11, an optimization module 12, an adjustment module 13 and a determination module 14.

An obtaining module 11, configured to obtain a sample heterogeneous network map for a sample service object set; nodes in the sample heterogeneous network graph respectively correspond to different sample service objects in the sample service object set; the edge weight of the connecting edge between the two nodes is used for representing the association degree of the two sample business objects;

The optimizing module 12 is configured to perform graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model, so as to obtain an attribute result for the sample service object set and an estimated isomorphic network graph generated based on the attribute result;

the adjustment module 13 is configured to perform parameter adjustment on the optimization graph structure learning model according to the attribute tag and the attribute result for the sample service object set, so as to obtain an adjusted optimization graph structure learning model; estimating the isomorphic network diagram, wherein the isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next training iteration when the adjusted diagram structure learning model does not meet the convergence condition;

the determining module 14 is configured to determine a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model if it is determined that the adjusted optimization graph structure learning model meets the convergence condition; the node attribute prediction model is used for performing attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set.

The specific functional implementation manners of the obtaining module 11, the optimizing module 12, the adjusting module 13, and the determining module 14 may refer to the descriptions of step S101 to step S104 in the corresponding embodiment of fig. 3, and are not described herein.

Wherein, the acquisition module 11 comprises: an information acquisition unit 111, an object set generation unit 112, a trusted filter unit 113, an association determination unit 114, and a graph construction unit 115.

An information acquisition unit 111 for acquiring at least two pieces of history service information;

an object set generating unit 112, configured to generate a historical service object set according to the historical service objects included in the at least two historical service information;

a trusted filtering unit 113, configured to perform trusted filtering processing on the historical service object set to obtain a sample service object set;

an association determining unit 114, configured to determine, according to the history service information associated with the sample service object set, a degree of association between every two sample service objects in the sample service object set, from among the at least two history service information;

the graph construction unit 115 is configured to construct a sample heterogeneous network graph for the sample service object set according to the sample service object set and the association degree.

The specific functional implementation manners of the information obtaining unit 111, the object set generating unit 112, the trusted filtering unit 113, the association determining unit 114, and the graph constructing unit 115 may refer to the descriptions of step S201 to step S204 in the corresponding embodiment of fig. 5, and are not repeated here.

Wherein the trusted filtering unit 113 comprises: an object determination subunit 1131 and a deletion subunit 1132.

An object determination subunit 1131, configured to obtain a trusted service object set; the set of trusted business objects includes one or more trusted business objects;

the object determining subunit 1131 is further configured to determine, as the historical service object to be filtered, the historical service object in the set of historical service objects that is the same as any one of the one or more trusted service objects;

and the deleting subunit 1132 is configured to delete the historical service object to be filtered from the historical service object set, so as to obtain a sample service object set.

The specific functional implementation manner of the object determining subunit 1131 and the deleting subunit 1132 may refer to the description of step S202 in the corresponding embodiment of fig. 5, and will not be described herein.

the association determination unit 114 includes: an information acquisition subunit 1141 and a first determination subunit 1142.

An information obtaining subunit 1141, configured to obtain, from at least two pieces of history service information, service attribute information associated with the sample service object M as first service attribute information;

The information obtaining subunit 1141 is further configured to obtain, from at least two pieces of history service information, service attribute information associated with the sample service object N, as second service attribute information;

the first determining subunit 1142 is configured to determine, according to the coincidence degree of the first service attribute information and the second service attribute information, a degree of association between the sample service object M and the sample service object N.

The specific functional implementation manner of the information obtaining subunit 1141 and the first determining subunit 1142 may refer to the description of step S203 in the corresponding embodiment of fig. 5, and will not be described herein.

the association determination unit 114 includes: an object acquisition subunit 1143, a computation subunit 1144, and a second determination subunit 1145.

An object obtaining subunit 1143, configured to determine, according to at least two pieces of historical service information, a service interaction object having a service interaction relationship with the sample service object M, and use the service interaction object having the service interaction relationship with the sample service object M as a first service interaction object set;

The object obtaining subunit 1143 is further configured to determine, according to at least two pieces of historical service information, a service interaction object having a service interaction relationship with the sample service object N, and use the service interaction object having the service interaction relationship with the sample service object N as a second service interaction object set;

a computing subunit 1144, configured to determine, according to the first service interaction object set and the second service interaction object set, a service reconciliation average degree between the sample service object M and the sample service object N;

a second determining subunit 1145, configured to determine, according to the traffic harmony average, a degree of association between the sample traffic object M and the sample traffic object N if the traffic harmony average is greater than or equal to the harmony average threshold;

the second determining subunit 1145 is further configured to determine the default association degree as the association degree between the sample service object M and the sample service object N if the service mediation average degree is less than the mediation average threshold.

The specific functional implementation manner of the object obtaining subunit 1143, the calculating subunit 1144, and the second determining subunit 1145 may refer to the description of step S203 in the corresponding embodiment of fig. 5, and will not be described herein.

The computing subunit 1144 is specifically configured to determine, as a common service interaction object, a service interaction object that is the same in the first service interaction object set and the second service interaction object set; determining a first common object proportion according to the first service interaction object set and the common service interaction object; determining a second common object proportion according to the second service interaction object set and the common service interaction object; and determining the business reconciliation average degree between the sample business object M and the sample business object N according to the first common object proportion and the second common object proportion.

The specific functional implementation of the computing subunit 1144 may refer to the description of step S203 in the corresponding embodiment of fig. 5, and will not be described herein.

Wherein the calculating subunit 1144 is specifically further configured to determine a product between the first common object ratio and the second common object ratio as the first value; determining a sum of the first common object scale and the second common object scale as a second value; the ratio between the first value and the second value is determined as the traffic harmony measures between the sample traffic object M and the sample traffic object N.

an optimization module 12 comprising: prediction unit 121, observation unit 122, estimation unit 123, and filtering unit 124.

The prediction unit 121 is configured to perform attribute prediction processing on the sample heterogeneous network map through a K-layer node attribute prediction sublayer, so as to obtain K node hidden layer features and attribute prediction results for the sample service object set;

The observation unit 122 is configured to perform attribute observation processing on the sample heterogeneous network graph and the K node hidden layer features through the node attribute observation layer, so as to obtain an attribute observation result;

an estimation unit 123, configured to perform expectation maximization estimation processing on the attribute prediction result, the attribute observation result, and the attribute label for the sample service object set through the graph estimation layer, so as to obtain an estimated isomorphic adjacency matrix;

and the filtering unit 124 is configured to perform isomorphism filtering processing on the estimated isomorphism neighboring matrix according to the isomorphism threshold value, so as to obtain an estimated isomorphism network map.

The specific functional implementation manners of the prediction unit 121, the observation unit 122, the estimation unit 123, and the filtering unit 124 may refer to the description of step S102 in the corresponding embodiment of fig. 3, and will not be described herein.

A filtering unit 124 including: update subunit 1241 and graph determination subunit 1242.

An update subunit 1241, configured to traverse the estimated isomorphic adjacency matrix and obtain an estimated isomorphic adjacency value Q _ij ；

The update subunit 1241 is further configured to, if isomorphic neighbor value Q is estimated _ij Less than the isomorphism threshold, then isomorphism neighbor value Q will be estimated _ij Updating the default isomorphic neighbor value; the default isomorphic neighbor value is used for representing that no connecting edge exists between an ith node and a jth node in the sample heterogeneous network diagram;

the diagram determining subunit 1242 is configured to determine, when traversing the estimated isomorphic adjacency matrix, an estimated isomorphic network diagram according to the updated estimated isomorphic adjacency matrix.

The specific functional implementation manner of the updating subunit 1241 and the diagram determining subunit 1242 may refer to the description of step S102 in the corresponding embodiment of fig. 3, and will not be described herein.

Wherein the adjustment module 13 comprises: a first adjustment unit 131, a second adjustment unit 132, and an adjustment determination unit 133.

The first adjusting unit 131 is configured to perform parameter adjustment on the node attribute prediction layer based on the attribute prediction result and the attribute tag, to obtain an adjusted node attribute prediction layer;

a second adjusting unit 132, configured to perform parameter adjustment on the graph estimation layer based on the attribute prediction result, the attribute observation result, and the attribute tag, to obtain an adjusted graph estimation layer;

The adjustment determining unit 133 is configured to determine the adjusted node attribute prediction layer, the adjusted graph estimation layer, and the node attribute observation layer as an adjusted optimized graph structure learning model.

The specific functional implementation manner of the first adjusting unit 131, the second adjusting unit 132, and the adjustment determining unit 133 may refer to the description of step S103 in the corresponding embodiment of fig. 3, and the description thereof will not be repeated here.

Wherein, the above-mentioned data processing apparatus 1, further include: model application module 15.

A model application module 15, configured to obtain at least two target service information associated with a target service object set;

the model application module 15 is further configured to construct a target heterogeneous network graph for the target service object set according to at least two target service information;

the model application module 15 is further configured to perform attribute prediction processing on the target heterogeneous network map through a node attribute prediction model, so as to obtain a target attribute prediction result for the target service object set;

the model application module 15 is further configured to perform service processing on the target service objects in the target service object set according to the target attribute prediction result.

The specific functional implementation of the model application module 15 may refer to the optional description of step S104 in the corresponding embodiment of fig. 3, which is not described herein.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the application. As shown in fig. 8, the data processing apparatus 1 in the embodiment corresponding to fig. 7 described above may be applied to a computer device 1000, and the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 8, an operating system, a network communication module, a user interface module, and a device control application may be included in a memory 1005, which is a type of computer-readable storage medium.

In the computer device 1000 shown in fig. 8, the network interface 1004 may provide a network communication network element; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 5, and will not be repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which the computer program executed by the aforementioned data processing apparatus 1 is stored, and the computer program includes program instructions, when executed by the aforementioned processor, can execute the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 5, and therefore, the description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Furthermore, it should be noted here that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method provided by any of the corresponding embodiments of fig. 3 and 5 above.

The terms first, second and the like in the description and in the claims and drawings of embodiments of the application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as electronic hardware, as a computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of network elements in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether these network elements are implemented in hardware or software depends on the specific application and design constraints of the solution. The skilled person may use different methods for implementing the described network elements for each specific application, but such implementation should not be considered to be beyond the scope of the present application.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A method of data processing, comprising:

carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimization graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result;

parameter adjustment is carried out on the optimization graph structure learning model according to the attribute labels aiming at the sample service object set and the attribute results, and an adjusted optimization graph structure learning model is obtained; the estimated isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next round of training iteration when the adjusted diagram structure learning model does not meet the convergence condition;

2. The method of claim 1, wherein the obtaining a sample heterogeneous network map for a sample set of business objects comprises:

acquiring at least two pieces of history service information;

generating a history service object set according to the history service objects contained in the at least two history service information;

performing trusted filtering processing on the historical service object set to obtain a sample service object set;

determining the association degree between every two sample service objects in the sample service object set according to the history service information associated with the sample service object set in the at least two history service information;

and constructing a sample heterogeneous network diagram aiming at the sample service object set according to the sample service object set and the association degree.

3. The method of claim 2, wherein said performing trusted filtering on said set of historical business objects to obtain a set of sample business objects comprises:

acquiring a trusted service object set; the set of trusted business objects includes one or more trusted business objects;

the historical service objects which are the same as any one of the one or more trusted service objects in the historical service object set are determined to be historical service objects to be filtered;

and deleting the historical service object to be filtered from the historical service object set to obtain a sample service object set.

4. The method according to claim 2, wherein the set of sample business objects comprises sample business object M and sample business object N; the historical service information comprises service attribute information corresponding to a historical service object;

the determining, according to the historical service information associated with the sample service object set in the at least two historical service information, a degree of association between every two sample service objects in the sample service object set includes:

acquiring service attribute information associated with the sample service object M from the at least two pieces of historical service information as first service attribute information;

Acquiring service attribute information associated with the sample service object N from the at least two pieces of historical service information as second service attribute information;

and determining the association degree between the sample business object M and the sample business object N according to the coincidence degree of the first business attribute information and the second business attribute information.

5. The method according to claim 2, wherein the set of sample business objects comprises sample business object M and sample business object N; a history service information comprises a service interaction relation between a history service object and a service interaction object;

according to the at least two pieces of historical service information, determining a service interaction object with a service interaction relation with the sample service object M, and taking the service interaction object with the service interaction relation with the sample service object M as a first service interaction object set;

according to the at least two pieces of historical service information, determining a service interaction object with a service interaction relation with the sample service object N, and taking the service interaction object with the service interaction relation with the sample service object N as a second service interaction object set;

Determining a business reconciliation average degree between the sample business object M and the sample business object N according to the first business interaction object set and the second business interaction object set;

if the business reconciliation average degree is greater than or equal to a reconciliation average threshold value, determining the association degree between the sample business object M and the sample business object N according to the business reconciliation average degree;

and if the business reconciliation average degree is smaller than a reconciliation average threshold value, determining a default association degree as the association degree between the sample business object M and the sample business object N.

6. The method of claim 5, wherein said determining a business reconciliation average between the sample business object M and the sample business object N from the first set of business interaction objects and the second set of business interaction objects comprises:

determining the same business interaction object in the first business interaction object set and the second business interaction object set as a common business interaction object;

determining a first common object proportion according to the first service interaction object set and the common service interaction object;

determining a second common object proportion according to the second service interaction object set and the common service interaction object;

And determining the business reconciliation average degree between the sample business object M and the sample business object N according to the first common object proportion and the second common object proportion.

7. The method of claim 6, wherein said determining a traffic harmony measure between said sample traffic object M and said sample traffic object N based on said first common object scale and said second common object scale comprises:

determining a product between the first common object scale and the second common object scale as a first value;

determining a sum between the first common object scale and the second common object scale as a second value;

and determining the ratio between the first value and the second value as the business harmony average degree between the sample business object M and the sample business object N.

8. The method of claim 1, wherein the attribute results include attribute prediction results and attribute observations; the optimized graph structure learning model comprises a node attribute prediction layer, a node attribute observation layer and a graph estimation layer; the node attribute prediction layer comprises a K-layer node attribute prediction sub-layer; k is a positive integer;

Performing graph structure optimization processing on the sample heterogeneous network graph through an optimized graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result, wherein the method comprises the following steps:

performing attribute prediction processing on the sample heterogeneous network graph through the K-layer node attribute prediction sub-layer to obtain K node hidden layer characteristics and attribute prediction results aiming at the sample service object set;

respectively carrying out attribute observation treatment on the sample heterogeneous network graph and the K node hidden layer features through the node attribute observation layer to obtain an attribute observation result;

performing expected maximization estimation processing on the attribute prediction result, the attribute observation result and the attribute labels aiming at the sample service object set through the graph estimation layer to obtain an estimation isomorphic adjacency matrix;

and carrying out isomorphic filtering processing on the estimated isomorphic adjacent matrix according to the isomorphic threshold value to obtain an estimated isomorphic network diagram.

9. The method of claim 8, wherein the estimating the isomorphic adjacency matrix comprises estimating identityConstruct neighbor value Q _ij I is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; j is a positive integer less than or equal to the total number of nodes in the sample heterogeneous network graph; the estimated isomorphic neighbor value Q _ij The probability used for representing the existence of a connecting edge between an ith node and a jth node in the sample heterogeneous network graph;

the isomorphism filtering processing is performed on the estimated isomorphism adjacent matrix according to an isomorphism threshold value to obtain an estimated isomorphism network diagram, which comprises the following steps:

traversing the estimation isomorphism adjacent matrix to obtain the estimation isomorphism adjacent value Q _ij ；

If the estimated isomorphic adjacent value Q _ij Less than the isomorphism threshold, the estimated isomorphism neighbor value Q _ij Updating the default isomorphic neighbor value; the default isomorphic adjacency value is used for representing that no connecting edge exists between an ith node and a jth node in the sample heterogeneous network diagram;

when the estimation isomorphic adjacent matrix is traversed, determining an estimation isomorphic network diagram according to the updated estimation isomorphic adjacent matrix.

10. The method of claim 8, wherein the performing parameter adjustment on the optimization graph structure learning model according to the attribute tags for the sample business object set and the attribute results to obtain an adjusted optimization graph structure learning model comprises:

based on the attribute prediction result and the attribute label, carrying out parameter adjustment on the node attribute prediction layer to obtain an adjusted node attribute prediction layer;

Based on the attribute prediction result, the attribute observation result and the attribute label, carrying out parameter adjustment on the graph estimation layer to obtain an adjusted graph estimation layer;

and determining the adjusted node attribute prediction layer, the adjusted graph estimation layer and the node attribute observation layer as an adjusted optimized graph structure learning model.

11. The method as recited in claim 1, further comprising:

acquiring at least two target business information associated with a target business object set;

constructing a target heterogeneous network diagram aiming at the target service object set according to the at least two target service information;

performing attribute prediction processing on the target heterogeneous network graph through the node attribute prediction model to obtain a target attribute prediction result aiming at a target service object set;

and respectively carrying out service processing on the target service objects in the target service object set according to the target attribute prediction result.

12. A data processing apparatus, comprising:

The optimization module is used for carrying out graph structure optimization processing on the sample heterogeneous network graph through an optimization graph structure learning model to obtain an attribute result aiming at the sample service object set and an estimated isomorphic network graph generated based on the attribute result;

the adjustment module is used for carrying out parameter adjustment on the optimization graph structure learning model according to the attribute labels aiming at the sample service object set and the attribute results to obtain an adjusted optimization graph structure learning model; the estimated isomorphic network diagram is used for continuing to be subjected to diagram structure optimization processing in the next round of training iteration when the adjusted diagram structure learning model does not meet the convergence condition;

the determining module is used for determining a node attribute prediction layer in the adjusted optimization graph structure learning model as a node attribute prediction model if the adjusted optimization graph structure learning model meets a convergence condition; the node attribute prediction model is used for performing attribute prediction processing on the target heterogeneous network graph of the target service object set to obtain a target attribute prediction result aiming at the target service object set.

13. A computer device, comprising: a processor, a memory, and a network interface;

The processor is connected to the memory, the network interface for providing data communication functions, the memory for storing program code, the processor for invoking the program code to perform the method of any of claims 1-11.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-11.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, are adapted to carry out the method of any one of claims 1-11.