CN115577283A

CN115577283A - Entity classification method and device, electronic equipment and storage medium

Info

Publication number: CN115577283A
Application number: CN202211132032.8A
Authority: CN
Inventors: 刘砺志; 王钰; 蒋海俭; 闵青
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-01-06

Abstract

The disclosure provides an entity classification method, an entity classification device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a plurality of entity relationship networks corresponding to a plurality of entity nodes; the same entity node corresponds to different entity relations in different entity relation networks; performing multi-label prediction on a plurality of entity relationship networks based on a plurality of trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network; and determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network. According to the multi-label prediction method and device, from the viewpoint of multi-view learning, multi-label prediction is carried out based on multiple entity relationship networks, relationships among entities can be more fully mined, and multi-label prediction results for entity node prediction are more accurate.

Description

Entity classification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to an entity classification method, an entity classification device, an electronic device, and a storage medium.

Background

With the development of information technology, people enter a big data era, and massive and multi-element data are sprayed every day, so that on one hand, semantic information of entity objects such as users, articles, commodities and the like is greatly enriched, and on the other hand, people are difficult to discover required pattern characteristics from huge data.

In order to systematically, standardize and finely describe and analyze the characteristics of the entity, people usually establish a set of standardized tag sets and label the entity. However, given the enormous size of data, manually labeling data alone is clearly an impossible task. Accordingly, label classification prediction techniques have been developed and widely used in many fields such as computer vision, natural language processing, biological information, and information retrieval.

The label classification prediction technology is a technology for performing label discrimination prediction on new data by summarizing the occurrence rule of known entity labels. Conventional classification problems assume that an instance is associated with only one label, i.e., single view classification prediction. However, in practice, one sample will typically have multiple labels, and conventional classification techniques are no longer applicable.

Disclosure of Invention

The embodiment of the disclosure at least provides an entity classification method, an entity classification device, an electronic device and a storage medium, so as to realize multi-view classification prediction with high prediction accuracy.

In a first aspect, an embodiment of the present disclosure provides an entity classification method, including:

acquiring a plurality of entity relationship networks corresponding to a plurality of entity nodes; the same entity node corresponds to different entity relations in different entity relation networks;

performing multi-label prediction on the entity relationship networks based on the trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network;

and determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network.

In one possible embodiment, where the multi-label prediction result comprises prediction scores for a plurality of candidate labels; determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network comprises:

for a target candidate label in the plurality of candidate labels, determining a label feature vector of each entity node for the target candidate label based on a multi-label prediction result of each entity node in each entity relationship network;

and performing sequencing learning on the label feature vectors of the plurality of entity nodes aiming at the target candidate labels based on the trained meta-learner, and determining a final multi-label prediction result corresponding to each entity node.

In one possible embodiment, the determining, based on the multi-label prediction result of each entity node in each entity relationship network, a label feature vector of each entity node for the target candidate label includes:

for a target entity relationship network in the entity relationship networks, selecting a target prediction score matched with a target candidate label from prediction scores of the entity node for the candidate labels in the target entity relationship network;

and combining the target prediction scores respectively selected from the entity relationship networks to obtain the label feature vectors of the entity nodes aiming at the target candidate labels.

In a possible implementation manner, the performing order learning on the label feature vectors of the plurality of entity nodes for the target candidate label based on the trained meta-learner to determine a final multi-label prediction result corresponding to each entity node includes:

for a target entity node in the entity nodes, inputting the label feature vector of the target entity node for the target candidate label into a trained meta-learner, and determining the sequenced prediction scores of the target entity node for the candidate labels;

and determining a multi-label prediction result corresponding to the target entity node based on the sorted prediction scores.

In a possible embodiment, the determining a multi-label prediction result corresponding to the target entity node based on the sorted prediction scores includes:

and under the condition that the sequenced prediction score is larger than a preset threshold value, determining the candidate label corresponding to the sequenced prediction score as a multi-label prediction result corresponding to the target entity node.

In one possible embodiment, the meta learner is trained as follows:

obtaining a sample training set comprising a plurality of sample characteristic vectors, wherein each dimension value of the sample characteristic vectors points to an entity node label pair;

traversing each sample feature vector in the sample training set, and determining the lambda-gradient of each sample feature vector;

constructing a regression tree based on the plurality of sample feature vectors and the lambda-gradient of each sample feature vector;

and updating the ranking score of each entity node in the meta-learner to be trained based on the corresponding relation between each leaf node of the regression tree and each entity node to obtain the trained meta-learner.

In a possible embodiment, the multi-label prediction of the entity relationship networks based on the trained multiple base learners to obtain a multi-label prediction result of each entity node in each entity relationship network includes:

and aiming at a target entity relationship network in the entity relationship networks, performing multi-label prediction on the target entity relationship network by using a target base learner corresponding to the target entity relationship network to obtain a multi-label prediction result of each entity node in the target entity relationship network.

In a possible implementation manner, in a case that the target base learner includes a full connection input layer, a representation learning module, and a full connection output layer, the performing multi-label prediction on the target entity relationship network by using the target base learner corresponding to the target entity relationship network to obtain a multi-label prediction result of each entity node in the target entity relationship network includes:

inputting the original feature vector of each entity node in the target entity network into a full-connection input layer included in the target base learner, and determining a dimension-reduced feature vector aiming at each entity node and output by the full-connection input layer; and the number of the first and second groups,

inputting the dimensionality reduction feature vector to a representation learning module included in the target base learner, and determining a hidden feature vector containing low-order signals and similar node information; and (c) a second step of,

and inputting the hidden feature vector into a full-connection output layer included by the target base learner, and determining a multi-label prediction result of each entity node in the target entity relationship network.

In one possible implementation, a target-based learner, including a fully-connected input layer, a representation learning module, and a fully-connected output layer, is trained as follows:

obtaining a sample entity relationship network, wherein part of entity nodes in the sample entity relationship network have multi-label labeling results;

performing dimensionality reduction representation on each entity node in the sample entity relationship network by using the fully-connected input layer, and determining node dimensionality reduction implicit representation of each entity node after dimensionality reduction conversion;

performing attention learning of low-order signals on the node dimension reduction implicit expression by using a graph convolution layer included in the expression learning module to obtain node attention implicit expression of each entity node after the attention learning; performing node similarity learning on the node attention implicit expression by using a conditional random field layer included by the expression learning module to obtain node similarity implicit expression of each entity node after the node similarity learning is performed;

performing multi-label prediction on the node similarity implicit representation of each entity node by using the full-connection output layer to obtain a prediction result;

and adjusting the target base learner based on the prediction result and the multi-label labeling result to obtain the trained target base learner.

In a possible implementation, in the case that multiple graph convolution layers are included, the performing, by using the graph convolution layer included in the representation learning module, attention learning of a low-order signal on the node dimension-reduced implicit representation to obtain a node attention implicit representation of each entity node after attention learning includes:

using the node dimensionality reduction implicit representation as an initial representation of each of the graph convolution layers;

for a current map convolutional layer other than the first map convolutional layer, performing the steps of:

inputting the node attention implicit representation output by the last graph convolutional layer before the current graph convolutional layer and the sample entity relationship network into the graph attention implicit representation input by the other graph convolutional layers, and determining the node attention implicit representation output by the graph attention layer;

determining a node attention implicit representation of the current graph convolutional layer output based on the initial representation, the node attention implicit representation of the graph attention layer output, and training parameters of the current graph convolutional layer.

In one possible implementation, the determining a node attention implicit representation of the current graph convolution layer output based on the initial representation, the node attention implicit representation of the graph attention layer output, and training parameters of the current graph convolution layer includes:

determining a first graph convolution operator based on a first weighted sum operation between the initial representation and a node attention implicit representation of the graph attention layer output; determining a second graph convolution operator based on a second weighted summation operation between the training parameters of the current graph convolution layer and an identity mapping matrix corresponding to the current graph convolution layer;

determining a node attention implication representation of the current graph convolution layer output based on the first graph convolution operator and the second graph convolution operator.

In a possible implementation manner, the performing node similarity learning on the node attention implicit expression by using a conditional random field layer included in the expression learning module to obtain a node similarity implicit expression of each entity node after performing node similarity learning includes:

aiming at each entity node, constructing a maximized conditional probability function corresponding to the entity node; the maximized conditional probability function is determined by a first difference between the node-likeness implicit representation of the entity node and the node-attention implicit representation of the entity node, and a second difference between the node-likeness implicit representation of the entity node and the node-likeness implicit representations of the other entity nodes in the plurality of entity nodes except the entity node;

determining a node-similarity implicit representation for each of the nodes if it is determined that the maximum conditional probability function reaches a maximum function value.

In a possible embodiment, the constructing a maximized conditional probability function corresponding to the entity node includes:

for each entity node, obtaining a first difference value between a node similarity implicit representation of the entity node and a node attention implicit representation of the entity node, a second difference value between the node similarity implicit representation of the entity node and node similarity implicit representations of other entity nodes except the entity node in the entity nodes, and a node similarity between the entity node and the other entity nodes;

performing product operation on the second difference and the node similarity to determine a product result;

performing summation operation on the product results between the entity node and each other entity node to obtain a second difference sum;

determining the maximized conditional probability function based on a third weighted sum operation between the first difference and the second difference sum.

In a possible implementation manner, the adjusting the target-based learner based on the prediction result and the multi-label labeling result to obtain a trained target-based learner includes:

acquiring a first weight parameter for adjusting the number of positive and negative samples and a second weight parameter for adjusting the contribution degree of the indistinguishable samples;

determining a target loss function value of the target-based learner based on the first weight parameter, the second weight parameter, and a difference result between the predicted result and the multi-labeled result;

and carrying out at least one round of adjustment on the training parameter values of the target base learner based on the target loss function value to obtain the trained target base learner.

In a second aspect, an embodiment of the present disclosure further provides an entity classification apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of entity relationship networks corresponding to a plurality of entity nodes; the same entity node corresponds to different entity relations in different entity relation networks;

the prediction module is used for carrying out multi-label prediction on the entity relationship networks based on the trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network;

and the classification module is used for determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the entity classification method according to the first aspect and any of its various embodiments.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the entity classification method according to the first aspect and any one of the various implementation manners thereof.

By adopting the entity classification method, the entity classification device, the electronic equipment and the storage medium, under the condition that a plurality of entity relationship networks corresponding to a plurality of entity nodes are obtained, multi-label prediction can be carried out on the plurality of entity relationship networks based on a plurality of trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network, and then a final multi-label prediction result corresponding to each entity node is determined based on the multi-label prediction result of each entity node in each entity relationship network. That is, the multi-label prediction method based on the multi-entity relationship network based on the multi-view learning can more fully mine the relationship between the entities, so that the multi-label prediction result for the entity node prediction is more accurate.

Other advantages of the present disclosure will be explained in more detail in conjunction with the following description and the accompanying drawings.

It should be understood that the above description is only an overview of the technical solutions of the present disclosure, so that the technical solutions of the present disclosure can be more clearly understood and implemented according to the contents of the specification. In order to make the aforementioned and other objects, features and advantages of the present disclosure comprehensible, specific embodiments thereof are described below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Also, like reference numerals are used to refer to like elements throughout. In the drawings:

FIG. 1 is a flow chart illustrating a method for entity classification provided by an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a specific method for training a base learner in the entity classification method provided by the embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a method for constructing an implicit representation of a specific method in an entity classification method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating an application of the entity classification method provided by the embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an entity classification apparatus provided by an embodiment of the present disclosure;

fig. 6 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the description of the embodiments of the present disclosure, it is to be understood that terms such as "including" or "having" are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof disclosed in the specification, and are not intended to preclude the presence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

Unless otherwise stated, "/" indicates an OR meaning, e.g., A/B may indicate A or B; "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of such features. In the description of the embodiments of the present disclosure, "a plurality" means two or more unless otherwise specified.

Research shows that the multi-label classification prediction problem generally has the following characteristics: (1) The entities in the real world are not independent of each other but have a relationship. For example, social relationships exist among users in an e-commerce platform, and similar social relationships generally mean similar purchasing preferences. In the multi-tag learning process, the correlation between entities is added into the model as a priori knowledge, so that the hidden distribution rule can be captured more accurately. (2) In an actual task, it may be described from many ways or perspectives for the same thing, i.e. constituting multiple views that describe the thing. For example, a video clip contains information of various modalities such as images, audio images, subtitles, etc., and a news article has information of various parts such as a title, an abstract, a body text, etc. Compared with single-view characterization, the multi-view characteristic comprises more complementary information, and objects can be described more comprehensively and more diversely, so that the learner is helped to obtain better prediction performance. (3) In a real task, the distribution of label labeling frequency is often unbalanced, and generally obeys class non-scale distribution. That is, a small part of the labels mark a large number of samples, while the rest of the labels are only used for marking a small number of samples, and the proportion of positive and negative samples of the labels is seriously unbalanced.

A series of studies have been conducted in the related art for multi-label learning. For example, the related art reference 1 provides a lazy multi-label learning algorithm based on a conventional Nearest neighbor algorithm (KNN) and a maximum posterior probability rule. The main idea is that the label set of a sample can be determined by the neighbors of the sample.

The ML-KNN algorithm can be divided into five steps:

a) Step 1: given a set of training samples for existing annotations

T＝((x ₁ ,Y ₁ ),(x ₂ ,Y ₂ ),…，(x _m ，Y _m ) Therein of

Is a label set. For each sample x in T _i (i is more than or equal to 1 and less than or equal to m), k nearest neighbors of the k nearest neighbors are calculated

b) Step 2: for a set of labels

Each label y in (1) _j (j is more than or equal to 1 and less than or equal to q), and calculating whether y appears in the label set of the sample x to be classified _j Prior probability of (d):

wherein H _j Indicating label y _j This event occurs, m being the total number of training set samples and s being the smoothing term, usually taken as 1. Then calculate the frequency array k _j And

here,. Kappa. _j [r]Denotes that the sample in the sample set is marked by the jth label y _j Marked and by y in k-neighbor of the sample _j The number of samples noted is exactly the total number of samples of r. Wherein, the first and the second end of the pipe are connected with each other,

represents a sample x _i K neighbors of (a), the jth label y _j Total number of occurrences.

c) And step 3: computing k neighbors of a sample x to be classified

d) And 4, step 4: for each tag y in the tag set _j (j is more than or equal to 1 and less than or equal to q), counting k neighbors of the sample x to be classified

Middle jth label y _j The frequency of occurrence is:

e) And 5: predicting the labeling result of the sample x to be classified:

wherein, according to Bayesian theorem, there are

While

Here, the number of the first and second electrodes,

labeled y in k neighbors representing sample x to be classified _j The number of marked samples is just C _j Under the condition(s), the sample x is also labeled with the label y _j The labeled probability.

It means that the sample x to be classified is labeled y _j Labeled case, with label y in its k neighbor _j The number of marked samples is just C _j The possibility of (a).

However, the algorithm has the following technical defects:

1) The algorithm does not explicitly exploit real entity relationship network information. In fact, based on the nearest neighbor idea, an entity relation network is constructed implicitly: each node has k adjacent nodes, and the edge weight is the Euclidean distance calculated by the characteristic vector between the nodes. However, it is obvious that the graph constructed in this way cannot accurately describe the association relationship between nodes, some nodes may be similar to only a few nodes, and some nodes may have an association relationship with many nodes, and it is not reasonable to simply define a threshold of one node degree, and the constructed k-nearest neighbor graph usually has a certain discrepancy with an existing entity relationship network in an actual problem.

2) The algorithm has poor prediction effect on rare classes under the condition of sample unbalance. Because the idea of labeling the prediction result given by the algorithm is to count the proportion labeled by the label in the k neighbor of the sample, if the frequency of the label appearing in the training sample set is very low, the sample labeled by the label is almost not in the k neighbor, and the distortion of the calculated posterior probability is larger.

3) The algorithm is based on a single-view design, i.e. simply combining all sample features into a single feature vector, and cannot be directly applied to the case of multi-view data.

In addition, reference 2 in the related art also provides a coordination-based multi-tag propagation algorithm CMLP. Compared with the classical single-tag propagation algorithm, the CMLP considers the inter-tag association into a model based on a "collaborative assumption" (collaborative assumption), and considers that the prediction result of a single tag not only comes from the own contribution of the tag, but also comes from the contributions of other tags.

Feature matrix of notational example

Where n is the number of instances and p is the feature dimension. Y belongs to { -1, +1} ^l×q The label matrix of the labeled examples is shown, wherein l (l < n) is the number of labeled examples, and q is the number of labels. Non-negative matrix W = [ W = _ij ] _n×n Is the adjacent weight matrix of the entity relation network. Let P = D ^-1/2 WD ^-1/2 Normalized propagation matrix of W, where D = diag [ D [ ] ₁ ,d ₂ ,…,d _n ]Is a diagonal matrix of the two angles,

the CMLP algorithm comprises two major steps:

a) Step 1: a tag association matrix is estimated.

Based on the "synergy hypothesis", the authors introduced a tag association matrix R = [ R = [ ] _ij ] _q×q Wherein r is _ij Reflects the magnitude of the contribution of the ith label to the jth label, and r _ii And =0. Thus, the authors assume that the final predictor should be a compromise between the original predictor f (X) and the predictor with the introduction of the correlation matrix R, i.e. the correlation matrix R

To solve the correlation matrix R, the author derives a ridge regression optimization problem starting from the above equation:

wherein, y _j And r _j Columns j of Y and R, respectively, are indicated, alpha being the degree of agreement and gamma being the regularization parameter.

b) Step 2: and obtaining a prediction result.

The output of the memory model is

Wherein

The objective function of the algorithm CMLP is defined as:

wherein, the first and the second end of the pipe are connected with each other,

for the intermediate variables introduced, Q = (1- α) 1+ α R, μ and λ are balance term parameters. f. of ⁱ Representing the ith row of the matrix F. I | · | purple wind _F Is Frobenius norm. In the thesis, optimization solution is carried out by an alternative iteration method, and the variable initial value

And Z ⁽⁰⁾ ＝Y。

First, fix Z updates F, and the recursion step with gradient descent of F is:

wherein, beta is the learning rate,

then, fix F updates Z, with a closed-form solution of Z:

wherein the content of the first and second substances,

repeating the above two steps until convergence, and converting the output result into the final prediction result

However, the algorithm has the following technical defects:

1) The entity relationship network in the algorithm is actually a k-nearest neighbor graph constructed by entity feature vectors, so that the entity information really utilized by the algorithm is only the features of the entity information, and no network exists. In some practical scenarios, the feature vector of the entity and the entity relationship network often exist at the same time, but the algorithm does not take this into account.

2) Although the algorithm explicitly utilizes the entity relationship network through the label propagation algorithm, the receptive field of the label propagation algorithm is only limited to the first-order neighborhood of the node, and wider connected information cannot be acquired. In addition, the label propagation algorithm can only capture linear graph topology structures, and cannot capture complex nonlinear structures.

3) The algorithm is designed based on the condition that an entity only has one relationship network, and cannot be applied to the scenes of multiple entity relationship networks. If the method is applied, the multi-view network is compressed into a single-view network in advance through means of weighted average, consistency integration and the like, so that potential information loss and noise introduction are caused, and the final prediction performance of the model is influenced.

Although a great deal of research is focused on entity multi-label classification algorithms, the applicant finds that, after searching a great deal of prior art materials, no one can perform multi-label prediction based on various entity relationship networks from the viewpoint of multi-view learning (multi-view learning) at present.

In order to make up for the blank, the applicant proposes to extract information contained in the entity relationship Network based on a depth map Convolutional Network (GCN), and integrate and fuse information among multiple views through a sequencing learning algorithm, so as to obtain a better prediction effect.

To facilitate understanding of the present embodiment, first, an entity classification method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the entity classification method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a computing device, or a server or other processing device. In some possible implementations, the entity classification method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of an entity classification method provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

s101: acquiring a plurality of entity relationship networks corresponding to a plurality of entity nodes; the same entity node corresponds to different entity relations in different entity relation networks;

s102: performing multi-label prediction on a plurality of entity relationship networks based on a plurality of trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network;

s103: and determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network.

In order to facilitate understanding of the entity classification method provided by the embodiments of the present disclosure, an application scenario of the method is first described in detail below. The entity classification method in the embodiment of the disclosure can be mainly applied to any application field requiring multi-tag prediction, for example, can be applied to an application scenario in which social media performs multi-tag classification on a user. The final multi-label prediction result corresponding to each entity node obtained through prediction can be multi-label prediction of whether a user is loyal or not, whether the user is active or not, topic preference, consumption preference and the like.

In the entity classification method provided by the embodiment of the disclosure, under the condition that a plurality of entity relationship networks (corresponding to a plurality of views) corresponding to a plurality of entity nodes are obtained, multi-label prediction can be performed on the plurality of entity relationship networks based on a plurality of trained base learners, so that entity relationships among the plurality of entity nodes under the plurality of views are better mined, and thus the determined multi-label prediction result is more accurate.

Before multi-label prediction is carried out by using the base learner, the entity relationship network can be modeled by using the input entity set, so that the modeled entity relationship network is more suitable for the characteristic that complex association exists among samples in the real world instead of mutual independence. Through the input entity relationship network, the embodiment of the disclosure can more easily discover the implicit similar features among the samples, and improve the accuracy of multi-label classification by utilizing the similarity.

The embodiment of the disclosure can specially perform modeling design aiming at the condition that multiple relation networks exist among entities, effectively utilizes the characteristics of complementarity and consistency commonly possessed by multiple views, makes up the problem that classification information contained in a single view is weak, and is beneficial to improving the final prediction performance of a model by utilizing the advantages of different views.

In addition, the problem of over-learning caused by stacking various data sources into high-dimensional single-view data is avoided through multi-view learning, namely, the phenomenon that data noise is also learned into the classification rule due to the fact that the model is too complex. Meanwhile, in the embodiment of the disclosure, each entity relationship network can correspond to one base learner to perform multi-label prediction, that is, the multi-view learning framework allows each view to train its own classifier, so that the characteristics of each data source can be retained.

Regardless of which base learner, the overall framework may be the same, and may include, in order, a fully-connected input layer that performs feature dimension reduction, a representation learning module that is capable of performing graph convolution operations, and a fully-connected output layer that is suitable for predictive classification, for example.

In practical application, aiming at a target entity relationship network in a plurality of entity relationship networks, a target base learner corresponding to the target entity relationship network is used for carrying out multi-label prediction on the target entity relationship network to obtain a multi-label prediction result of each entity node in the target entity relationship network.

The target entity relationship network may be any one of a plurality of entity relationship networks, may also be each one of the plurality of entity relationship networks, and may also be all the networks in the plurality of entity relationship networks, which is not specifically limited in this disclosure. Considering that different entity relationship networks correspond to node features capable of representing different views, the target entity relationship network can be each network, and thus multi-label prediction can be performed on each network by using a correspondingly trained base learner, so that a multi-label prediction result of each entity node in each entity relationship network is obtained.

In the process of actually performing multi-label prediction on a target entity relationship network, the method can be specifically realized by the following steps:

inputting an original feature vector of each entity node in a target entity network into a full-connection input layer included by a target base learner, and determining a dimension reduction feature vector aiming at each entity node output by the full-connection input layer;

inputting the dimension reduction characteristic vector into a representation learning module included by a target base learner, and determining a hidden characteristic vector containing low-order signals and similar node information;

and step three, inputting the hidden feature vectors into a full-connection output layer included by the target base learner, and determining a multi-label prediction result of each entity node in the target entity relationship network.

Here, the fully-connected input layer included by the base learner can be used for extracting the dimensionality reduction feature vector for each node, so that the hidden feature vector can be better extracted through the representation learning module, and further, multi-label prediction can be realized through the fully-connected output layer.

The representation learning module in the multi-label prediction scheme provided by the embodiment of the disclosure solves the problem that the receptive field of the existing algorithm (such as comparison technology 1 and comparison file 2) is only limited to the first-order field of the node, and sufficiently fuses the low-order and high-order topological information in the network through the multiple superposition of graph convolution layers, so as to dig out the potential node similarity characteristics. By means of the constructed deeper network structure, a wider receptive field can be obtained, and more shallow features are combined to represent more complex abstract features. With the benefit of the deep architecture, even in the face of a large-scale entity relationship network, the embodiment of the present disclosure can have a relatively global understanding of the whole network structure.

It should be noted that the node and the entity node may be referred to as the same expression.

This is primarily a consideration in that, in general, there may also be a close interaction between two indirectly adjacent nodes in a social network. For example, users a and B often share posts, and users a and C also often share posts. Although user B and user C do not know each other, it can be inferred that user B and user C have similar preferences for posting. In this example, user B and user C are not directly adjacent (1-hop neighbor), but pass through a secondary neighbor node (2-hop neighbor) of user A. Therefore, the representation learning module has more practical significance by considering the high-order network topology to help discover the potential entity similarity characteristics.

Considering the key role of the training of the base learner in realizing multi-label prediction, the following description of the training process will be exemplified by the training of the target base learner, and specifically includes the following steps:

step one, acquiring a sample entity relationship network, wherein part of entity nodes in the sample entity relationship network have multi-label labeling results;

step two, performing dimensionality reduction representation on each entity node in the sample entity relationship network by using a full-connection input layer, and determining node dimensionality reduction implicit representation of each entity node after dimensionality reduction conversion;

thirdly, performing attention learning of low-order signals on the node dimension reduction implicit representation by using a graph convolution layer included in the representation learning module to obtain node attention implicit representation of each entity node after the attention learning is performed; performing node similarity learning on the node attention implicit expression by using a condition random field layer included by the expression learning module to obtain node similarity implicit expression of each entity node after the node similarity learning is performed;

step four, performing multi-label prediction on the node similarity implicit expression of each entity node by utilizing a full-connection output layer to obtain a prediction result;

and step five, adjusting the target base learner based on the prediction result and the multi-label labeling result to obtain the trained target base learner.

Considering a simple stacking of a large number of graph convolution modules, the low-order topology information is completely attenuated when reaching the deep graph convolution layer, so that the finally generated node embedding representation is weak in low-order topology signals. In practical applications, although there may be a relatively close interaction between some second-order adjacent nodes in the user's social network, generally speaking, there is a relatively low probability that a close relationship exists between a pair of higher-order nodes. Therefore, the contribution degree of the lower-order topology information to the final prediction result is higher than that of the higher-order topology information.

Based on this, the disclosed embodiments use two solutions: (1) Based on the idea of 'jump connection', introducing initial representation of nodes into each convolution layer, and adding identity mapping in a conversion matrix to enable the initial representation of the nodes and low-order topology information to be transmitted to a deeper hidden layer; (2) And a conditional random field layer is added behind each graph convolution module to ensure that the similarity information between adjacent nodes is kept, so that the loss of the similarity information caused by repeated graph convolution transformation is avoided. Through the two solving strategies, the base learner in the embodiment of the disclosure can effectively alleviate the problem of over-smoothness, and reasonably balance the contribution degree of the low-order and high-order topological information to the node embedding representation, so that a more excellent multi-label prediction effect compared with an original graph convolution module is obtained.

In particular, the above-mentioned determination process regarding the implicit representation of node attention may be determined by the following steps:

step one, taking the node dimensionality reduction implicit representation as the initial representation of each graph convolution layer;

step two, aiming at the current graph convolution layer except the first graph convolution layer, executing the following steps:

(1) Implicitly representing the node attention of the output of the last graph convolution layer before the current graph convolution layer and the sample entity relationship network into the graph attention layers input into other graph convolution layers, and determining the node attention implicit representation of the output of the graph attention layers;

(2) A node attention implicit representation of the current graph convolution layer output is determined based on the initial representation, the node attention implicit representation of the graph attention layer output, and training parameters of the current graph convolution layer.

Furthermore, the above determination process regarding the node-like implicit representation may be determined by the following steps:

step one, aiming at each entity node, constructing a maximized conditional probability function of the corresponding entity node; the maximized conditional probability function is determined by a first difference between the node-likeness implicit representation of the entity node and the node-attention implicit representation of the entity node, and a second difference between the node-likeness implicit representation of the entity node and node-likeness implicit representations of other entity nodes than the entity node among the plurality of entity nodes;

and step two, under the condition that the maximum conditional probability function is determined to reach the maximum function value, determining the node similarity implicit expression of each node.

It is known that the basis learner performs feature dimension reduction through a fully connected input layer with entity label vectors as input. Then, the feature representation after the preliminary processing is sent to a representation learning module to be processed in the entity relationship network A ^(k) The convolution of the upper run graph is used for message transmission and is integrated into the networkTopology information and using conditional random fields to adjust the node representation to preserve local similarity features, thereby generating a high quality entity node-embedded representation (i.e., a node-similarity implicit representation).

The embedded representations are refined through a multi-layer perceptron which is fully connected with an output layer to obtain a final embedded representation, and a likelihood prediction result of whether the association exists between the nodes and the labels is given according to the final embedded representation, so that the accuracy is higher.

To facilitate further understanding of the training process of the base learner, the following description is provided with reference to fig. 2 for the overall flow of the base learner.

Because the original feature vectors of the entity nodes are often high-dimensional sparse, the calculation pressure is increased sharply when the original feature vectors are directly sent into a graph convolution network without processing. For this purpose, the first step of the base learner is to perform dimensionality reduction on the original feature vectors of the entities through a fully connected input layer, so that the original feature vectors are mapped into a low-dimensional dense vector.

Specifically, the conversion function of the fully-connected input layer in the embodiment of the present disclosure is defined as

Wherein x is _i For the ith entity p _i The original feature vector of (a) is,

and the node implicit representation after the dimensionality reduction conversion is carried out through the fully-connected input layer of the kth base learner is represented. W ^(k)[0] As a matrix of weight parameters, b ^(k)[0] Is a bias parameter. σ (-) is the activation function, and the leaky linear rectification function LeakyReLU, defined as LeakyReLU, is selected in the disclosed embodiment

Where ρ ∈ (0,1) is a set constant, representing the slope of the third quadrant ray in the activation function.

As shown in FIG. 2, the representation learning module as the core component of the base learner can be divided into two large sub-modules, namely, a graph volume module and a conditional random field module. First, in order to incorporate topology information in an entity-relationship network into an embedded representation of a node, a graph-convolution network technique is used in the embodiment of the present disclosure. The graph volume network takes entity node embedding representation obtained by a previous layer of network as input, and performs message passing (message passing) operation on an entity relation network to generate a primary implicit representation. And then, in order to retain the similarity information in the direct neighborhood, carrying out secondary processing on the primary implicit representation through a conditional random field to obtain a new node embedded representation. Such a representation learning module can stack multiple layers (total number of layers is denoted as L) with the output of the previous layer as the input of the next layer, so that the low-order and high-order topologies in the network can be fully captured.

The graph volume module in the embodiment of the present disclosure is mainly divided into four parts: graph convolution, batch Normalization (BN), activation function, and random deactivation (Dropout).

The graph volume layer operates on the entity relationship network, receives the embedded representation of the entity node obtained in the previous step as input, introduces the topological structure information of the entity relationship network through a conversion function, and outputs the embedded representation of a new entity node.

The graph convolution layer here contains two operations: firstly, message transmission is carried out on an entity relationship network through a Graph AttenTion layer (GAT); in the second step, the result generated by the attention layer of the previous graph is reprocessed through the initial representation and the identity mapping to retain the low-order signal in the node representation, and the specific implementation of the node attention implicit representation related to the output of the current graph convolutional layer can be referred to fig. 3.

First, unlike the native graph convolution operation, which assigns all the adjacent nodes with the same weight for simple and direct summation, in the embodiment of the present disclosure, a dynamic attention mechanism (dynamic attention) is adopted to adaptively learn the weights of the adjacent nodes, so that the contribution degree of different neighbors to the central node information is high or low, and thus, the central node information is lowThe interference of noise to model learning is relieved, and the robustness of the model is enhanced. With a physical node p _i For example, the transfer function of the l-th layer is defined as:

wherein the attention coefficient alpha _i，j The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

is an entity p _i The embedded representation generated by the learning module is represented at the l-1 st position, and

i.e. the output representation obtained by dimension reduction of the fully connected input layer.

Representing nodes p in a physical relational network _i Set of related contiguous nodes (but not including p) _i Itself). Theta ^(k)[l] And the weight parameter matrix to be learned of the attention layer of the ith layer graph is obtained. a is ^(k)[l] Is a self-attention parameter vector to be learned and is used for calculating an attention coefficient. The symbol | | | represents a vector stitching operation.

The resulting prediction accuracy is compromised by the fact that the map convolution layer that is stacked too deeply can create an over-smoothing phenomenon. For this reason, the initial representation of nodes and the identity mapping technique are introduced after the attention level of the graph in the embodiment of the disclosure, so as to strengthen the lower-layer network signals to be fully preserved. In particular, with entity node p _i For example, the transfer function of the l-th layer is defined as:

the output representation obtained for the dimension reduction of the fully connected input layer is the initial representation of the node. I is the identity mapping matrix. Hyperparameter alpha ^(k)[l] Controlling the strength of the initial residual connection (corresponding to the first weighted sum operation), β ^(k)[l] The strength of the identity map (corresponding to the second weighted sum operation) is controlled. In general,. Beta. ^(k)[l] By the following formula:

and calculating, wherein theta is a hyper-parameter, and l represents that the current model is in the layer number. The formula shows that as l increases, beta ^(k)[l] Will gradually decrease and the role of identity mapping in convolution becomes more prominent. Thus, the deeper the depth, the less high-order topology information is newly introduced, and more low-order topology information is retained.

Considering that in a deep neural network, the distribution of each layer of input changes with the parameter of the previous layer during training, this phenomenon is called internal covariate shift (internal covariate shift). For the hidden layer output of the neural network, after various transformation operations in the layer, the distribution of the hidden layer output will be different from that of the input signal, and the difference will be aggravated as the depth of the network increases. This results in the convergence speed of the deep networks becoming slower and harder to train. In the embodiment of the disclosure, the input signal is normalized by using batch normalization in each round of random gradient descent of training, so that the mean value of each dimension of the output signal is 0 and the variance is 1, thereby alleviating the internal covariate migration phenomenon, alleviating the gradient disappearance problem to a certain extent, and enabling the deep network training to be faster and more stable.

The batch normalization includes four steps. Given a batch of input signals, i.e. the output of the aforementioned convolutional layer

Their empirical means and variances are respectively:

the input signal is then normalized by recentering and rescaling:

wherein epsilon is an arbitrarily small constant in order to ensure numerical stability. Finally, carrying out scale transformation and offset on the normalized result to obtain a batch normalized final conversion result:

wherein, γ ^(k)[l] And delta ^(k)[l] Are all parameters to be learned.

The third step of the graph convolution module is to apply a non-linear activation function to the mapping results of the neurons. The LeakyReLU function may be used in the disclosed embodiment, and the output result is then:

in addition, random inactivation is used as a common regularization method, and the overfitting problem of the model can be effectively relieved. The core idea of the method is that in the training process, the neuron stops working with a given probability and does not participate in the following forward propagation and backward propagation, so that the generalization capability of the model is improved.

In the training phase of the base learner, the formula for the random deactivation is:

wherein r is ^(k)[l] A mask vector is represented in which each element is a random variable that follows a bernoulli distribution with probability p, i.e. the neuron has a probability p to be preserved. Symbol

Representing an element-wise multiplication of a vector.

In the model test phase, then no random inactivation was used and all neurons were active, i.e.:

after performing a series of operations of the graph convolution module, feature learning of similar nodes may be performed using the conditional random field module. This is mainly considered that although graph convolution operation can encode connectivity information into the implicit representation of the node, with the increase of the number of nested layers, the similarity characteristics between nodes in the first-order neighborhood will be diluted, so that the finally obtained implicit representation of the node cannot accurately represent the actual context relationship of the node in the network. To this end, conditional random fields are used in the disclosed embodiments to preserve similarity relationships in the implicit representation of nodes.

Formally, the node implicit representation produced by a given graph convolution module

Conditional random field model intent to pass through maximizing conditional probabilities

To predict node representations containing similarity information

Wherein Z (. Cndot.) represents the balancing function (functioning) serving as the normalization factor, and E (. Cndot.) represents the energy function (functioning). For the energy function, it contains two parts: a unary energy function and a binary energy function. In the disclosed embodiment, a node p is defined _i The energy function of (a) is:

wherein the content of the first and second substances,

and psi ^(k)[l] More than 0 is a hyperparameter (corresponding to a third weighted summation operation) for balancing the energy functions of the two parts;

for the purpose of representing the first difference value,

for indicating the second difference,

Representation node p _i And p _j Inter-similarity (i.e., node similarity). Obviously, the energy function ensures that the difference between the node representation generated by the conditional random field processing and the input original node representation is limited on one hand, and enables the node representation generated by the conditional random field processing to retain the similarity characteristic of the graph on the other hand.

In the embodiment of the disclosure, the conditional random field is solved by inference through mean-field approximation method. The solving process includes T (k) l]Round iterations where the T (1. Ltoreq. T. Ltoreq.T) ^(k)[l] ) The calculation formula of the round iteration is as follows:

wherein the node p _i And p _j Similarity between them

Can be calculated by a gaussian function:

in the formula, σ ^(k)[l] Are parameters to be learned. Through repeated iteration, the node p is finally obtained _i Is implicitly expressed as:

after repeated iteration of the L-layer representation learning module, the node p which fully concentrates the network topology structure and similarity information is obtained _i Is implicitly represented by

Using this representation, node p can be finally generated by a fully connected output layer _i Multi-label prediction results of (2):

wherein φ (-) is a sigmoid function, i.e.

e is the base of the natural logarithm. W ^(k)[L+1] Is a weight matrix of the parameters to be learned.

The j (th) element of (1)

Representation node p _i And a label t _j The predicted likelihood score.

On an actual multi-label classification dataset, a plurality of labels are only used for labeling a small number of samples, and the prediction effect of the final model on low-frequency labels is not ideal by using a traditional cross entropy loss function. In order to alleviate the imbalance of the positive and negative samples, in the embodiment of the present disclosure, the adjustment of the training parameter may be performed in combination with a first weight parameter for adjusting the number of the positive and negative samples and a second weight parameter for adjusting the contribution degree of the indistinguishable samples.

In practical applications, a Focal loss function may be used. The loss function dynamically adjusts the contribution degree of the samples which are easy to be divided and the samples which are difficult to be divided to the loss function by balancing the weight between the positive sample and the negative sample, so that the model training process is more concerned with the samples which are difficult to be divided, and a better training effect is obtained.

In particular, the objective function is defined as

Wherein, y _i，j Representing the true label values on the training data set (corresponding to the multi-label labeling results),

it is the predicted value (corresponding to the predicted result) of the base learner. The parameter α (corresponding to the first weight parameter) adjusts the weight between the positive and negative samples, and the parameter γ (corresponding to the second weight parameter) adjusts the weight of the indistinguishable samples.

It can be known that, in the embodiment of the present disclosure, it is proposed to train the base learner by using the Focal loss function, and by giving a larger weight to the sample difficult to distinguish, the influence of the sample difficult to distinguish in the loss function is enlarged, so as to promote the loss function to focus the emphasis on the sample difficult to distinguish, thereby effectively alleviating the adverse effect of the high sparsity of the positive sample and the extreme label imbalance phenomenon on the model training, solving the problem of the existing method (such as the comparison technique 1) that the influence of the positive sample is too small, which causes the deviation of the decision interface to the negative sample, which makes the prediction accuracy of the model poor, and particularly obtaining a better prediction performance on the low-frequency label.

In addition, in the embodiments of the present disclosure, an Adam optimizer may be used to optimize the objective function. The Adam optimizer is used as a random gradient descent algorithm based on self-adaptive first-order and second-order momentum estimation, and can be well applied to the optimization problem of a large-scale neural network due to the high-efficiency calculation speed and the extremely low memory consumption. Parameters such as learning rate, exponential decay rate and the like used for first-order and second-order momentum estimation are set in advance, the objective function value is continuously reduced through a gradient descent algorithm, and model parameters are updated through back propagation optimization until the objective function value is converged and then training is terminated.

Here, in the case that each base learner is used to obtain a corresponding multi-label prediction result, the final label prediction result may be determined by combining a ranking learning method, that is, in the embodiment of the present disclosure, a ranking learning algorithm may be used as a meta-learner (meta-leaner) to integrate preliminary prediction results respectively obtained by the base learners on each view, so as to improve the generalization performance of prediction. The method can be realized by the following steps:

step one, aiming at a target candidate label in a plurality of candidate labels, determining a label feature vector of each entity node aiming at the target candidate label based on a multi-label prediction result of each entity node in each entity relationship network;

and step two, sequencing and learning the label characteristic vectors of the plurality of entity nodes aiming at the target candidate labels based on the trained meta-learner, and determining a final multi-label prediction result corresponding to each entity node.

Similar to the target entity relationship network, the target candidate tag may also be one candidate tag of a plurality of candidate tags, or each candidate tag, or a part of candidate tags, or each candidate tag. In this way, a label feature vector for each candidate label may be determined for each entity node, that is, each entity node label pair may correspond to one label feature vector, and the vector value corresponds to the prediction score of the basis learner.

And taking the determined label feature vectors of each entity node label pair as the input of the meta-learner, determining the sequenced predicted values of a plurality of candidate labels corresponding to each entity node, and then determining the multi-label prediction results corresponding to the entity nodes through screening of a preset threshold.

The following may focus on the training process of the meta-learner, including the following steps:

the method comprises the following steps of firstly, obtaining a sample training set comprising a plurality of sample characteristic vectors, wherein each dimension value of each sample characteristic vector points to an entity node label pair;

thirdly, constructing a regression tree based on the multiple sample feature vectors and the lambda-gradient of each sample feature vector;

and step four, updating the ranking score of each entity node in the meta-learner to be trained based on the corresponding relation between each leaf node and each entity node of the regression tree to obtain the trained meta-learner.

Here, the sample feature vector of the meta learner may be first constructed from the prediction scores of the base learner. Memory entity relationship network A ^(k) Is marked with a label t _j The training base learning device is

Then its predicted entity p _i Is marked with a label t _j Has a likelihood score of

For each labelt _j And splicing the prediction scores obtained by the v kinds of base learners to form a score string as an input feature vector of the sequencing learning model (namely, the meta learner). In particular, for entity p _i And a label t _j For, its feature vector is constructed as

Wherein each score

The final prediction result can then be given by the rank learning model. The Rank Learning (LTR) is a machine Learning technique widely used in search engines and recommendation systems, and is a supervised Learning algorithm capable of ranking a candidate document set so that relevant documents are ranked higher, given a query keyword.

In practical applications, lamb dammar can be used for rank learning, and the core of the rank learning method is to convert the rank problem into a pairwise (pair) regression problem, i.e. a given query entity p _i For the label t _j And t _k Lambdamart predicts t _j Ratio t _k A more relevant likelihood score, a higher value being indicative of a comparison to t _k ，p _i The more likely to be t _j As noted.

The LambdaMART algorithm integrates the advantages of a Multiple Additive Regression Tree (MART) and a neural network-based pairwise-ranking learning model LambdaRank.

Specifically, based on MART's idea, lambdaMART is essentially a gradient-boosted decision tree (GBDT) whose final output is a linear combination of a set of regression tree outputs. And, lamb dammar replaces the gradient computation part in the gradient lifting tree for the gradient in lamb rank, thereby making MART suitable for the sorting task.

The flow of the Lambdamart training algorithm is shown as follows. For simplicity, the following note entity p _i And a label t _j The feature vector corresponding to the formed pair is x _c ＝x′ _i，j The corresponding label y _c ＝y _i，j Training the size of the data set by this meta-learner

And when the initial state is trained, the predicted values of all samples are assigned to be 0. After initialization is completed, the ordering learning method can generate T trees in an iterative mode so as to fit residual errors continuously and improve final prediction performance. Each iteration round contains four parts.

The first part is to traverse the training set, computing the λ -gradient and its derivatives for each sample. Specifically, first for sample x _i And x _j Definition of

Wherein s is _i ＝F _t-1 (x _i ) And s _j ＝F _t-1 (x _j ) The representation model is now given for sample x _i And x _j The predicted result of (2). | Δ NDCG _ij Is interchange of x _i And x _j After the position is sorted, the change condition of Normalized Divided Cumulative Gain (NDCG) of the index is evaluated. Let x be _i And x _j The ranking positions in the prediction result are r respectively _i And r _j Then, then

Wherein IDCG (idea DCG) represents a DCG value in perfect order, or a DCG value calculated by arranging samples first with positive samples and then with negative samples:

here, the first and second liquid crystal display panels are,

representing X in perfect alignment _i The rank position of (c).

Then, for sample x _i Accumulate λ of all other samples _ij Value to obtain

Wherein, I is a set of all { I, j } index pairs (I < j), i.e., I = { {1,2}, {1,3}, {1,4}, …, {2,3}, {2,4}, … }. For convenience, it can be abbreviated

Wherein, remember

Then, will λ _i To s _i Derivative to obtain

The second part, building a regression tree based on the feature vectors of the samples, fits the lambda-gradient. The heuristic node splitting criterion chosen here is the Minimum Square Error (MSE).

In particular, for data sets

Wherein the feature vector x _c ＝(x _c，1 ，x _c，2 ，…，x _c，v )，x _c，j Represents a j-th dimension feature value, z _c I.e. sample x _c Lambda-gradient lambda of _c . Now, it is possible to exhaustively enumerate all values τ on all features j (j is greater than or equal to 1 and less than or equal to v) and find the one-dimensional feature j that minimizes the sum ^* And the value τ thereon ^* ：

Wherein, satisfy

All of the samples of (a) fall into the left sub-tree L,

all samples of (D) fall into the right subtree R, μ _L And mu _R Representing the mean of all sample target values falling within the left and right subtrees, respectively. Thus, a simplest binary regression tree with a root node and two leaf nodes is obtained. Then, the leaf nodes are continuously segmented according to the optimal threshold value, and the segmentation is repeated for L-1 times, so that a regression tree containing L leaf nodes can be obtained.

For the l leaf node on the t tree, R _lt Representing the set of all samples that fall into this leaf node.

Here, for the l-th leaf node on the t-th tree, the formula for calculating the multiplier value thereon is:

the last part is to add the regression tree learned in the current round to the existing sequencing learning model, and update the score of each sample:

wherein [ [ x ] _c ∈R _lt ]]Indicating sampleThis x _c Whether or not it falls on the leaf node R _lt If so, the value is 1, otherwise, the value is 0. The parameter η is the set learning rate. Practice shows that the generalization performance of the model is greatly improved by configuring a smaller learning rate (generally, σ < 0.1) compared with the learning rate not set (i.e., η = 1). This regularization is referred to as "attenuation" (shrinkage).

By this time, we have completed a round of iteration to construct a regression tree.

In the embodiment of the present disclosure, the final ranking learning model can be obtained by repeatedly completing the above four steps of operations

Wherein T is the set iteration number.

Based on the above description, it can be seen that the problem to be solved by the embodiments of the present disclosure is to perform multi-label classification prediction on unlabeled entities given a plurality of entity relationship networks and known labels of part of the entities. The prediction algorithm provided by the embodiment of the disclosure is divided into two stages, based on a Stacking strategy in ensemble learning, low-order and high-order topology information in an entity relationship network is fully mined and captured through a base-learner (base-leaner) taking a depth map neural network as a main frame, and preliminary prediction results respectively obtained by the base-learner on each view are integrated through a ranking learning algorithm as a meta-learner (meta-leaner), so that the generalization performance of the model is improved. To facilitate further understanding of the logic of the above two stages, a detailed description may be made in conjunction with fig. 4.

Formally is provided with

Is a label set, where n is the total number of labels. In a set of m entities

In the middle, the first entity has labels and the ith entity is markedp _i Is marked by y _i ＝(y _i，1 ，y _i，2 ，…，y _i，n ) Wherein y is _i，j E {0,1} denotes p _i Whether or not it is tagged with a tag t _j If the label is marked, the value is 1, otherwise, the value is 0; inscription entity p _i (i =1, …, m) has a feature vector of

In addition, there are v different relationship networks between these entities

Relationships between entities are described from different perspectives,

wherein the (i, j) th element in the adjacency matrix of the interaction relationship network on the kth view

Representing an entity p _i And p _j If the correlation exists, the value is 1, otherwise, the value is 0.

Moreover, the entity relation network involved in the embodiment of the disclosure belongs to the undirected and weightless graph and satisfies

The problem to be solved by the scheme is to utilize a known label set

And multi-view entity relationship networks

Labeling of predicted residual u = m-l entities

Given data set

Wherein

In order to have the set of entities already labeled,

is an unlabeled entity set. In the process of model training, firstly, the model is trained

Is divided into

And

two parts of which

The method is used for training the base learning machine,

the method is used for training the meta-learner.

The labeled entity set to be predicted.

As can be appreciated from fig. 4, unlike most current algorithms (e.g., compare techniques 1 and 2) that extract information from only a single relationship network, embodiments of the present disclosure employ the concept of "stacking" in an ensemble learning technique to integrate preliminary information captured from relationship networks between various entities through a ranking learning model. The primary prediction results of all the base learners are integrated through the meta-learner, valuable view information can be screened out through the model in a targeted manner, more excellent generalization performance than that of a single learner is obtained, and a final prediction result which contains more comprehensive information and has performance superior to that of the base learner is obtained. This is mainly considered that in a real-world scenario such as social media, users do not exist independently, but have a certain social relationship, and can be modeled as a social network. Since users have a variety of behaviors, there is often more than one social network among users. For example, the behaviors of attention, sharing, praise, comment, mention and the like among users can construct a corresponding relationship network. Each network reflects the closeness of social relationships among users from one side, and modeling from one relationship network alone causes a bias in portrayal.

In addition, the integrated learning framework is adopted to respectively train the base classifiers for the multiple views, and then the multiple views are integrated through the sequencing learning algorithm, so that the purposes of weak independent learning and strong integrated learning of the views are achieved, and the final prediction performance of the model is superior to that of any single-view base learner.

The entity classification method provided by the embodiment of the disclosure can be applied to various application fields. Taking a social media application as an example, an operator generally constructs a set of tag systems for users to describe the users. For example, the user tags may include loyalty, activity, topic preferences, consumption preferences, and the like.

Thus, the application scene of multi-label classification of one user is built. In addition, since there are many behaviors of users, there is often not only one kind of social network among users. For example, the behaviors of attention, sharing, praise, comment, mention and the like among users can construct a corresponding relationship network. This, in turn, builds up a multi-view inter-user social relationship network. The task of label prediction for users in social media can then be modeled as a multi-view entity multi-label classification problem as proposed by embodiments of the present disclosure.

Firstly, constructing a feature vector characterizing a user according to fact information such as basic attributes, posting information, historical browsing information, collection information and the like of the user. And constructing a corresponding social relationship network according to the behaviors of attention, sharing, praise, comment, mention and the like among the users. In addition, part of the users are selected to carry out multi-label labeling in a manual labeling mode, and therefore a standard data set is constructed.

And secondly, inputting the constructed social relationship network of the user, the characteristic vector of the user and the manual label into a base learner based on the graph convolution network provided by the embodiment of the disclosure, and respectively obtaining a preliminary label prediction result of the user which is not labeled under each view.

And finally, the preliminary prediction scores generated by the base learners in the previous step are spliced and input into the meta-learner based on the sequencing learning algorithm provided by the embodiment of the disclosure to obtain the final labeling prediction result of the user.

In the description of the present specification, reference to a description of the term "some possible embodiments," "some embodiments," "examples," "specific examples," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the various embodiments or examples and features of the various embodiments or examples described in this specification can be combined and combined by those skilled in the art without contradiction.

With regard to the method flow diagrams of the disclosed embodiments, certain operations are described as different steps performed in a certain order. Such flow diagrams are illustrative and not restrictive. Certain steps described herein may be grouped together and performed in a single operation, certain steps may be separated into sub-steps, and certain steps may be performed in an order different than presented herein. The various steps shown in the flowcharts may be implemented in any way by any circuit structure and/or tangible mechanism (e.g., by software running on a computer device, hardware (e.g., logical functions implemented by a processor or chip), etc., and/or any combination thereof).

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an entity classification device corresponding to the entity classification method is also provided in the embodiments of the present disclosure, and as the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the entity classification method described above in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repeated parts are not described again.

Referring to fig. 5, a schematic diagram of an entity classification apparatus provided in an embodiment of the present disclosure is shown, the apparatus including: an acquisition module 501, a prediction module 502 and a classification module 503; wherein the content of the first and second substances,

an obtaining module 501, configured to obtain multiple entity relationship networks corresponding to multiple entity nodes; the same entity node corresponds to different entity relations in different entity relation networks;

the prediction module 502 is configured to perform multi-label prediction on multiple entity relationship networks based on the trained multiple base learners to obtain a multi-label prediction result of each entity node in each entity relationship network;

a classifying module 503, configured to determine a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network.

By adopting the entity classification device, under the condition that a plurality of entity relationship networks corresponding to a plurality of entity nodes are obtained, multi-label prediction can be carried out on the plurality of entity relationship networks based on a plurality of trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network, and then a final multi-label prediction result corresponding to each entity node is determined based on the multi-label prediction result of each entity node in each entity relationship network. That is, the multi-label prediction method based on the multi-entity relationship network based on the multi-view learning can more fully mine the relationship between the entities, so that the multi-label prediction result for the entity node prediction is more accurate.

In one possible embodiment, where the multi-label prediction result includes prediction scores for a plurality of candidate labels; a classifying module 503, configured to determine a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network according to the following steps:

In one possible implementation, the classification module 503 is configured to determine a label feature vector of each entity node for the target candidate label based on a multi-label prediction result of each entity node in each entity relationship network according to the following steps:

aiming at a target entity relationship network in a plurality of entity relationship networks, selecting a target prediction score matched with a target candidate label from the prediction scores of entity nodes aiming at a plurality of candidate labels in the target entity relationship network;

In a possible implementation manner, the classification module 503 is configured to perform rank learning on the label feature vectors of the target candidate labels of the multiple entity nodes based on the trained meta-learner according to the following steps, and determine a final multi-label prediction result corresponding to each entity node:

aiming at a target entity node in a plurality of entity nodes, inputting the label feature vector of the target entity node aiming at a target candidate label into a trained meta-learner, and determining the sequenced prediction score of the target entity node corresponding to the plurality of candidate labels;

In a possible implementation, the classifying module 503 is configured to determine a multi-label prediction result corresponding to the target entity node based on the sorted prediction scores according to the following steps:

In one possible implementation, the classification module 503 is configured to train the meta-learner according to the following steps:

and updating the ranking score of each entity node in the meta-learner to be trained based on the corresponding relation between each leaf node and each entity node of the regression tree to obtain the trained meta-learner.

In a possible implementation manner, the prediction module 502 is configured to perform multi-label prediction on a plurality of entity relationship networks based on a plurality of trained base learners according to the following steps to obtain a multi-label prediction result of each entity node in each entity relationship network:

In a possible implementation manner, in a case that the target base learner includes a fully connected input layer, a representation learning module, and a fully connected output layer, the predicting module 502 is configured to perform multi-label prediction on the target entity relationship network by using the target base learner corresponding to the target entity relationship network according to the following steps, so as to obtain a multi-label prediction result of each entity node in the target entity relationship network:

inputting the original characteristic vector of each entity node in the target entity network into a full-connection input layer included by the target base learning device, and determining a dimension reduction characteristic vector aiming at each entity node and output by the full-connection input layer; and the number of the first and second groups,

inputting the reduced-dimension feature vector into a representation learning module included in a target base learning device, and determining a hidden feature vector containing low-order signals and similar node information; and the number of the first and second groups,

and inputting the hidden feature vectors into a full-connection output layer included by the target base learner, and determining the multi-label prediction result of each entity node in the target entity relationship network.

In one possible implementation, the prediction module 502 is used to train a target-based learner that includes a fully-connected input layer, a representation learning module, and a fully-connected output layer, as follows:

acquiring a sample entity relationship network, wherein part of entity nodes in the sample entity relationship network have multi-label labeling results;

performing dimensionality reduction representation on each entity node in the sample entity relational network by using a full-connection input layer, and determining node dimensionality reduction implicit representation of each entity node after dimensionality reduction conversion;

performing attention learning of low-order signals on the node dimension reduction implicit representation by using a graph convolution layer included in the representation learning module to obtain node attention implicit representation of each entity node after the attention learning; performing node similarity learning on the node attention implicit expression by using a conditional random field layer included by the expression learning module to obtain node similarity implicit expression of each entity node after the node similarity learning is performed;

performing multi-label prediction on the node similarity implicit expression of each entity node by utilizing a full-connection output layer to obtain a prediction result;

In a possible implementation, in the case that multiple graph convolution layers are included, the prediction module 502 is configured to perform attention learning of a low-order signal on the node dimension-reduced implicit representation by using the graph convolution layers included in the representation learning module according to the following steps to obtain a node attention implicit representation of each entity node after performing attention learning:

using the node dimensionality reduction implicit representation as an initial representation of each graph convolution layer;

the node attention implicit expression of the graph attention layer output is determined according to the sample entity relationship network and the node attention implicit expression of the last graph convolutional layer output before the current graph convolutional layer, wherein the node attention implicit expression is input into other graph convolutional layers;

In one possible implementation, prediction module 502 is configured to determine a node attention implicit representation of the current graph convolution layer output based on the initial representation, the node attention implicit representation of the graph attention layer output, and training parameters of the current graph convolution layer as follows:

determining a first graph convolution operator based on a first weighted summation operation between the initial representation and a node attention implicit representation of the graph attention layer output; determining a second graph convolution operator based on a second weighted summation operation between the training parameters of the current graph convolution layer and the identity mapping matrix corresponding to the current graph convolution layer;

In a possible implementation manner, the prediction module 502 is configured to perform node similarity learning on the node attention implicit representation by using a conditional random field layer included in the representation learning module according to the following steps to obtain a node similarity implicit representation of each entity node after the node similarity learning is performed:

aiming at each entity node, constructing a maximum conditional probability function of the corresponding entity node; the maximized conditional probability function is determined by a first difference between the node-likeness implicit representation of the entity node and the node-attention implicit representation of the entity node, and a second difference between the node-likeness implicit representation of the entity node and node-likeness implicit representations of other entity nodes than the entity node among the plurality of entity nodes;

in case it is determined that the maximum conditional probability function reaches the maximum function value, a node-similarity implicit representation of each node is determined.

In one possible embodiment, the prediction module 502 is configured to construct the maximized conditional probability function of the corresponding entity node according to the following steps:

for each entity node, acquiring a first difference value between the node similarity implicit representation of the entity node and the node attention implicit representation of the entity node, a second difference value between the node similarity implicit representation of the entity node and the node similarity implicit representations of other entity nodes except the entity node in the entity nodes, and node similarity between the entity node and other entity nodes;

summing the product results of the entity nodes and other entity nodes to obtain a second difference sum;

a maximum conditional probability function is determined based on a third weighted sum operation between the first difference and the second difference sum.

In a possible implementation manner, the prediction module 502 is configured to adjust the target-based learner based on the prediction result and the multi-label labeling result according to the following steps, so as to obtain a trained target-based learner:

acquiring a first weight parameter for adjusting the number of positive and negative samples and a second weight parameter for adjusting the contribution degree of difficultly distinguished samples;

determining a target loss function value of the target-based learner based on the first weight parameter, the second weight parameter and a difference result between the prediction result and the multi-label labeling result;

and performing at least one round of adjustment on the training parameter values of the target-based learner based on the target loss function values to obtain the trained target-based learner.

It should be noted that the apparatus in the embodiment of the present application can implement each process of the foregoing embodiment of the method, and achieve the same effect and function, which is not described herein again.

An embodiment of the present disclosure further provides an electronic device, as shown in fig. 6, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and includes: a processor 601, a memory 602, and a bus 603. The memory 602 stores machine-readable instructions executable by the processor 601 (for example, execution instructions corresponding to the obtaining module 501, the predicting module 502, and the classifying module 503 in the apparatus in fig. 5, and the like), when the electronic device is operated, the processor 601 and the memory 602 communicate via the bus 603, and when the machine-readable instructions are executed by the processor 601, the following processes are performed:

performing multi-label prediction on a plurality of entity relationship networks based on a plurality of trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network;

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the entity classification method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

An embodiment of the present disclosure further provides a computer program product, where the computer program product carries a program code, and an instruction included in the program code may be used to execute the step of the entity classification method in the foregoing method embodiment.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus, device, and computer-readable storage medium embodiments, the description of which is simplified since it is substantially similar to the method embodiments, and where relevant, reference may be made to some descriptions of the method embodiments.

The apparatus, the device, and the computer-readable storage medium provided in the embodiments of the present application correspond to the method one to one, and therefore, the apparatus, the device, and the computer-readable storage medium also have similar advantageous technical effects to the corresponding method.

It will be appreciated by one skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus (device or system), or computer-readable storage medium. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer-readable storage medium embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices or systems), and computer-readable storage media according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An entity classification method, comprising:

2. The method of claim 1, wherein in the case that the multi-label predictor comprises predicted scores for a plurality of candidate labels; determining a final multi-label prediction result corresponding to each entity node based on the multi-label prediction result of each entity node in each entity relationship network comprises:

3. The method of claim 2, wherein determining the label feature vector of each entity node for the target candidate label based on the multi-label prediction result of each entity node in each entity relationship network comprises:

and combining the target prediction scores respectively selected from the entity relationship networks to obtain the label feature vector of the entity node for the target candidate label.

4. The method of claim 2, wherein the performing rank learning on the label feature vectors of the target candidate labels for a plurality of entity nodes based on the trained meta-learner to determine a final multi-label prediction result corresponding to each entity node comprises:

5. The method of claim 4, wherein determining the multi-labeled prediction result corresponding to the target entity node based on the sorted prediction scores comprises:

6. The method of any of claims 2 to 5, wherein the meta learner is trained according to the steps of:

obtaining a sample training set comprising a plurality of sample feature vectors, wherein each dimension value of the sample feature vectors points to an entity node label pair;

7. The method according to any one of claims 1 to 5, wherein the multi-label prediction of the entity relationship networks based on the trained base learners to obtain a multi-label prediction result of each entity node in each entity relationship network comprises:

8. The method according to claim 7, wherein in a case that the target base learner includes a fully connected input layer, a representation learning module, and a fully connected output layer, the performing multi-label prediction on the target entity relationship network by using the target base learner corresponding to the target entity relationship network to obtain a multi-label prediction result of each entity node in the target entity relationship network includes:

inputting the original feature vector of each entity node in the target entity network into a full-connection input layer included by the target base learner, and determining a dimension-reduced feature vector output by the full-connection input layer and aiming at each entity node; and the number of the first and second groups,

inputting the dimensionality reduction feature vector to a representation learning module included in the target base learner, and determining a hidden feature vector containing low-order signals and similar node information; and the number of the first and second groups,

9. The method of claim 8, wherein the target-based learner, which includes a fully connected input layer, a representation learning module, and a fully connected output layer, is trained according to the steps of:

performing multi-label prediction on the node similarity implicit expression of each entity node by using the full-connection output layer to obtain a prediction result;

10. The method of claim 9, wherein, in a case that multiple graph convolution layers are included, performing attention learning of low-order signals on the node dimension-reduced implicit representation by using the graph convolution layers included in the representation learning module to obtain a node attention implicit representation of each entity node after performing attention learning, comprises:

11. The method of claim 10, wherein determining the node attention implicit representation for the current graph attention layer output based on the initial representation, the node attention implicit representation for the graph attention layer output, and training parameters for the current graph attention layer comprises:

determining a first graph convolution operator based on a first weighted sum operation between the initial representation and a node attention implicit representation of the graph attention layer output; determining a second graph convolution operator based on a second weighted summation operation between the training parameters of the current graph convolution layer and the identity mapping matrix corresponding to the current graph convolution layer;

12. The method according to claim 9, wherein the performing node similarity learning on the node attention implicit representation by using a conditional random field layer included in the representation learning module to obtain a node similarity implicit representation of each entity node after performing node similarity learning comprises:

determining a node similarity implicit representation for each of the nodes if it is determined that the maximum conditional probability function reaches a maximum function value.

13. The method of claim 12, wherein constructing the maximized conditional probability function for the entity node comprises:

14. The method of claim 9, wherein the adjusting the target-based learner based on the prediction result and the multi-label labeling result to obtain a trained target-based learner comprises:

determining a target loss function value of the target-based learner based on the first weight parameter, the second weight parameter, and a difference result between the prediction result and the multi-label labeling result;

15. An entity classification apparatus, comprising:

16. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the entity classification method of any one of claims 1 to 14.

17. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the entity classification method of any one of claims 1 to 14.