CN111723301A

CN111723301A - Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Info

Publication number: CN111723301A
Application number: CN202010483759.5A
Authority: CN
Inventors: 郑建兴; 李沁文; 李德玉; 梁吉业
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-09-29
Anticipated expiration: 2040-06-01
Also published as: CN111723301B

Abstract

The invention belongs to the technical field of social network interpretable link prediction methods, and discloses an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix, which comprises the steps of firstly constructing a learning hierarchical theme preference semantic matrix aiming at a social user attention relationship network graph and user text content data, labeling a network node attention relationship through a hierarchical theme and preference semantic matrix, and interpreting the preference semantic matrix through user text content; and then, calculating the relevance of the attention relationship between the new user and other users through the hierarchical theme preference semantic matrix, identifying the user with high relevance as having the attention relationship, marking the attention relationship through the hierarchical theme and preference semantic matrix, and explaining the preference semantic matrix according to the text content of the user. The method and the system predict the user attention relationship in the aspect of hierarchical theme, improve the accuracy of identifying the social network user attention relationship, and particularly provide great support in identifying the network fraud group allopatric social attention relationship.

Description

Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Technical Field

The invention relates to the technical field of social network interpretable link prediction methods, in particular to an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix.

Background

In a social network, attention relationships between users can form an attention network structure. In many attention relationship social networks, users may have different interest points for different types of topics and different levels of topics, and by exploring implicit semantic connections of attention relationships among the users under the different types of topics and different levels of topics, the attention relationship interest motivations of the users can be deeply mined, so that potential attention users of the users can be found more easily, and more reliable attention relationship prediction explanations are established. If one user pays attention to another user, the microblog content of the user may be forwarded or approved. By analyzing microblog contents issued by two users, semantic similarity can be found on certain potential common interest characteristics, and attention behaviors among the users can be predicted; meanwhile, the interests of the users have a hierarchical structure, and two users establishing the attention relationship on the CBA theme are more semantically interpretable than two users establishing the attention relationship on the basketball theme. Fine-grained interpretable concern relation reasons are extracted based on the potential interest topics among the users, and the performance of link prediction of a recommendation system can be improved.

The prediction of the concerned relationship link in the social network generally depends on a network structure, the link influence between users is analyzed by a labeling technology based on the network structure, and rich interest information in the microblog content of the users is ignored; the attention relationship technology based on user behavior record focuses on labeling with user explicit topic keywords. On the other hand, the interests of the social network users are diversified and multi-layered, for example, the microblog content released by the user of the attendee includes sports keywords, the microblog content released by the user of the attendee includes subject terms such as basketball and CBA, and the attention relationship behavior among the users can be extracted and interpreted through the potential semantic relationship among the subject terms such as sports, basketball and CBA. Therefore, it is necessary to how to mine the motivation of potential semantic interest among users according to the microblog content of the users, learn the attention relationship among the users on the hierarchical interest topic preference semantic matrix, further realize the prediction of the user network attention relationship on the aspect of fine-grained interest topics, and complete the interpretable accurate prediction of the user attention relationship links.

Disclosure of Invention

Aiming at the problems, the invention provides an attention relation identification and labeling method based on a hierarchical theme preference semantic matrix.

In order to achieve the purpose, the invention adopts the following technical scheme:

an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix comprises the following steps:

step S1, constructing an attention relation network graph initialized by a theme preference semantic matrix;

step S2, learning a hierarchical theme preference semantic matrix based on the translation model;

step S3, labeling the network node attention relation through the preference semantic matrix;

step S4, based on the user text content interpretation layering theme preference semantic matrix;

step S5, calculating the relevance of the new user and other user nodes under the hierarchical theme;

step S6, selecting the hierarchical theme with the maximum correlation to label the attention relationship among users;

step S7, marking a hierarchical theme preference semantic matrix of the attention relationship;

in step S8, the preference semantic matrix is interpreted according to the user text content.

Further, the step S1, the constructing the attention relationship network graph initialized by the topic preference semantic matrix further includes the following steps:

s1.1, establishing an attention relation graph G (V, R) according to the attention relation of a user, wherein V is a set of nodes, and R is a set of edges; nodes in the concern relationship graph represent users, edges represent concern relationships r among the users, and if a user h concerns a user t, an edge pointing to the user t from the user h is constructed; wherein h and t both represent users; the attention relationship graph describes explicit social attention relationships among users and is favorable for predicting potential attention relationships among users;

step S1.2, extracting a keyword set S based on the text content released by the user h_h(ii) a Text content extraction keyword set S published based on user t_t(ii) a According to S_h、S_tRespectively calculating the similarity coefficient of the jaccard with N coarse-grained subjects on the Chinese Wikipedia layer, selecting m coarse-grained subjects with high similarity between the user h and the user t to explain the attention relation between the users, and initializing preference semantic matrixes of the m coarse-grained subjects

On the basis of l layers of coarse-grained subject matter, according to S_h、S_tRespectively calculating jaccard similarity coefficients of P fine-grained themes at a layer P +1 of the Chinese Wikipedia, selecting q fine-grained themes with high similarity between a user h and a user t to explain the concern relationship between the users, and initializing a preference semantic matrix of the q fine-grained themes

The fine-grained subject set of coarse-grained subject c is denoted child (c) ═ { c { (c) }₁,c₂,...,c_k,...,c_b}; the initialized theme preference semantic matrix reflects the text content association between the user h and the user t, and is beneficial to disclosing and reflecting the reason of the attention relationship between the users.

Further, in the step S2, learning the hierarchical topic preference semantic matrix based on the translation model further includes the following steps:

s2.1, establishing a triple (h, r, t) based on a user h, a user t and an attention relation on an attention relation graph, wherein r is the attention relation from the user h to the user t; and then, on the aspect of coarse-grained subject, modeling the representation of users and relations based on a translation model, giving a coarse-grained subject c, and recording the coarse-grained subject of a user hThe vector in aspect c is expressed as

The vector representation of user t in terms of coarse-grained subject c is represented as

Wherein the content of the first and second substances,

vector representations of user h and user t, respectively; vector representation of user h and user t

Mapping to the relation space of the coarse-grained subject c aspect to obtain the vector representation of the relation space of the coarse-grained subject c aspect

The user vector on the aspect of the coarse-grained theme c represents the projection semantic interest of the user on a certain aspect, and the reason of the attention relationship between the users can be explained according to the semantic interest of the coarse-grained theme c;

step S2.2, based on the interpretation of the coarse-grained subject c with respect to the attention relationship, using the fine-grained subject child (c) ═ c of the coarse-grained subject c₁,c₂,...,c_k,...,c_bDescribing the attention relationship among users in further detail; let c_kA k-th fine-grained theme being a coarse-grained theme c, when the coarse-grained theme c is a basketball, the fine-grained theme c_kTopic preference semantic matrix representing CBA on coarse-grained topic c

On the basis of learning the fine-grained subject c_kSubject preference semantic matrix

User h is on fine-grained topic c_kRepresentation of vectors in

User t is on fine-grained subject c_kRepresentation of vectors in

Establishing a fine-grained subject c of a user h and a user t in a coarse-grained subject c_kThe focus relationship distance function above is:

wherein the content of the first and second substances,

representing user h and user t as fine-grained subject c_kA distance function over the interest relation r of the aspect;

representing user h as a fine-grained topic c_kA vector representation of the aspect;

representing user t in fine-grained subject c_kA vector representation of the aspect;

a spatial representation representing an attention relationship r of user u and user v; user h and user t are on fine-grained subject c_kThe distance of the attention relationship in the aspect reflects the difference of semantic interests among the users, the distance is larger, the interest difference is larger, the distance is smaller, the interests among the users are more similar, and the reason of the attention relationship among the users can be explained;

step S2.3, according to the distance function of the triplets (h, r, t), for the positive sample triplets (h, r, t) with the real attention relation r and the negative sample triplets without the attention relation r

Defining a translation relationship-based Hinge Loss objective function as:

wherein c and t are the coarse-grained subject of the layer l; c. C_i、c_jIs a fine-grained theme of the coarse-grained theme c;

is a preference semantic matrix of the coarse-grained subjects c and t,

fine grain subject c being coarse grain subject c_i、c_jThe preference semantic matrix of (2);

is an identity matrix;

as a Hinge Loss function; s is a positive sample set of (h, r, t) user attention relations, and the user h and the user t have the attention relation r;

for a set of negative examples of user attention relationships,

is a user h based on (h, r, t) replacement, and

there is no concern relationship r with t,

is user t based on (h, r, t) replacement, and h is associated with

The method comprises the following steps of (1) not having an attention relation r, (gamma) representing a boundary parameter, (lambda) representing a regularization super parameter of a coarse-granularity theme, and (η) representing a regularization super parameter of a fine-granularity theme;

and

orthogonality ensures that the coarse-grained subjects c and t can learn different parameter matrixes;

and

orthogonality guarantees fine-grained subject matter c_i、c_jDifferent parameter matrixes can be learned; the Hinge Loss minimization enables the distance between positive sample users to be smaller than the distance between negative sample users and lower than a boundary parameter gamma, the distributed representation of the users can be iteratively learned from the aspect of fine-grained theme semantic interest, the prediction of the attention relationship between the users from the aspect of fine-grained theme is facilitated, and the attention relationship between the users is explained according to the fine-grained theme.

Further, the step S3, labeling the network node attention relationship by the preference semantic matrix, further includes the following steps:

step 3.1, the preference semantic matrix of the coarse-grained topic c learned from step S2.3

Interpreting an attention relationship between users;

step 3.2, Fine grained subject c of coarse grained subject c learned from step S2.3_kPreference semantic matrix of

Interpreting an attention relationship between users; from the aspects of coarse-grained and fine-grained themesThe reason of the attention relationship among the user nodes is explained in a comprehensive manner, and the method is favorable for accurately recommending products or users related to subject interests for the users.

Further, in step S4, the method for interpreting the hierarchical topic preference semantic matrix based on the user text content is as follows:

based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2_kThe related subject word text content explains and marks the attention relationship among users; the text content related to the coarse-grained subject and the fine-grained subject explains the subject preference semantic matrix, the behavior interest relationship among the social network users can be identified, the change of the attention relationship among the users can be monitored according to the change of the text content of the users, and the method has an important role in the aspect of identifying the social relationship among different places of network groups.

Further, in step S5, the method for calculating the relevance of the new user and other user nodes under the hierarchical topic is as follows:

preference semantic matrix on user coarse-grained subject c

Based on the vector representation of the new user u

Computing a vector representation with an existing user t

By the similarity of

Representing the distance between the user u and the user t on the attention relation r in the aspect of the coarse-grained subject c; then the fine-grained subject c of the coarse-grained subject c_kPreference semantic matrix

Foundation of (2)In the above, the similarity with the existing user t is calculated for the new user u, and the similarity is utilized

Representing the user u and the user t in a fine-grained subject c_kDistance on the concern relationship r of the aspect; the smaller the distance is, the user u pays attention to the user t, the user u has an attention relationship with the user t, and the coarse-grained theme c and the fine-grained theme c are_kThe attention relationship between users is predicted from different aspects, respectively.

Further, in step S6, the method for selecting the hierarchical topic with the maximum relevance to label the attention relationship between users is as follows:

the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic c_kMarking attention relations among users; for a new user, the attention relation behavior between the new user and the user t is explained in terms of coarse-grained and fine-grained interest topics, and the interpretability of the new user in the attention friend recommendation process is improved.

Further, in step S7, the method for labeling the hierarchical topic preference semantic matrix of the attention relationship is as follows:

the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6

Selecting a fine-grained subject c with the maximum correlation on the basis of the coarse-grained subject c as a coarse-grained subject preference semantic matrix of the new user attention relationship_kPreference semantic matrix of

A fine-grained subject preference semantic matrix as a new user attention relationship; for a new user, the method discloses the preference degree of the interest subject with the user t from the aspects of coarse granularity and fine granularityThe reason for the concern relationship explains the behavior of the new user concerning different users in different subject areas.

Further, in step S8, the method for interpreting the preference semantic matrix according to the text content of the user is as follows:

selecting related subject word text content explanation for the coarse-grained subject c and marking a new concern relationship among users based on the word segmentation result of the user issued content; at the same time, for fine-grained subject matter c_kSelecting related subject word text content to explain and label a new concern relationship between users; the text content reflects the behaviors of the users, the theme preference semantic matrix is explained through the text content, the interest-induced behavior attention relationship among the users is identified, the visual explanation of the attention relationship is provided for the new users, the understandability and the trust of the new users to the recommendation system can be improved, and meanwhile the text content is beneficial to tracking the socialized attention behavior among the users in different places of the social network.

The invention provides an attention relation identification and marking method based on a hierarchical theme preference semantic matrix, which comprises the steps of firstly, aiming at the existing user social relation network graph and user text content data, constructing a hierarchical theme preference semantic matrix based on a translation model, marking the attention relation of network nodes through a preference semantic matrix, and explaining the preference semantic matrix through the user text content; then, the relevance of the attention relationship between the new user and other nodes is calculated through the hierarchical theme preference semantic matrix, the user with high relevance is identified as having the attention relationship, the attention relationship between the new user and the nodes in the network is updated, the hierarchical theme preference semantic matrix of the attention relationship is marked, the preference semantic matrix is explained according to the text content of the user, and the labeled social network attention relationship is the final output result of the method.

Compared with the prior art, the invention has the following advantages:

the method provided by the invention is different from the existing method and is characterized in that a hierarchical theme preference semantic matrix based on a translation model is constructed, the attention relationship of network nodes is marked through the preference semantic matrix, the attention relationship correlation between a new user and other nodes is calculated through text content and the hierarchical theme preference semantic matrix, the user with high correlation is identified as having the attention relationship, and the preference semantic matrix is explained according to the text content of the user. The method and the system predict the user concern relationship from the aspect of hierarchical theme, improve the accuracy of identifying the user concern relationship by the social network, and particularly provide great support in the aspect of identifying the phishing group foreign relationship.

Drawings

FIG. 1 is a schematic diagram of an overall model architecture.

Detailed Description

The method for identifying and marking the attention relationship based on the user hierarchical theme preference semantic matrix is implemented by a computer program. The following will describe the specific implementation of the technical solution proposed by the present invention in terms of flow, and the overall model architecture of the present invention is shown in fig. 1.

The fine-grained subject set of coarse-grained subject c is denoted child (c) ═ { c { (c) }₁,c₂,...,c_k,...,c_b}。

s2.1, establishing a triple (h, r, t) based on a user h, a user t and an attention relation on an attention relation graph, wherein r is the attention relation from the user h to the user t; and then modeling the representation of the user and the relation based on a translation model in the aspect of the coarse-grained subject, giving the coarse-grained subject c, and representing the vector of the user h in the aspect of the coarse-grained subject c as

Wherein the content of the first and second substances,

User h is on fine-grained topic c_kRepresentation of vectors in

User t is on fine-grained subject c_kRepresentation of vectors in

wherein the content of the first and second substances,

a spatial representation representing an attention relationship r of user u and user v;

Defining a translation relationship-based Hinge Loss objective function as:

is a preference semantic matrix of the coarse-grained subjects c and t,

is an identity matrix;

for a set of negative examples of user attention relationships,

is a user h based on (h, r, t) replacement, and

there is no concern relationship r with t,

is user t based on (h, r, t) replacement, and h is associated with

and

and

orthogonality guarantees fine-grained subject matter c_i、c_jDifferent parameter matrices can be learned.

step S3.1, the preference semantic matrix of the coarse-grained subject c learned from step S2.3

Interpreting an attention relationship between users;

step S3.2, coarse-grained subject c learned from step S2.3Fine grained subject matter of c_kPreference semantic matrix of

The focus relationship between users is interpreted.

Step S4, based on the user text content interpretation layering theme preference semantic matrix; based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2_kThe associated subject word text content explains the attention relationship between the tagged users.

Step S5, calculating the relevance of the new user and other user nodes under the hierarchical theme; preference semantic matrix on user coarse-grained subject c

Based on the vector representation of the new user u

Computing a vector representation with an existing user t

By the similarity of

On the basis of the method, the similarity between the new user u and the existing user t is calculated and utilized

Representing the user u and the user t in a fine-grained subject c_kConcern of the aspect relates to the distance over r.

Step S6, selecting toolThe hierarchical theme with the maximum relevance marks the attention relationship among users; the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic c_kAnd marking attention relations among users.

Step S7, marking a hierarchical theme preference semantic matrix of the attention relationship; the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6

And the semantic matrix is used as a fine-grained theme preference semantic matrix of the new user attention relationship.

Step S8, interpreting a preference semantic matrix according to the text content of the user; selecting related subject word text content explanation for the coarse-grained subject c and marking a new concern relationship among users based on the word segmentation result of the user issued content; at the same time, for fine-grained subject matter c_kAnd selecting the associated subject word text content to explain and label the new concern relationship between the users.

Evaluation of technical Effect

In order to verify the effectiveness and the advancement of the technical scheme provided by the invention, several existing translation model methods are selected for comparison: TransE, TransD, TransH, TransR. The prediction detection result of the concerned relationship link on the microblog social network data set by the method is evaluated through average ranking (MeanRank) and Hits @ K, and the result is shown in Table 1:

TABLE 1

The results in the table show that the technical scheme of the invention can obtain the detection result with better precision and reliability than the existing method when the link prediction of the social network is carried out.

Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. An attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix is characterized by comprising the following steps: the method comprises the following steps:

2. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S1, the step of constructing the interest relationship network graph initialized by the theme preference semantic matrix further includes the following steps:

3. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S2, learning the hierarchical topic preference semantic matrix based on the translation model further includes the following steps:

Wherein the content of the first and second substances,

User h is on fine-grained topic c_kRepresentation of vectors in

User t is on fine-grained subject c_kRepresentation of vectors in

wherein the content of the first and second substances,

Defining translation-based relationshipsThe Hinge Loss objective function is:

is a preference semantic matrix of the coarse-grained subjects c and t,

is an identity matrix;

for a set of negative examples of user attention relationships,

is a user h based on (h, r, t) replacement, and

there is no concern relationship r with t,

is based on (h, r, t)Changed users t, and h and

and

and

4. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S3, labeling the network node attention relationship by the preference semantic matrix further includes the following steps:

Interpreting an attention relationship between users;

step S3.2, Fine grained subject c of coarse grained subject c learned from step S2.3_kPreference semantic matrix of

The focus relationship between users is interpreted.

5. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S4, the method for interpreting the hierarchical topic preference semantic matrix based on the user text content includes the following steps: based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2_kThe associated subject word text content explains the attention relationship between the tagged users.

6. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S5, the method for calculating the relevance of the new user and other user nodes under the hierarchical topic is as follows: preference semantic matrix on user coarse-grained subject c

Based on the vector representation of the new user u

Computing a vector representation with an existing user t

By the similarity of

7. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S6, the method for selecting the hierarchical topic with the maximum relevance to label the attention relationship among the users is as follows: the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic c_kAnd marking attention relations among users.

8. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S7, the method for labeling the hierarchical topic preference semantic matrix of the attention relationship is as follows: the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6

9. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S8, the method for interpreting the preference semantic matrix according to the text content of the user is as follows: selecting the coarse-grained subject c based on the word segmentation result of the user issued contentExplaining and labeling new concern relations among users by taking the associated subject word text contents; at the same time, for fine-grained subject matter c_kAnd selecting the associated subject word text content to explain and label the new concern relationship between the users.