CN111723301A - Attention relation identification and labeling method based on hierarchical theme preference semantic matrix - Google Patents

Attention relation identification and labeling method based on hierarchical theme preference semantic matrix Download PDF

Info

Publication number
CN111723301A
CN111723301A CN202010483759.5A CN202010483759A CN111723301A CN 111723301 A CN111723301 A CN 111723301A CN 202010483759 A CN202010483759 A CN 202010483759A CN 111723301 A CN111723301 A CN 111723301A
Authority
CN
China
Prior art keywords
user
grained
attention
subject
preference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010483759.5A
Other languages
Chinese (zh)
Other versions
CN111723301B (en
Inventor
郑建兴
李沁文
李德玉
梁吉业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202010483759.5A priority Critical patent/CN111723301B/en
Publication of CN111723301A publication Critical patent/CN111723301A/en
Application granted granted Critical
Publication of CN111723301B publication Critical patent/CN111723301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of social network interpretable link prediction methods, and discloses an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix, which comprises the steps of firstly constructing a learning hierarchical theme preference semantic matrix aiming at a social user attention relationship network graph and user text content data, labeling a network node attention relationship through a hierarchical theme and preference semantic matrix, and interpreting the preference semantic matrix through user text content; and then, calculating the relevance of the attention relationship between the new user and other users through the hierarchical theme preference semantic matrix, identifying the user with high relevance as having the attention relationship, marking the attention relationship through the hierarchical theme and preference semantic matrix, and explaining the preference semantic matrix according to the text content of the user. The method and the system predict the user attention relationship in the aspect of hierarchical theme, improve the accuracy of identifying the social network user attention relationship, and particularly provide great support in identifying the network fraud group allopatric social attention relationship.

Description

Attention relation identification and labeling method based on hierarchical theme preference semantic matrix
Technical Field
The invention relates to the technical field of social network interpretable link prediction methods, in particular to an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix.
Background
In a social network, attention relationships between users can form an attention network structure. In many attention relationship social networks, users may have different interest points for different types of topics and different levels of topics, and by exploring implicit semantic connections of attention relationships among the users under the different types of topics and different levels of topics, the attention relationship interest motivations of the users can be deeply mined, so that potential attention users of the users can be found more easily, and more reliable attention relationship prediction explanations are established. If one user pays attention to another user, the microblog content of the user may be forwarded or approved. By analyzing microblog contents issued by two users, semantic similarity can be found on certain potential common interest characteristics, and attention behaviors among the users can be predicted; meanwhile, the interests of the users have a hierarchical structure, and two users establishing the attention relationship on the CBA theme are more semantically interpretable than two users establishing the attention relationship on the basketball theme. Fine-grained interpretable concern relation reasons are extracted based on the potential interest topics among the users, and the performance of link prediction of a recommendation system can be improved.
The prediction of the concerned relationship link in the social network generally depends on a network structure, the link influence between users is analyzed by a labeling technology based on the network structure, and rich interest information in the microblog content of the users is ignored; the attention relationship technology based on user behavior record focuses on labeling with user explicit topic keywords. On the other hand, the interests of the social network users are diversified and multi-layered, for example, the microblog content released by the user of the attendee includes sports keywords, the microblog content released by the user of the attendee includes subject terms such as basketball and CBA, and the attention relationship behavior among the users can be extracted and interpreted through the potential semantic relationship among the subject terms such as sports, basketball and CBA. Therefore, it is necessary to how to mine the motivation of potential semantic interest among users according to the microblog content of the users, learn the attention relationship among the users on the hierarchical interest topic preference semantic matrix, further realize the prediction of the user network attention relationship on the aspect of fine-grained interest topics, and complete the interpretable accurate prediction of the user attention relationship links.
Disclosure of Invention
Aiming at the problems, the invention provides an attention relation identification and labeling method based on a hierarchical theme preference semantic matrix.
In order to achieve the purpose, the invention adopts the following technical scheme:
an attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix comprises the following steps:
step S1, constructing an attention relation network graph initialized by a theme preference semantic matrix;
step S2, learning a hierarchical theme preference semantic matrix based on the translation model;
step S3, labeling the network node attention relation through the preference semantic matrix;
step S4, based on the user text content interpretation layering theme preference semantic matrix;
step S5, calculating the relevance of the new user and other user nodes under the hierarchical theme;
step S6, selecting the hierarchical theme with the maximum correlation to label the attention relationship among users;
step S7, marking a hierarchical theme preference semantic matrix of the attention relationship;
in step S8, the preference semantic matrix is interpreted according to the user text content.
Further, the step S1, the constructing the attention relationship network graph initialized by the topic preference semantic matrix further includes the following steps:
s1.1, establishing an attention relation graph G (V, R) according to the attention relation of a user, wherein V is a set of nodes, and R is a set of edges; nodes in the concern relationship graph represent users, edges represent concern relationships r among the users, and if a user h concerns a user t, an edge pointing to the user t from the user h is constructed; wherein h and t both represent users; the attention relationship graph describes explicit social attention relationships among users and is favorable for predicting potential attention relationships among users;
step S1.2, extracting a keyword set S based on the text content released by the user hh(ii) a Text content extraction keyword set S published based on user tt(ii) a According to Sh、StRespectively calculating the similarity coefficient of the jaccard with N coarse-grained subjects on the Chinese Wikipedia layer, selecting m coarse-grained subjects with high similarity between the user h and the user t to explain the attention relation between the users, and initializing preference semantic matrixes of the m coarse-grained subjects
Figure BDA0002518156070000031
On the basis of l layers of coarse-grained subject matter, according to Sh、StRespectively calculating jaccard similarity coefficients of P fine-grained themes at a layer P +1 of the Chinese Wikipedia, selecting q fine-grained themes with high similarity between a user h and a user t to explain the concern relationship between the users, and initializing a preference semantic matrix of the q fine-grained themes
Figure BDA0002518156070000032
The fine-grained subject set of coarse-grained subject c is denoted child (c) ═ { c { (c) }1,c2,...,ck,...,cb}; the initialized theme preference semantic matrix reflects the text content association between the user h and the user t, and is beneficial to disclosing and reflecting the reason of the attention relationship between the users.
Further, in the step S2, learning the hierarchical topic preference semantic matrix based on the translation model further includes the following steps:
s2.1, establishing a triple (h, r, t) based on a user h, a user t and an attention relation on an attention relation graph, wherein r is the attention relation from the user h to the user t; and then, on the aspect of coarse-grained subject, modeling the representation of users and relations based on a translation model, giving a coarse-grained subject c, and recording the coarse-grained subject of a user hThe vector in aspect c is expressed as
Figure BDA0002518156070000033
The vector representation of user t in terms of coarse-grained subject c is represented as
Figure BDA0002518156070000034
Wherein the content of the first and second substances,
Figure BDA0002518156070000035
vector representations of user h and user t, respectively; vector representation of user h and user t
Figure BDA0002518156070000036
Figure BDA0002518156070000037
Mapping to the relation space of the coarse-grained subject c aspect to obtain the vector representation of the relation space of the coarse-grained subject c aspect
Figure BDA0002518156070000038
The user vector on the aspect of the coarse-grained theme c represents the projection semantic interest of the user on a certain aspect, and the reason of the attention relationship between the users can be explained according to the semantic interest of the coarse-grained theme c;
step S2.2, based on the interpretation of the coarse-grained subject c with respect to the attention relationship, using the fine-grained subject child (c) ═ c of the coarse-grained subject c1,c2,...,ck,...,cbDescribing the attention relationship among users in further detail; let ckA k-th fine-grained theme being a coarse-grained theme c, when the coarse-grained theme c is a basketball, the fine-grained theme ckTopic preference semantic matrix representing CBA on coarse-grained topic c
Figure BDA0002518156070000041
On the basis of learning the fine-grained subject ckSubject preference semantic matrix
Figure BDA0002518156070000042
User h is on fine-grained topic ckRepresentation of vectors in
Figure BDA0002518156070000043
User t is on fine-grained subject ckRepresentation of vectors in
Figure BDA0002518156070000044
Establishing a fine-grained subject c of a user h and a user t in a coarse-grained subject ckThe focus relationship distance function above is:
Figure BDA0002518156070000045
wherein the content of the first and second substances,
Figure BDA0002518156070000046
representing user h and user t as fine-grained subject ckA distance function over the interest relation r of the aspect;
Figure BDA0002518156070000047
representing user h as a fine-grained topic ckA vector representation of the aspect;
Figure BDA0002518156070000048
representing user t in fine-grained subject ckA vector representation of the aspect;
Figure BDA0002518156070000049
a spatial representation representing an attention relationship r of user u and user v; user h and user t are on fine-grained subject ckThe distance of the attention relationship in the aspect reflects the difference of semantic interests among the users, the distance is larger, the interest difference is larger, the distance is smaller, the interests among the users are more similar, and the reason of the attention relationship among the users can be explained;
step S2.3, according to the distance function of the triplets (h, r, t), for the positive sample triplets (h, r, t) with the real attention relation r and the negative sample triplets without the attention relation r
Figure BDA00025181560700000410
Defining a translation relationship-based Hinge Loss objective function as:
Figure BDA00025181560700000411
wherein c and t are the coarse-grained subject of the layer l; c. Ci、cjIs a fine-grained theme of the coarse-grained theme c;
Figure BDA00025181560700000412
Figure BDA00025181560700000413
is a preference semantic matrix of the coarse-grained subjects c and t,
Figure BDA00025181560700000414
fine grain subject c being coarse grain subject ci、cjThe preference semantic matrix of (2);
Figure BDA00025181560700000415
is an identity matrix;
Figure BDA00025181560700000416
as a Hinge Loss function; s is a positive sample set of (h, r, t) user attention relations, and the user h and the user t have the attention relation r;
Figure BDA0002518156070000051
for a set of negative examples of user attention relationships,
Figure BDA0002518156070000052
is a user h based on (h, r, t) replacement, and
Figure BDA0002518156070000053
there is no concern relationship r with t,
Figure BDA0002518156070000054
is user t based on (h, r, t) replacement, and h is associated with
Figure BDA0002518156070000055
The method comprises the following steps of (1) not having an attention relation r, (gamma) representing a boundary parameter, (lambda) representing a regularization super parameter of a coarse-granularity theme, and (η) representing a regularization super parameter of a fine-granularity theme;
Figure BDA0002518156070000056
and
Figure BDA0002518156070000057
orthogonality ensures that the coarse-grained subjects c and t can learn different parameter matrixes;
Figure BDA0002518156070000058
and
Figure BDA0002518156070000059
orthogonality guarantees fine-grained subject matter ci、cjDifferent parameter matrixes can be learned; the Hinge Loss minimization enables the distance between positive sample users to be smaller than the distance between negative sample users and lower than a boundary parameter gamma, the distributed representation of the users can be iteratively learned from the aspect of fine-grained theme semantic interest, the prediction of the attention relationship between the users from the aspect of fine-grained theme is facilitated, and the attention relationship between the users is explained according to the fine-grained theme.
Further, the step S3, labeling the network node attention relationship by the preference semantic matrix, further includes the following steps:
step 3.1, the preference semantic matrix of the coarse-grained topic c learned from step S2.3
Figure BDA00025181560700000510
Interpreting an attention relationship between users;
step 3.2, Fine grained subject c of coarse grained subject c learned from step S2.3kPreference semantic matrix of
Figure BDA00025181560700000511
Interpreting an attention relationship between users; from the aspects of coarse-grained and fine-grained themesThe reason of the attention relationship among the user nodes is explained in a comprehensive manner, and the method is favorable for accurately recommending products or users related to subject interests for the users.
Further, in step S4, the method for interpreting the hierarchical topic preference semantic matrix based on the user text content is as follows:
based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2kThe related subject word text content explains and marks the attention relationship among users; the text content related to the coarse-grained subject and the fine-grained subject explains the subject preference semantic matrix, the behavior interest relationship among the social network users can be identified, the change of the attention relationship among the users can be monitored according to the change of the text content of the users, and the method has an important role in the aspect of identifying the social relationship among different places of network groups.
Further, in step S5, the method for calculating the relevance of the new user and other user nodes under the hierarchical topic is as follows:
preference semantic matrix on user coarse-grained subject c
Figure BDA0002518156070000061
Based on the vector representation of the new user u
Figure BDA0002518156070000062
Computing a vector representation with an existing user t
Figure BDA0002518156070000063
By the similarity of
Figure BDA0002518156070000064
Representing the distance between the user u and the user t on the attention relation r in the aspect of the coarse-grained subject c; then the fine-grained subject c of the coarse-grained subject ckPreference semantic matrix
Figure BDA0002518156070000065
Foundation of (2)In the above, the similarity with the existing user t is calculated for the new user u, and the similarity is utilized
Figure BDA0002518156070000066
Representing the user u and the user t in a fine-grained subject ckDistance on the concern relationship r of the aspect; the smaller the distance is, the user u pays attention to the user t, the user u has an attention relationship with the user t, and the coarse-grained theme c and the fine-grained theme c arekThe attention relationship between users is predicted from different aspects, respectively.
Further, in step S6, the method for selecting the hierarchical topic with the maximum relevance to label the attention relationship between users is as follows:
the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic ckMarking attention relations among users; for a new user, the attention relation behavior between the new user and the user t is explained in terms of coarse-grained and fine-grained interest topics, and the interpretability of the new user in the attention friend recommendation process is improved.
Further, in step S7, the method for labeling the hierarchical topic preference semantic matrix of the attention relationship is as follows:
the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6
Figure BDA0002518156070000067
Selecting a fine-grained subject c with the maximum correlation on the basis of the coarse-grained subject c as a coarse-grained subject preference semantic matrix of the new user attention relationshipkPreference semantic matrix of
Figure BDA0002518156070000071
A fine-grained subject preference semantic matrix as a new user attention relationship; for a new user, the method discloses the preference degree of the interest subject with the user t from the aspects of coarse granularity and fine granularityThe reason for the concern relationship explains the behavior of the new user concerning different users in different subject areas.
Further, in step S8, the method for interpreting the preference semantic matrix according to the text content of the user is as follows:
selecting related subject word text content explanation for the coarse-grained subject c and marking a new concern relationship among users based on the word segmentation result of the user issued content; at the same time, for fine-grained subject matter ckSelecting related subject word text content to explain and label a new concern relationship between users; the text content reflects the behaviors of the users, the theme preference semantic matrix is explained through the text content, the interest-induced behavior attention relationship among the users is identified, the visual explanation of the attention relationship is provided for the new users, the understandability and the trust of the new users to the recommendation system can be improved, and meanwhile the text content is beneficial to tracking the socialized attention behavior among the users in different places of the social network.
The invention provides an attention relation identification and marking method based on a hierarchical theme preference semantic matrix, which comprises the steps of firstly, aiming at the existing user social relation network graph and user text content data, constructing a hierarchical theme preference semantic matrix based on a translation model, marking the attention relation of network nodes through a preference semantic matrix, and explaining the preference semantic matrix through the user text content; then, the relevance of the attention relationship between the new user and other nodes is calculated through the hierarchical theme preference semantic matrix, the user with high relevance is identified as having the attention relationship, the attention relationship between the new user and the nodes in the network is updated, the hierarchical theme preference semantic matrix of the attention relationship is marked, the preference semantic matrix is explained according to the text content of the user, and the labeled social network attention relationship is the final output result of the method.
Compared with the prior art, the invention has the following advantages:
the method provided by the invention is different from the existing method and is characterized in that a hierarchical theme preference semantic matrix based on a translation model is constructed, the attention relationship of network nodes is marked through the preference semantic matrix, the attention relationship correlation between a new user and other nodes is calculated through text content and the hierarchical theme preference semantic matrix, the user with high correlation is identified as having the attention relationship, and the preference semantic matrix is explained according to the text content of the user. The method and the system predict the user concern relationship from the aspect of hierarchical theme, improve the accuracy of identifying the user concern relationship by the social network, and particularly provide great support in the aspect of identifying the phishing group foreign relationship.
Drawings
FIG. 1 is a schematic diagram of an overall model architecture.
Detailed Description
The method for identifying and marking the attention relationship based on the user hierarchical theme preference semantic matrix is implemented by a computer program. The following will describe the specific implementation of the technical solution proposed by the present invention in terms of flow, and the overall model architecture of the present invention is shown in fig. 1.
An attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix comprises the following steps:
step S1, constructing an attention relation network graph initialized by a theme preference semantic matrix;
s1.1, establishing an attention relation graph G (V, R) according to the attention relation of a user, wherein V is a set of nodes, and R is a set of edges; nodes in the concern relationship graph represent users, edges represent concern relationships r among the users, and if a user h concerns a user t, an edge pointing to the user t from the user h is constructed; wherein h and t both represent users; the attention relationship graph describes explicit social attention relationships among users and is favorable for predicting potential attention relationships among users;
step S1.2, extracting a keyword set S based on the text content released by the user hh(ii) a Text content extraction keyword set S published based on user tt(ii) a According to Sh、StRespectively calculating the similarity coefficient of the jaccard with N coarse-grained subjects on the Chinese Wikipedia layer, selecting m coarse-grained subjects with high similarity between the user h and the user t to explain the attention relation between the users, and initializing preference semantic matrixes of the m coarse-grained subjects
Figure BDA0002518156070000081
On the basis of l layers of coarse-grained subject matter, according to Sh、StRespectively calculating jaccard similarity coefficients of P fine-grained themes at a layer P +1 of the Chinese Wikipedia, selecting q fine-grained themes with high similarity between a user h and a user t to explain the concern relationship between the users, and initializing a preference semantic matrix of the q fine-grained themes
Figure BDA0002518156070000091
The fine-grained subject set of coarse-grained subject c is denoted child (c) ═ { c { (c) }1,c2,...,ck,...,cb}。
Step S2, learning a hierarchical theme preference semantic matrix based on the translation model;
s2.1, establishing a triple (h, r, t) based on a user h, a user t and an attention relation on an attention relation graph, wherein r is the attention relation from the user h to the user t; and then modeling the representation of the user and the relation based on a translation model in the aspect of the coarse-grained subject, giving the coarse-grained subject c, and representing the vector of the user h in the aspect of the coarse-grained subject c as
Figure BDA0002518156070000092
The vector representation of user t in terms of coarse-grained subject c is represented as
Figure BDA0002518156070000093
Wherein the content of the first and second substances,
Figure BDA0002518156070000094
vector representations of user h and user t, respectively; vector representation of user h and user t
Figure BDA0002518156070000095
Figure BDA0002518156070000096
Mapping to the relation space of the coarse-grained subject c aspect to obtain the vector representation of the relation space of the coarse-grained subject c aspect
Figure BDA0002518156070000097
Step S2.2, based on the interpretation of the coarse-grained subject c with respect to the attention relationship, using the fine-grained subject child (c) ═ c of the coarse-grained subject c1,c2,...,ck,...,cbDescribing the attention relationship among users in further detail; let ckA k-th fine-grained theme being a coarse-grained theme c, when the coarse-grained theme c is a basketball, the fine-grained theme ckTopic preference semantic matrix representing CBA on coarse-grained topic c
Figure BDA0002518156070000098
On the basis of learning the fine-grained subject ckSubject preference semantic matrix
Figure BDA0002518156070000099
User h is on fine-grained topic ckRepresentation of vectors in
Figure BDA00025181560700000910
User t is on fine-grained subject ckRepresentation of vectors in
Figure BDA00025181560700000911
Establishing a fine-grained subject c of a user h and a user t in a coarse-grained subject ckThe focus relationship distance function above is:
Figure BDA00025181560700000912
wherein the content of the first and second substances,
Figure BDA00025181560700000913
representing user h and user t as fine-grained subject ckA distance function over the interest relation r of the aspect;
Figure BDA00025181560700000914
representing user h as a fine-grained topic ckA vector representation of the aspect;
Figure BDA00025181560700000915
representing user t in fine-grained subject ckA vector representation of the aspect;
Figure BDA00025181560700000916
a spatial representation representing an attention relationship r of user u and user v;
step S2.3, according to the distance function of the triplets (h, r, t), for the positive sample triplets (h, r, t) with the real attention relation r and the negative sample triplets without the attention relation r
Figure BDA00025181560700000917
Defining a translation relationship-based Hinge Loss objective function as:
Figure BDA0002518156070000101
wherein c and t are the coarse-grained subject of the layer l; c. Ci、cjIs a fine-grained theme of the coarse-grained theme c;
Figure BDA0002518156070000102
Figure BDA0002518156070000103
is a preference semantic matrix of the coarse-grained subjects c and t,
Figure BDA0002518156070000104
fine grain subject c being coarse grain subject ci、cjThe preference semantic matrix of (2);
Figure BDA0002518156070000105
is an identity matrix;
Figure BDA0002518156070000106
as a Hinge Loss function; s is a positive sample set of (h, r, t) user attention relations, and the user h and the user t have the attention relation r;
Figure BDA0002518156070000107
for a set of negative examples of user attention relationships,
Figure BDA0002518156070000108
is a user h based on (h, r, t) replacement, and
Figure BDA0002518156070000109
there is no concern relationship r with t,
Figure BDA00025181560700001010
is user t based on (h, r, t) replacement, and h is associated with
Figure BDA00025181560700001011
The method comprises the following steps of (1) not having an attention relation r, (gamma) representing a boundary parameter, (lambda) representing a regularization super parameter of a coarse-granularity theme, and (η) representing a regularization super parameter of a fine-granularity theme;
Figure BDA00025181560700001012
and
Figure BDA00025181560700001013
orthogonality ensures that the coarse-grained subjects c and t can learn different parameter matrixes;
Figure BDA00025181560700001014
and
Figure BDA00025181560700001015
orthogonality guarantees fine-grained subject matter ci、cjDifferent parameter matrices can be learned.
Step S3, labeling the network node attention relation through the preference semantic matrix;
step S3.1, the preference semantic matrix of the coarse-grained subject c learned from step S2.3
Figure BDA00025181560700001016
Interpreting an attention relationship between users;
step S3.2, coarse-grained subject c learned from step S2.3Fine grained subject matter of ckPreference semantic matrix of
Figure BDA00025181560700001017
The focus relationship between users is interpreted.
Step S4, based on the user text content interpretation layering theme preference semantic matrix; based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2kThe associated subject word text content explains the attention relationship between the tagged users.
Step S5, calculating the relevance of the new user and other user nodes under the hierarchical theme; preference semantic matrix on user coarse-grained subject c
Figure BDA0002518156070000111
Based on the vector representation of the new user u
Figure BDA0002518156070000112
Computing a vector representation with an existing user t
Figure BDA0002518156070000113
By the similarity of
Figure BDA0002518156070000114
Representing the distance between the user u and the user t on the attention relation r in the aspect of the coarse-grained subject c; then the fine-grained subject c of the coarse-grained subject ckPreference semantic matrix
Figure BDA0002518156070000115
On the basis of the method, the similarity between the new user u and the existing user t is calculated and utilized
Figure BDA0002518156070000116
Representing the user u and the user t in a fine-grained subject ckConcern of the aspect relates to the distance over r.
Step S6, selecting toolThe hierarchical theme with the maximum relevance marks the attention relationship among users; the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic ckAnd marking attention relations among users.
Step S7, marking a hierarchical theme preference semantic matrix of the attention relationship; the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6
Figure BDA0002518156070000117
Selecting a fine-grained subject c with the maximum correlation on the basis of the coarse-grained subject c as a coarse-grained subject preference semantic matrix of the new user attention relationshipkPreference semantic matrix of
Figure BDA0002518156070000118
And the semantic matrix is used as a fine-grained theme preference semantic matrix of the new user attention relationship.
Step S8, interpreting a preference semantic matrix according to the text content of the user; selecting related subject word text content explanation for the coarse-grained subject c and marking a new concern relationship among users based on the word segmentation result of the user issued content; at the same time, for fine-grained subject matter ckAnd selecting the associated subject word text content to explain and label the new concern relationship between the users.
Evaluation of technical Effect
In order to verify the effectiveness and the advancement of the technical scheme provided by the invention, several existing translation model methods are selected for comparison: TransE, TransD, TransH, TransR. The prediction detection result of the concerned relationship link on the microblog social network data set by the method is evaluated through average ranking (MeanRank) and Hits @ K, and the result is shown in Table 1:
TABLE 1
Figure BDA0002518156070000121
The results in the table show that the technical scheme of the invention can obtain the detection result with better precision and reliability than the existing method when the link prediction of the social network is carried out.
Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (9)

1. An attention relationship identification and labeling method based on a hierarchical theme preference semantic matrix is characterized by comprising the following steps: the method comprises the following steps:
step S1, constructing an attention relation network graph initialized by a theme preference semantic matrix;
step S2, learning a hierarchical theme preference semantic matrix based on the translation model;
step S3, labeling the network node attention relation through the preference semantic matrix;
step S4, based on the user text content interpretation layering theme preference semantic matrix;
step S5, calculating the relevance of the new user and other user nodes under the hierarchical theme;
step S6, selecting the hierarchical theme with the maximum correlation to label the attention relationship among users;
step S7, marking a hierarchical theme preference semantic matrix of the attention relationship;
in step S8, the preference semantic matrix is interpreted according to the user text content.
2. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S1, the step of constructing the interest relationship network graph initialized by the theme preference semantic matrix further includes the following steps:
s1.1, establishing an attention relation graph G (V, R) according to the attention relation of a user, wherein V is a set of nodes, and R is a set of edges; nodes in the concern relationship graph represent users, edges represent concern relationships r among the users, and if a user h concerns a user t, an edge pointing to the user t from the user h is constructed; wherein h and t both represent users; the attention relationship graph describes explicit social attention relationships among users and is favorable for predicting potential attention relationships among users;
step S1.2, extracting a keyword set S based on the text content released by the user hh(ii) a Text content extraction keyword set S published based on user tt(ii) a According to Sh、StRespectively calculating the similarity coefficient of the jaccard with N coarse-grained subjects on the Chinese Wikipedia layer, selecting m coarse-grained subjects with high similarity between the user h and the user t to explain the attention relation between the users, and initializing preference semantic matrixes of the m coarse-grained subjects
Figure FDA0002518156060000011
On the basis of l layers of coarse-grained subject matter, according to Sh、StRespectively calculating jaccard similarity coefficients of P fine-grained themes at a layer P +1 of the Chinese Wikipedia, selecting q fine-grained themes with high similarity between a user h and a user t to explain the concern relationship between the users, and initializing a preference semantic matrix of the q fine-grained themes
Figure FDA0002518156060000021
The fine-grained subject set of coarse-grained subject c is denoted child (c) ═ { c { (c) }1,c2,...,ck,...,cb}。
3. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S2, learning the hierarchical topic preference semantic matrix based on the translation model further includes the following steps:
s2.1, establishing a triple (h, r, t) based on a user h, a user t and an attention relation on an attention relation graph, wherein r is the attention relation from the user h to the user t; and then modeling the representation of the user and the relation based on a translation model in the aspect of the coarse-grained subject, giving the coarse-grained subject c, and representing the vector of the user h in the aspect of the coarse-grained subject c as
Figure FDA0002518156060000022
The vector representation of user t in terms of coarse-grained subject c is represented as
Figure FDA0002518156060000023
Wherein the content of the first and second substances,
Figure FDA0002518156060000024
vector representations of user h and user t, respectively; vector representation of user h and user t
Figure FDA0002518156060000025
Mapping to the relation space of the coarse-grained subject c aspect to obtain the vector representation of the relation space of the coarse-grained subject c aspect
Figure FDA0002518156060000026
Step S2.2, based on the interpretation of the coarse-grained subject c with respect to the attention relationship, using the fine-grained subject child (c) ═ c of the coarse-grained subject c1,c2,...,ck,...,cbDescribing the attention relationship among users in further detail; let ckA k-th fine-grained theme being a coarse-grained theme c, when the coarse-grained theme c is a basketball, the fine-grained theme ckTopic preference semantic matrix representing CBA on coarse-grained topic c
Figure FDA0002518156060000027
On the basis of learning the fine-grained subject ckSubject preference semantic matrix
Figure FDA0002518156060000028
User h is on fine-grained topic ckRepresentation of vectors in
Figure FDA0002518156060000029
User t is on fine-grained subject ckRepresentation of vectors in
Figure FDA00025181560600000210
Establishing a fine-grained subject c of a user h and a user t in a coarse-grained subject ckThe focus relationship distance function above is:
Figure FDA00025181560600000211
wherein the content of the first and second substances,
Figure FDA00025181560600000212
representing user h and user t as fine-grained subject ckA distance function over the interest relation r of the aspect;
Figure FDA00025181560600000213
representing user h as a fine-grained topic ckA vector representation of the aspect;
Figure FDA00025181560600000214
representing user t in fine-grained subject ckA vector representation of the aspect;
Figure FDA0002518156060000031
a spatial representation representing an attention relationship r of user u and user v;
step S2.3, according to the distance function of the triplets (h, r, t), for the positive sample triplets (h, r, t) with the real attention relation r and the negative sample triplets without the attention relation r
Figure FDA0002518156060000032
Defining translation-based relationshipsThe Hinge Loss objective function is:
Figure FDA0002518156060000033
wherein c and t are the coarse-grained subject of the layer l; c. Ci、cjIs a fine-grained theme of the coarse-grained theme c;
Figure FDA0002518156060000034
Figure FDA0002518156060000035
is a preference semantic matrix of the coarse-grained subjects c and t,
Figure FDA0002518156060000036
fine grain subject c being coarse grain subject ci、cjThe preference semantic matrix of (2);
Figure FDA0002518156060000037
is an identity matrix;
Figure FDA0002518156060000038
as a Hinge Loss function; s is a positive sample set of (h, r, t) user attention relations, and the user h and the user t have the attention relation r;
Figure FDA0002518156060000039
for a set of negative examples of user attention relationships,
Figure FDA00025181560600000310
is a user h based on (h, r, t) replacement, and
Figure FDA00025181560600000311
there is no concern relationship r with t,
Figure FDA00025181560600000312
is based on (h, r, t)Changed users t, and h and
Figure FDA00025181560600000313
the method comprises the following steps of (1) not having an attention relation r, (gamma) representing a boundary parameter, (lambda) representing a regularization super parameter of a coarse-granularity theme, and (η) representing a regularization super parameter of a fine-granularity theme;
Figure FDA00025181560600000314
and
Figure FDA00025181560600000315
orthogonality ensures that the coarse-grained subjects c and t can learn different parameter matrixes;
Figure FDA00025181560600000316
and
Figure FDA00025181560600000317
orthogonality guarantees fine-grained subject matter ci、cjDifferent parameter matrices can be learned.
4. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S3, labeling the network node attention relationship by the preference semantic matrix further includes the following steps:
step S3.1, the preference semantic matrix of the coarse-grained subject c learned from step S2.3
Figure FDA00025181560600000318
Interpreting an attention relationship between users;
step S3.2, Fine grained subject c of coarse grained subject c learned from step S2.3kPreference semantic matrix of
Figure FDA0002518156060000041
The focus relationship between users is interpreted.
5. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S4, the method for interpreting the hierarchical topic preference semantic matrix based on the user text content includes the following steps: based on the word segmentation result of the text content released by the user, explaining and labeling the attention relationship among the users according to the subject word text content associated with the coarse-grained subject c selected in the step S3.1; at the same time, according to the fine-grained theme c selected in step S3.2kThe associated subject word text content explains the attention relationship between the tagged users.
6. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S5, the method for calculating the relevance of the new user and other user nodes under the hierarchical topic is as follows: preference semantic matrix on user coarse-grained subject c
Figure FDA0002518156060000042
Based on the vector representation of the new user u
Figure FDA0002518156060000043
Computing a vector representation with an existing user t
Figure FDA0002518156060000044
By the similarity of
Figure FDA0002518156060000045
Representing the distance between the user u and the user t on the attention relation r in the aspect of the coarse-grained subject c; then the fine-grained subject c of the coarse-grained subject ckPreference semantic matrix
Figure FDA0002518156060000046
On the basis of the method, the similarity between the new user u and the existing user t is calculated and utilized
Figure FDA0002518156060000047
Representing the user u and the user t in a fine-grained subject ckConcern of the aspect relates to the distance over r.
7. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in the step S6, the method for selecting the hierarchical topic with the maximum relevance to label the attention relationship among the users is as follows: the relevance of the concern relationship between the new user u and the user t can be calculated according to different topics, the concern relationship between the users is marked by selecting the coarse-grained topic c with the maximum relevance and the minimum distance from the coarse-grained topic set with the relevance, and meanwhile, the fine-grained topic c with the highest relevance is selected on the basis of the coarse-grained topic ckAnd marking attention relations among users.
8. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S7, the method for labeling the hierarchical topic preference semantic matrix of the attention relationship is as follows: the preference semantic matrix of the coarse-grained subject c with the maximum relevance in step S6
Figure FDA0002518156060000051
Selecting a fine-grained subject c with the maximum correlation on the basis of the coarse-grained subject c as a coarse-grained subject preference semantic matrix of the new user attention relationshipkPreference semantic matrix of
Figure FDA0002518156060000052
And the semantic matrix is used as a fine-grained theme preference semantic matrix of the new user attention relationship.
9. The method for recognizing and labeling attention relationships based on the hierarchical topic preference semantic matrix as claimed in claim 1, wherein: in step S8, the method for interpreting the preference semantic matrix according to the text content of the user is as follows: selecting the coarse-grained subject c based on the word segmentation result of the user issued contentExplaining and labeling new concern relations among users by taking the associated subject word text contents; at the same time, for fine-grained subject matter ckAnd selecting the associated subject word text content to explain and label the new concern relationship between the users.
CN202010483759.5A 2020-06-01 2020-06-01 Attention relation identification and labeling method based on hierarchical theme preference semantic matrix Active CN111723301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010483759.5A CN111723301B (en) 2020-06-01 2020-06-01 Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010483759.5A CN111723301B (en) 2020-06-01 2020-06-01 Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Publications (2)

Publication Number Publication Date
CN111723301A true CN111723301A (en) 2020-09-29
CN111723301B CN111723301B (en) 2022-05-27

Family

ID=72565598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010483759.5A Active CN111723301B (en) 2020-06-01 2020-06-01 Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Country Status (1)

Country Link
CN (1) CN111723301B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807600A (en) * 2021-09-26 2021-12-17 河南工业职业技术学院 Link prediction method in dynamic social network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077417A (en) * 2014-07-18 2014-10-01 中国科学院计算技术研究所 Figure tag recommendation method and system in social network
US20150293989A1 (en) * 2014-04-11 2015-10-15 Palo Alto Research Center Incorporated Computer-Implemented System And Method For Generating An Interest Profile For A User From Existing Online Profiles
CN108460153A (en) * 2018-03-27 2018-08-28 广西师范大学 A kind of social media friend recommendation method of mixing blog article and customer relationship
CN109033069A (en) * 2018-06-16 2018-12-18 天津大学 A kind of microblogging Topics Crawling method based on Social Media user's dynamic behaviour
CN109189936A (en) * 2018-08-13 2019-01-11 天津科技大学 A kind of label semanteme learning method measured based on network structure and semantic dependency
CN109325171A (en) * 2018-08-08 2019-02-12 微梦创科网络科技(中国)有限公司 User interest analysis method and system based on domain knowledge
US20190073410A1 (en) * 2017-09-05 2019-03-07 Estia, Inc. Text-based network data analysis and graph clustering
CN109766431A (en) * 2018-12-24 2019-05-17 同济大学 A kind of social networks short text recommended method based on meaning of a word topic model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150293989A1 (en) * 2014-04-11 2015-10-15 Palo Alto Research Center Incorporated Computer-Implemented System And Method For Generating An Interest Profile For A User From Existing Online Profiles
CN104077417A (en) * 2014-07-18 2014-10-01 中国科学院计算技术研究所 Figure tag recommendation method and system in social network
US20190073410A1 (en) * 2017-09-05 2019-03-07 Estia, Inc. Text-based network data analysis and graph clustering
CN108460153A (en) * 2018-03-27 2018-08-28 广西师范大学 A kind of social media friend recommendation method of mixing blog article and customer relationship
CN109033069A (en) * 2018-06-16 2018-12-18 天津大学 A kind of microblogging Topics Crawling method based on Social Media user's dynamic behaviour
CN109325171A (en) * 2018-08-08 2019-02-12 微梦创科网络科技(中国)有限公司 User interest analysis method and system based on domain knowledge
CN109189936A (en) * 2018-08-13 2019-01-11 天津科技大学 A kind of label semanteme learning method measured based on network structure and semantic dependency
CN109766431A (en) * 2018-12-24 2019-05-17 同济大学 A kind of social networks short text recommended method based on meaning of a word topic model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FATTANE ZARRINKALAM等: "Mining user interests over active topics on social networks", 《INFORMATION PROCESSING & MANAGEMENT》 *
JIANXING ZHENG等: "Personalized recommendation based on hierarchical interest overlapping community", 《INFORMATION SCIENCES》 *
朱倩: "面向自由文本的细粒度关系抽取的关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
郑建兴: "社会化用户模型研究及其在推荐***中的应用", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807600A (en) * 2021-09-26 2021-12-17 河南工业职业技术学院 Link prediction method in dynamic social network
CN113807600B (en) * 2021-09-26 2023-07-25 河南工业职业技术学院 Link prediction method in dynamic social network

Also Published As

Publication number Publication date
CN111723301B (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN111506714B (en) Question answering based on knowledge graph embedding
Torfi et al. Natural language processing advancements by deep learning: A survey
Young et al. Recent trends in deep learning based natural language processing
Hammou et al. Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics
Wang et al. Recursive neural conditional random fields for aspect-based sentiment analysis
Tang et al. Aspect level sentiment classification with deep memory network
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
Li et al. Image sentiment prediction based on textual descriptions with adjective noun pairs
CN108038492A (en) A kind of perceptual term vector and sensibility classification method based on deep learning
CN107391565B (en) Matching method of cross-language hierarchical classification system based on topic model
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
He et al. Named entity recognition for Chinese marine text with knowledge-based self-attention
CN116521882A (en) Domain length text classification method and system based on knowledge graph
Furht et al. Deep learning techniques in big data analytics
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
Zhang et al. Multidimensional mining of massive text data
CN111723301B (en) Attention relation identification and labeling method based on hierarchical theme preference semantic matrix
CN111339258A (en) University computer basic exercise recommendation method based on knowledge graph
Almasian et al. Word embeddings for entity-annotated texts
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
Xie et al. Dynamic knowledge graph completion with jointly structural and textual dependency
Liu et al. Aspect sentiment mining of short bullet screen comments from online TV series
Chen et al. Multi-modal multi-layered topic classification model for social event analysis
CN111737591A (en) Product recommendation method based on heterogeneous heavy-side information network translation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant