CN111026943A - User interest analysis method and system by using cooperative learning of multi-source social network - Google Patents

User interest analysis method and system by using cooperative learning of multi-source social network Download PDF

Info

Publication number
CN111026943A
CN111026943A CN201911101644.9A CN201911101644A CN111026943A CN 111026943 A CN111026943 A CN 111026943A CN 201911101644 A CN201911101644 A CN 201911101644A CN 111026943 A CN111026943 A CN 111026943A
Authority
CN
China
Prior art keywords
source
information
user
weight matrix
source information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911101644.9A
Other languages
Chinese (zh)
Inventor
林俊宇
关惟俐
宋雪萌
甘甜
常晓军
聂礼强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201911101644.9A priority Critical patent/CN111026943A/en
Publication of CN111026943A publication Critical patent/CN111026943A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for analyzing user interest by utilizing cooperative learning of a multi-source social network, wherein the method comprises the following steps: constructing a multi-source user information data set; for S information sources, S prediction models are defined for all social media users, and the S prediction models form a user interest prediction model; dividing the weight matrix corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information; constructing a loss function for the weights and the confidence; endowing the loss function to the user interest prediction model, and performing iterative optimization; and inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model, and outputting the prediction of the interest and hobby classification of the social user. The method makes full use of the multi-source information complementarity, and effectively improves the accuracy of the social user interest prediction.

Description

User interest analysis method and system by using cooperative learning of multi-source social network
Technical Field
The invention relates to the technical field of user interest analysis, in particular to a method and a system for analyzing user interest by utilizing cooperative learning of a multi-source social network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the explosion of the internet industry, many social media sites have come to the fore, and people share their lives in such communities. In recent years, there has been much research work around multi-social media user behavior analysis. The prior art mainly combines a plurality of social media network sources to analyze the consistency and confidence of multi-source information to draw user interest inference. If the analysis is carried out only by utilizing the consistency and the confidence coefficient of the multi-source information, the complementarity among the multi-social media sources is ignored, the intermediate association attribute is lost, the characteristic information of the social media user cannot be completely expressed, and the accurate user interest analysis cannot be realized.
The traditional multi-source social network model only considers the consistency and confidence of multi-source information when analyzing social users, and ignores the important basis that the multi-social media source has complementarity. How to integrate the attribute of complementarity, which has excellent characterization properties, into a social analysis model is a challenging problem; for a social media user, he may use different user name information on multiple platforms, and how to align the misalignment information is difficult.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for analyzing user interest by utilizing cooperative learning of a multi-source social network, which are used for analyzing the user behavior of the social media by utilizing the combination of multi-source information consistency, multi-source information complementarity and multi-source information confidence coefficient, so that the frame can deduce the interest and hobbies of the user, and further help the social media website to accurately analyze and predict the user's own behaviors.
In some embodiments, the following technical scheme is adopted:
a method for analyzing user interest by utilizing cooperative learning of a multi-source social network comprises the following steps:
constructing a multi-source user information data set;
for S information sources, S prediction models are defined for all social media users, and the S prediction models form a user interest prediction model;
dividing the weight matrix corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information;
constructing a loss function related to the weight and the confidence coefficient according to the weight matrix representing the consistency of the multi-source information, the weight matrix representing the complementarity of the multi-source information and the weight of the confidence coefficient of the multi-source information;
endowing the user interest prediction model with the loss function, and performing iterative optimization until the whole loss function is completely converged;
and inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model, and outputting the prediction of the interest and hobby classification of the social user.
In other embodiments, the following technical solutions are adopted:
a system for user interest analysis with collaborative learning for multi-source social networks, comprising:
the device comprises a device for collecting user information from a plurality of social media platforms, performing text preprocessing and text effective feature extraction on the user information, and constructing a multi-source user information data set by using the extracted features;
means for dividing the weight matrices corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information;
means for constructing a loss function with respect to weight and confidence based on a weight matrix representing consistency of the multi-source information, a weight matrix representing complementarity of the multi-source information, and a weight of confidence of the multi-source information;
a device for assigning the loss function to the user interest prediction model and performing iterative optimization until the whole loss function is completely converged;
and the device is used for inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model and outputting the prediction of the interest and hobby classification of the social user.
In other embodiments, the following technical solutions are adopted:
a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the user interest analysis method by utilizing the cooperative learning of the multi-source social network.
In other embodiments, the following technical solutions are adopted:
a computer-readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute the above-described method for user interest analysis using collaborative learning of a multi-source social network.
Compared with the prior art, the invention has the beneficial effects that:
compared with Earlyfusion, latex fusion, MSNL, MvDA-VC, hetereogenous trees and breadth learning, the method can achieve better index effect on the precision rate and the recall rate, improve the precision rate from 0.181 to 0.205 and improve the recall rate from 0.326 to 0.365. According to the method, the multi-source information complementarity is fully utilized, and the accuracy of the interest prediction of the social user is effectively improved; meanwhile, the user characteristic information generated by the method can be used for accurately predicting other tasks such as personal privacy disclosure detection, article recommendation and the like, so that social media analysis is improved.
Drawings
Fig. 1 is a flowchart of a method for analyzing user interest by using collaborative learning of a multi-source social network according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Interpretation of professional terms:
multi-source information consistency: the information of the users in the plurality of social network information sources is consistent, and even in different social media platforms, the shared content of the users is closely related to the users, and the consistency is always kept.
Multi-source information complementarity: information of a single user among different social networks is not necessarily overlapped, social information of different sources can be mutually supplemented and perfected so as to better analyze user behaviors, and the characteristic is multi-source information complementarity.
Multi-source information confidence: when a multi-source social information source of a user is analyzed, the personal information attribute proportions of the user represented by different sources are different, and the proportion index is the confidence coefficient of the multi-source information.
Loss function: the method is used for defining the error indexes of the current model and the optimal model in machine learning, and weight updating and optimization can be carried out by calculating the differential of the loss function corresponding to each weight.
LDA model: the document theme generation model can extract themes of the document according to words, themes and the document, and finally obtains low-dimensional feature vector representation for performing subsequent user behavior analysis.
Precision (precision): the index calculates the proportion of all "correctly retrieved items" to all "actually retrieved items".
Recall (recall): the index calculates the proportion of all "correctly retrieved items" to all "items that should be retrieved".
Example one
In one or more embodiments, a method for performing user behavior analysis on social media users using multi-source information consistency, complementarity and confidence is disclosed, referring to fig. 1, comprising the following steps:
s1: and collecting user information from a plurality of social media platforms, performing text preprocessing on the user information, extracting effective text features, and constructing a multi-source user information data set.
The step S1 process of data set construction further includes:
s11: the method comprises the steps of utilizing a web crawler to crawl social user information of a plurality of social platforms (Quora, Twitter and Facebook), and collecting and recording interest and hobby information of the social users according to social media contents published by the social users.
S12: and extracting low-dimensional vector feature representation of each user by using an LDA (latent dirichlet allocation) model, respectively obtaining 89,24 and 119-dimensional topic-level user features for three social media platforms, and storing and recording the features to form a multi-source social media user data set.
S2, for S information sources, S prediction models are defined for all social media users, and weight matrixes corresponding to the S prediction models are WiThe weight matrix WiSplit into A representing consistency of multi-source informationiMatrix and C representing complementarity of multi-source informationiAnd (4) matrix.
The process of the step S2 weight matrix segmentation and loss function definition further includes:
s21: dividing the prediction model giving the final result from S information sources into S, wherein the weight matrix corresponding to each prediction model is WiThe weight matrix WiSplit into A representing consistency of multi-source informationiMatrix and C representing complementarity of multi-source informationiAnd (4) matrix.
S22: for a givenSocial media user information input XiThen, according to the principle of user interest and preference inference, according to the square error loss function, the loss function about weight and confidence can be obtained:
Figure BDA0002270044620000061
s23: to AiThe matrix weight parameters are used for calculating the penalty quantity, and the penalty quantity is as follows:
Figure BDA0002270044620000062
to CiThe matrix also performs penalty calculation:
Figure BDA0002270044620000063
the resulting optimization function is then as follows:
Figure BDA0002270044620000064
wherein λ is1,λ2,λ3,λ4To be hyper-parametric, AiFor a multi-source information consistency weight matrix, CiFor the multi-source information complementarity weight matrix, β i is the multi-source information confidence vector.
S3: loss calculation is carried out on the multi-source information consistency weight matrix, the multi-source information complementation weight matrix and the multi-source information confidence coefficient weight, and iterative optimization is carried out until the whole loss function is completely converged; the model is trained and can be put into testing and use.
The iterative optimization process of step S3 further includes:
s31: fixed weight matrix AiAnd CiAnd performing iterative optimization on β, and defining an optimization function O by using Lagrange's theorem as follows:
Figure BDA0002270044620000071
and weight updates are made to it, where δ represents a non-negative lagrange multiplier.
S32 fixed β and weight matrix CiTo A, aiAnd (6) optimizing. By calculating the differential to Ai using the optimization function O, the weight matrix A can be matchediAnd updating the weight value.
S33 fixed β and weight matrix AiTo CiOptimization β and A are fixed according to the following functioni
Figure BDA0002270044620000072
And to weight matrix CiAnd updating the weight.
S34: and carrying out iterative optimization for a plurality of times until the whole loss function is completely converged, and storing the weight parameters of the model network.
S35: and if the convergence of the loss function is finished, finishing the training of the model. In the testing stage, the pre-trained model network weight parameters are loaded, and user behavior analysis such as personal interest inference and the like can be performed by utilizing multi-information source input of a given social user.
Example two
In one or more embodiments, a system for user interest analysis with collaborative learning for multi-source social networks is disclosed, comprising:
the device comprises a device for collecting user information from a plurality of social media platforms, performing text preprocessing and text effective feature extraction on the user information, and constructing a multi-source user information data set by using the extracted features;
means for dividing the weight matrices corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information;
means for constructing a loss function with respect to weight and confidence based on a weight matrix representing consistency of the multi-source information, a weight matrix representing complementarity of the multi-source information, and a weight of confidence of the multi-source information;
a device for assigning the loss function to the user interest prediction model and performing iterative optimization until the whole loss function is completely converged;
and the device is used for inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model and outputting the prediction of the interest and hobby classification of the social user.
EXAMPLE III
In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the method for analyzing user interest using collaborative learning of a multi-source social network according to the first embodiment. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The steps of a method in connection with one embodiment may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (7)

1. A method for analyzing user interest by cooperative learning of a multi-source social network is characterized by comprising the following steps:
constructing a multi-source user information data set;
for S information sources, S prediction models are defined for all social media users, and the S prediction models form a user interest prediction model;
dividing the weight matrix corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information;
constructing a loss function related to the weight and the confidence coefficient according to the weight matrix representing the consistency of the multi-source information, the weight matrix representing the complementarity of the multi-source information and the weight of the confidence coefficient of the multi-source information;
endowing the user interest prediction model with the loss function, and performing iterative optimization until the whole loss function is completely converged;
and inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model, and outputting the prediction of the interest and hobby classification of the social user.
2. The method for analyzing user interest using collaborative learning of a multi-source social network according to claim 1, wherein the constructing of the multi-source user information dataset specifically comprises:
collecting user information from a plurality of social media platforms;
and performing text preprocessing and text effective feature extraction on the user information, and constructing a multi-source user information data set by using the extracted features.
3. The method for analyzing user interest through cooperative learning of the multi-source social network as claimed in claim 1, wherein a loss function about weight and confidence is constructed according to a weight matrix representing consistency of multi-source information, a weight matrix representing complementarity of the multi-source information, and a weight of confidence of the multi-source information, specifically:
Figure FDA0002270044610000021
weight matrix A representing multi-source information consistencyiAnd a weight matrix C representing the complementarity of the multi-source informationiThe penalty calculation is carried out on the parameters to obtain an optimization function as follows:
Figure FDA0002270044610000022
wherein λ is1,λ2,λ3,λ4Is hyperparametric, XiFor information input by social media users, WiA weight matrix corresponding to each prediction model, wherein the weight matrix WiPartitioning into weight matrices A representing consistency of multi-source informationiWeight matrix C for complementing information representing multiple sourcesiβ is a set of multi-source information confidence vectors, βiAnd Y is a real information vector set formed by the interest and hobbies of each social user for each corresponding source confidence vector.
4. The method for analyzing user interest using collaborative learning of a multi-source social network according to claim 1, wherein the iterative optimization is performed on the loss function, specifically:
weight matrix A for fixedly representing multi-source information consistencyiWeight matrix C representing multi-source information complementarityiIteratively optimizing the multi-source information confidence vector β;
fixed multi-source information confidence vector β and weight matrix C representing multi-source information complementarityiFor the weight matrix A representing the consistency of multi-source informationiOptimizing;
fixed multi-source information confidence vector β and weight matrix A representing multi-source information consistencyiFor the weight matrix C representing the complementarity of multi-source informationiAnd (6) optimizing.
5. A system for user interest analysis using collaborative learning for multi-source social networks, comprising:
the device comprises a device for collecting user information from a plurality of social media platforms, performing text preprocessing and text effective feature extraction on the user information, and constructing a multi-source user information data set by using the extracted features;
means for dividing the weight matrices corresponding to the S prediction models into a weight matrix representing the consistency of the multi-source information and a weight matrix representing the complementarity of the multi-source information;
means for constructing a loss function with respect to weight and confidence based on a weight matrix representing consistency of the multi-source information, a weight matrix representing complementarity of the multi-source information, and a weight of confidence of the multi-source information;
a device for assigning the loss function to the user interest prediction model and performing iterative optimization until the whole loss function is completely converged;
and the device is used for inputting the social user multiple information sources to be analyzed into the optimized user interest prediction model and outputting the prediction of the interest and hobby classification of the social user.
6. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer-readable storage medium storing instructions adapted to be loaded by a processor and to perform the method for user interest analysis using collaborative learning with multi-source social networks of any of claims 1-4.
7. A computer-readable storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute the method for user interest analysis using collaborative learning for multi-source social networks according to any one of claims 1-4.
CN201911101644.9A 2019-11-12 2019-11-12 User interest analysis method and system by using cooperative learning of multi-source social network Pending CN111026943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911101644.9A CN111026943A (en) 2019-11-12 2019-11-12 User interest analysis method and system by using cooperative learning of multi-source social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911101644.9A CN111026943A (en) 2019-11-12 2019-11-12 User interest analysis method and system by using cooperative learning of multi-source social network

Publications (1)

Publication Number Publication Date
CN111026943A true CN111026943A (en) 2020-04-17

Family

ID=70205594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911101644.9A Pending CN111026943A (en) 2019-11-12 2019-11-12 User interest analysis method and system by using cooperative learning of multi-source social network

Country Status (1)

Country Link
CN (1) CN111026943A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122852A (en) * 2017-04-24 2017-09-01 无锡中科富农物联科技有限公司 A kind of microblog users interest Forecasting Methodology based on PMF
CN108874959A (en) * 2018-06-06 2018-11-23 电子科技大学 A kind of user's dynamic interest model method for building up based on big data technology
CN109033255A (en) * 2018-07-06 2018-12-18 合肥明高软件技术有限公司 A kind of on-line study point of interest analysis method and system
CN109948066A (en) * 2019-04-16 2019-06-28 杭州电子科技大学 A kind of point of interest recommended method based on Heterogeneous Information network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122852A (en) * 2017-04-24 2017-09-01 无锡中科富农物联科技有限公司 A kind of microblog users interest Forecasting Methodology based on PMF
CN108874959A (en) * 2018-06-06 2018-11-23 电子科技大学 A kind of user's dynamic interest model method for building up based on big data technology
CN109033255A (en) * 2018-07-06 2018-12-18 合肥明高软件技术有限公司 A kind of on-line study point of interest analysis method and system
CN109948066A (en) * 2019-04-16 2019-06-28 杭州电子科技大学 A kind of point of interest recommended method based on Heterogeneous Information network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈唯: "《基于用户情境的实时兴趣模型研究及应用》" *

Similar Documents

Publication Publication Date Title
US10958748B2 (en) Resource push method and apparatus
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
EP2866421B1 (en) Method and apparatus for identifying a same user in multiple social networks
CN111523051A (en) Social interest recommendation method and system based on graph volume matrix decomposition
CN108536784B (en) Comment information sentiment analysis method and device, computer storage medium and server
CN108287857B (en) Expression picture recommendation method and device
CN109241412A (en) A kind of recommended method, system and electronic equipment based on network representation study
CN110209922A (en) Object recommendation method, apparatus, storage medium and computer equipment
CN113901327A (en) Target recommendation model training method, recommendation device and electronic equipment
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN114780831A (en) Sequence recommendation method and system based on Transformer
CN112561031A (en) Model searching method and device based on artificial intelligence and electronic equipment
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
CN106803092B (en) Method and device for determining standard problem data
CN110781405B (en) Document context perception recommendation method and system based on joint convolution matrix decomposition
CN117216281A (en) Knowledge graph-based user interest diffusion recommendation method and system
CN110085292A (en) Drug recommended method, device and computer readable storage medium
CN110807693A (en) Album recommendation method, device, equipment and storage medium
CN111898766A (en) Ether house fuel limitation prediction method and device based on automatic machine learning
CN114547312B (en) Emotional analysis method, device and equipment based on common sense knowledge graph
CN116383521A (en) Subject word mining method and device, computer equipment and storage medium
CN111026943A (en) User interest analysis method and system by using cooperative learning of multi-source social network
US20160042277A1 (en) Social action and social tie prediction
CN115905518A (en) Emotion classification method, device and equipment based on knowledge graph and storage medium
CN115033700A (en) Cross-domain emotion analysis method, device and equipment based on mutual learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination