Data digging method and device based on social platform
Technical field
The present invention relates to computer realm, in particular to a kind of data digging method and dress based on social platform
It puts.
Background technology
At present, it is popularized in the development of computer technology and the gradual of internet, more and more people are obtained by internet
Take various information.And the correspondingly, development and internet of information quantity on internet also with computer technology
It is universal to become more to be enriched.
In recent years, the fast development of mobile Internet, gradually custom passes through the information client on mobile terminal to people
Come the content that obtains information.This mode causes user becoming more fragmentation by the time that network obtains information.
Under this background, how accurately to provide to the user valuable, and the interested information of user becomes more important.Especially
It, is providing valuable and interested information for new user, is becoming urgent problem to be solved.
In existing technology, the cold start-up problem of commending system is one in the application of information client this kind of product
Significant challenge.Wherein, the cold start-up problem of commending system refers to lack enough data for new custom system to capture user
Interest and effective recommendation.This problem has a kind of method being widely used in numerous solutions, is exactly to rouse
User is encouraged with social networks (Social Network Service:SNS) account logs in commending system, such as:Microblogging, Tencent
The social activity such as QQ, Renren Network account logs in.Commending system can utilize the user social contact network platform information (such as:Concern relation,
Friend relation, interest tags, publication content etc.) initialising subscriber interest model, so as to effectively be recommended.
On the one hand, utilizing the public data of social network-i i-platform merely, (public data is for example for commending contents:Video,
Article, picture, music, game, software, good friend etc.) there are many difficulties in practical applications.Such as:In social network-i i-platform
Publication content often length is shorter, and content is mixed and disorderly, the label substance of user often do things creatively (such as:It does not get up late meeting
Dead star people, intensive neurosis patients with terminal etc.), it is more difficult to understood by machine learning algorithm, it is limited to improving recommendation service help.And
For inactive on social networks, for the user of social networks weakness, the public data in social network-i i-platform is changing
It is acted on into recommendation effect just more limited.On the other hand, for comparative maturity, the larger content recommendation service quotient of user volume comes
It says, during long-term operation, has often had accumulated a large amount of user behavior information, such as:The video of user's program request, sees
The article crossed or commented on.If this partial data can be used effectively with social networks public data fusion, it is possible to pole
The recommendation effect of big improvement user.However, existing technology, focus is all focused on carried using social network-i i-platform substantially
The public data of confession is excavated user interest model and is recommended, and the difficulty that such method is realized is larger, and accuracy rate is relatively low.
For because new registration user does not have historical viewings record, caused can not provide targetedly provides in the prior art
The problem of news, currently no effective solution has been proposed.
Invention content
It is existing to solve it is a primary object of the present invention to provide a kind of data digging method and device based on social platform
Have in technology because new registration user does not have historical viewings record, caused the problem of can not providing targetedly information.
To achieve these goals, one side according to embodiments of the present invention provides a kind of based on social platform
Data digging method.This method includes:The interest tags dictionary of registered users in the client that obtains information;Obtain social platform
In there is the first object of concern relation with registered users in information client, and read registered users and the first object it
Between relation information;According to registered users there is the first object of concern relation, determine corresponding with registered users first
Concern set;According to the interest tags dictionary of registered users and the first concern set, interest model is built, wherein, interest mould
Type has the correspondence of the identical first registered users for paying close attention to set and interest tags for characterizing;Obtain information client
Upper new registration user, with its second object with concern relation, and reads new registration user and the second object in social platform
Between relation information;According to new registration user there is the second object of concern relation, determine to close with the second of new registration user
Note set;Second concern set with interest model is matched, the recommendation interest of new registration user is determined according to interest model
Label.
To achieve these goals, another aspect according to embodiments of the present invention provides a kind of based on social platform
Data mining device, the device include:First acquisition module, the interest tags of registered users in the client that obtains information
Dictionary;Second acquisition module has the of concern relation for obtaining with registered users in information client in social platform
An object, and read the relation information between registered users and the first object;First determining module, for according to registered use
Family has the first object of concern relation, determines the first concern set corresponding with registered users;First processing module is used for
According to the interest tags dictionary of registered users and the first concern set, interest model is built, wherein, interest model is used to characterize
The correspondence of registered users and interest tags with the identical first concern set;Third acquisition module provides for obtaining
Interrogate client on new registration user in social platform with its second object with concern relation, and read new registration user with
Relation information between second object;Second determining module, for having the second object of concern relation according to new registration user,
Determine that the second concern with new registration user is gathered;Second processing module, for the second concern set to be carried out with interest model
Matching determines the recommendation interest tags of new registration user according to interest model.
According to inventive embodiments, pass through the interest tags dictionary of registered users in the client that obtains information;It obtains social
In platform with registered users in information client there is the first object of concern relation, and read registered users and first pair
Relation information as between;According to registered users there is the first object of concern relation, determine corresponding with registered users
First concern set;According to the interest tags dictionary of registered users and the first concern set, interest model is built, wherein, it is emerging
Interesting model has the correspondence of the identical first registered users for paying close attention to set and interest tags for characterizing;Obtain information visitor
New registration user and reads new registration user and second with its second object with concern relation in social platform on the end of family
Relation information between object;According to new registration user there is the second object of concern relation, determine the with new registration user
Two concern set;Second concern set with interest model is matched, the recommendation of new registration user is determined according to interest model
Interest tags are solved in the prior art because new registration user does not have historical viewings record, caused to provide targetedly
Information the problem of.It realizes and targeted information is provided to the user in the concern relation of social platform by new registration user
Effect.
Description of the drawings
The attached drawing for forming the part of the application is used to provide further understanding of the present invention, schematic reality of the invention
Example and its explanation are applied for explaining the present invention, is not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of according to embodiments of the present invention one data mining based on social platform;
Fig. 2 is the flow chart of according to embodiments of the present invention one data mining for being preferably based on social platform;
Fig. 3 be by microblogging pay close attention to set to registered users carry out matching generation registered users set flow show
It is intended to;
Fig. 4 is the structure diagram of according to embodiments of the present invention two data mining device based on social platform;
Fig. 5 is the structural representation of according to embodiments of the present invention two data mining device for being preferably based on social platform
Figure;And
Fig. 6 is the structural representation of according to embodiments of the present invention two data mining device for being preferably based on social platform
Figure.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase
Mutually combination.The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order to which those skilled in the art is made to more fully understand the present invention program, below in conjunction in the embodiment of the present invention
The technical solution in the embodiment of the present invention is clearly and completely described in attached drawing, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
Member's all other embodiments obtained without making creative work should all belong to the model that the present invention protects
It encloses.
It should be noted that term " first " in description and claims of this specification and above-mentioned attached drawing, "
Two " etc. be the object for distinguishing similar, and specific sequence or precedence are described without being used for.It should be appreciated that it uses in this way
Data can be interchanged in the appropriate case, so as to the embodiment of the present invention described herein.In addition, term " comprising " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps or unit clearly listed, but may include without clear
It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
Embodiment 1
An embodiment of the present invention provides a kind of data digging methods based on social platform.
Fig. 1 is the flow chart of according to embodiments of the present invention one data digging method based on social platform.Such as Fig. 1 institutes
Show, it is as follows that the method comprising the steps of:
Step S11, the interest tags dictionary of registered users in the client that obtains information.
The application above-mentioned steps S11, by the collection of the historical viewings behavior to registered users, analysis obtain with each
The corresponding interest tags dictionary of registered users.
Step S13 obtains first object with registered users in information client in social platform with concern relation,
And read the relation information between registered users and the first object.
The application above-mentioned steps S13, by reading concern relation information of the registered users in social platform, determine with
Registered users have the object of concern relation.
In practical application, concern relation can be in the friend relation in Tencent QQ software or microblogging
Friends in concern relation or Renren Network.
Step S15 has the first object of concern relation according to registered users, determines corresponding with registered users the
One concern set.
The application above-mentioned steps S15, it is whole by there is the first object of concern relation to carry out to each registered users respectively
Reason, so that it is determined that the first concern set of each registered users.
Step S17 gathers according to the interest tags dictionary of registered users and the first concern, builds interest model, wherein,
Interest model has the correspondence of the identical first registered users for paying close attention to set and interest tags for characterizing.
The application above-mentioned steps S17 is analyzed by the concern set to each registered users, will have different the
The registered users of one concern set are classified, and are divided into and are gathered corresponding registered users collection with several the first concerns
It closes, and passes through the interest tags dictionary of the registered users in registered users set, generation is corresponding with the first concern set
User's collection-label dictionary.So that it is determined that the first concern set and the correspondence of interest tags.
Step S19, in the client that obtains information new registration user in social platform with its second with concern relation
Object, and read the relation information between new registration user and the second object.
The application above-mentioned steps S19, by reading concern relation information of the new registration user in social platform, determine with
New registration user has the second object of concern relation.
In practical application, concern relation can be in the friend relation in Tencent QQ software or microblogging
Friends in concern relation or Renren Network.
Step S21 has the second object of concern relation according to new registration user, determines the second concern of new registration user
Set.
The application above-mentioned steps S21 is arranged by the second object for having concern relation to new registration user, so as to really
Determine the second concern set of new registration user.
Second concern set with interest model is matched, determines new registration user's according to interest model by step S23
Recommend interest tags.
The application above-mentioned steps S23, by the concern second set of new registration user in interest model several the
One concern set is matched, and obtains gathering with the first concern that the second concern set of new registration user matches, so as to logical
Cross the interest tags that the first concern set determines new registration user.
Specifically, by step S11 to step S23, to there is the registered of identical first concern set in social platform
User is grouped, and registered users set corresponding with the first concern set is obtained, according to having been noted in information client
The acquisition of the interest tags dictionary of volume user, obtains user's collection-label dictionary corresponding with registered users set.In this way, just
Construct an interest model for possessing the first concern set and user's collection-label dictionary correspondence.It is used obtaining new registration
After the second concern set at family, directly matched by the second concern set with the first concern set in interest model,
It can be obtained by the recommendation interest tags of new registration user.
In practical application, the interest similarity that the relationship in social platform reflects user can be generally considered as.Base
In different it is assumed that different methods may be used in we finds the other use similar with a user interest in social platform
Family.Different hypothesis is suitable for different types of social platform, for example, for Tencent QQ, wechat is this to emphasize two-way exchange
Social platform assume that the interest between good friend is similar.It and can for this social platform for emphasizing unidirectionally to pay close attention to of microblogging
It is similar with the user interest for assuming to possess common concern, for example, two users pay close attention to Liao Lei armies, Huang Zhang, they
It is likely to interested all to smart mobile phone.
Social platform carries out the content of the microblogging concern list of registered users in information client by taking microblogging as an example
Screening filters out perpetual object of the bean vermicelli number more than certain numerical value or filters out the perpetual object composition one of bean vermicelli number several former
A first concern set.With identical screening mode, the microblogging concern list of all registered users is screened, obtain with
Each registered users corresponding first pay close attention to set, the registered users for possessing the first identical concern set are classified as several
A registered users set, each registered users set have the first different concern set.By collecting registered users
The interest tags dictionary of registered users in set obtains user's collection-label word corresponding with each registered users set
Allusion quotation.After a new registration user's registration information client and after authorizing information client call microblogging public data, to this
The concern list of new user also carries out the screening of the same manner, by the second concern set and several registered users after screening
First concern set of set is matched, so that it is determined that the registered users set belonging to new user, obtains the registered use
Gather corresponding user's collection-label dictionary, i.e. the recommendation interest tags of new registration user in family.
In conclusion the present invention is solved in the prior art because new registration user does not have historical viewings record, caused nothing
Method provides the problem of targetedly information.It realizes and is provided to the user by new registration user in the concern relation of social platform
The effect of targeted information.
Preferably, as shown in Fig. 2, obtaining information in client in step S11 in the preferred embodiment that the application provides
Before the interest tags dictionary of registered user, method includes:
Step S101 is obtained and is recommended information.
Step S103 recommends the interest tags of information from the contents extraction of information is recommended.
Step S105 obtains the historical behavior data of registered users, wherein, historical behavior data are registered for recording
User is to the operation behavior of recommendation information.
Step S107 according to historical behavior data, determines the label weighted value of interest tags.
Step S109 according to label weighted value, determines interest tags dictionary corresponding with registered users.
Specifically, by step S101 to step S109, the content of all recommendation information in information client is carried out
Analysis recommends information to extract interest tags according to the content of information is recommended for every.When registered users are to recommending information to carry out
During operation, record the operation behavior of registered users, according to recommend information operation behavior, pair with this recommendation information it is corresponding
Interest tags be weighted, the weighted value of interest tags corresponding with registered users is calculated.When label weight
When value is more than threshold value, which is added in interest tags dictionary corresponding with the user.
In practical application, the recommendation service in information client can beat the recommendation information content that client is recommended
Upper interest tags, such as:For the classification of content:Science and technology, football, basketball etc., for the classification of corresponding crowd:Technology residence,
Outdoor fan, teenager etc., for the keyword of content:IPhone, tank contest, Bayern Munich etc..These interest marks
Label are sometimes human-edited's, and sometimes algorithm, which automatically analyzes, recommends information identification.
In the case where the recommendable all recommendation information of recommendation service are interested in label, made by recording registered users
With the behavioral data of recommendation service, such as:Browsing content, click/collection/comment content etc., and according to corresponding with information content
Interest tags obtain the interest tags dictionary of user.This interest tags dictionary describes which interest tags user has, often
The weight of a interest tags is how many.This interest tags dictionary can be used as interest model in subsequent step.
Specifically, the computational methods of the label weighted value of interest tags can include:
A weight w is set, for example click 1 point of note firstly, for each user action act, browsing is not clicked still
- 0.2 point of note, 5 points of collection note.
A given sequence of user actions [act1, act2 ..., act3], the interest tags vector of user calculate as follows:
V=∑siTi·wi;
Wherein Ti represents the interest tags vector of i-th of user action, and wi represents the weight of i-th of user action.
Preferably, in the preferred embodiment that the application provides, step S17 according to the interest tags dictionaries of registered users and
The step of first concern set, structure interest model, includes:
Step S171 screens the first concern set, obtains third concern set corresponding with registered users,
In, screening technique includes at least:Data screening method, index screening method, conditional filtering method and information sifting method.
Step S173 matches registered users by third concern set, generates registered users set,
In, registered users set includes the registered users for possessing identical third concern set.
Step S175, according to the interest tags dictionary of the registered users included in registered users set, generation with
Registered user gathers corresponding user's collection-label dictionary.
Specifically, by step S171 to step S175, the first concern set of registered users is screened first,
Can by concern quantity and/or the conditions such as good friend's quantity and/or liveness by first concern set screen, by it is inactive,
The few user of good friend removes from the first concern set, third concern set of the generation by screening.
Registered users by screening are matched by third concern set, by the matching degree of third concern set
Gather identical registered users more than pre-set threshold value or third concern and be subdivided into identical registered users
Set.According to the content deltas of third concern set, registered users set can be by many.Certainly, third concern set
Set can also be paid close attention to according to artificially defined third, registered users are grouped by artificially defined, be divided into it is different
In registered user's set.
According to the content of interest tags dictionary corresponding with each registered users in registered users set, generation with
Current registered user gathers corresponding user's collection-label dictionary.
Above-mentioned social platform is by taking microblogging as an example, as shown in figure 3, Fig. 3 is by paying close attention to set in microblogging to registered users
Carry out the flow diagram of matching generation registered users set.
It is obtained according to the concern list of registered users, using bean vermicelli quantity as screening conditions, list will be paid close attention to
User's screening and filtering of middle bean vermicelli negligible amounts.According to garbled concern list generation third concern set.Certainly, for micro-
For rich, artificially third concern set can also be defined.For example, by the specific user in microblogging according to class of subscriber into
Row divides, and can the user of the computer internet fields such as Li Kaifu, Lei Jun, week great Yi, Li Yanhong be turned to a third and closed
Note set can say that the user in the amusement medium field such as what Gui, Xie Na, the army of wearing turns to a third concern set, can also will
The user in the sports such as Wei Kexing, Li Na, Liu Xiang field is divided into a third concern set.
It is paid close attention to and gathered according to third, registered user is subjected to classifying and dividing, common third concern set will be possessed
Registered users are divided into a registered users set, to achieve the purpose that similar Interests User group.
Preferably, in the preferred embodiment that the application provides, step S175 is included according in registered users set
The step of interest tags dictionary of registered user, generation user's collection-label dictionary corresponding with registered users set, includes:
Step S1751, the first number of users of registered users and registered users set in the client that obtains information
Second user quantity.
Step S1753, according to label weighted value and the first number of users, the weight distribution for calculating each interest tags is averaged
Value.
Step S1755, the label weighted value of the registered users in registered users set and second user quantity,
Calculate the set weighted mean for each interest tags that user gathers in interest tags dictionary.
Step S1757 according to weight distribution average value and set weighted mean, is calculated interest tags and collects in user
Close the registered users set weighted value in interest tags dictionary.
Step S1759, successively by interest tags user gather interest tags dictionary in registered users set weight
Value is compared with preset noise threshold.
It is preset when registered users set weighted value of the interest tags in user gathers interest tags dictionary is more than
Noise threshold when, corresponding with registered users set weighted value interest tags are retained in user's collection-label dictionary;
When registered users set weighted value of the interest tags in user gathers interest tags dictionary is less than or equal in advance
During the noise threshold first set, interest mark corresponding with registered users set weighted value is deleted in user's collection-label dictionary
Label.
Specifically, step S1751 to step S1759, in practical application, social platform is finding class by taking microblogging as an example
After Interests User group, the interest tags dictionary that can merge these users individual obtains group interest model.Most simple side
Method is exactly that user tag vector is directly added.But in practical application, it is found that result of which has very big noise, because
Microblogging large size follower for certain fields is very more, and many people only because this large size is famous and pays close attention to, go by concern
It can not reflect own interests for itself, if simply the interest tags vector of these users is summed it up, significant signal is just
Easily flooded by common interest.For the example in actual experiment, the microblog users of analysis concern Wang Xing (U.S. group's net founder),
We have found that the interest tags of weight maximum are not " internets ", " O2O ", but " amusement ", " social news ".This is because
" amusement " and " social news " is universal interest tags, much has the user of the two labels because Wang Xing is the wound of U.S. group's net
Beginning, people paid close attention to him, but " internet " and " O2O " are less paid close attention in fact.The considerations of finally if we do not make any distinction between is owned
These users, will obtain " entertaining " and " social news " weight ratio " internet ", " O2O " higher result.
How to remove ambient noise is effective core technology for excavating group interest.In practice, we are firstly the need of system
The weight distribution average value of the registered users at all stations of meter:
Wherein N represents the quantity of all registered users, and Vn represents the interest tags weight distribution of a user;
By above-mentioned formula, and then acquire average weight V of the total user on interest tags ibase[i];
Then to possessing the registered users set of a certain the same terms in concern relation, (such as:In microblogging, own
Concern set in, pay close attention to " Wang Xing " registered users set), give this registered users set group interest label to
V is measured, acquires the registered users set weighted value V ' for removing noise respectively:
V ' [i]=V [i]/Vbase[i];
Wherein V ' [i] represents the registered users set weighted value of interest tags i, and V [i] represents the interest of interest tags i
The set weighted mean of label, Vbase[i] represents average weight of the total user on interest tags i.
By being compared to registered users set weighted value V ' and preset noise threshold, work as registered users
When gathering weighted value V ' less than the noise threshold, it was demonstrated that this interest tags is noise label, should be from active user's collection-label
It is eliminated in dictionary;And when registered users set weighted value V ' is more than or equal to the noise threshold, judge the interest tags
For non-noise label, which is retained in current user's collection-label dictionary.
Preferably, in the preferred embodiment that the application provides, the second concern set is carried out with interest model in step S23
It matches, in the recommendation interest tags that new registration user is determined according to interest model, step includes:
Step S231 screens the second concern set, obtains the 4th concern set corresponding with new registration user,
In, screening technique includes at least:Data screening method, index screening method, conditional filtering method and information sifting method.
Step S233 matches the 4th concern set with third concern collection, it is determining it is corresponding with new registration user
Registered user gathers.
Step S235 according to user's collection-label dictionary of registered users set corresponding with new registration user, is determined
The recommendation interest tags of new registration user.
Specifically, by step S231 to step S235, the second concern set of new registration user is screened first,
Can by concern quantity and/or the conditions such as good friend's quantity and/or liveness by second concern set screen, by it is inactive,
The few user of good friend removes from the second concern set, fourth concern set of the generation by screening.Wherein, the method for screening can
With identical with the screening technique used in step 171, other screening techniques can also be used.As long as optimization second can be reached
The purpose of set is paid close attention to, used screening technique is not limited.
Then the 4th concern set is matched with each third concern set, when the 4th concern collection of new registration user
When closing with the matching degree of third concern set more than pre-set threshold value or identical third concern set, determine that this is new
Registered user pays close attention to sets match with the third.So that it is determined that the registered users set belonging to new registration user.
User's collection-label dictionary of registered users set according to belonging to new registration determines what the new user was recommended
Recommend label.
In practical application, after excavating a group interest model with new registration user interest similar users group,
We can be according to this group interest model of certain weight fusion and user's individual interest model, then according to emerging after fusion
Interesting model carrys out recommendation.Specifically, the interest model (interest tags vector) after a fusion is given, we can be according to
Top quality content under some labels of the recommendation of the weight equal proportion of each interest tags.
It should be noted that for new user, we do not have action data in the station of any user, are just in no position to take possession of yet
Its individual interest model.But if this new user is to log in information client with the network account of social platform, we
The social networks on the new registration user social contact platform can be obtained, by interest similar users group in his station of excavation, are passed through
User's recommendation is given using this group interest model, it is possible to which information is targetedly recommended in realization.In practice, it is this to do
Method is more preferable than recommending or recommending most popular content effect at random.
Preferably, in the preferred embodiment that the application provides, the second concern set is carried out with interest model in step S23
It matches, after the recommendation interest tags that new registration user is determined according to interest model, method further includes:
Step S24 according to interest tags are recommended, is pushed for new registration user and is recommended information.
Specifically, by step S24, according to the interest tags determined by above-mentioned steps for new registration user, to new note
Volume user push and the matched recommendation information of interest tags.
From technical solution as can be seen that the present invention has been effectively combined social networks public data and the privately owned number of recommendation service
According to common for user's recommendation.Compared with using only social networks public data or recommendation service private data, two kinds are merged
Data help more accurately to recommend individualized content.And fusion method proposed by the present invention can also utilize new user
(user interest model is transferred to the station of new registration by social networks in the station based on data mining in station for the fusion of two kinds of data
With outer user), this is also the effect that conventional method is unable to reach.
A feature of the present invention is to possess the recommendation service quotient of a large number of users, and the effect of this method can be better.
Because such its user group of recommendation service quotient can be bigger for the covering surface of social network user group, it is unlikely to occur
Appoint to a social account, good friend or bean vermicelli major part are not users in station, can not excavate the situation of group interest.This
The product for possessing today's tops hundred million grades of users in this way is a significant competitive advantage, and for some smaller recommended products
It is then a technical barrier.
Embodiment 2
The embodiment of the present invention additionally provides a kind of data mining device based on social platform, as shown in figure 4, the device packet
It includes:First acquisition module 30, the second acquisition module 32, the first determining module 34, first processing module 36, third acquisition module
38th, the second determining module 40 and Second processing module 42.
Wherein, the first acquisition module 30, the interest tags dictionary of registered users in the client that obtains information.
The first acquisition module 30 of the application, for passing through the collection of the historical viewings behavior to registered users, analysis
Obtain interest tags dictionary corresponding with each registered users.
Second acquisition module 32 has concern relation for obtaining in social platform with registered users in information client
The first object, and read the relation information between registered users and the first object.
The second acquisition module 32 of the application, for passing through the concern relation letter for reading registered users in social platform
Breath determines the object for having concern relation with registered users.
In practical application, concern relation can be in the friend relation in Tencent QQ software or microblogging
Friends in concern relation or Renren Network.
First determining module 34, for according to registered users have concern relation the first object, determine with it is registered
The corresponding first concern set of user.
The first determining module 34 of the application, for passing through the first couple for having concern relation to each registered users respectively
As being arranged, so that it is determined that the first concern set of each registered users.
First processing module 36, for being gathered according to the interest tags dictionary of registered users and the first concern, structure is emerging
Interesting model, wherein, interest model is corresponding with interest tags for characterizing the registered users with the identical first concern set
Relationship.
The first processing module 36 of the application, will for being analyzed by the concern set to each registered users
Registered users with difference the first concern set are classified, and are divided into corresponding with several the first concern set
Registered user gathers, and passes through the interest tags dictionary of the registered users in registered users set, generation and the first concern
Gather corresponding user's collection-label dictionary.So that it is determined that the first concern set and the correspondence of interest tags.
Third acquisition module 38, new registration user has concern in social platform with it in the client that obtains information
Second object of relationship, and read the relation information between new registration user and the second object.
The third acquisition module 38 of the application, for passing through the concern relation letter for reading new registration user in social platform
Breath determines the second object for having concern relation with new registration user.
In practical application, concern relation can be in the friend relation in Tencent QQ software or microblogging
Friends in concern relation or Renren Network.
Second determining module 40, for having the second object of concern relation, determining and new registration according to new registration user
The second concern set of user.
The second determining module 40 of the application, for whole by there is the second object of concern relation to carry out to new registration user
Reason, so that it is determined that the second concern set of new registration user.
Second processing module 42 for the second concern set to be matched with interest model, is determined according to interest model
The recommendation interest tags of new registration user.
The Second processing module 42 of the application, for passing through in the concern second set of new registration user and interest model
Several the first concern set are matched, and obtain collecting with the first concern that the second concern set of new registration user matches
It closes, so as to determine the interest tags of new registration user by the first concern set.
Specifically, pass through the first acquisition module 30, the second acquisition module 32, the first determining module 34, first processing module
36th, third acquisition module 38, the second determining module 40 and Second processing module 42, to there is identical first concern in social platform
The registered users of set are grouped, and registered users set corresponding with the first concern set are obtained, according to information
The acquisition of the interest tags dictionary of registered users in client, obtains user's collection-label corresponding with registered users set
Dictionary.In this way, just construct an interest model for possessing the first concern set and user's collection-label dictionary correspondence.
After the second concern set for obtaining new registration user, directly collected by the second concern set and the first concern in interest model
Conjunction is matched, it is possible to obtain the recommendation interest tags of new registration user.
In practical application, the interest similarity that the relationship in social platform reflects user can be generally considered as.Base
In different it is assumed that different methods may be used in we finds the other use similar with a user interest in social platform
Family.Different hypothesis is suitable for different types of social platform, for example, for Tencent QQ, wechat is this to emphasize two-way exchange
Social platform assume that the interest between good friend is similar.It and can for this social platform for emphasizing unidirectionally to pay close attention to of microblogging
It is similar with the user interest for assuming to possess common concern, for example, two users pay close attention to Liao Lei armies, Huang Zhang, they
It is likely to interested all to smart mobile phone.
Social platform carries out the content of the microblogging concern list of registered users in information client by taking microblogging as an example
Screening filters out perpetual object of the bean vermicelli number more than certain numerical value or filters out the perpetual object composition one of bean vermicelli number several former
A first concern set.With identical screening mode, the microblogging concern list of all registered users is screened, obtain with
Each registered users corresponding first pay close attention to set, the registered users for possessing the first identical concern set are classified as several
A registered users set, each registered users set have the first different concern set.By collecting registered users
The interest tags dictionary of registered users in set obtains user's collection-label word corresponding with each registered users set
Allusion quotation.After a new registration user's registration information client and after authorizing information client call microblogging public data, to this
The concern list of new user also carries out the screening of the same manner, by the second concern set and several registered users after screening
First concern set of set is matched, so that it is determined that the registered users set belonging to new user, obtains the registered use
Gather corresponding user's collection-label dictionary, i.e. the recommendation interest tags of new registration user in family.
In conclusion the present invention is solved in the prior art because new registration user does not have historical viewings record, caused nothing
Method provides the problem of targetedly information.It realizes and is provided to the user by new registration user in the concern relation of social platform
The effect of targeted information.
Preferably, in the preferred embodiment that the application provides, as shown in figure 5, device further includes:4th acquisition module 281,
Extraction module 283, the 5th acquisition module 285,287 and the 4th determining module 289 of third determining module.
Wherein, the 4th acquisition module 281 recommends information for obtaining.
Extraction module 283, for recommending the interest tags of information from the contents extraction of recommendation information.
5th acquisition module 285, for obtaining the historical behavior data of registered users, wherein, historical behavior data are used
In record registered users to the operation behavior of recommendation information.
Third determining module 287, for according to historical behavior data, determining the label weighted value of interest tags.
4th determining module 289, for according to label weighted value, determining interest tags word corresponding with registered users
Allusion quotation.
Specifically, pass through the 4th acquisition module 281, extraction module 283, the 5th acquisition module 285, third determining module
287 and the 4th determining module 289, the content of all recommendation information in information client is analyzed, according to recommending information
Content for every recommend information extract interest tags.When registered users are to recommending information to operate, record registered
The operation behavior of user, according to the operation behavior to recommending information, a pair interest tags corresponding with this recommendation information add
Power calculates, and the weighted value of interest tags corresponding with registered users is calculated.When label weighted value is more than threshold value, by this
Label is added in interest tags dictionary corresponding with the user.
In practical application, the recommendation service in information client can beat the recommendation information content that client is recommended
Upper interest tags, such as:For the classification of content:Science and technology, football, basketball etc., for the classification of corresponding crowd:Technology residence,
Outdoor fan, teenager etc., for the keyword of content:IPhone, tank contest, Bayern Munich etc..These interest marks
Label are sometimes human-edited's, and sometimes algorithm, which automatically analyzes, recommends information identification.
In the case where the recommendable all recommendation information of recommendation service are interested in label, made by recording registered users
With the behavioral data of recommendation service, such as:Browsing content, click/collection/comment content etc., and according to corresponding with information content
Interest tags obtain the interest tags dictionary of user.This interest tags dictionary describes which interest tags user has, often
The weight of a interest tags is how many.This interest tags dictionary can be used as interest model in subsequent step.
Specifically, the computational methods of the label weighted value of interest tags can include:
A weight w is set, for example click 1 point of note firstly, for each user action act, browsing is not clicked still
- 0.2 point of note, 5 points of collection note.
A given sequence of user actions [act1, act2 ..., act3], the interest tags vector of user calculate as follows:
V=∑siTi·wi;
Wherein Ti represents the interest tags vector of i-th of user action, and wi represents the weight of i-th of user action.
Preferably, in the preferred embodiment that the application provides, first processing module 36, including:First subprocessing module
361st, sub- 363 and first generation module 365 of matching module.
Wherein, the first subprocessing module 361 for being screened to the first concern set, obtains and registered users pair
The third concern set answered, wherein, screening technique includes at least:Data screening method, index screening method, conditional filtering method and information
Screening method.
Sub- matching module 363 for being matched by third concern set to registered users, generates registered users
Set, wherein, registered users set includes the registered users for possessing identical third concern set.
First generation module 365, for the interest tags word according to the registered users included in registered users set
Allusion quotation generates user's collection-label dictionary corresponding with registered users set.
Specifically, by the first subprocessing module 361,363 and first generation module 365 of sub- matching module, first to
The first concern set of registered user is screened, and can be incited somebody to action by concern quantity and/or the conditions such as good friend's quantity and/or liveness
First concern set is screened, and the few user of inactive, good friend is removed from the first concern set, generation is by screening
Third concern set.
Registered users by screening are matched by third concern set, by the matching degree of third concern set
Gather identical registered users more than pre-set threshold value or third concern and be subdivided into identical registered users
Set.According to the content deltas of third concern set, registered users set can be by many.Certainly, third concern set
Set can also be paid close attention to according to artificially defined third, registered users are grouped by artificially defined, be divided into it is different
In registered user's set.
According to the content of interest tags dictionary corresponding with each registered users in registered users set, generation with
Current registered user gathers corresponding user's collection-label dictionary.
Above-mentioned social platform is by taking microblogging as an example, as shown in figure 3, Fig. 3 is by paying close attention to set in microblogging to registered users
Carry out the flow diagram of matching generation registered users set.
It is obtained according to the concern list of registered users, using bean vermicelli quantity as screening conditions, list will be paid close attention to
User's screening and filtering of middle bean vermicelli negligible amounts.According to garbled concern list generation third concern set.Certainly, for micro-
For rich, artificially third concern set can also be defined.For example, by the specific user in microblogging according to class of subscriber into
Row divides, and can the user of the computer internet fields such as Li Kaifu, Lei Jun, week great Yi, Li Yanhong be turned to a third and closed
Note set can say that the user in the amusement medium field such as what Gui, Xie Na, the army of wearing turns to a third concern set, can also will
The user in the sports such as Wei Kexing, Li Na, Liu Xiang field is divided into a third concern set.
It is paid close attention to and gathered according to third, registered user is subjected to classifying and dividing, common third concern set will be possessed
Registered users are divided into a registered users set, to achieve the purpose that similar Interests User group.
Preferably, in the preferred embodiment that the application provides, the first generation module 365, including:First sub-acquisition module
3651st, the first sub- computing module 3652, the second sub- computing module 3653, the sub- computing module 3654 of third and sub- judgment module
3655。
Wherein, the first sub-acquisition module 3651, the first number of users of registered users in the client that obtains information
With the second user quantity of registered users set.
First sub- computing module 3652, for according to label weighted value and the first number of users, calculating each interest tags
Weight distribution average value.
Second sub- computing module 3653, for the registered users in registered users set label weighted value and
Second user quantity calculates the set weighted mean for each interest tags that user gathers in interest tags dictionary.
The sub- computing module 3654 of third, for according to weight distribution average value and set weighted mean, being calculated emerging
Registered users set weighted value of the interesting label in user gathers interest tags dictionary.
Sub- judgment module 3655, for the registered users successively by interest tags in user gathers interest tags dictionary
Set weighted value is compared with preset noise threshold.
It is preset when registered users set weighted value of the interest tags in user gathers interest tags dictionary is more than
Noise threshold when, corresponding with registered users set weighted value interest tags are retained in user's collection-label dictionary.
When registered users set weighted value of the interest tags in user gathers interest tags dictionary is less than or equal in advance
During the noise threshold first set, interest mark corresponding with registered users set weighted value is deleted in user's collection-label dictionary
Label.
Specifically, by the first generation module 365, including:First sub-acquisition module 3651, the first sub- computing module
3652nd, the second sub- computing module 3653, the sub- computing module 3654 of third and sub- judgment module 3655 answering in practical application
With social platform is by taking microblogging as an example, after similar Interests User group is found, can merge the interest tags of these users individual
Dictionary obtains group interest model.Most straightforward procedure is exactly that user tag vector is directly added.But in practical application,
It was found that result of which has very big noise because the microblogging large size follower in certain fields is very more, many people be only because
It pays close attention to for this large size is famous, concern behavior can not reflect own interests in itself, if simply the emerging of these users
Interesting label vector sums it up, and significant signal is easy for being flooded by common interest.For the example in actual experiment, analysis concern
The microblog users of Wang Xing (U.S. group net founder), it has been found that the interest tags of weight maximum are not " internets ", " O2O ", and
It is " amusement ", " social news ".This is because " amusement " and " social news " is universal interest tags, much there are the two marks
The user of label is because the founder that Wang Xing is U.S. group's net has paid close attention to him, but " internet " and " O2O " are less paid close attention in fact.Most
If all these users of the considerations of we do not make any distinction between eventually, it will obtain " entertaining " and " social news " weight ratio " interconnects
Net ", " O2O " higher result.
How to remove ambient noise is effective core technology for excavating group interest.In practice, we are firstly the need of system
The weight distribution average value of the registered users at all stations of meter:
Wherein N represents the quantity of all registered users, and Vn represents the interest tags weight distribution of a user;
By above-mentioned formula, and then acquire average weight V of the total user on interest tags ibase[i];
Then to possessing the registered users set of a certain the same terms in concern relation, (such as:In microblogging, own
Concern set in, pay close attention to " Wang Xing " registered users set), give this registered users set group interest label to
V is measured, acquires the registered users set weighted value V ' for removing noise respectively:
V ' [i]=V [i]/Vbase[i];
Wherein V ' [i] represents the registered users set weighted value of interest tags i, and V [i] represents the interest of interest tags i
The set weighted mean of label, Vbase[i] represents average weight of the total user on interest tags i.
By being compared to registered users set weighted value V ' and preset noise threshold, work as registered users
When gathering weighted value V ' less than the noise threshold, it was demonstrated that this interest tags is noise label, should be from active user's collection-label
It is eliminated in dictionary;And when registered users set weighted value V ' is more than or equal to the noise threshold, judge the interest tags
For non-noise label, which is retained in current user's collection-label dictionary.
By being compared to registered users set weighted value V ' and preset noise threshold, work as registered users
When gathering weighted value V ' less than the noise threshold, it was demonstrated that this interest tags is noise label, should be from active user's collection-label
It is eliminated in dictionary;And when registered users set weighted value V ' is more than or equal to the noise threshold, judge the interest tags
For non-noise label, which is retained in current user's collection-label dictionary.
Preferably, in the preferred embodiment that the application provides, Second processing module 42, including:Second subprocessing module
421st, the first sub- 423 and second sub- determining module 425 of determining module.
Wherein, the second subprocessing module 421 for being screened to the second concern set, obtains and new registration user couple
The 4th concern set answered, wherein, screening technique includes at least:Data screening method, index screening method, conditional filtering method and information
Screening method.
First sub- determining module 423, for the 4th concern set to be matched with third concern collection, determining and new registration
The corresponding registered users set of user.
Second sub- determining module 425, for being gathered according to the user of registered users set corresponding with new registration user
Label dictionary determines the recommendation interest tags of new registration user.
Specifically, by the second subprocessing module 421, the first sub- 423 and second sub- determining module 425 of determining module, it is first
First the second concern set of new registration user is screened, it can be by concern quantity and/or good friend's quantity and/or liveness etc.
Condition screens the second concern set, the few user of inactive, good friend is removed from the second concern set, generation is passed through
4th concern set of screening.Wherein, the method for screening can be identical with the screening technique used in step 171, can also
Use other screening techniques.As long as can achieve the purpose that optimization the second concern set, used screening technique is not limited
System.
Then the 4th concern set is matched with each third concern set, when the 4th concern collection of new registration user
When closing with the matching degree of third concern set more than pre-set threshold value or identical third concern set, determine that this is new
Registered user pays close attention to sets match with the third.So that it is determined that the registered users set belonging to new registration user.
User's collection-label dictionary of registered users set according to belonging to new registration determines what the new user was recommended
Recommend label.
In practical application, after excavating a group interest model with new registration user interest similar users group,
We can be according to this group interest model of certain weight fusion and user's individual interest model, then according to emerging after fusion
Interesting model carrys out recommendation.Specifically, the interest model (interest tags vector) after a fusion is given, we can be according to
Top quality content under some labels of the recommendation of the weight equal proportion of each interest tags.
It should be noted that for new user, we do not have action data in the station of any user, are just in no position to take possession of yet
Its individual interest model.But if this new user is to log in information client with the network account of social platform, we
The social networks on the new registration user social contact platform can be obtained, by interest similar users group in his station of excavation, are passed through
User's recommendation is given using this group interest model, it is possible to which information is targetedly recommended in realization.In practice, it is this to do
Method is more preferable than recommending or recommending most popular content effect at random.
Preferably, in the preferred embodiment that the application provides, as shown in fig. 6, device further includes:Pushing module 43.
Wherein, pushing module 43, for according to interest tags are recommended, being pushed for new registration user and recommending information.
Specifically, by pushing module 43, according to the interest tags determined by above-mentioned steps for new registration user, Xiang Xin
Registered user pushes and the matched recommendation information of interest tags.
From technical solution as can be seen that the present invention has been effectively combined social networks public data and the privately owned number of recommendation service
According to common for user's recommendation.Compared with using only social networks public data or recommendation service private data, two kinds are merged
Data help more accurately to recommend individualized content.And fusion method proposed by the present invention can also utilize new user
(user interest model is transferred to the station of new registration by social networks in the station based on data mining in station for the fusion of two kinds of data
With outer user), this is also the effect that conventional method is unable to reach.
A feature of the present invention is to possess the recommendation service quotient of a large number of users, and the effect of this method can be better.
Because such its user group of recommendation service quotient can be bigger for the covering surface of social network user group, it is unlikely to occur
Appoint to a social account, good friend or bean vermicelli major part are not users in station, can not excavate the situation of group interest.This
The product for possessing today's tops hundred million grades of users in this way is a significant competitive advantage, and for some smaller recommended products
It is then a technical barrier.
It should be noted that for aforementioned each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because
According to the present invention, certain steps may be used other sequences or be carried out at the same time.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Division of logic function, can there is an other dividing mode in actual implementation, such as multiple units or component can combine or can
To be integrated into another system or some features can be ignored or does not perform.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING or communication connection of device or unit,
Can be electrical or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially
The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products
It embodies, which is stored in a storage medium, is used including some instructions so that a computer
Equipment (can be personal computer, mobile terminal, server or network equipment etc.) performs side described in each embodiment of the present invention
The all or part of step of method.And aforementioned storage medium includes:USB flash disk, read-only memory (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various to be stored
The medium of program code.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, that is made any repaiies
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.