CN108805598A

CN108805598A - Similarity information determines method, server and computer readable storage medium

Info

Publication number: CN108805598A
Application number: CN201710313158.8A
Authority: CN
Inventors: 姚伶伶; 项则远; 陈骥远; 王芊; 郭永; 何琪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-05-05
Filing date: 2017-05-05
Publication date: 2018-11-13
Anticipated expiration: 2037-05-05
Also published as: CN108805598B

Abstract

The embodiment of the invention discloses a kind of similarity informations to determine method, including：Obtain the second historical data corresponding to corresponding first historical data of the first user and second user；If the first similarity information of the first historical data and the second historical data is less than or equal to preset thresholding, the first user's characteristic information of the first user and the second user characteristic information of second user are obtained；According to the first user's characteristic information and second user characteristic information, determine that the second similarity information of the first user and second user, preset similarity training pattern are the similarity function relational model trained according to sample of users and sample of users characteristic information using preset similarity training pattern.The present invention also provides a kind of server and computer readable storage mediums.The embodiment of the present invention can calculate a more reliable similarity information using preset similarity training pattern, and server is recommended according to the similarity information, to promote the reliability of recommendation.

Description

Similarity information determines method, server and computer readable storage medium

Technical field

The present invention relates to field of Internet communication more particularly to similarity information to determine method and server.

Background technology

With the continuous development of Internet technology, more and more internet platforms meet difference using personalized recommendation The demand of user, for example, recommended user may interested commodity or the possible interested travelling sight spot of recommended user etc..

Currently, collaborative filtering recommending is most widely used in internet platform, the specific way of recommendation such as Fig. 1 institutes Show, Fig. 1 is a schematic diagram of collaborative filtering recommending in the prior art, it is assumed that user A and user B have purchased respectively two it is identical Commodity, and user B also has purchased cola, then internet platform will also push cola to user A.

However, in practical applications, since number of users and commodity amount are all very big, may be selected between two users Entirely different commodity are selected, therefore, the similarity between the two users is zero.In this case, internet platform is difficult to The preference of user is accurately captured according to the similarity between user, it is unreliable so as to cause recommendation results.

Invention content

An embodiment of the present invention provides a kind of similarity informations to determine method, server and computer-readable storage medium Matter, when the first similarity information between user be less than or equal to preset thresholding, then can also further utilize it is preset similar Training pattern is spent to calculate a second new similarity information, and using the second similarity information as capturing user preference A key factor, to promote the reliability of recommendation.

In view of this, first aspect present invention provides a kind of method that similarity information determines, including：

Obtain the second historical data corresponding to the first historical data and the second user corresponding to the first user；

If the first similarity information between first historical data and second historical data is less than or equal to pre- Thresholding is set, then obtains corresponding first user's characteristic information of first user and the corresponding second user of the second user Characteristic information；

According to first user's characteristic information and the second user characteristic information, mould is trained using preset similarity Type determines the second similarity information between first user and the second user, wherein the preset similarity training Model is the similarity function relational model trained according to sample of users and sample of users characteristic information.

Second aspect of the present invention provides a kind of server, including：

First acquisition module, for obtaining corresponding to the first historical data and the second user corresponding to the first user Second historical data；

Second acquisition module, if first historical data for first acquisition module to obtain is gone through with described second The first similarity information between history data is less than or equal to preset thresholding, then obtains corresponding first user of first user Characteristic information and the corresponding second user characteristic information of the second user；

First determining module, first user's characteristic information for being obtained according to second acquisition module and institute Second user characteristic information is stated, is determined between first user and the second user using preset similarity training pattern Second similarity information, wherein the preset similarity training pattern is to be instructed according to sample of users and sample of users characteristic information The similarity function relational model got.

Third aspect present invention provides a kind of server, including：Memory, processor and bus system；

Wherein, the memory is for storing program；

The processor is used to execute the program in the memory, includes the following steps：

According to first user's characteristic information and the second user characteristic information, mould is trained using preset similarity Type determines the second similarity information between first user and the second user, wherein the preset similarity training Model is the similarity function relational model trained according to sample of users and sample of users characteristic information；

The bus system is for connecting the memory and the processor, so that the memory and the place Reason device is communicated.

The fourth aspect of the present invention provides a kind of computer readable storage medium, in the computer readable storage medium It is stored with instruction, when run on a computer so that computer executes the method described in above-mentioned various aspects.

As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages：

In the embodiment of the present invention, a kind of method that similarity information determines is provided, server first obtains the first user The second historical data corresponding to the first corresponding historical data and second user, if the first historical data and the second history The first similarity information between data is less than or equal to preset thresholding, then obtains the corresponding first user characteristics letter of the first user Breath and the corresponding second user characteristic information of second user, server is further according to the first user's characteristic information and second user Characteristic information determines the second similarity information between the first user and second user using preset similarity training pattern, In, preset similarity training pattern is the similarity function relationship trained according to sample of users and sample of users characteristic information Model.It, can also be by the above-mentioned means, if the first similarity information between user is less than or equal to preset thresholding One step calculates a second new similarity information using preset similarity training pattern, and using the second similarity information as A key factor for capturing user preference, to promote the reliability of recommendation.

Description of the drawings

Fig. 1 is a schematic diagram of collaborative filtering recommending in the prior art；

Fig. 2 is the topological diagram that similarity information is obtained in the embodiment of the present invention；

Fig. 3 is method one embodiment schematic diagram that similarity information determines in the embodiment of the present invention；

Fig. 4 is the interface schematic diagram of Collaborative Recommendation in the embodiment of the present invention；

Fig. 5 is server one embodiment schematic diagram in the embodiment of the present invention；

Fig. 6 is another embodiment schematic diagram of server in the embodiment of the present invention；

Fig. 7 is another embodiment schematic diagram of server in the embodiment of the present invention；

Fig. 8 is another embodiment schematic diagram of server in the embodiment of the present invention；

Fig. 9 is another embodiment schematic diagram of server in the embodiment of the present invention；

Figure 10 is another embodiment schematic diagram of server in the embodiment of the present invention；

Figure 11 is another embodiment schematic diagram of server in the embodiment of the present invention；

Figure 12 is one structural schematic diagram of server in the embodiment of the present invention.

Specific implementation mode

Term " first ", " second ", " third " in description and claims of this specification and above-mentioned attached drawing, " The (if present)s such as four " are for distinguishing similar object, without being used to describe specific sequence or precedence.It should manage The data that solution uses in this way can be interchanged in the appropriate case, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " comprising " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process of series of steps or unit, method, system, production Product or equipment those of are not necessarily limited to clearly to list step or unit, but may include not listing clearly or for this The intrinsic other steps of processes, method, product or equipment or unit a bit.

It should be understood that present invention can apply to server according to the similarity between user come the scene of recommendation information, such as Can according to the difference of user interest dynamic change ad content.In some social networking applications, in order to promote advertisement dispensing Effect, server can commodity of the user in its website be accessed behavior and commodity library uploads to advertising platform, advertising platform root Behavior is accessed according to commodity of the user in advertisement main website and recommends its interested commodity, is finally spliced into ad content and is presented to use Family.

Under normal conditions, server can collect more user data, more similar with target user to judge with this Several association users, referring to Fig. 2, Fig. 2 is a topological diagram for obtaining similarity information in the embodiment of the present invention, if institute Show, it is assumed that target user be user's first, server matches arrive with the associated user of user's first be user's second, wherein use it is similar Statistic algorithm obtains having similar hobby or the neighboring user of similar interests, that is, completes the matching of association user.User's first exists Self-shooting bar and camera were bought in past one month, and user's second bought tealeaves and biscuit, this feelings past one month Under condition, the first similarity information between user's first and user's second is 0, and server is also just without normal direction user's second Recommendations.In It is that method is determined using the similarity that this programme is provided, multiple indexs of synthetic user first and user's second, such as：Gender, year Age, occupation, keyword and friend relation etc., the second similarity both to be calculated using preset similarity training pattern are believed Breath.

It certainly, in practical applications, may user's second, such as also user third and use incessantly with the associated user of user's first Family fourth finally selects the second similarity then the second similarity information between each user and user's first can be calculated separately Information is maximum as recommended user, and the commodity that recommended user bought will also be pushed to user's first.

Below by from the angle of server, the method determined to similarity information in the present invention is introduced, and please refers to figure 3, method one embodiment that similarity information determines in the embodiment of the present invention includes：

101, the second historical data corresponding to the first historical data and second user corresponding to the first user is obtained；

In the present embodiment, server obtains corresponding first historical data of the first user first and second user is corresponding Second historical data, wherein the first user is specially target user, that is, needs to receive the user that information pushes, second user is The user with the first user-association that server is obtained according to similar statistics.In practical applications, second user can be one, It can also be multiple, be not construed as limiting herein.

First historical data can be a statistics of attributes of first user within the past period as a result, such as first The commodity classification statistics that user buys in one month in the websites A, similarly, the second historical data can be second user in the past A statistics of attributes in a period of time is as a result, the commodity classification statistics that for example second user is bought in one month in the websites A.

If the first similarity information 102, between the first historical data and the second historical data is less than or equal to preset gate Limit, then obtain corresponding first user's characteristic information of the first user and the corresponding second user characteristic information of second user；

In the present embodiment, by comparing the coincidence factor of the first historical data and the second historical data, the first phase can be obtained Like degree information.Under normal conditions, the first similarity information can be indicated with percentage, can also be indicated using decimal, this Place is not construed as limiting.

If the first similarity information is less than or equal to preset thresholding, then it is assumed that have between the first user and second user There is lower similarity, then server further obtains corresponding first user's characteristic information of the first user and second user Corresponding second user characteristic information, wherein user's characteristic information is different from historical data information, and user's characteristic information introduces The more information with user-association, such as the behavioural habits etc. of the essential attribute of user or user.

103, true using preset similarity training pattern according to the first user's characteristic information and second user characteristic information Fixed the second similarity information between first user and second user, wherein preset similarity training pattern is to be used according to sample The similarity function relational model that family is trained with sample of users characteristic information.

In the present embodiment, server, will after getting the first user's characteristic information and second user characteristic information The two is input to preset similarity training pattern jointly, since preset similarity training pattern is to be used according to sample of users and sample Therefore the similarity function relational model that family characteristic information is trained utilizes being calculated for similarity training pattern The second similarity information between first user and second user.

Under normal conditions, server can calculate separately to obtain second between the first user and multiple and different second users Similarity information, please refers to table 1, and table 1 is the second similarity information of the first user and second user.

Table 1

First user	Second user	Second similarity information
			User A	User a	26.7%
User A	User b	69.2%
			User A	User c	11.9%
User A	User d	7.0%
			User A	User e	49.3%

According to upper table, it is not difficult to find out that the second similarity information between user A and user b is maximum, therefore, server Can recommend second user (the i.e. user b) commodity that buy to the first user (i.e. user A), the video either seen or Other information etc..

Specifically, it is assumed that preset similarity training pattern is Ax+By=C, wherein A is the coefficient that training obtains, and B is another The coefficient that one training obtains, x are the first user's characteristic information, and y is second user characteristic information, and C is the second similarity information. This example is only a signal, and in practical applications, preset similarity training pattern can also be other forms of expression.

Optionally, on the basis of above-mentioned Fig. 3 corresponding embodiments, similarity information provided in an embodiment of the present invention determines First alternative embodiment of method in, between the first user and second user is determined using preset similarity training pattern Before two similarity informations, can also include：

Obtain sample of users set, sample of users historical data set and sample of users characteristic information set, wherein sample Include multiple sample of users in this user set, includes the corresponding history of each sample of users in sample of users historical data set Data include the corresponding sample of users characteristic information of each sample of users in sample of users characteristic information set；

According to sample of users set and sample of users historical data set, positive example similarity sample and negative example phase are determined Like degree sample；

Positive example similarity sample, negative example similarity sample and sample of users characteristic information set are trained, with To preset similarity training pattern.

In the present embodiment, before determining the similarity information between user using preset similarity training pattern, server It needs first to train to obtain a preset similarity training pattern.

Specifically, server can obtain sample of users set from local data base or cloud database first, sample is used Family historical data set and sample of users characteristic information set, these set contribute to carry out model learning and training. Then server can obtain a user-according to the sample of users set and sample of users historical data set that are collected into Project (User-Item) rating matrix, please refers to table 2, and table 2 is the rating matrix of a User-Item.

Table 2

User-Item	Item1	Item1	Item1	Item1
					User 1	1	0	0	1
User j	1	0	0	1
					User j+1	0	1	1	0

In the User-Item rating matrixs that table 2 provides, for any two user in sample of users set, it can calculate Go out the cosine similarity of historical data, server selects positive example similarity sample and negative example similarity according to cosine similarity Sample, and this two classes sample and sample of users characteristic information set are trained jointly, obtain preset similarity training pattern.

Secondly, in the embodiment of the present invention, server also needs to establish the model before using preset similarity training pattern, I.e. server first obtains sample of users set, sample of users historical data set and sample of users characteristic information set, then According to sample of users set and sample of users historical data set, positive example similarity sample and negative example similarity sample are determined This, is finally trained positive example similarity sample, negative example similarity sample and sample of users characteristic information set, to obtain Preset similarity training pattern.By the above-mentioned means, the mode that server trains preset similarity training pattern is described, a side Face can utilize positive example similarity sample and negative example similarity sample with the practicability and feasibility of lifting scheme, on the other hand Originally it is carried out at the same time training, is also capable of the accuracy of lift scheme.

Optionally, on the basis of above-mentioned Fig. 3 corresponding one embodiment, similarity letter provided in an embodiment of the present invention It ceases in determining second alternative embodiment of method, according to sample of users set and sample of users historical data set, determines Positive example similarity sample and negative example similarity sample may include：

According to sample of users historical data set, the third phase between each two sample of users in sample of users set is calculated Like degree information；

If third similarity information is not 0, it is determined that two sample of users are positive example similarity sample；

If third similarity information is 0, it is determined that two sample of users are negative example similarity sample.

In the present embodiment, using the User-Item rating matrixs in above-mentioned table 2, sample of users can be further calculated out Third similarity information in set between each two sample of users, wherein third similarity information can be cosine similarity, Can also be Pearson correlation coefficients (Pearson Correlation Coeffcient, PCC) or Jaccard similarity factors (Jaccard Coefficient, JC), the present embodiment will be introduced so that third similarity information is cosine similarity as an example, But this should not be construed as the restriction to this programme.

Specifically, table 2 is please referred to, if r in once browsed certain commodity i+1, the User-Item rating matrixs of user j_j,i+1 It is set to 1.Assuming that any two sample of users is user j and user j+1, then following formula, which may be used, calculates the two users Between cosine similarity.

Wherein, cosine_similarity indicates that cosine similarity, A indicate that the browsed commodity vectors of user j, B indicate The dimension of commodity vector browsed user j+1, commodity vector is n.It, can be by user j when the result of cosine similarity is 0 With user j+1 as example similarity sample is born, when the result of cosine similarity is not 0, user j and user j+1 can be made For positive example similarity sample.

Again, in the embodiment of the present invention, server is in the mistake for obtaining positive example similarity sample and negative example similarity sample Cheng Zhong needs, according to according to sample of users historical data set, to calculate in sample of users set between each two sample of users Third similarity information, it is last that two class samples are determined according to the third similarity information being calculated.By the above-mentioned means, can More accurately to classify to sample, with the preset similarity training pattern that training is more tallied with the actual situation, to carry The operability of the scheme of liter.

Optionally, on the basis of above-mentioned Fig. 3 corresponding embodiments, similarity information provided in an embodiment of the present invention determines Method third alternative embodiment in, obtain the first user corresponding to the first historical data and second user corresponding to After second historical data, can also include：

The first historical data and the second historical data are compared, and obtains comparison result；

The first similarity information between the first historical data and the second historical data is determined according to comparison result.

In the present embodiment, server determines the first similarity by comparing the first historical data and the second historical data Information.

Specifically, it is similar to corresponding second alternative embodiment of above-mentioned Fig. 3, the first historical data of calculating and second are gone through Cosine similarity between history data, using the cosine similarity being calculated as the first similarity information, calculation is herein It does not repeat.

Further, it is also possible to directly by comparing the coincidence factor of the first historical data and the second historical data, such as user First has purchased commodity A, commodity B, commodity C and commodity D, and user's second has purchased commodity A, commodity D, commodity E and commodity F, then using Family first and the coincidence factor of user's second quality inspection are 50%, i.e. the first similarity information is 50%.

Secondly, in the embodiment of the present invention, server is obtaining the first historical data and second corresponding to the first user After the second historical data corresponding to user, the first historical data and the second historical data can also be compared, and compared Pair as a result, determining the first similarity information between the first historical data and the second historical data further according to comparison result.Pass through Aforesaid way provides concrete foundation, to improve the practicability of scheme to calculate the first similarity information between user And feasibility.

Optionally, on the basis of above-mentioned Fig. 3 corresponding embodiments, similarity information provided in an embodiment of the present invention determines The 4th alternative embodiment of method in, the first user's characteristic information may include that the first customer attribute information, the first interest are special Reference ceases and first using at least one in interactive information, and the second user characteristic information includes second user attribute letter At least one of in breath, the second interest characteristics information and the second application interactive information.

It,, can be in addition to historical data (such as goods browse information) can be collected for a user in the present embodiment Collect user's characteristic information, wherein user's characteristic information includes mainly that customer attribute information, interest characteristics information and application are handed over Mutual information will be directed to be introduced respectively per category information below.

Customer attribute information can include the age it is not limited to user, the gender of user, the educational background of user, user Native place, the permanent residence of user, the love and marriage state of user, the income level of user and the occupation of user etc..

Interest characteristics information can include it is not limited to user is recorded by the browsing that browser obtains, or is passed through The search record that search engine obtains, or extract keyword, classification and theme etc. by social class application records.

Can include it is not limited to user often removes commercial circle, application program installation and purchase row using interactive information For social interaction behavior (including friend relation), advertisement browsing or click behavior, wealth pay logical or wechat payment behavior etc..

Secondly, in the embodiment of the present invention, the corresponding user's characteristic information of user is introduced, i.e. the first user characteristics letter Breath may include at least one in the first customer attribute information, the first interest characteristics information and the first application interactive information, The second user characteristic information includes second user attribute information, the second interest characteristics information and second using interactive information At least one of in.By the above-mentioned means, influence of user's various features information to similarity can be considered, even newly User does not have the case where historical data, and the calculating of similarity information can also be completed by the characteristic information of user itself, To the practicability of lifting scheme.

Optionally, on the basis of above-mentioned Fig. 3 corresponding 4th embodiment, similarity letter provided in an embodiment of the present invention It ceases in determining the 5th alternative embodiment of method, the first user's characteristic information is the first customer attribute information, and second user is special Reference breath is second user attribute information；

It is true using preset similarity training pattern according to the first user's characteristic information and the second user characteristic information Determine the second similarity information between the first user and second user, may include：

First customer attribute information and second user attribute information are input to preset similarity training pattern, wherein First customer attribute information is used to indicate that the basic status information of user, preset similarity to train mould with second user attribute information Type is the similarity function relational model trained according to the customer attribute information of sample of users and sample of users；

The result exported according to preset similarity training pattern determines the second similarity information.

In the present embodiment, can logic-based return (Logistic regression, LR) and obtain preset similarity training mould Type.Assuming that the current function for training obtained preset similarity training pattern between customer attribute information and similarity information closes System, you can to calculate the similarity between user using a kind of possible functional relation as follows.

Similarity=Ax₁+By₁+Cx₂+Dy₂+Ex₃+Fy₃ (3)

Wherein, Similarity indicates similarity information, and A, B, C, D, E and F indicate to calculate by logistic regression respectively The coefficient arrived, x₁It can indicate that user's first corresponds to the age of the first customer attribute information, y₁It can indicate that user's second corresponds to second and uses The age of family attribute information, x₂It can indicate that user's first corresponds to the income level of the first customer attribute information, y₂It can indicate user Second corresponds to the income level of second user attribute information, x₃It can indicate that user's first corresponds to the love and marriage feelings of the first customer attribute information Condition (if " 1 " indicates married, " 0 " indicates unmarried), y₃It can indicate that user's second first corresponds to the love and marriage feelings of second user attribute information Condition.Above-mentioned example is only a signal, should not be construed as limitation of the invention.

Again, in the embodiment of the present invention, server can further utilize preset similarity training pattern and user property The second similarity information between the first user and second user is calculated in information.By the above-mentioned means, for the realization of scheme A kind of feasible mode is provided, and solves the relatively low situation of similarity between user, is considered in customer attribute information Many index it is rational as a result, the thus reliability of lifting scheme to obtain.

Optionally, on the basis of above-mentioned Fig. 3 corresponding 4th embodiment, similarity letter provided in an embodiment of the present invention It ceases in determining the 6th alternative embodiment of method, the first user's characteristic information is the first interest characteristics information, and second user is special Reference breath is the second interest characteristics information；

According to the first user's characteristic information and second user characteristic information, is determined using preset similarity training pattern The second similarity information between one user and second user may include：

By the first interest characteristics information and the second interest characteristics information input to preset similarity training pattern, wherein First interest characteristics information and the obtained key message of historical record that the second interest characteristics information is according to user, preset phase It is the similarity function relationship mould trained according to the interest characteristics information of sample of users and sample of users like degree training pattern Type；

In the present embodiment, can logic-based return to obtain preset similarity training pattern.Assuming that current training obtain it is pre- Set functional relation of the similarity training pattern between interest characteristics information and similarity information, you can to use following one kind can Can functional relation calculate the similarity between user.

Similarity=Gx₄+Hy₄+Ix₅+Jy₅+Kx₆+Ly₆ (4)

Wherein, Similarity indicates similarity information, and G, H, I, J, K and L indicate to calculate by logistic regression respectively The coefficient arrived, x₄It can indicate that user's first corresponds to the search record in the first interest characteristics information in QQ browsers, y₄It can be with table Show that user's second corresponds to the search record in the second interest characteristics information in QQ browsers, x₅It can indicate that user's first corresponds to first Article browsing record in interest characteristics information in webpage, y₅Can indicate user's second correspond to the second interest characteristics information in Article browsing record in webpage, x₆It can indicate that user's first corresponds to forward in wechat circle of friends in the first interest characteristics information Record, y₆It can indicate that user's second corresponds to the record forwarded in wechat circle of friends in the second interest characteristics information.Above-mentioned example Son is only a signal, should not be construed as limitation of the invention.

Again, in the embodiment of the present invention, server can further utilize preset similarity training pattern and interest characteristics The second similarity information between the first user and second user is calculated in information.By the above-mentioned means, for the realization of scheme A kind of feasible mode is provided, and solves the relatively low situation of similarity between user, is considered in interest characteristics information Many index it is rational as a result, the thus reliability of lifting scheme to obtain.

Optionally, on the basis of above-mentioned Fig. 3 corresponding 4th embodiment, similarity letter provided in an embodiment of the present invention It ceases in determining the 7th alternative embodiment of method, the first user's characteristic information is first using interactive information, second user spy Reference breath applies interactive information for second；

First application interactive information and second are input to preset similarity training pattern using interactive information, wherein First application interactive information and the second application interactive information are used to indicate the interaction scenario between user and application program, preset phase It is the similarity function relationship mould trained using interactive information according to sample of users and sample of users like degree training pattern Type；

In the present embodiment, can logic-based return to obtain preset similarity training pattern.Assuming that current training obtain it is pre- It is using the functional relation between interactive information and similarity information to set similarity training pattern, you can to use following one kind can Can functional relation calculate the similarity between user.

Similarity=Mx₇+Ny₇+Ox₈+Py₈+Qx₉+Ry₉ (5)

Wherein, Similarity indicates similarity information, and M, N, O, P, Q and R indicate to calculate by logistic regression respectively The coefficient arrived, x₇It can indicate that user's first corresponds to the commercial circle access frequency of the first application interactive information, y₇It can indicate user's second pair Answer the commercial circle access frequency of the second application interactive information, x₈It can indicate that user's first corresponds to good friend's number of the first application interactive information Amount, y₈It can indicate that user's second corresponds to good friend's quantity of the second application interactive information, x₉It can indicate that user's first corresponds to the first application The spending amount of interactive information, y₉It can indicate that user's second corresponds to the spending amount of the second application interactive information.Above-mentioned example is only Illustrate for one, should not be construed as limitation of the invention.

Again, in the embodiment of the present invention, server can further be interacted with the application using preset similarity training pattern The second similarity information between the first user and second user is calculated in information.By the above-mentioned means, for the realization of scheme A kind of feasible mode is provided, and solves the relatively low situation of similarity between user, is considered using in interactive information Many index it is rational as a result, the thus reliability of lifting scheme to obtain.

For ease of understanding, below can with a concrete application scene in the present invention similarity information determine process into Row detailed description, specially：

By taking certain online store as an example, the online store can by its station the behavior of user's commodity and commodity library upload to The server that the present invention is introduced, server are capable of providing a variety of Method of Commodity Recommendation, implement referring to Fig. 4, Fig. 4 is the present invention The interface schematic diagram of Collaborative Recommendation, can select different recommendations to calculate as shown, the advertiser user of advertisement need to be launched in example Method, which is advertised, launches.

For advertiser user, on the one hand wish that commercial product recommending algorithm is good enough, after the commodity of recommendation are presented to user, Clicking rate and changing effect are good, on the other hand, it is desirable to be able to be pushed away to new user (users of commodity navigation patterns in no any station) Its interested commodity is recommended, attracting more new visitors that ad click and conversion, Collaborative Recommendation provided by the present invention occurs can solve The certainly above problem.

The server in the present invention is described in detail below, referring to Fig. 5, Fig. 5 is to be serviced in the embodiment of the present invention Device one embodiment schematic diagram, server 20 include：

First acquisition module 201, the first historical data and second user institute for being used to obtain corresponding to the first user are right The second historical data answered；

Second acquisition module 202, if for first acquisition module 201 obtain first historical data with it is described The first similarity information between second historical data is less than or equal to preset thresholding, then obtains first user corresponding the One user's characteristic information and the corresponding second user characteristic information of the second user；

First determining module 203, first user's characteristic information for being obtained according to second acquisition module 202 And the second user characteristic information, first user and the second user are determined using preset similarity training pattern Between the second similarity information, wherein the preset similarity training pattern be according to sample of users and sample of users feature The similarity function relational model that information is trained.

In the present embodiment, the first acquisition module 201 obtains the first historical data and the second use corresponding to the first user The second historical data corresponding to family, if first historical data and described second that first acquisition module 201 obtains The first similarity information between historical data is less than or equal to preset thresholding, then the second acquisition module 202 obtains described first Corresponding first user's characteristic information of user and the corresponding second user characteristic information of the second user, the first determining module 203 first user's characteristic informations according to second acquisition module 202 acquisition and the second user characteristic information, The second similarity information between first user and the second user is determined using preset similarity training pattern, In, the preset similarity training pattern is the similarity function trained according to sample of users and sample of users characteristic information Relational model.

In the embodiment of the present invention, a kind of server is provided, the server first obtains first corresponding to the first user The second historical data corresponding to historical data and second user, if between the first historical data and the second historical data One similarity information is less than or equal to preset thresholding, then obtains corresponding first user's characteristic information of the first user and the second use The corresponding second user characteristic information in family, server are adopted further according to the first user's characteristic information and second user characteristic information The second similarity information between the first user and second user is determined with preset similarity training pattern, wherein preset similar Degree training pattern is the similarity function relational model trained according to sample of users and sample of users characteristic information.By upper Mode is stated, if the first similarity information between user is less than or equal to preset thresholding, can also further be utilized pre- Similarity training pattern is set to calculate a second new similarity information, and use using the second similarity information as capturing One key factor of family preference, to promote the reliability of recommendation.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5, referring to Fig. 6, clothes provided in an embodiment of the present invention It is engaged in another embodiment of device 20,

The server 20 further includes：

Third acquisition module 204, for first determining module 203 using described in the determination of preset similarity training pattern Before the second similarity information between first user and the second user, sample of users set, sample of users history are obtained Data acquisition system and sample of users characteristic information set, wherein include multiple sample of users in the sample of users set, Include each corresponding historical data of the sample of users, the sample of users feature in the sample of users historical data set Include each corresponding sample of users characteristic information of the sample of users in information aggregate；

Second determining module 205, the sample of users set for being obtained according to the third acquisition module 204 and The sample of users historical data set determines positive example similarity sample and negative example similarity sample；

Training module 206, the positive example similarity sample for determining to second determining module 205 described are born Example similarity sample and the sample of users characteristic information set are trained, and mould is trained to obtain the preset similarity Type.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 6, referring to Fig. 7, clothes provided in an embodiment of the present invention It is engaged in another embodiment of device 20,

Second determining module 205 includes：

Computing unit 2051, for according to the sample of users historical data set, calculating in the sample of users set Third similarity information between each two sample of users；

First determination unit 2052, if the third similarity information being calculated for the computing unit 2051 is not It is 0, it is determined that described two sample of users are the positive example similarity sample；

Second determination unit 2053, if the third similarity information for the computing unit 2051 to be calculated is 0, it is determined that described two sample of users are the negative example similarity sample.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5, referring to Fig. 8, clothes provided in an embodiment of the present invention It is engaged in another embodiment of device 20,

The server 20 further includes：

Comparing module 207, for first acquisition module 201 obtain the first user corresponding to the first historical data with And after the second historical data corresponding to second user, first historical data and second historical data are compared, And obtain comparison result；

Third determining module 208, the comparison result for being obtained according to the comparing module 207 determine described first The first similarity information between historical data and second historical data.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5, server 20 provided in an embodiment of the present invention it is another In one embodiment, the first user's characteristic information includes the first customer attribute information, the first interest characteristics information and the first application At least one of in interactive information, the second user characteristic information includes second user attribute information, the second interest characteristics letter At least one of in breath and the second application interactive information.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5, referring to Fig. 9, clothes provided in an embodiment of the present invention It is engaged in another embodiment of device 20, first user's characteristic information is first customer attribute information, the second user Characteristic information is the second user attribute information；

First determining module 203 includes：

First input unit 2031, for first customer attribute information and the second user attribute information is defeated Enter to the preset similarity training pattern, wherein first customer attribute information is used with the second user attribute information In the basic status information for indicating user, the preset similarity training pattern is to be used according to the sample of users and the sample The similarity function relational model that the customer attribute information at family is trained；

Third determination unit 2032, for determining described second according to the result of the preset similarity training pattern output Similarity information.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5, referring to Fig. 10, provided in an embodiment of the present invention In another embodiment of server 20, first user's characteristic information is the first interest characteristics information, and described second uses Family characteristic information is the second interest characteristics information；

First determining module 203 includes：

Second input unit 2033, for the first interest characteristics information and the second interest characteristics information is defeated Enter to the preset similarity training pattern, wherein the first interest characteristics information is with the second interest characteristics information According to the obtained key message of the historical record of user, the preset similarity training pattern be according to the sample of users with The similarity function relational model that the interest characteristics information of the sample of users is trained；

4th determination unit 2034, for determining described second according to the result of the preset similarity training pattern output Similarity information.

Optionally, on the basis of embodiment corresponding to above-mentioned Fig. 5,1 is please referred to Fig.1, it is provided in an embodiment of the present invention In another embodiment of server 20, first user's characteristic information is described first using interactive information, second use Family characteristic information is described second using interactive information；

First determining module 203 includes：

Third input unit 2035, for the first application interactive information and the second application interactive information is defeated Enter to the preset similarity training pattern, wherein the first application interactive information is used with the second application interactive information In indicating the interaction scenario between user and application program, the preset similarity training pattern be according to the sample of users with The similarity function relational model of the sample of users trained using interactive information；

5th determination unit 2036, for determining described second according to the result of the preset similarity training pattern output Similarity information.

Figure 12 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 322 (for example, one or more processors) and memory 332, one or more storages apply journey The storage medium 330 (such as one or more mass memory units) of sequence 342 or data 344.Wherein, 332 He of memory Storage medium 330 can be of short duration storage or persistent storage.The program for being stored in storage medium 330 may include one or one With upper module (diagram does not mark), each module may include to the series of instructions operation in server.Further, in Central processor 322 could be provided as communicating with storage medium 330, be executed on server 300 a series of in storage medium 330 Instruction operation.

Server 300 can also include one or more power supplys 326, one or more wired or wireless networks Interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..Server 300 can also include Bus system, the bus system is for connecting the memory and the processor, so that the memory and described Processor is communicated.

Server architecture shown in the Figure 12 can be based on by the step performed by server in above-described embodiment.

Wherein, central processing unit 322 is for executing following steps：

If the first similarity information between the first historical data and the second historical data is less than or equal to preset thresholding, Obtain corresponding first user's characteristic information of the first user and the corresponding second user characteristic information of second user；

According to the first user's characteristic information and second user characteristic information, is determined using preset similarity training pattern The second similarity information between one user and second user, wherein preset similarity training pattern be according to sample of users with The similarity function relational model that sample of users characteristic information is trained.

Optionally, central processing unit 322 is additionally operable to execute following steps：

Optionally, central processing unit 322 is specifically used for executing following steps：

If third similarity information is not 0, it is determined that described two sample of users are positive example similarity sample；

If third similarity information is 0, it is determined that described two sample of users are negative example similarity sample.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.

The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to before Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that：It still can be to preceding The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features；And these Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of similarity information determines method, which is characterized in that including：

If the first similarity information between first historical data and second historical data is less than or equal to preset gate Limit, then obtain corresponding first user's characteristic information of first user and the corresponding second user feature of the second user Information；

It is true using preset similarity training pattern according to first user's characteristic information and the second user characteristic information Fixed the second similarity information between first user and the second user, wherein the preset similarity training pattern For the similarity function relational model trained according to sample of users and sample of users characteristic information.

2. according to the method described in claim 1, it is characterized in that, described determine described the using preset similarity training pattern Before the second similarity information between one user and the second user, the method further includes：

Obtain sample of users set, sample of users historical data set and sample of users characteristic information set, wherein the sample Include multiple sample of users in this user set, is used comprising each sample in the sample of users historical data set The corresponding historical data in family includes the corresponding sample of each sample of users in the sample of users characteristic information set User's characteristic information；

According to the sample of users set and the sample of users historical data set, determines positive example similarity sample and bear Example similarity sample；

The positive example similarity sample, the negative example similarity sample and the sample of users characteristic information set are instructed Practice, to obtain the preset similarity training pattern.

3. according to the method described in claim 2, it is characterized in that, described according to the sample of users set and the sample User's history data acquisition system determines positive example similarity sample and negative example similarity sample, including：

According to the sample of users historical data set, in the sample of users set between each two sample of users is calculated Three similarity informations；

If the third similarity information is not 0, it is determined that described two sample of users are the positive example similarity sample；

If the third similarity information is 0, it is determined that described two sample of users are the negative example similarity sample.

4. according to the method described in claim 1, it is characterized in that, first historical data obtained corresponding to the first user And after the second historical data corresponding to second user, the method further includes：

First historical data and second historical data are compared, and obtains comparison result；

Determine that the first similarity between first historical data and second historical data is believed according to the comparison result Breath.

5. according to the method described in claim 1, it is characterized in that, first user's characteristic information includes the first user property At least one of in information, the first interest characteristics information and the first application interactive information, the second user characteristic information packet Include at least one in second user attribute information, the second interest characteristics information and the second application interactive information.

6. according to the method described in claim 5, it is characterized in that, first user's characteristic information belongs to for first user Property information, the second user characteristic information be the second user attribute information；

It is described according to first user's characteristic information and the second user characteristic information, mould is trained using preset similarity Type determines the second similarity information between first user and the second user, including：

First customer attribute information and the second user attribute information are input to the preset similarity and train mould Type, wherein first customer attribute information is used to indicate the basic status information of user with the second user attribute information, The preset similarity training pattern is to train to obtain according to the customer attribute information of the sample of users and the sample of users Similarity function relational model；

Second similarity information is determined according to the result of the preset similarity training pattern output.

7. according to the method described in claim 5, it is characterized in that, first user's characteristic information is that first interest is special Reference ceases, and the second user characteristic information is the second interest characteristics information；

The first interest characteristics information and the second interest characteristics information input to the preset similarity are trained into mould Type, wherein the first interest characteristics information is with the second interest characteristics information for obtained by the historical record according to user Key message, the preset similarity training pattern is to be believed according to the interest characteristics of the sample of users and the sample of users The similarity function relational model that breath training obtains；

8. according to the method described in claim 5, it is characterized in that, first user's characteristic information is the first application friendship Mutual information, the second user characteristic information are described second using interactive information；

The first application interactive information and described second are input to the preset similarity using interactive information and train mould Type, wherein the first application interactive information and the second application interactive information are for indicating between user and application program Interaction scenario, the preset similarity training pattern is to interact letter according to applying for the sample of users and the sample of users The similarity function relational model that breath training obtains；

9. a kind of server, which is characterized in that including：

First acquisition module, for obtaining second corresponding to the first historical data and the second user corresponding to the first user Historical data；

Second acquisition module, if first historical data for first acquisition module acquisition and the second history number The first similarity information between is less than or equal to preset thresholding, then obtains corresponding first user characteristics of first user Information and the corresponding second user characteristic information of the second user；

First determining module, first user's characteristic information for being obtained according to second acquisition module and described the Two user's characteristic informations determine second between first user and the second user using preset similarity training pattern Similarity information, wherein the preset similarity training pattern is trained according to sample of users and sample of users characteristic information The similarity function relational model arrived.

10. server according to claim 9, which is characterized in that the server further includes：

Third acquisition module, for first determining module using preset similarity training pattern determine first user with Before the second similarity information between the second user, obtain sample of users set, sample of users historical data set with And sample of users characteristic information set, wherein include multiple sample of users in the sample of users set, the sample is used Comprising the corresponding historical data of each sample of users in the historical data set of family, in the sample of users characteristic information set Including the corresponding sample of users characteristic information of each sample of users；

Second determining module, the sample of users set and the sample for being obtained according to the third acquisition module are used Family historical data set determines positive example similarity sample and negative example similarity sample；

Training module, the positive example similarity sample, the negative example similarity sample for being determined to second determining module This and the sample of users characteristic information set are trained, to obtain the preset similarity training pattern.

11. server according to claim 10, which is characterized in that second determining module includes：

Computing unit, for according to the sample of users historical data set, calculating each two sample in the sample of users set Third similarity information between this user；

First determination unit, if not being 0 for the third similarity information that the computing unit is calculated, it is determined that institute It is the positive example similarity sample to state two sample of users；

Second determination unit, if being 0 for the third similarity information that the computing unit is calculated, it is determined that described Two sample of users are the negative example similarity sample.

12. a kind of server, which is characterized in that including：Memory, processor and bus system；

Wherein, the memory is for storing program；

It is true using preset similarity training pattern according to first user's characteristic information and the second user characteristic information Fixed the second similarity information between first user and the second user, wherein the preset similarity training pattern For the similarity function relational model trained according to sample of users and sample of users characteristic information；

The bus system is for connecting the memory and the processor, so that the memory and the processor It is communicated.

13. server according to claim 12, which is characterized in that the processor is additionally operable to execute following steps：

14. server according to claim 13, which is characterized in that the processor is for executing following steps：

15. a kind of computer readable storage medium, including instruction, when run on a computer so that computer executes such as Method described in claim 1-8.