CN109740063A

CN109740063A - Information recalls, information cluster method, device and equipment

Info

Publication number: CN109740063A
Application number: CN201910044328.6A
Authority: CN
Inventors: 马国伟
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-01-17
Filing date: 2019-01-17
Publication date: 2019-05-10

Abstract

The embodiment of the invention provides a kind of information to recall, information cluster method, device and equipment, this method comprises: determination needs to carry out the historical information that the user that information is recalled clicked；For each existing info class cluster, this is had the information that info class cluster includes to be compared with identified historical information, determine that this has the number in the information that info class cluster includes with identical information, wherein, existing info class cluster are as follows: the info class cluster that clustering information is clustered, the feature of each information to be clustered are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered are treated according to the feature of each information to be clustered；The first preset quantity info class cluster is chosen in existing info class cluster；Using the information for including in the existing info class cluster of the first preset quantity as the information for needing to recall for user.When using scheme call back message provided in an embodiment of the present invention, the efficiency that information is recalled can be improved.

Description

Information recalls, information cluster method, device and equipment

Technical field

The present invention relates to field of computer technology, recall more particularly to information, information cluster method, device and equipment.

Background technique

Information recall refer to the process of obtain the user for some user may interested information.

, may interested information in order to effectively obtain user during information is recalled, it will usually collect user gender, The essential informations such as age, occupation, hobby, according to the essential information of user, by way of the information stored in traversal information library, The possible interested information of user is filtered out, and then realizes that information is recalled.

However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art:

Since the information content of stored information in information bank is generally very big, and traverse and the efficiency of filter information again compared with It is low, therefore, determine that the efficiency of the possible information of interest of user is lower using aforesaid way, and then easily lead to the efficiency that information is recalled It is low.

Summary of the invention

The embodiment of the present invention be designed to provide a kind of information recall, information cluster method, device and equipment, to realize Improve the efficiency that information is recalled.Specific technical solution is as follows:

The one side that the present invention is implemented provides a kind of information and recalls method, which comprises

Determination needs to carry out the historical information that the user that information is recalled clicked；

For each existing info class cluster, this had into the information and the progress of identified historical information that info class cluster includes Compare, determines that this has the number in the information that info class cluster includes with identical information, wherein the existing info class cluster Are as follows: the info class cluster that clustering information is clustered is treated according to the feature of each information to be clustered, it is described each to be clustered The feature of information are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered；

The first preset quantity info class cluster is chosen in existing info class cluster, wherein have in selected info class cluster There is the number of identical information to be all larger than the number in the info class cluster that do not choose with identical information；

Using the information for including in the first preset quantity info class cluster as the letter for needing to recall for the user Breath.

Optionally, the method also includes:

Each existing info class cluster is obtained using following manner:

For each information to be clustered, the user for clicking the information to be clustered is determined, and according to the letter of determined user Breath obtains the feature of the information to be clustered；

According to the feature of information to be clustered, the similarity between every two information to be clustered is calculated；

According to the feature of the similarity and each information to be clustered that are calculated, treats clustering information and clustered.

Optionally, after the feature of the information to be clustered of the information acquisition according to determined user the step of, also Include:

From the feature of information to be clustered obtained, the second preset quantity feature is randomly selected as cluster centre；

The characteristic information of similarity and each information to be clustered that the basis is calculated, treats clustering information and is gathered Class obtains the step of info class cluster, comprising:

For each information to be clustered, according to the similarity being calculated, the information to be clustered and any described poly- is determined Similarity between class center, and judge whether the information to be clustered belongs to the cluster centre and correspond to according to identified similarity Info class cluster；If the user to be clustered belongs to the corresponding info class cluster of the cluster centre, which is added to In the corresponding info class cluster of the cluster centre；

The information is calculated according to the feature for the information to be clustered for including in the info class cluster for each info class cluster The average characteristics of class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, by the letter The cluster centre of breath class cluster is updated to the average characteristics being calculated；And each information to be clustered is directed to described in returning, according to meter Obtained similarity determines the similarity between the information to be clustered and any cluster centre, and according to identified Similarity judges whether the information to be clustered belongs to the corresponding info class cluster of the cluster centre；If it is poly- that the user to be clustered belongs to this The information to be clustered is then added to the step in the corresponding info class cluster of the cluster centre by the corresponding info class cluster in class center, Until when the cluster centre of each info class cluster and the average characteristics of the info class cluster all the same, the information that will cluster at this time Class cluster is as cluster result.

Optionally, the characteristic information according to each information to be clustered calculates the phase between every two information to be clustered The step of seemingly spending, comprising:

Using following formula calculate separately the similarity factor between every two information to be clustered, and according to being calculated Similarity factor determines the similarity between described two information to be clustered:

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U_·jIndicate letter to be clustered Cease the feature vector of j, U_·kIndicate the feature vector of information k to be clustered, | U_·k&U_·k| indicate the feature vector of information j to be clustered With the intersection of the feature vector of information k to be clustered, | U_·j|U_·k| indicate information j to be clustered feature vector and information k to be clustered Feature vector union.

The another aspect that the present invention is implemented additionally provides a kind of information cluster method, which comprises

The another aspect that the present invention is implemented additionally provides a kind of information and recalls device, and described device includes:

First determining module, for determining the historical information for needing to carry out the user that information is recalled and clicking；

This is had the information and determine that info class cluster includes for being directed to each existing info class cluster by comparison module Historical information be compared, determine the number with identical information in information that the existing info class cluster includes, wherein described Existing info class cluster are as follows: the info class cluster that clustering information is clustered, institute are treated according to the feature of each information to be clustered State the feature of each information to be clustered are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered；

Second determining module, for choosing the first preset quantity info class cluster in existing info class cluster, wherein selected The number with identical information is all larger than the number in the info class cluster that do not choose with identical information in the info class cluster taken；

As module, the information for that will include in the first preset quantity info class cluster is as the user The information for needing to recall.

Optionally, described device further include:

Module is obtained, for obtaining each existing info class cluster, the acquisition module includes:

It determines submodule, is used for for each information to be clustered, the determining user for clicking the information to be clustered, and according to The feature of the information acquisition of the determined user information to be clustered；

Computational submodule calculates the similarity between every two information to be clustered for the feature according to information to be clustered；

Submodule is clustered, for the feature according to the similarity and each information to be clustered being calculated, to letter to be clustered Breath is clustered.

Optionally, described device further include:

Module is chosen, for randomly selecting the second preset quantity feature from the feature of information to be clustered obtained As cluster centre；

The cluster submodule, is specifically used for

Optionally, the computational submodule, specifically for calculating separately every two information to be clustered using following formula Between similarity factor determine the similarity between described two information to be clustered and according to the similarity factor being calculated:

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U_·jIndicate letter to be clustered Cease the feature vector of j, U_·kIndicate the feature vector of information k to be clustered, | U_·j&U_·k| indicate the feature vector of information j to be clustered With the intersection of the feature vector of information k to be clustered, | U_·j|U_·k| indicate information j to be clustered feature vector and information k to be clustered Feature vector union.

The another aspect that the present invention is implemented, additionally provides a kind of information cluster device, and described device includes:

Third determining module determines the user for clicking the information to be clustered, and root for being directed to each information to be clustered According to the feature of the information acquisition of the determined user information to be clustered；

Computing module calculates separately similar between every two information to be clustered for the feature according to information to be clustered Degree；

Cluster module treats clustering information for the feature according to the similarity and each information to be clustered being calculated It is clustered.

The another aspect that the present invention is implemented additionally provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus；

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes that any of the above-described information recalls method.

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes any of the above-described information cluster method.

At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes any of the above-described information and calls together Back method.

At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes any of the above-described information and gathers Class method.

At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes any of the above-described information and recalls method.

At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes any of the above-described information cluster method.

Information provided in an embodiment of the present invention recalls method, device and equipment, can recall carrying out information for user When, historical information is clicked in the existing info class cluster clustered in advance to information according to user, determines and uses It includes that the info class cluster more than identical information number is used as the information recalled of user needs that historical information is clicked at family, It is to be found according to the hobby of user and meets the information that a category information of user interest hobby is recalled as needs, Without the information stored in traversal and filter information library, the efficiency that information is recalled is provided.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.

Fig. 1 is the flow diagram that a kind of information provided in an embodiment of the present invention recalls method；

Fig. 2 is a kind of flow diagram of information cluster method provided in an embodiment of the present invention；

Fig. 3 is the structural schematic diagram that a kind of information provided in an embodiment of the present invention recalls device；

Fig. 4 is a kind of structural schematic diagram of information cluster device provided in an embodiment of the present invention；

Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention；

Fig. 6 is the structural schematic diagram of another electronic equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.

It is the flow diagram that a kind of information provided in an embodiment of the present invention recalls method referring to Fig. 1, this method comprises:

S100, determination need to carry out the historical information that the user that information is recalled clicked.

The server for sending information to user can recorde the historical information that each user clicked and in addition to this may be used also To record the number etc. that each user clicks each information, for ease of description, the above- mentioned information recorded are properly termed as history note Record.It, can be directly true according to historical record when the executing subject of the embodiment of the present invention is to send the server of information to user Determine the historical information that user clicked；And when the embodiment of the present invention executing subject be except to user send information server it When outer other equipment, other equipment need to obtain historical record from above-mentioned server then to determine history letter that user clicked Breath.

This is had the information and identified history letter that info class cluster includes for each existing info class cluster by S110 Breath is compared, and determines that this has the number in the information that info class cluster includes with identical information.

Wherein, has info class cluster are as follows: clustering information is treated according to the feature of each information to be clustered and is clustered to obtain Info class cluster, the feature of each information to be clustered are as follows: according to the spy of the information acquisition for the user for clicking the information to be clustered Sign.

When judging two information for identical information, the feature that can use information is judged, when the feature of two information Between similarity be less than predetermined similarity threshold when, it is believed that two information are identical.Feature can be in information Type belonging to keyword, information etc..For example, in two information simultaneously include keyword " L'Oreal ", it may be considered that this two A information is identical information；When two information belong to the information of cosmetics type, it may be considered that the two information are phase Same information.

And it will be appreciated by those skilled in the art that, the information with same characteristic features is on information appearance form and non-fully It is identical, for example, same includes two information of " L'Oreal " keyword when using the keyword for including in information as when feature, One be L'Oreal facial mask information, one be L'Oreal face cream information, the two information are not complete on appearance form Exactly the same, thus, two different on appearance form, but the identical information of keyword for including in information is it is also assumed that be Identical information.And it is fair to consider that the two information are identical letters if the appearance form of two information is identical Breath.

For example, including information A, B, C, D in an existing info class cluster；User click historical information include information A, D,E,F；Information A and information D is identical on appearance form, it is believed that existing info class cluster and user click history The identical information for including in information has information A, information D；The number of identical information is 2 at this time；And information B and letter in a kind of situation It include identical keyword " L'Oreal " in information B and information E although it is different on appearance form to cease E, it may be considered that Information B and information E is identical information, at this point, the number of identical information is then 3.

In order to accelerate the efficiency that information is recalled, in a kind of implementation of the present invention, the operation of information service is provided a user Quotient can cluster the information provided a user to obtain info class cluster in real time when providing a user information service, for example, It is clustered information is clicked by the user liked with same interest, has also just obtained existing info class cluster.

Wherein, the method for obtaining information characteristics to be clustered, can be by clicking occupation, the year of the user of information to be clustered The attribute informations such as age, gender characterize, and can also be characterized by information to be clustered by the click condition that user is clicked.

It specifically, can be to click occupation, the age, property of the user of information to be clustered in a kind of implementation of the present invention Not Deng attributes characterize the feature of information to be clustered, for example, the attribute for clicking the user of information to be clustered include: female, 28 years old, drill Member；Correspondingly, the feature of information to be clustered may include: female, 28 years old, performer.

In another kind implementation of the invention, it can also be characterized with information to be clustered by the concrete condition that user is clicked The feature of information to be clustered, specifically, can be characterized with information to be clustered by whether user clicks, it can also be with to be clustered Information is characterized by the number that user clicks.

In this implementation, the feature of information to be clustered can be indicated by way of vector, passes through letter to be clustered Breath is by each of all users user click condition, in the corresponding feature vector of feature to constitute the information to be clustered Each element, that is, the case where an element representation information to be clustered in feature vector is clicked by a user.

Specifically, the case where whether above-mentioned click condition can be clicked for expression information to be clustered by user, such as to When an element Uij=1 of the feature vector Ui of clustering information i, indicate that information i to be clustered was clicked by user j, and element When Uij=0, then it represents that information i to be clustered was not clicked by user j；

In addition, above-mentioned click condition may be to indicate the case where information to be clustered is by user click frequency, for example, to poly- When an element Uij=3 of the feature vector Ui of category information i, indicate that the number that information i to be clustered was clicked by user j is 3 It is secondary, when an element Uij=0 of the feature vector Ui of information i to be clustered, indicate time that information i to be clustered was clicked by user j Number is 0 time.

For convenient for handling in subsequent process the feature of each information to be clustered, the feature of each information to be clustered can To be described based on identical user information, that is, the feature vector of its feature is indicated for each information to be clustered In include the number of element be identical, therefore, it is possible to indicate the feature vector of each information to be clustered using matrix U.

Wherein, in a kind of situation, every a line in matrix U can correspond to an information to be clustered, and each column are one corresponding User, each information to be clustered of element representation in matrix click situation by each user.In this case, matrix Every a line in U can be used as the feature vector of an information to be clustered, indicate that an information to be clustered is clicked by all users The case where.

In another case, each column in matrix U can correspond to an information to be clustered, the corresponding use of every a line Family, each information to be clustered of element representation in matrix click situation by each user.In this case, matrix U In each column can be used as the feature vector of an information to be clustered, indicate what an information to be clustered was clicked by all users Situation.

S120 chooses the first preset quantity info class cluster in existing info class cluster.

Wherein, the number in selected info class cluster with identical information is all larger than in the info class cluster that do not choose and has The number of identical information.

In a kind of implementation, by existing info class cluster, descending sort is carried out according to the number in cluster with identical information, Choose preceding first preset quantity info class cluster；

In another implementation, the optional cluster from the set of existing info class cluster, determining has identical letter in the cluster The number of breath determines whether the number is to have the number of identical information maximum in all clusters；If so, determining the info class Cluster is the info class cluster chosen, and deletes the informational cluster from the set of existing info class cluster, is returned again to from existing info class cluster Set in optional cluster the step of, until the info class cluster of selection reaches the first preset quantity.

The quantity that first preset quantity can according to need call back message determines, wraps in each info class cluster clustered The information content contained is fixed, then, it needs the quantity of call back message bigger, correspondingly needs info class cluster more, also It is that the first preset quantity is bigger.

For example, the number of call back message is needed to click historical information for 800 and user and there is identical information number The number for the information for including in existing info class cluster H, I, G, K, L from high to low is respectively as follows: 300,300,200,200,300； So, then it needs to carry out information using the information for including in info class cluster H, I, G to recall, the first preset quantity is 3 at this time.And one In the case of kind, when needing the number of call back message to be 900, due to having the information content for including in info class cluster H, I, G It is insufficient for the quantity that information is recalled, it is possible to which optional 100 information in info class cluster K, realizes the number that information is recalled Amount is 900, and the first preset quantity is then 4 at this time.

S130, using the information for including in the first preset quantity info class cluster selected in step S120 as use Family needs the information recalled.

Due to having the identical information for including in the historical information that the information for including in info class cluster was clicked with user Number is more, then shows that user is bigger to the interested possibility for the information for including in existing info class cluster, therefore, it is possible to sharp With the historical information clicked with user have identical information number more than existing info class cluster in include information recall.

In each scheme provided in an embodiment of the present invention, the information method of recalling can be recalled carrying out information for user When, historical information is clicked in the existing info class cluster clustered in advance to information according to user, determines and uses It includes that the info class cluster more than identical information number is used as the information recalled of user needs that historical information is clicked at family, It is to be found according to the hobby of user and meets the information that a category information of user interest hobby is recalled as needs, Without the information stored in traversal and filter information library, the efficiency that information is recalled is provided.

In a kind of implementation of the embodiment of the present invention, it can use following steps E-J and obtain each existing info class cluster:

Step E determines the user for clicking the information to be clustered for each information to be clustered, and uses according to determining The feature of the information acquisition at the family information to be clustered.

From time dimension, above-mentioned information to be clustered refers to the letter for having transmitted to user and being clicked by user Breath.From content dimension, above-mentioned information to be clustered can be with are as follows: advertising information, short video information and news information etc..

The process details for specifically obtaining information characteristics to be clustered refer to S110, are not repeating herein.

Step F calculates the similarity between every two information to be clustered according to the feature of information to be clustered.

In the first implementation, every two can be calculated using cosine similarity algorithm according to the feature of information to be clustered Similarity between a information to be clustered.

In second of implementation, two can be calculated to poly- using following formula according to the feature of information to be clustered Similarity between category information:

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U_·jIndicate letter to be clustered Cease the feature vector of j, U_·kIndicate the feature vector of information k to be clustered, | U_·j&U_·k| indicate the feature vector of information j to be clustered With the intersection of the feature vector of information k to be clustered, | U_·j|U_·k| indicate information j to be clustered feature vector and information k to be clustered Feature vector union.Specifically, working as the feature vector U of information j to be clustered_·jThe click of information j to be clustered is clicked for user Situation, the feature vector U of information k to be clustered_·kWhen clicking the click condition of information k to be clustered for user, | U_·j&U_·k| indicate same When click user's number of information j to be clustered and information k to be clustered, | U_·j|U_·k| for the user's number for clicking information j to be clustered Or click user's number of information k to be clustered.

In the third implementation, in the case where feature indicates in the form of vectors, the distance between vector can be used for Indicate similarity between two vectors, i.e. the distance between two vectors are smaller, show that the similarity between two vectors is got over It is high.

Specifically, can Euclidean distance between the feature vector by calculating two users to be clustered, using calculating To Euclidean distance indicate the similarity between two users to be clustered, that is, the Euclidean distance being calculated is smaller, two Similarity between a user to be clustered is higher.

The similarity factor between two information to be clustered can also be calculated first with formula (1), and formula (2) is recycled to calculate The distance between the two information to be clustered

D (j, k)=1-s (j, k) (2)

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U_·jIndicate letter to be clustered Cease the feature vector of j, U_·kIndicate the feature vector of information k to be clustered, | U_·j&U_·k| indicate the feature vector of information j to be clustered With the intersection of the feature vector of information k to be clustered, | U_·j|U_·k| indicate information j to be clustered feature vector and information k to be clustered Feature vector union.D (j, k) indicates the distance between information j and information k to be clustered to be clustered.Specifically, when to be clustered The feature vector U of information j_·jThe click condition of information j to be clustered, the feature vector U of information k to be clustered are clicked for user_·kFor with When the click condition of information k to be clustered is clicked at family, | U_·j&U_·k| it indicates while clicking information j's to be clustered and information k to be clustered User's number, | U_·j|U_·k| to click user's number of information j to be clustered or clicking user's number of information k to be clustered.

Step J treats clustering information and is gathered according to the feature of the similarity and each information to be clustered that are calculated Class.

Before being clustered, after being clustered according to the information category quantity setting intentionally got during concrete application The quantity of obtained class cluster, i.e. the second preset quantity.For example, it is desirable to obtain 3 information categories, then can be set before cluster Obtained class number of clusters amount is 3 after fixed cluster.Since each class cluster can have a cluster centre in cluster process, When being clustered, the number of the info class cluster that can according to need determines the number of cluster centre, that is, one poly- One info class cluster for needing to obtain of class center representative.

Correspondingly, the numerical value of the second preset quantity is bigger, then it represents that the info class cluster obtained after cluster is more that is, right The classification results of information to be clustered are thinner.

In a kind of implementation, it can use hierarchical clustering algorithm and treat clustering information and clustered, it is default to obtain second Quantity info class cluster, then using the center of obtained info class cluster as the cluster centre of the info class cluster.

In another implementation, according to the step of the feature of the information acquisition of the determined user information to be clustered it Afterwards, further includes:

From the feature of information to be clustered obtained, the second preset quantity feature is randomly selected as cluster centre.

Correspondingly, above-mentioned steps J may include:

Step J1, for each information to be clustered, according to the similarity being calculated, determine the information to be clustered with it is any Similarity between the cluster centre, and judge whether the information to be clustered belongs in the cluster according to identified similarity The corresponding info class cluster of the heart；If the user to be clustered belongs to the corresponding info class cluster of the cluster centre, by the information to be clustered It is added in the corresponding info class cluster of the cluster centre.

One in the feature of each information to be clustered due to cluster centre, and information to be clustered be characterized in by with What the case where family was clicked indicated, and user clicked some information and illustrates that user is interested in this information, so, to poly- The feature of category information and the similarity of cluster centre are higher, then show quilt between information to be clustered information corresponding with cluster centre A possibility that user that there is same interest to like clicks is bigger, that is, two information are the possibility of the information of the same type Property is bigger, and a cluster centre represents an info class cluster, has found high with the characteristic similarity of information to be clustered Cluster centre has also determined that info class cluster belonging to information to be clustered.

Step J2 is calculated for each info class cluster according to the feature for the information to be clustered for including in the info class cluster The average characteristics of the info class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, The cluster centre of the info class cluster is updated to the average characteristics being calculated；And return step J1, until each info class cluster Average characteristics it is identical as the cluster centre of the info class cluster when, using the info class cluster obtained at this time as cluster result.

The average value of the feature for the information to be clustered for including in average characteristics i.e. info class cluster, each info class cluster are equal Correspondence possesses an average characteristics.

In the case where information to be clustered is characterized in being determined according to the number clicked by user, for example, an information It include three information in class cluster, the feature of information 1 is 2,3,4；The feature of information 2 is 5,6,5；The feature of information 3 is 5,9,6； When calculating average characteristics, first element of average characteristics is (2+5+5)/3=4；Second element is (3+6+9)/3=6；The Three elements are (4+5+6)/3=5.

And be characterized in information to be clustered according to whether in the case where being clicked and determined by user, for example, a letter Ceasing includes three information in class cluster, and the feature of information 1 is 1,0,1；The feature of information 2 is 1,0,1；The feature of information 3 be 0,0, 1；When calculating average characteristics, first element of average characteristics is (1+1+0)/3=0.66；Second element is (0+0+0)/3 =0；Third element is (1+1+1)/3=1.

Since above-mentioned average characteristics illustrate the average value of the feature for the information to be clustered for including in info class cluster, so working as Under the cluster centre of selection and the different situation of the average characteristics being calculated, then show that selected cluster centre is not The actual center of info class cluster, cluster result at this time may have error, it is then desired to by the cluster of the info class cluster Center is updated to the average characteristics being calculated, and since each class cluster can have a cluster centre, cluster centre becomes Change correspondingly info class cluster also just to change, thus, after cluster centre updates, need to re-start cluster, it is poly- to improve The accuracy of class result.

When the cluster centre of selection is identical as the average characteristics being calculated, show each information for including in info class cluster It is centered around around cluster centre and is evenly distributed, belong to same type of information；When the cluster centre of selection be calculated Average characteristics it is not identical when, it may be possible to due in info class cluster exist be not belonging to of a sort information with other users when and produce Error is given birth to, it is then desired to need to re-start cluster, after cluster centre update to improve the accuracy of cluster result.

It is provided referring to fig. 2 for the embodiment of the present invention due to needing to be clustered to obtain to existing information existing info class cluster A kind of information cluster method flow diagram, this method comprises:

S200 determines the user for clicking the information to be clustered for each information to be clustered, and according to determined user The information acquisition information to be clustered feature.

Each information has different characteristics, and can be described from a variety of different angles when describing an information. Inventor has found that the information clicked by user is often related to the hobby of user during the experiment, it is, The information that user with same interest hobby clicked is similar, in consideration of it, being clicked and being believed using user in the embodiment of the present invention The case where breath, carrys out the feature of characterization information.

It, can be to click occupation, the year of the user of information to be clustered in a kind of implementation of the present invention based on above content The attributes such as age, gender characterize the feature of information to be clustered, for example, the attribute for clicking the user of information to be clustered includes: female, 28 Year, performer；Correspondingly, the feature of information to be clustered may include: female, 28 years old, performer.

In another kind implementation of the invention, it can also be characterized with information to be clustered by the concrete condition that user is clicked The feature of information to be clustered, specifically, can indicate the feature of information to be clustered in a manner of vector, information to be clustered is by institute There is each of user user click condition to constitute each member in the corresponding feature vector of feature of the information to be clustered A case where element representation information to be clustered in element, that is, feature vector is clicked by a user,

S210 calculates the similarity between every two information to be clustered according to the feature of information to be clustered.

The similarity factor between two information to be clustered can also be calculated first with formula (3), and formula (4) is recycled to calculate The distance between the two information to be clustered

D (j, k)=1-s (j, k) (4)

S220 treats clustering information and is clustered according to the feature of the similarity and each information to be clustered that are calculated.

In a kind of implementation, after the step of the feature of the information acquisition of the determined user information to be clustered, Further include:

Correspondingly, the numerical value of the second preset quantity is bigger, then it represents that the info class cluster obtained after cluster is more that is, right The classification results that information to be clustered is classified are thinner.

In another implementation, from the feature of information to be clustered obtained, the second preset quantity is randomly selected Feature is as cluster centre.

Correspondingly, above-mentioned steps 220 may include:

Step M1, for each information to be clustered, according to the similarity being calculated, determine the information to be clustered with it is any Similarity between the cluster centre, and judge whether the information to be clustered belongs in the cluster according to identified similarity The corresponding info class cluster of the heart；If the user to be clustered belongs to the corresponding info class cluster of the cluster centre, by the information to be clustered It is added in the corresponding info class cluster of the cluster centre.

Step M2 is calculated for each info class cluster according to the feature for the information to be clustered for including in the info class cluster The average characteristics of the info class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, The cluster centre of the info class cluster is updated to the average characteristics being calculated；And return step M1, until each info class cluster Average characteristics it is identical as the cluster centre of the info class cluster when, using the info class cluster clustered at this time as cluster result.

In each scheme provided in an embodiment of the present invention, what information cluster method can in advance cluster information Existing info class cluster, so that it includes identical that determination, which clicks historical information with user, when recalling for user's progress information Info class cluster more than information number, that is, can be according to the hobby of user as the information for needing to recall for the user It finds and meets the information that a category information of user interest hobby is recalled as needs, without being deposited in traversal and filter information library The information of storage provides the efficiency that information is recalled.

Referring to Fig. 3, the structural schematic diagram of device is recalled for information provided in an embodiment of the present invention, which includes:

First determining module 300, for determining the historical information for needing to carry out the user that information is recalled and clicking；

This is had information that info class cluster includes and institute really for being directed to each existing info class cluster by comparison module 310 Fixed historical information is compared, and obtains identical letter between the information and identified historical information that the existing info class cluster includes The number of breath, wherein the existing info class cluster are as follows: clustering information is treated according to the feature of each information to be clustered and is clustered Obtained info class cluster, the feature of each information to be clustered are as follows: according to the information for the user for clicking the information to be clustered The feature of acquisition；

Second determining module 320, for choosing the first preset quantity info class cluster in existing info class cluster, wherein The number with identical information is all larger than in the info class cluster that do not choose with identical information in selected info class cluster Number；

As module 330, the information for that will include in the first preset quantity info class cluster is as described User needs the information recalled.

In a kind of implementation of the embodiment of the present invention, described device further include:

The cluster submodule, is specifically used for

The information is calculated according to the feature for the information to be clustered for including in the info class cluster for each info class cluster The average characteristics of class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, by the letter The cluster centre of breath class cluster is updated to the average characteristics being calculated；And each information to be clustered is directed to described in returning, according to meter Obtained similarity determines the similarity between the information to be clustered and any cluster centre, and according to identified Similarity judges whether the information to be clustered belongs to the corresponding info class cluster of the cluster centre；If it is poly- that the user to be clustered belongs to this The information to be clustered is then added to the step in the corresponding info class cluster of the cluster centre by the corresponding info class cluster in class center, Until when the average characteristics of each info class cluster are identical as the cluster centre of the info class cluster, the info class that will cluster at this time Cluster is as cluster result.

In a kind of implementation of the embodiment of the present invention, the computational submodule is specifically used for calculating using following formula Similarity factor between every two information to be clustered, and according to the similarity factor being calculated, determine described two letters to be clustered Similarity between breath:

Information provided in an embodiment of the present invention recalls device, can be when recalling for user's progress information, according to user Historical information is clicked in the existing info class cluster clustered in advance to information, determining click with user is gone through History information includes the info class cluster more than identical information number, as the information for needing to recall for the user, that is, being capable of root It is found according to the hobby of user and meets the information that a category information of user interest hobby is recalled as needs, without traversal And the information stored in filter information library, the efficiency that information is recalled is provided.

It referring to fig. 4, is a kind of information cluster device provided in an embodiment of the present invention, which includes:

Third determining module 400, for determining the user for clicking the information to be clustered for each information to be clustered, And according to the feature of the information acquisition of the determined user information to be clustered；

Computing module 410 calculates similar between every two information to be clustered for the feature according to information to be clustered Degree；

Cluster module 420, for the feature according to the similarity and each information to be clustered being calculated, to letter to be clustered Breath is clustered.

Randomized blocks, for randomly selecting the second preset quantity feature from the feature of information to be clustered obtained As cluster centre；

Cluster module 420, is specifically used for

The information is calculated according to the feature for the information to be clustered for including in the info class cluster for each info class cluster The average characteristics of class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, by the letter The cluster centre of breath class cluster is updated to the average characteristics being calculated；And each information to be clustered is directed to described in returning, according to meter Obtained similarity determines the similarity between the information to be clustered and any cluster centre, and according to identified Similarity judges whether the information to be clustered belongs to the corresponding info class cluster of the cluster centre；If it is poly- that the user to be clustered belongs to this The information to be clustered is then added to the step in the corresponding info class cluster of the cluster centre by the corresponding info class cluster in class center, Until being tied when the average characteristics of each info class cluster are identical as cluster centre using the info class cluster clustered at this time as cluster Fruit.

In a kind of implementation of the embodiment of the present invention, the computing module 410 is specifically used for distinguishing using following formula The similarity factor between every two information to be clustered is calculated, and according to the similarity factor being calculated, is determined described two to poly- Similarity between category information:

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U_·jIndicate letter to be clustered Cease the feature vector of j, U_·kIndicate the feature vector of information k to be clustered, | U_·j&U_·j| indicate the feature vector of information j to be clustered With the intersection of the feature vector of information k to be clustered, | U_·j|U_·k| indicate information j to be clustered feature vector and information k to be clustered Feature vector union.

Information cluster device provided in an embodiment of the present invention, the existing info class that information can be clustered in advance Cluster, so that it includes that identical information number is more that determining and user, which clicks historical information, when recalling for user's progress information Info class cluster need the information recalled as the user, that is, can be found according to the hobby of user and meet this The information that one category information of user interest hobby is recalled as needs, without traversing the information stored in simultaneously filter information library, The efficiency that information is recalled is provided.

The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, include processor 001, communication interface 002, Memory 003 and communication bus 004, wherein processor 001, communication interface 002, memory 003 are complete by communication bus 004 At mutual communication,

Memory 003, for storing computer program；

Processor 001 when for executing the program stored on memory 003, realizes letter provided in an embodiment of the present invention Breath recalls method.

Specifically, the above- mentioned information method of recalling includes:

For each existing info class cluster, information and identified historical information which includes are compared Compared with determining the number with identical information in information that the existing info class cluster includes, wherein the existing info class cluster are as follows: The info class cluster that clustering information is clustered, each information to be clustered are treated according to the feature of each information to be clustered Feature are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered；

It should be noted that above-mentioned processor 001, which executes the program stored on memory 003, realizes that information recalls method Other embodiments, with preceding method embodiment part provide embodiment it is identical, which is not described herein again.

In each scheme provided in an embodiment of the present invention, electronic equipment can be when recalling for user's progress information, root Historical information is clicked in the existing info class cluster clustered in advance to information according to user, it is determining to be clicked with user Cross historical information includes that info class cluster more than identical information number is used as the information recalled of user needs, that is, can Enough found according to the hobby of user meets the information that a category information of user interest hobby is recalled as needs, without The information for traversing and storing in filter information library, provides the efficiency that information is recalled.

The embodiment of the invention also provides a kind of electronic equipment, as shown in fig. 6, include processor 011, communication interface 012, Memory 013 and communication bus 014, wherein processor 011, communication interface 012, memory 013 are complete by communication bus 014 At mutual communication,

Memory 013, for storing computer program；

Processor 011 when for executing the program stored on memory 013, realizes letter provided in an embodiment of the present invention Cease clustering method.

Specifically, above- mentioned information clustering method includes:

It should be noted that above-mentioned processor 011, which executes the program stored on memory 013, realizes information cluster method Other embodiments, with preceding method embodiment part provide embodiment it is identical, which is not described herein again.

In each scheme provided in an embodiment of the present invention, electronic equipment can in advance cluster information existing Info class cluster, so that it includes identical information that determining and user, which clicks historical information, when recalling for user's progress information Info class cluster more than number is as the information for needing to recall for the user, that is, can be found according to the hobby of user Meet the information that a category information of user interest hobby is recalled as needs, without what is stored in traversal and filter information library Information provides the efficiency that information is recalled.

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.

In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment The information stated recalls method.

Specifically, the above- mentioned information method of recalling includes:

It should be noted that realize that information recalls the other embodiments of method by above-mentioned computer readable storage medium, Identical as the embodiment that preceding method embodiment part provides, which is not described herein again.

In each scheme provided in an embodiment of the present invention, computer readable storage medium can carry out information for user When recalling, historical information is clicked in the existing info class cluster clustered in advance to information according to user, is determined Clicking historical information with user includes that the info class cluster more than identical information number is used as the letter recalled of user needs Breath, that is, can be found according to the hobby of user and meet what the category information that the user interest is liked was recalled as needs Information provides the efficiency that information is recalled without the information stored in traversal and filter information library.

In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment The information cluster method stated.

Specifically, above- mentioned information clustering method includes:

It should be noted that the other embodiments of information cluster method are realized by above-mentioned computer readable storage medium, Identical as the embodiment that preceding method embodiment part provides, which is not described herein again.

In each scheme provided in an embodiment of the present invention, computer readable storage medium can in advance be clustered information Obtained existing info class cluster, so that when being recalled for user's progress information, it is determining to click historical information packet with user Info class cluster more than number containing identical information, that is, can be according to the emerging of user as the information for needing to recall for the user Interest hobby, which is found, meets the information that a category information of user interest hobby is recalled as needs, without traversal and filter information The information stored in library provides the efficiency that information is recalled.

In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes any information in above-described embodiment and recalls method.

Specifically, the above- mentioned information method of recalling includes:

It is and preceding it should be noted that realize that information recalls the other embodiments of method by above-mentioned computer program product The embodiment for stating the offer of embodiment of the method part is identical, and which is not described herein again.

In each scheme provided in an embodiment of the present invention, computer program product can be recalled carrying out information for user When, historical information is clicked in the existing info class cluster clustered in advance to information according to user, determines and uses It includes that the info class cluster more than identical information number is used as the information recalled of user needs that historical information is clicked at family, It is to be found according to the hobby of user and meets the information that a category information of user interest hobby is recalled as needs, Without the information stored in traversal and filter information library, the efficiency that information is recalled is provided.

In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes any information cluster method in above-described embodiment.

Specifically, above- mentioned information clustering method includes:

It should be noted that the other embodiments of information cluster method are realized by above-mentioned computer program product, and it is preceding The embodiment for stating the offer of embodiment of the method part is identical, and which is not described herein again.

In each scheme provided in an embodiment of the present invention, computer program product can be clustered to obtain to information in advance Existing info class cluster in it is determining to include with user's click historical information so that when carrying out information for user and recalling Info class cluster more than identical information number, that is, can be according to the interest of user as the information for needing to recall for the user Hobby, which is found, meets the information that a category information of user interest hobby is recalled as needs, without traversal and filter information library The information of middle storage provides the efficiency that information is recalled.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer readable storage medium and computer program product embodiments, since it is substantially similar to method Embodiment, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of information recalls method, which is characterized in that the described method includes:

For each existing info class cluster, this is had into the information that info class cluster includes and is compared with identified historical information Compared with determining the number with identical information in information that the existing info class cluster includes, wherein the existing info class cluster are as follows: The info class cluster that clustering information is clustered, each information to be clustered are treated according to the feature of each information to be clustered Feature are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered；

The first preset quantity info class cluster is chosen in existing info class cluster, wherein there is phase in selected info class cluster Number with information is all larger than the number in the info class cluster that do not choose with identical information；

Using the information for including in the first preset quantity info class cluster as the information for needing to recall for the user.

2. the method as described in claim 1, which is characterized in that the method also includes:

Each existing info class cluster is obtained using following manner:

For each information to be clustered, the user for clicking the information to be clustered is determined, and obtained according to the information of determined user Obtain the feature of the information to be clustered；

3. method according to claim 1 or 2, which is characterized in that the information acquisition according to determined user should be to After the step of feature of clustering information, further includes:

The characteristic information of similarity and each information to be clustered that the basis is calculated, treats clustering information and is clustered, The step of obtaining info class cluster, comprising:

For each information to be clustered, according to the similarity being calculated, determine in the information to be clustered and any cluster Similarity between the heart, and judge whether the information to be clustered belongs to the corresponding letter of the cluster centre according to identified similarity Cease class cluster；If the user to be clustered belongs to the corresponding info class cluster of the cluster centre, which is added to this and is gathered In the corresponding info class cluster in class center；

For each info class cluster, according to the feature for the information to be clustered for including in the info class cluster, the info class cluster is calculated Average characteristics, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, by the info class The cluster centre of cluster is updated to the average characteristics being calculated；And return it is described be directed to each information to be clustered, according to calculating The similarity arrived determines the similarity between the information to be clustered and any cluster centre, and according to identified similar Degree judges whether the information to be clustered belongs to the corresponding info class cluster of the cluster centre；If the user to be clustered belongs in the cluster The information to be clustered is then added to the step in the corresponding info class cluster of the cluster centre by the corresponding info class cluster of the heart, until When the cluster centre of each info class cluster and the average characteristics of the info class cluster all the same, using the info class cluster obtained at this time as Cluster result.

4. method according to claim 1 or 2, which is characterized in that the characteristic information according to each information to be clustered, meter The step of calculating the similarity between every two information to be clustered, comprising:

The similarity factor between every two information to be clustered is calculated separately using following formula, and similar according to what is be calculated Coefficient determines the similarity between described two information to be clustered:

Wherein, s (j, k) indicates the similarity factor between information j and information k to be clustered to be clustered, U._jIndicate information j's to be clustered Feature vector, U._kIndicate the feature vector of information k to be clustered, | U._j&U._k| indicate the feature vector of information j to be clustered and to poly- The intersection of the feature vector of category information k, | U._j|U._k| indicate the feature vector of information j to be clustered and the feature of information k to be clustered The union of vector.

5. a kind of information cluster method, which is characterized in that the described method includes:

6. a kind of information recalls device, which is characterized in that described device includes:

This is had the information that info class cluster includes and gone through with identified by comparison module for being directed to each existing info class cluster History information is compared, and determines that this has the number in the information that info class cluster includes with identical information, wherein described existing Info class cluster are as follows: the info class cluster that clustering information is clustered is treated according to the feature of each information to be clustered, it is described every The feature of one information to be clustered are as follows: according to the feature of the information acquisition for the user for clicking the information to be clustered；

Second determining module, for choosing the first preset quantity info class cluster in existing info class cluster, wherein selected Number in info class cluster with identical information is all larger than the number in the info class cluster that do not choose with identical information；

As module, the information for that will include in the first preset quantity info class cluster is as user's needs The information recalled.

7. device as claimed in claim 6, which is characterized in that described device further include:

Determine submodule, for being directed to each information to be clustered, determine and clicked the user of the information to be clustered, and according to really Determine the feature of the information acquisition of the user information to be clustered；

Cluster submodule, for according to the feature of similarity and each information to be clustered being calculated, treat clustering information into Row cluster.

8. device as claimed in claims 6 or 7, which is characterized in that described device further include:

Module is chosen, for from the feature of information to be clustered obtained, randomly selecting the second preset quantity feature conduct Cluster centre；

The cluster submodule, is specifically used for

For each info class cluster, according to the feature for the information to be clustered for including in the info class cluster, the info class cluster is calculated Average characteristics, in the case where the average characteristics being calculated are different from the cluster centre of the info class cluster, by the info class The cluster centre of cluster is updated to the average characteristics being calculated；And return it is described be directed to each information to be clustered, according to calculating The similarity arrived determines the similarity between the information to be clustered and any cluster centre, and according to identified similar Degree judges whether the information to be clustered belongs to the corresponding info class cluster of the cluster centre；If the user to be clustered belongs in the cluster The information to be clustered is then added to the step in the corresponding info class cluster of the cluster centre by the corresponding info class cluster of the heart, until When the cluster centre of each info class cluster and the average characteristics of the info class cluster all the same, the info class cluster that will cluster at this time As cluster result.

9. device as claimed in claims 6 or 7, which is characterized in that

The computational submodule, specifically for calculating the similarity factor between every two information to be clustered using following formula, And according to the similarity factor being calculated, the similarity between described two information to be clustered is determined:

10. a kind of information cluster device, which is characterized in that described device includes:

Third determining module determines the user for clicking the information to be clustered, and according to institute for being directed to each information to be clustered Determine the feature of the information acquisition of the user information to be clustered；

Computing module calculates the similarity between every two information to be clustered for the feature according to information to be clustered；

Cluster module treats clustering information progress for the feature according to the similarity and each information to be clustered being calculated Cluster.

11. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus；

Memory, for storing computer program；

Processor when for executing the program stored on memory, realizes method and step described in claim 1-4, or Realize method and step described in claim 5.

12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes any method and step of claim 1-4, or realizes power when the computer program is executed by processor Benefit require 5 described in method and step.