CN112256979B - Control method and device for similar article recommendation - Google Patents

Control method and device for similar article recommendation Download PDF

Info

Publication number
CN112256979B
CN112256979B CN202011541921.0A CN202011541921A CN112256979B CN 112256979 B CN112256979 B CN 112256979B CN 202011541921 A CN202011541921 A CN 202011541921A CN 112256979 B CN112256979 B CN 112256979B
Authority
CN
China
Prior art keywords
articles
article
item
items
embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011541921.0A
Other languages
Chinese (zh)
Other versions
CN112256979A (en
Inventor
沈振雷
刘凡平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai 2345 Network Technology Co ltd
Original Assignee
Shanghai 2345 Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai 2345 Network Technology Co ltd filed Critical Shanghai 2345 Network Technology Co ltd
Priority to CN202011541921.0A priority Critical patent/CN112256979B/en
Publication of CN112256979A publication Critical patent/CN112256979A/en
Application granted granted Critical
Publication of CN112256979B publication Critical patent/CN112256979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a control method for approximate article recommendation, which comprises the following steps: a. the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of the to-be-recommended articles; b. storing the imbedding feature vector of the item to be recommended in a vector retrieval database; c. matching the first article determined based on the user access information in a vector retrieval database, and determining an embedding feature vector of the first article; d. determining a second item similar to the embedding feature vector of the first item based on a nearest neighbor finding algorithm. According to the invention, BERT is adopted for training, so that the information of text context in an article can be effectively captured, and reliable Embedding is generated; the training with a large amount of data can be used for a long time once the training is finished, and frequent updating is not needed. The method is simple, the flow is convenient and fast, the recommendation is accurate, the training time is saved, the cold start problem is solved, and the method has extremely high commercial value.

Description

Control method and device for similar article recommendation
Technical Field
The invention belongs to the field of Internet technology application, and particularly relates to a control method and device for approximate item recommendation.
Background
The recommendation system needs to solve the problem of matching of mass users and mass articles, the articles which are most likely to be interested by the users need to be found from the mass articles within tens of milliseconds, a double-tower model is developed in *** in 2019, and the main idea is as follows: the user features are mapped to a user feature vector through DNN, and the feature vector can be generated based on an Embedding technology and represents user interest features. And simultaneously mapping the item features and the item ID to item feature vectors through DNN to represent the item features. The model is then trained by the user's behavioral feedback data with the goal of having the user closest to his favorite items, and this distance is obtained by inner product calculations. The method comprises the steps of putting a feature vector (Embedding) generated by article features into a vector database through training a model, generating the feature vector (Embedding) of a user through the model when the user comes, searching articles with the most similar features (Embedding) in an article vector database, and recommending the articles to the user.
However, in the prior art, there is a certain problem that both the dual-tower model of Google and the derivative model based on the idea of the dual-tower model are basically recommended for video, but the following disadvantages exist when applied to information stream recommendation: the timeliness of the information flow is particularly strong, a large number of news appear every day, a lot of news are out of date quickly, and the fact that the latest news which are interested by a user can be found out quickly is particularly important. However, according to the double-tower model of ***, the embedding generation of the articles depends on the behavior feedback of the user on the articles, and the cold start problem of the new articles cannot be solved; the information flow contains a large amount of text information, the double-tower model and the derivative model thereof cannot effectively utilize the text information, and when some articles only have behavior feedback of few people, the generated article embedding cannot accurately express the information of the articles; in order to generate embedding for a new article, a frequent training model is needed, which changes the embedding of an old article at the same time, so that an article vector library needs to be updated frequently, and when the article is skilled more, the overhead of frequent updating is huge; the imbedding of the article is changed frequently and cannot be used for training the sequencing model, because the change of the imbedding of the article can cause the feature of the same article during training to be inconsistent with the feature of the same article during sequencing; the item feature embedding generation has no mobility, and each item needs to train own embedding.
Particularly, there is no good prediction method for the degree of association between an article and an article, and there is no control method for achieving approximate article recommendation through an article.
Disclosure of Invention
In view of the technical defects in the prior art, the present invention provides a control method and device for approximate item recommendation, and according to an aspect of the present invention, a control method for approximate item recommendation is provided, which includes the following steps:
a. predicting characteristic information of all to-be-recommended articles in a BERT model as input to determine embedding characteristic vectors of one or more to-be-recommended articles, wherein the to-be-recommended articles at least comprise a first article and a second article;
b. storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database;
c. determining an embedding feature vector of a first article based on matching of the first article determined by user access information in a vector retrieval database, wherein the user access information at least comprises user current access information and/or user historical access information;
d. determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm.
Preferably, before the step c, the method further comprises:
i: and caching the historical access information of the user.
Preferably, the BERT model is established by:
a 1: calculating the similarity between any two articles in all the articles to be recommended
Figure 469917DEST_PATH_IMAGE001
a 2: will be provided with
Figure 161930DEST_PATH_IMAGE001
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 321778DEST_PATH_IMAGE001
And one or more article pairs and the characteristic information thereof which are smaller than a second threshold value are used as negative samples, and the BERT model is trained by the positive samples and the negative samples according to the same proportional quantity, wherein the article pairs comprise 2 articles.
Preferably, in the step a1, the calculation of the similarity between any two items in all the items to be recommended is realized by the following formula:
Figure 579584DEST_PATH_IMAGE002
wherein, A represents the user set of favorite articles a, B represents the user set of favorite articles B, f (x) represents the number of set elements, a and B are any of all the articles to be recommendedTwo items.
Preferably, the value range of the first threshold is 0.05-1.
Preferably, the value range of the second threshold is 0-0.0015.
Preferably, will
Figure 681532DEST_PATH_IMAGE003
Is greater than a first threshold value and
Figure 641267DEST_PATH_IMAGE004
one or more item pairs greater than a third threshold value of 3, the item pairs containing 2 items, and as a positive sample.
Preferably, before the step c, the method further comprises:
ii: predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;
iii: storing the embedding feature vectors of the one or more updated items in a vector retrieval database.
Preferably, in the step b, the nearest neighbor searching algorithm is any one of the following manners:
-a cosine similarity algorithm;
-a vector inner product algorithm; or
-euclidean distance algorithm.
Preferably, when the nearest neighbor searching algorithm is a cosine similarity algorithm, one or more articles with the imbedding feature vector similarity smaller than a fourth threshold with the first article are removed as irrelevant articles.
Preferably, the value range of the fourth threshold is 0-0.6.
Preferably, the embedding feature vectors of one or more items with the embedding feature vector similarity greater than a fifth threshold with the first item are taken as the same items for removal.
Preferably, the value range of the fifth threshold is 0.997-1.
Preferably, one or more second articles are displayed after being sorted according to the sequence of similarity from big to small.
According to another aspect of the present invention, there is provided a control device for approximating item recommendation, which employs the control method, including:
the first determination means: determining an embedding feature vector of a first item based on a match of the first item determined by user access information in a vector retrieval database;
second determining means: determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm.
Preferably, the method further comprises the following steps:
third determining means: the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of one or more to-be-recommended articles;
a first storage device: and storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database.
A second storage device: and caching the historical access information of the user.
Preferably, the method further comprises the following steps:
the first computing device: calculating the similarity between any two articles in all the articles to be recommended
Figure 854074DEST_PATH_IMAGE005
A first processing device: will be provided with
Figure 231965DEST_PATH_IMAGE006
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 255547DEST_PATH_IMAGE006
One or more article pairs smaller than the second threshold value and the characteristic information thereof are used as negative samples, and the positive samples and the negative samples are counted according to the same proportionThe quantity pair BERT model was trained, wherein the pair of items contained 2 items.
Preferably, the method further comprises the following steps:
fourth determining means: predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;
a third storage device: storing the embedding feature vectors of the one or more updated items in a vector retrieval database.
The invention discloses a control method for recommending approximate articles, which is characterized in that an embedding characteristic vector of a first article is determined based on the matching of the first article determined by user access information in a vector retrieval database, wherein the user access information at least comprises the current access information and/or the historical access information of a user; determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm. According to feedback information of a user on information flow information, the BERT is trained, so that the BERT can generate embedding of the information according to information titles and contents, the embedding can well express the interest of the user, and other related contents which are interested by the user can be quickly found out in a nearest neighbor searching mode through the embedding of articles which the user likes.
The invention has the following beneficial effects:
(1) the method only depends on the user behavior during training, does not depend on the user behavior during prediction, and only depends on the text content of the object, such as a title, an abstract/a text and the like. Therefore, for a new article, a corresponding embedding can be generated in real time and added into an article library, the embedding of the article or the embedding of the user is clicked according to the history of the user, and the user who may be interested in the article or the embedding of the user is found and pushed in a nearest neighbor searching mode, so that the cold start problem in information flow recommendation is effectively solved;
(2) the BERT is adopted for training, so that the information of text context in an article can be effectively captured, and reliable Embedding is generated;
(3) the training can be carried out by adopting a large amount of data, and the training can be used for a long time once being finished without frequent updating. The training time is saved, and meanwhile, the online performance expense caused by frequent updating of the object vector library is avoided;
(4) because the stability can be maintained for a long time, the embedding characteristic of the article can be directly used for training the sequencing model and predicting on line, and the problem of inconsistent training and predicting can not be caused;
(5) the whole model has certain mobility, the model trained in a mature large-scale information flow scene can be directly taken to be used by a similar new information flow product, and the problem that a project is cold-started and has no data is solved;
the method is simple, the flow is convenient and fast, the recommendation is accurate, the training time is saved, the cold start problem is solved, and the method has extremely high commercial value.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic flow chart diagram illustrating a control method for approximating item recommendation according to an embodiment of the present invention;
FIG. 2 is a detailed flow chart diagram of a control method for approximating item recommendation according to a first embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a specific process of building the BERT model according to the second embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a specific process of storing embedding feature vectors of one or more updated items in a vector retrieval database according to a third embodiment of the present invention; and
fig. 5 is a schematic block diagram of a control device for approximating item recommendation according to another embodiment of the present invention.
Detailed Description
In order to better and clearly show the technical scheme of the invention, the invention is further described with reference to the attached drawings.
Fig. 1 is a flowchart illustrating a control method for approximate item recommendation according to an embodiment of the present invention, and the present invention provides a control method for approximate item recommendation based on click behavior of a user, historical visits of the user, collections of the user, and favorite items of the user, which performs nearest algorithm search on the items after vectorization representation through training of a training model, and further determines other items closest to the items, specifically, the control method for approximate item recommendation includes the following steps:
firstly, step S101 is entered, feature information of all to-be-recommended articles is used as input to be predicted in a BERT model to determine embedding feature vectors of one or more to-be-recommended articles, the to-be-recommended articles at least comprise a first article and a second article, and step S101 needs to be completed before step S103, so that when a user accesses in real time, no more time is spent on calorie to complete conversion of the articles to the article vectors, and further real-time matching and real-time recommendation are directly performed, that is, the steps are completed when the actual user performs similar article recommendation, and can be completed in a background, so that calculation time is saved, calculation cost is reduced, and calculation efficiency is improved. Those skilled in the art understand that this step is a process of vectorizing and representing the feature information of the item to be recommended, and the BERT is a training model for implementing vectorization and representation, which will be further described in the following embodiments, BERT, i.e., Bidirectional Encoder responses from transformations, which essentially learns a good feature representation for words by running a self-supervised learning method on the basis of a large amount of linguistic data, so-called self-supervised learning refers to supervised learning that runs on data without artificial labeling, and the present invention applies it to similar recommendation of items.
Then, step S102 is performed, and the embedding feature vectors of one or more to-be-recommended items are stored in a vector retrieval database, where the embedding feature vectors of all to-be-recommended items are preferably stored in the vector retrieval database, where the embedding feature vectors include, but are not limited to, a first item, a second item, an item preferred by a user, an item not liked by the user, and the like, and all of these items are stored in the vector retrieval database for standby after feature information of all to-be-recommended items is used as an input to be predicted in the BERT model to determine the embedding feature vectors of one or more to-be-recommended items before the user actually accesses. Further, when step S101 is executed, a match in the vector search database can be quickly found, and the embedding feature vector of the first article is determined.
Then, step S103 is entered, a first item determined based on user access information is matched in a vector retrieval database, and an embedding feature vector of the first item is determined, where the user access information at least includes user current access information and/or user historical access information, in such an embodiment, the user access information is user current access information, user historical access information, user current access information and user historical access information, further, the user current access information is behavior operations such as clicking, browsing, collecting, and liking performed in a user current access state, and the user historical access information refers to the behavior operations performed by a user within a period of time, and a first item preferred by the user can be determined through the behavior operations of the user current access information and/or the user historical access information, the first item is the item needing approximate item recommendation.
Further, before the step S103, the embedding feature vector of the first article corresponding to the first article is preferably stored in a vector search database, which will be further described in the following detailed description, and the first article is further matched in the vector search database to determine the embedding feature vector of the first article.
Finally, step S104 is performed, one or more second items similar to the imbedding feature vector of the first item are determined based on a nearest neighbor search algorithm, where the nearest neighbor search algorithm includes, but is not limited to, a cosine similarity algorithm, a vector inner product algorithm, or a euclidean distance algorithm, further, a plurality of second items may be preferably found by the first item, and further, based on the similarity, recommendation after sorting is performed. .
Furthermore, cosine similarity, also called cosine similarity, is to evaluate the similarity of two vectors by calculating the cosine value of the included angle between the vectors, and the cosine similarity draws the vectors into a vector space according to coordinate values; the inner product of the vectors is an operation of performing point multiplication operation on two vectors, multiplying corresponding bits of the two vectors one by one and then summing, and the euclidean distance, also called euclidean distance, is the most common distance measurement, and measures the absolute distance between two points in a multidimensional space, and from the viewpoint of three results for realizing the nearest search algorithm to search for an article, the larger the result of the cosine similarity algorithm and the vector inner product algorithm is, the more similar the result of the euclidean distance algorithm is, and the smaller the result is.
Further, when the nearest neighbor search algorithm is a cosine similarity algorithm, one or more articles having an embedding feature vector similarity smaller than a fourth threshold are removed as irrelevant articles, in the present invention, a plurality of second articles similar to the first article are preferably determined by the cosine similarity algorithm, and when the embedding feature vector similarity of the first article is smaller than the fourth threshold, it is considered that the similarity of the current article and the first article is low, and the current article and the irrelevant articles can be removed, further, the fourth threshold is preferably 0.6, and in other embodiments, the fourth threshold may also be 0.4, 0.5, and the like, which do not affect the specific implementation of the present invention, and are not described herein again.
Further, the embedding feature vectors of one or more articles with the degree of similarity to the embedding feature vector of the first article being greater than a fifth threshold are taken as the same article to be removed, in such an embodiment, when the degree of similarity to the embedding feature vector of the first article is greater than the fifth threshold, it is considered that the current article is the same as the first article, and then it needs to be removed, more specifically, the value range of the fifth threshold is 0.997-1, in this application, it may be preferably set to 0.997, that is, the embedding feature vectors of one or more articles with the degree of similarity to the embedding feature vector of the first article being greater than 0.997 are taken as the same article to be removed.
Further, if the similarity of one or more second articles and the similarity of the first article are 0.77, 0.75, 0.86, 0.92, 0.89, and 0.74, respectively, then preferably, the second articles are sorted and displayed in the order of photographic identity of 0.92, 0.89, 0.86, 0.77, 0.75, and 0.74.
Fig. 2 shows a first embodiment of the present invention, and fig. 2 shows a detailed flowchart of a control method for approximating item recommendation according to the first embodiment of the present invention, and specifically, the method further includes:
the steps S201 to S202 may refer to the foregoing steps S101 to S102, and further, the step S203 is entered to buffer the historical access information of the user, and the step S203 may be executed before, simultaneously with, or after the step S101, and is used to determine the first item when the user is currently accessing, preferably based on the historical access information of the user, so as to effectively solve the cold start problem in the information flow recommendation.
The steps S204 to S205 can refer to the steps S103 to S104, which are not described herein.
Fig. 3 shows a specific flowchart for building the BERT model according to the second embodiment of the present invention, and those skilled in the art understand that the BERT model is built by the following steps:
first, step S301 is entered, and the similarity between any two items in all the items to be recommended is calculated
Figure 656573DEST_PATH_IMAGE006
In such an embodiment, if there are five items, respectively A, B, C, D and E, in all the items to be recommended, then according to step S301, the calculation is performed
Figure 719075DEST_PATH_IMAGE007
Figure 889157DEST_PATH_IMAGE008
Figure 395224DEST_PATH_IMAGE009
Figure 565437DEST_PATH_IMAGE010
Figure 120046DEST_PATH_IMAGE011
Figure 941372DEST_PATH_IMAGE012
Figure 867608DEST_PATH_IMAGE013
Figure 774384DEST_PATH_IMAGE014
Figure 867105DEST_PATH_IMAGE015
Figure 801391DEST_PATH_IMAGE016
Then, the process proceeds to step S302, where
Figure 586944DEST_PATH_IMAGE017
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 981016DEST_PATH_IMAGE006
One or more article pairs and characteristic information thereof which are smaller than a second threshold value are used as negative samples, the BERT model is trained by the positive samples and the negative samples according to the same proportional quantity, wherein the article pairs comprise 2 articles, in such an embodiment, because the similarity of the two articles is calculated, the similarity between the articles needs to be treated as a whole, namely as an article pair,the pair of items will contain two items for determining the similarity between the two items as a whole, and the characteristic information is preferably context information of the items, description information of the items, names of the items, sources of the items, and the like.
Further, in the step S301, the calculation of the similarity between any two items in all the items to be recommended is implemented by the following formula:
Figure 189013DEST_PATH_IMAGE002
wherein, a represents the user set of favorite articles a, B represents the user set of favorite articles B, f (x) represents the number of set elements, and a and B represent any two articles in all the articles to be recommended.
Further, the value range of the first threshold is 0.05-1, and the value of the first threshold is preferably 0.1, that is, the value is about to be obtained
Figure 922613DEST_PATH_IMAGE017
One or more article pairs greater than 0.1 and the characteristic information are taken as positive samples, and in other embodiments, values of 0.07, 0.5, 0.8, and so on may also be taken.
Further, the value range of the second threshold is 0-0.0015, and the value of the second threshold is preferably 0.001, that is, the value is about to be obtained
Figure 675806DEST_PATH_IMAGE017
One or more item pairs and their characteristic information less than 0.001 are used as negative examples, but in other embodiments, values of 0.0001, 0.0007, 0.0013, etc. may be used.
Further, will
Figure 573486DEST_PATH_IMAGE003
Is greater than a first threshold value and
Figure 70326DEST_PATH_IMAGE004
one or more item pairs greater than a third threshold and as a positive sample, wherein the third threshold3, said pair of articles comprising 2 articles, in such an embodiment, in combination with the above-described embodiment, will be
Figure 924013DEST_PATH_IMAGE018
Greater than 0.1 and
Figure 97374DEST_PATH_IMAGE019
one or more article pairs greater than 3 and as positive samples.
Fig. 4 shows a specific flowchart of a third embodiment of the present invention, in which imbedding feature vectors of one or more updated articles are stored in a vector retrieval database, and it is understood by those skilled in the art that fig. 4 is used to implement that, in the daily use process of the training model, if an emerging article exists, the emerging article can be timely vectorized and represented, and then stored in the vector retrieval database, the present invention may employ a large amount of data training, once the training is completed, the emerging article can be used for a longer time, and frequent updating is not necessary, which not only saves training time, but also avoids online performance overhead caused by frequently updating the article vector library, and specifically, before step S101, the present invention further includes:
firstly, proceeding to step S401, predicting the feature information of one or more updated articles as input in the BERT model to determine the embedding feature vector of one or more updated articles, and as understood by those skilled in the art, the step S401 may refer to the aforementioned step S201, that is, the determination of the vectorized representation of the article to be recommended is the same as the determination of the vectorized representation of the updated article.
Then, step S402 is performed, and the embedding feature vectors of one or more updated articles are stored in a vector retrieval database, where the step S402 may refer to the step S202, more specifically, the steps S401 to S402 may be performed at any stage of step S101 to step S104, and the steps S401 to S402 serve as an auxiliary method for continuously updating the vector retrieval database and continuously updating the training model.
Fig. 5 is a schematic block diagram of a control device for approximating item recommendation according to another embodiment of the present invention. A control device for approximating item recommendation, which adopts the control method, includes a first determination device 1: the embedded feature vector of the first item is determined based on the matching of the first item determined by the user access information in the vector retrieval database, and the working principle of the first determining apparatus 1 may refer to the foregoing step S103, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a second determining device 2: the one or more second items similar to the embedding feature vector of the first item are determined based on the nearest neighbor searching algorithm, and the working principle of the second determining device 2 may refer to the step S104, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a third determining device 3: the feature information of all the to-be-recommended articles is input into the BERT model for prediction to determine the embedding feature vectors of one or more to-be-recommended articles, and the working principle of the third determining device 3 may refer to the step S101, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a first storage device 4: the embedding feature vectors of one or more to-be-recommended articles are stored in the vector retrieval database, and the working principle of the first storage device 4 may refer to the step S102, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a second storage device 5: the user history access information is cached, and the working principle of the second storage device 5 may refer to the foregoing step S203, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a first computing device 6: calculating the similarity between any two articles in all the articles to be recommended
Figure 669301DEST_PATH_IMAGE006
The operation principle of the first computing device 6 can refer to the aforementioned step S301, hereAnd are not described in detail.
Further, the control device for approximating the item recommendation further comprises a first processing device 7: will be provided with
Figure 704253DEST_PATH_IMAGE005
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 491074DEST_PATH_IMAGE006
One or more article pairs smaller than the second threshold and their feature information are used as negative samples, and the BERT model is trained on the positive samples and the negative samples according to the same proportional number, where the article pairs include 2 articles, and the working principle of the first processing device 7 may refer to the foregoing step S302, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a fourth determination device 8: the feature information of one or more updated items is used as input to be predicted in the BERT model to determine the embedding feature vectors of the one or more updated items, and the working principle of the fourth determining device 8 may refer to the foregoing step S401, which is not described herein again.
Further, the control device for approximate item recommendation further comprises a third storage device 9: the embedding feature vectors of one or more updated articles are stored in the vector retrieval database, and the working principle of the third storage device 9 may refer to the step S402, which is not described herein again.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (15)

1. A control method for approximating item recommendations, comprising the steps of:
a. predicting characteristic information of all to-be-recommended articles in a BERT model as input to determine embedding characteristic vectors of one or more to-be-recommended articles, wherein the to-be-recommended articles at least comprise a first article and a second article;
b. storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database;
c. determining an embedding feature vector of a first article based on matching of the first article determined by user access information in a vector retrieval database, wherein the user access information at least comprises user current access information and/or user historical access information;
d. determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor search algorithm, wherein the BERT model is built by the following steps:
a 1: calculating the similarity between any two articles in all the articles to be recommended
Figure 442455DEST_PATH_IMAGE001
a 2: will be provided with
Figure 359596DEST_PATH_IMAGE001
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 703115DEST_PATH_IMAGE001
One or more article pairs and characteristic information thereof which are smaller than a second threshold value are used as negative samples, the positive samples and the negative samples are used for training the BERT model according to the same proportional quantity, wherein the article pairs comprise 2 articles, the article pairs are obtained by the following steps,
in the step a1, the calculation of the similarity between any two items in all the items to be recommended is realized through the following formula:
Figure 98324DEST_PATH_IMAGE002
wherein A represents a set of users who like item a, B represents a set of users who like item B, and f (x) represents a set elementThe number a and b are any two of all the articles to be recommended.
2. The control method according to claim 1, characterized by, before said step c, further comprising:
i: and caching the historical access information of the user.
3. The control method according to claim 1, wherein the first threshold value ranges from 0.05 to 1.
4. The control method according to claim 2, wherein the second threshold value ranges from 0 to 0.0015.
5. Control method according to claim 2, characterized in that
Figure 975013DEST_PATH_IMAGE001
Is greater than a first threshold value and
Figure 797476DEST_PATH_IMAGE003
one or more item pairs greater than a third threshold value of 3, the item pairs containing 2 items, and as a positive sample.
6. The control method according to claim 1, 4 or 5, characterized by further comprising, before the step c:
ii: predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;
iii: storing the embedding feature vectors of the one or more updated items in a vector retrieval database.
7. Control method according to claim 1, characterized in that in step d, the nearest neighbor finding algorithm is any of the following ways:
cosine similarity calculation;
a vector inner product algorithm; or
And (4) Euclidean distance algorithm.
8. The control method of claim 7, wherein when the nearest neighbor search algorithm is a cosine similarity algorithm, one or more items having an embedding feature vector similarity with the first item less than a fourth threshold are removed as irrelevant items.
9. The control method according to claim 8, wherein the value of the fourth threshold ranges from 0 to 0.6.
10. The control method according to claim 8 or 9, characterized in that the embedding feature vectors of one or more items having an embedding feature vector similarity greater than a fifth threshold with the first item are removed as the same item.
11. The control method according to claim 10, wherein the value of the fifth threshold ranges from 0.997 to 1.
12. The control method according to claim 8, 9 or 11, wherein the one or more second articles are displayed after being sorted in the order of similarity from large to small.
13. A control apparatus that approximates item recommendations using the control method of any one of claims 1-12, comprising:
first determination means (1): determining an embedding feature vector of a first item based on a match of the first item determined by user access information in a vector retrieval database;
second determination means (2): determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor searching algorithm;
third determination means (3): the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of one or more to-be-recommended articles;
first storage means (4): storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database;
second storage means (5): and caching the historical access information of the user.
14. The control device according to claim 13, characterized by further comprising:
first computing means (6): calculating the similarity between any two articles in all the articles to be recommended
Figure 126826DEST_PATH_IMAGE001
First processing device (7): will be provided with
Figure 325726DEST_PATH_IMAGE001
One or more article pairs greater than the first threshold and the characteristic information are used as positive samples
Figure 56922DEST_PATH_IMAGE001
And one or more article pairs and the characteristic information thereof which are smaller than a second threshold value are used as negative samples, and the BERT model is trained by the positive samples and the negative samples according to the same proportional quantity, wherein the article pairs comprise 2 articles.
15. The control device according to claim 13, characterized by further comprising:
fourth determination means (8): predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;
third storage means (9): storing the embedding feature vectors of the one or more updated items in a vector retrieval database.
CN202011541921.0A 2020-12-24 2020-12-24 Control method and device for similar article recommendation Active CN112256979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011541921.0A CN112256979B (en) 2020-12-24 2020-12-24 Control method and device for similar article recommendation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011541921.0A CN112256979B (en) 2020-12-24 2020-12-24 Control method and device for similar article recommendation

Publications (2)

Publication Number Publication Date
CN112256979A CN112256979A (en) 2021-01-22
CN112256979B true CN112256979B (en) 2021-06-04

Family

ID=74225269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011541921.0A Active CN112256979B (en) 2020-12-24 2020-12-24 Control method and device for similar article recommendation

Country Status (1)

Country Link
CN (1) CN112256979B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968506A (en) * 2012-12-14 2013-03-13 北京理工大学 Personalized collaborative filtering recommendation method based on extension characteristic vectors
CN107845025A (en) * 2017-11-10 2018-03-27 天脉聚源(北京)传媒科技有限公司 The method and device of article in a kind of recommendation video
CN110084658A (en) * 2018-01-26 2019-08-02 北京京东尚科信息技术有限公司 The matched method and apparatus of article
CN111046221A (en) * 2019-12-17 2020-04-21 腾讯科技(深圳)有限公司 Song recommendation method and device, terminal equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166668B (en) * 2014-06-09 2018-02-23 南京邮电大学 News commending system and method based on FOLFM models
CN110059271B (en) * 2019-06-19 2020-01-10 达而观信息科技(上海)有限公司 Searching method and device applying tag knowledge network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968506A (en) * 2012-12-14 2013-03-13 北京理工大学 Personalized collaborative filtering recommendation method based on extension characteristic vectors
CN107845025A (en) * 2017-11-10 2018-03-27 天脉聚源(北京)传媒科技有限公司 The method and device of article in a kind of recommendation video
CN110084658A (en) * 2018-01-26 2019-08-02 北京京东尚科信息技术有限公司 The matched method and apparatus of article
CN111046221A (en) * 2019-12-17 2020-04-21 腾讯科技(深圳)有限公司 Song recommendation method and device, terminal equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户聚类和 Logistic函数改进的协同过滤算法;刘榕城等;《电子设计工程》;20181231(第13期);第31页 *

Also Published As

Publication number Publication date
CN112256979A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN111581510A (en) Shared content processing method and device, computer equipment and storage medium
CN104731861B (en) Multi-medium data method for pushing and device
CN111324752B (en) Image and text retrieval method based on graphic neural network structure modeling
CN106649658B (en) Recommendation system and method for user role non-difference treatment and data sparsity
CN111930518B (en) Knowledge graph representation learning-oriented distributed framework construction method
CN105426550B (en) Collaborative filtering label recommendation method and system based on user quality model
CN107145519B (en) Image retrieval and annotation method based on hypergraph
CN107239564B (en) Text label recommendation method based on supervision topic model
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN112597389A (en) Control method and device for realizing article recommendation based on user behavior
CN111651678B (en) Personalized recommendation method based on knowledge graph
CN111190968A (en) Data preprocessing and content recommendation method based on knowledge graph
CN109408578A (en) One kind being directed to isomerous environment monitoring data fusion method
CN110147494A (en) Information search method, device, storage medium and electronic equipment
CN114186084A (en) Online multi-mode Hash retrieval method, system, storage medium and equipment
Tan et al. Attentional autoencoder for course recommendation in mooc with course relevance
CN110083766B (en) Query recommendation method and device based on meta-path guiding embedding
CN111753151B (en) Service recommendation method based on Internet user behavior
CN114564594A (en) Knowledge graph user preference entity recall method based on double-tower model
CN110674313A (en) Method for dynamically updating knowledge graph based on user log
CN108647295B (en) Image labeling method based on depth collaborative hash
Qiu et al. An embedded bandit algorithm based on agent evolution for cold-start problem
CN112256979B (en) Control method and device for similar article recommendation
CN115098728A (en) Video retrieval method and device
CN114153965A (en) Content and map combined public opinion event recommendation method, system and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant