CN112256979B

CN112256979B - Control method and device for similar article recommendation

Info

Publication number: CN112256979B
Application number: CN202011541921.0A
Authority: CN
Inventors: 沈振雷; 刘凡平
Original assignee: Shanghai 2345 Network Technology Co ltd
Current assignee: Shanghai 2345 Network Technology Co ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-06-04
Anticipated expiration: 2040-12-24
Also published as: CN112256979A

Abstract

The invention discloses a control method for approximate article recommendation, which comprises the following steps: a. the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of the to-be-recommended articles; b. storing the imbedding feature vector of the item to be recommended in a vector retrieval database; c. matching the first article determined based on the user access information in a vector retrieval database, and determining an embedding feature vector of the first article; d. determining a second item similar to the embedding feature vector of the first item based on a nearest neighbor finding algorithm. According to the invention, BERT is adopted for training, so that the information of text context in an article can be effectively captured, and reliable Embedding is generated; the training with a large amount of data can be used for a long time once the training is finished, and frequent updating is not needed. The method is simple, the flow is convenient and fast, the recommendation is accurate, the training time is saved, the cold start problem is solved, and the method has extremely high commercial value.

Description

Control method and device for similar article recommendation

Technical Field

The invention belongs to the field of Internet technology application, and particularly relates to a control method and device for approximate item recommendation.

Background

The recommendation system needs to solve the problem of matching of mass users and mass articles, the articles which are most likely to be interested by the users need to be found from the mass articles within tens of milliseconds, a double-tower model is developed in *** in 2019, and the main idea is as follows: the user features are mapped to a user feature vector through DNN, and the feature vector can be generated based on an Embedding technology and represents user interest features. And simultaneously mapping the item features and the item ID to item feature vectors through DNN to represent the item features. The model is then trained by the user's behavioral feedback data with the goal of having the user closest to his favorite items, and this distance is obtained by inner product calculations. The method comprises the steps of putting a feature vector (Embedding) generated by article features into a vector database through training a model, generating the feature vector (Embedding) of a user through the model when the user comes, searching articles with the most similar features (Embedding) in an article vector database, and recommending the articles to the user.

However, in the prior art, there is a certain problem that both the dual-tower model of Google and the derivative model based on the idea of the dual-tower model are basically recommended for video, but the following disadvantages exist when applied to information stream recommendation: the timeliness of the information flow is particularly strong, a large number of news appear every day, a lot of news are out of date quickly, and the fact that the latest news which are interested by a user can be found out quickly is particularly important. However, according to the double-tower model of ***, the embedding generation of the articles depends on the behavior feedback of the user on the articles, and the cold start problem of the new articles cannot be solved; the information flow contains a large amount of text information, the double-tower model and the derivative model thereof cannot effectively utilize the text information, and when some articles only have behavior feedback of few people, the generated article embedding cannot accurately express the information of the articles; in order to generate embedding for a new article, a frequent training model is needed, which changes the embedding of an old article at the same time, so that an article vector library needs to be updated frequently, and when the article is skilled more, the overhead of frequent updating is huge; the imbedding of the article is changed frequently and cannot be used for training the sequencing model, because the change of the imbedding of the article can cause the feature of the same article during training to be inconsistent with the feature of the same article during sequencing; the item feature embedding generation has no mobility, and each item needs to train own embedding.

Particularly, there is no good prediction method for the degree of association between an article and an article, and there is no control method for achieving approximate article recommendation through an article.

Disclosure of Invention

In view of the technical defects in the prior art, the present invention provides a control method and device for approximate item recommendation, and according to an aspect of the present invention, a control method for approximate item recommendation is provided, which includes the following steps:

a. predicting characteristic information of all to-be-recommended articles in a BERT model as input to determine embedding characteristic vectors of one or more to-be-recommended articles, wherein the to-be-recommended articles at least comprise a first article and a second article;

b. storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database;

c. determining an embedding feature vector of a first article based on matching of the first article determined by user access information in a vector retrieval database, wherein the user access information at least comprises user current access information and/or user historical access information;

d. determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm.

Preferably, before the step c, the method further comprises:

i: and caching the historical access information of the user.

Preferably, the BERT model is established by:

a 1: calculating the similarity between any two articles in all the articles to be recommended

；

a 2: will be provided with

One or more article pairs greater than the first threshold and the characteristic information are used as positive samples

And one or more article pairs and the characteristic information thereof which are smaller than a second threshold value are used as negative samples, and the BERT model is trained by the positive samples and the negative samples according to the same proportional quantity, wherein the article pairs comprise 2 articles.

Preferably, in the step a1, the calculation of the similarity between any two items in all the items to be recommended is realized by the following formula:

wherein, A represents the user set of favorite articles a, B represents the user set of favorite articles B, f (x) represents the number of set elements, a and B are any of all the articles to be recommendedTwo items.

Preferably, the value range of the first threshold is 0.05-1.

Preferably, the value range of the second threshold is 0-0.0015.

Preferably, will

Is greater than a first threshold value and

one or more item pairs greater than a third threshold value of 3, the item pairs containing 2 items, and as a positive sample.

Preferably, before the step c, the method further comprises:

ii: predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;

iii: storing the embedding feature vectors of the one or more updated items in a vector retrieval database.

Preferably, in the step b, the nearest neighbor searching algorithm is any one of the following manners:

-a cosine similarity algorithm;

-a vector inner product algorithm; or

-euclidean distance algorithm.

Preferably, when the nearest neighbor searching algorithm is a cosine similarity algorithm, one or more articles with the imbedding feature vector similarity smaller than a fourth threshold with the first article are removed as irrelevant articles.

Preferably, the value range of the fourth threshold is 0-0.6.

Preferably, the embedding feature vectors of one or more items with the embedding feature vector similarity greater than a fifth threshold with the first item are taken as the same items for removal.

Preferably, the value range of the fifth threshold is 0.997-1.

Preferably, one or more second articles are displayed after being sorted according to the sequence of similarity from big to small.

According to another aspect of the present invention, there is provided a control device for approximating item recommendation, which employs the control method, including:

the first determination means: determining an embedding feature vector of a first item based on a match of the first item determined by user access information in a vector retrieval database;

second determining means: determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm.

Preferably, the method further comprises the following steps:

third determining means: the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of one or more to-be-recommended articles;

a first storage device: and storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database.

A second storage device: and caching the historical access information of the user.

Preferably, the method further comprises the following steps:

the first computing device: calculating the similarity between any two articles in all the articles to be recommended

；

A first processing device: will be provided with

One or more article pairs smaller than the second threshold value and the characteristic information thereof are used as negative samples, and the positive samples and the negative samples are counted according to the same proportionThe quantity pair BERT model was trained, wherein the pair of items contained 2 items.

Preferably, the method further comprises the following steps:

fourth determining means: predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;

a third storage device: storing the embedding feature vectors of the one or more updated items in a vector retrieval database.

The invention discloses a control method for recommending approximate articles, which is characterized in that an embedding characteristic vector of a first article is determined based on the matching of the first article determined by user access information in a vector retrieval database, wherein the user access information at least comprises the current access information and/or the historical access information of a user; determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor lookup algorithm. According to feedback information of a user on information flow information, the BERT is trained, so that the BERT can generate embedding of the information according to information titles and contents, the embedding can well express the interest of the user, and other related contents which are interested by the user can be quickly found out in a nearest neighbor searching mode through the embedding of articles which the user likes.

The invention has the following beneficial effects:

(1) the method only depends on the user behavior during training, does not depend on the user behavior during prediction, and only depends on the text content of the object, such as a title, an abstract/a text and the like. Therefore, for a new article, a corresponding embedding can be generated in real time and added into an article library, the embedding of the article or the embedding of the user is clicked according to the history of the user, and the user who may be interested in the article or the embedding of the user is found and pushed in a nearest neighbor searching mode, so that the cold start problem in information flow recommendation is effectively solved;

(2) the BERT is adopted for training, so that the information of text context in an article can be effectively captured, and reliable Embedding is generated;

(3) the training can be carried out by adopting a large amount of data, and the training can be used for a long time once being finished without frequent updating. The training time is saved, and meanwhile, the online performance expense caused by frequent updating of the object vector library is avoided;

(4) because the stability can be maintained for a long time, the embedding characteristic of the article can be directly used for training the sequencing model and predicting on line, and the problem of inconsistent training and predicting can not be caused;

(5) the whole model has certain mobility, the model trained in a mature large-scale information flow scene can be directly taken to be used by a similar new information flow product, and the problem that a project is cold-started and has no data is solved;

the method is simple, the flow is convenient and fast, the recommendation is accurate, the training time is saved, the cold start problem is solved, and the method has extremely high commercial value.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic flow chart diagram illustrating a control method for approximating item recommendation according to an embodiment of the present invention;

FIG. 2 is a detailed flow chart diagram of a control method for approximating item recommendation according to a first embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a specific process of building the BERT model according to the second embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a specific process of storing embedding feature vectors of one or more updated items in a vector retrieval database according to a third embodiment of the present invention; and

fig. 5 is a schematic block diagram of a control device for approximating item recommendation according to another embodiment of the present invention.

Detailed Description

In order to better and clearly show the technical scheme of the invention, the invention is further described with reference to the attached drawings.

Fig. 1 is a flowchart illustrating a control method for approximate item recommendation according to an embodiment of the present invention, and the present invention provides a control method for approximate item recommendation based on click behavior of a user, historical visits of the user, collections of the user, and favorite items of the user, which performs nearest algorithm search on the items after vectorization representation through training of a training model, and further determines other items closest to the items, specifically, the control method for approximate item recommendation includes the following steps:

firstly, step S101 is entered, feature information of all to-be-recommended articles is used as input to be predicted in a BERT model to determine embedding feature vectors of one or more to-be-recommended articles, the to-be-recommended articles at least comprise a first article and a second article, and step S101 needs to be completed before step S103, so that when a user accesses in real time, no more time is spent on calorie to complete conversion of the articles to the article vectors, and further real-time matching and real-time recommendation are directly performed, that is, the steps are completed when the actual user performs similar article recommendation, and can be completed in a background, so that calculation time is saved, calculation cost is reduced, and calculation efficiency is improved. Those skilled in the art understand that this step is a process of vectorizing and representing the feature information of the item to be recommended, and the BERT is a training model for implementing vectorization and representation, which will be further described in the following embodiments, BERT, i.e., Bidirectional Encoder responses from transformations, which essentially learns a good feature representation for words by running a self-supervised learning method on the basis of a large amount of linguistic data, so-called self-supervised learning refers to supervised learning that runs on data without artificial labeling, and the present invention applies it to similar recommendation of items.

Then, step S102 is performed, and the embedding feature vectors of one or more to-be-recommended items are stored in a vector retrieval database, where the embedding feature vectors of all to-be-recommended items are preferably stored in the vector retrieval database, where the embedding feature vectors include, but are not limited to, a first item, a second item, an item preferred by a user, an item not liked by the user, and the like, and all of these items are stored in the vector retrieval database for standby after feature information of all to-be-recommended items is used as an input to be predicted in the BERT model to determine the embedding feature vectors of one or more to-be-recommended items before the user actually accesses. Further, when step S101 is executed, a match in the vector search database can be quickly found, and the embedding feature vector of the first article is determined.

Then, step S103 is entered, a first item determined based on user access information is matched in a vector retrieval database, and an embedding feature vector of the first item is determined, where the user access information at least includes user current access information and/or user historical access information, in such an embodiment, the user access information is user current access information, user historical access information, user current access information and user historical access information, further, the user current access information is behavior operations such as clicking, browsing, collecting, and liking performed in a user current access state, and the user historical access information refers to the behavior operations performed by a user within a period of time, and a first item preferred by the user can be determined through the behavior operations of the user current access information and/or the user historical access information, the first item is the item needing approximate item recommendation.

Further, before the step S103, the embedding feature vector of the first article corresponding to the first article is preferably stored in a vector search database, which will be further described in the following detailed description, and the first article is further matched in the vector search database to determine the embedding feature vector of the first article.

Finally, step S104 is performed, one or more second items similar to the imbedding feature vector of the first item are determined based on a nearest neighbor search algorithm, where the nearest neighbor search algorithm includes, but is not limited to, a cosine similarity algorithm, a vector inner product algorithm, or a euclidean distance algorithm, further, a plurality of second items may be preferably found by the first item, and further, based on the similarity, recommendation after sorting is performed. .

Furthermore, cosine similarity, also called cosine similarity, is to evaluate the similarity of two vectors by calculating the cosine value of the included angle between the vectors, and the cosine similarity draws the vectors into a vector space according to coordinate values; the inner product of the vectors is an operation of performing point multiplication operation on two vectors, multiplying corresponding bits of the two vectors one by one and then summing, and the euclidean distance, also called euclidean distance, is the most common distance measurement, and measures the absolute distance between two points in a multidimensional space, and from the viewpoint of three results for realizing the nearest search algorithm to search for an article, the larger the result of the cosine similarity algorithm and the vector inner product algorithm is, the more similar the result of the euclidean distance algorithm is, and the smaller the result is.

Further, when the nearest neighbor search algorithm is a cosine similarity algorithm, one or more articles having an embedding feature vector similarity smaller than a fourth threshold are removed as irrelevant articles, in the present invention, a plurality of second articles similar to the first article are preferably determined by the cosine similarity algorithm, and when the embedding feature vector similarity of the first article is smaller than the fourth threshold, it is considered that the similarity of the current article and the first article is low, and the current article and the irrelevant articles can be removed, further, the fourth threshold is preferably 0.6, and in other embodiments, the fourth threshold may also be 0.4, 0.5, and the like, which do not affect the specific implementation of the present invention, and are not described herein again.

Further, the embedding feature vectors of one or more articles with the degree of similarity to the embedding feature vector of the first article being greater than a fifth threshold are taken as the same article to be removed, in such an embodiment, when the degree of similarity to the embedding feature vector of the first article is greater than the fifth threshold, it is considered that the current article is the same as the first article, and then it needs to be removed, more specifically, the value range of the fifth threshold is 0.997-1, in this application, it may be preferably set to 0.997, that is, the embedding feature vectors of one or more articles with the degree of similarity to the embedding feature vector of the first article being greater than 0.997 are taken as the same article to be removed.

Further, if the similarity of one or more second articles and the similarity of the first article are 0.77, 0.75, 0.86, 0.92, 0.89, and 0.74, respectively, then preferably, the second articles are sorted and displayed in the order of photographic identity of 0.92, 0.89, 0.86, 0.77, 0.75, and 0.74.

Fig. 2 shows a first embodiment of the present invention, and fig. 2 shows a detailed flowchart of a control method for approximating item recommendation according to the first embodiment of the present invention, and specifically, the method further includes:

the steps S201 to S202 may refer to the foregoing steps S101 to S102, and further, the step S203 is entered to buffer the historical access information of the user, and the step S203 may be executed before, simultaneously with, or after the step S101, and is used to determine the first item when the user is currently accessing, preferably based on the historical access information of the user, so as to effectively solve the cold start problem in the information flow recommendation.

The steps S204 to S205 can refer to the steps S103 to S104, which are not described herein.

Fig. 3 shows a specific flowchart for building the BERT model according to the second embodiment of the present invention, and those skilled in the art understand that the BERT model is built by the following steps:

first, step S301 is entered, and the similarity between any two items in all the items to be recommended is calculated

In such an embodiment, if there are five items, respectively A, B, C, D and E, in all the items to be recommended, then according to step S301, the calculation is performed

、

、

、

、

、

、

、

、

、

。

Then, the process proceeds to step S302, where

One or more article pairs and characteristic information thereof which are smaller than a second threshold value are used as negative samples, the BERT model is trained by the positive samples and the negative samples according to the same proportional quantity, wherein the article pairs comprise 2 articles, in such an embodiment, because the similarity of the two articles is calculated, the similarity between the articles needs to be treated as a whole, namely as an article pair,the pair of items will contain two items for determining the similarity between the two items as a whole, and the characteristic information is preferably context information of the items, description information of the items, names of the items, sources of the items, and the like.

Further, in the step S301, the calculation of the similarity between any two items in all the items to be recommended is implemented by the following formula:

wherein, a represents the user set of favorite articles a, B represents the user set of favorite articles B, f (x) represents the number of set elements, and a and B represent any two articles in all the articles to be recommended.

Further, the value range of the first threshold is 0.05-1, and the value of the first threshold is preferably 0.1, that is, the value is about to be obtained

One or more article pairs greater than 0.1 and the characteristic information are taken as positive samples, and in other embodiments, values of 0.07, 0.5, 0.8, and so on may also be taken.

Further, the value range of the second threshold is 0-0.0015, and the value of the second threshold is preferably 0.001, that is, the value is about to be obtained

One or more item pairs and their characteristic information less than 0.001 are used as negative examples, but in other embodiments, values of 0.0001, 0.0007, 0.0013, etc. may be used.

Further, will

Is greater than a first threshold value and

one or more item pairs greater than a third threshold and as a positive sample, wherein the third threshold3, said pair of articles comprising 2 articles, in such an embodiment, in combination with the above-described embodiment, will be

Greater than 0.1 and

one or more article pairs greater than 3 and as positive samples.

Fig. 4 shows a specific flowchart of a third embodiment of the present invention, in which imbedding feature vectors of one or more updated articles are stored in a vector retrieval database, and it is understood by those skilled in the art that fig. 4 is used to implement that, in the daily use process of the training model, if an emerging article exists, the emerging article can be timely vectorized and represented, and then stored in the vector retrieval database, the present invention may employ a large amount of data training, once the training is completed, the emerging article can be used for a longer time, and frequent updating is not necessary, which not only saves training time, but also avoids online performance overhead caused by frequently updating the article vector library, and specifically, before step S101, the present invention further includes:

firstly, proceeding to step S401, predicting the feature information of one or more updated articles as input in the BERT model to determine the embedding feature vector of one or more updated articles, and as understood by those skilled in the art, the step S401 may refer to the aforementioned step S201, that is, the determination of the vectorized representation of the article to be recommended is the same as the determination of the vectorized representation of the updated article.

Then, step S402 is performed, and the embedding feature vectors of one or more updated articles are stored in a vector retrieval database, where the step S402 may refer to the step S202, more specifically, the steps S401 to S402 may be performed at any stage of step S101 to step S104, and the steps S401 to S402 serve as an auxiliary method for continuously updating the vector retrieval database and continuously updating the training model.

Fig. 5 is a schematic block diagram of a control device for approximating item recommendation according to another embodiment of the present invention. A control device for approximating item recommendation, which adopts the control method, includes a first determination device 1: the embedded feature vector of the first item is determined based on the matching of the first item determined by the user access information in the vector retrieval database, and the working principle of the first determining apparatus 1 may refer to the foregoing step S103, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a second determining device 2: the one or more second items similar to the embedding feature vector of the first item are determined based on the nearest neighbor searching algorithm, and the working principle of the second determining device 2 may refer to the step S104, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a third determining device 3: the feature information of all the to-be-recommended articles is input into the BERT model for prediction to determine the embedding feature vectors of one or more to-be-recommended articles, and the working principle of the third determining device 3 may refer to the step S101, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a first storage device 4: the embedding feature vectors of one or more to-be-recommended articles are stored in the vector retrieval database, and the working principle of the first storage device 4 may refer to the step S102, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a second storage device 5: the user history access information is cached, and the working principle of the second storage device 5 may refer to the foregoing step S203, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a first computing device 6: calculating the similarity between any two articles in all the articles to be recommended

The operation principle of the first computing device 6 can refer to the aforementioned step S301, hereAnd are not described in detail.

Further, the control device for approximating the item recommendation further comprises a first processing device 7: will be provided with

One or more article pairs smaller than the second threshold and their feature information are used as negative samples, and the BERT model is trained on the positive samples and the negative samples according to the same proportional number, where the article pairs include 2 articles, and the working principle of the first processing device 7 may refer to the foregoing step S302, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a fourth determination device 8: the feature information of one or more updated items is used as input to be predicted in the BERT model to determine the embedding feature vectors of the one or more updated items, and the working principle of the fourth determining device 8 may refer to the foregoing step S401, which is not described herein again.

Further, the control device for approximate item recommendation further comprises a third storage device 9: the embedding feature vectors of one or more updated articles are stored in the vector retrieval database, and the working principle of the third storage device 9 may refer to the step S402, which is not described herein again.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A control method for approximating item recommendations, comprising the steps of:

d. determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor search algorithm, wherein the BERT model is built by the following steps:

；

a 2: will be provided with

One or more article pairs and characteristic information thereof which are smaller than a second threshold value are used as negative samples, the positive samples and the negative samples are used for training the BERT model according to the same proportional quantity, wherein the article pairs comprise 2 articles, the article pairs are obtained by the following steps,

in the step a1, the calculation of the similarity between any two items in all the items to be recommended is realized through the following formula:

wherein A represents a set of users who like item a, B represents a set of users who like item B, and f (x) represents a set elementThe number a and b are any two of all the articles to be recommended.

2. The control method according to claim 1, characterized by, before said step c, further comprising:

i: and caching the historical access information of the user.

3. The control method according to claim 1, wherein the first threshold value ranges from 0.05 to 1.

4. The control method according to claim 2, wherein the second threshold value ranges from 0 to 0.0015.

5. Control method according to claim 2, characterized in that

Is greater than a first threshold value and

6. The control method according to claim 1, 4 or 5, characterized by further comprising, before the step c:

7. Control method according to claim 1, characterized in that in step d, the nearest neighbor finding algorithm is any of the following ways:

cosine similarity calculation;

a vector inner product algorithm; or

And (4) Euclidean distance algorithm.

8. The control method of claim 7, wherein when the nearest neighbor search algorithm is a cosine similarity algorithm, one or more items having an embedding feature vector similarity with the first item less than a fourth threshold are removed as irrelevant items.

9. The control method according to claim 8, wherein the value of the fourth threshold ranges from 0 to 0.6.

10. The control method according to claim 8 or 9, characterized in that the embedding feature vectors of one or more items having an embedding feature vector similarity greater than a fifth threshold with the first item are removed as the same item.

11. The control method according to claim 10, wherein the value of the fifth threshold ranges from 0.997 to 1.

12. The control method according to claim 8, 9 or 11, wherein the one or more second articles are displayed after being sorted in the order of similarity from large to small.

13. A control apparatus that approximates item recommendations using the control method of any one of claims 1-12, comprising:

first determination means (1): determining an embedding feature vector of a first item based on a match of the first item determined by user access information in a vector retrieval database;

second determination means (2): determining one or more second items similar to the embedding feature vector of the first item based on a nearest neighbor searching algorithm;

third determination means (3): the characteristic information of all the to-be-recommended articles is used as input to be predicted in a BERT model so as to determine embedding characteristic vectors of one or more to-be-recommended articles;

first storage means (4): storing the embedding feature vectors of one or more to-be-recommended articles in a vector retrieval database;

second storage means (5): and caching the historical access information of the user.

14. The control device according to claim 13, characterized by further comprising:

first computing means (6): calculating the similarity between any two articles in all the articles to be recommended

；

First processing device (7): will be provided with

15. The control device according to claim 13, characterized by further comprising:

fourth determination means (8): predicting feature information of one or more updated items as input in a BERT model to determine embedding feature vectors of the one or more updated items;

third storage means (9): storing the embedding feature vectors of the one or more updated items in a vector retrieval database.