CN109902823B

CN109902823B - Model training method and device based on generation countermeasure network

Info

Publication number: CN109902823B
Application number: CN201811654623.5A
Authority: CN
Inventors: 刘志容; 董振华; 张宇宙; 刘明瑞; 郭贵斌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2024-06-07
Anticipated expiration: 2038-12-29
Also published as: CN109902823A; WO2020135642A1

Abstract

The embodiment of the application provides a model training method and device based on a generated countermeasure network, wherein the method comprises the following steps: the device generates a positive example forged article and a negative example forged article for the first user through the generation model; the apparatus trains a plurality of real pairs of items and a plurality of counterfeit pairs of items to obtain a discriminant model for discriminating differences between the plurality of real pairs of items and the plurality of counterfeit pairs of items; each real article pair comprises a positive real article and a negative real article, and each counterfeit article pair comprises one of the positive counterfeit articles and one of the negative counterfeit articles; the device updates the generative model according to a loss function of the discriminant model. By adopting the embodiment of the application, the generation capacity of the generated model and the discrimination capacity of the discrimination model can be improved.

Description

Model training method and device based on generation countermeasure network

Technical Field

The application relates to the field of big data, in particular to a model training method and device based on a generated countermeasure network.

Background

With the continuous development of informatization, people face the problem of increasingly serious information overload. The personalized recommendation system is used as an effective information filtering tool and can provide various personalized recommendation services for users. The Information retrieval generation countermeasure network (Information RETRIEVAL GAN, IRGAN) is a model in which a generation countermeasure network (GENERATIVE ADVERSARIAL NET, GAN) model is applied to the field of article recommendation, and the input article data is trained to obtain a generation model and a discrimination model, wherein the generation model is responsible for generating a counterfeit article similar to a real article, and the discrimination model is responsible for discriminating the generated counterfeit article from the real sample. The training of the generation model and the discrimination model is interdependent, in an item recommendation scene, the generation model is required to generate scores of forged items and the items, and then the items are ordered according to the scores so as to obtain a recommendation result.

Common training methods for IRGAN include the point-wise (point-wise) method and the pair-wise (pair-wise) method. The main idea of Point-wise is to translate the recommendation problem into a classification problem or regression problem, assuming that the user's preference for each item is independent, training on the item extraction features that the user may like. The main idea of Pair-wise is to convert the recommendation problem into a two-class problem, and model training is performed without independent assumption on the articles, but with the article pairs as the minimum unit of training, and each article Pair usually comprises an article that the user likes and an article that the user dislikes. At present, the training effect of the pair-wise is not as good as that of the point-wise, and how to optimize the pair-wise, so that the generation capacity of a generation model and the discrimination capacity of a discrimination model in a recommended scene are improved, and the technical problem of research of a person in the field is solved.

Content of the application

The embodiment of the application discloses a model training method and device based on a generated countermeasure network, which can improve the generation capacity of a generated model and the discrimination capacity of a discrimination model.

In a first aspect, an embodiment of the present application provides a model training method based on generating an countermeasure network, the method including:

The method comprises the steps that equipment generates a positive case forged article and a negative case forged article for a first user through a generation model, wherein the negative case forged article is generated according to the positive case forged article, the positive case forged article of the first user is a predicted article focused by the first user, and the negative case forged article of the first user is a predicted article not focused by the Ren Di user; the apparatus trains a plurality of real pairs of items and a plurality of counterfeit pairs of items to obtain a discriminant model for discriminating differences between the plurality of real pairs of items and the plurality of counterfeit pairs of items; each real article pair comprises a positive real article and a negative real article, and each counterfeit article pair comprises one of the positive counterfeit articles and one of the negative counterfeit articles; the positive real object is an object which is considered by the first user according to the operation behavior of the first user, and the negative real object is an object which is considered by the first user according to the operation behavior of the first user and is not considered by the first user; the device updates the generative model according to a loss function of the discriminant model.

By executing the method, the negative example forged article in the forged article pair is generated by relying on the positive example forged article, and the potential relation between the negative example forged article and the positive example forged article is fully considered, so that the forged article pair contains more information, the training effect is improved, the generating capacity of the generating model is enhanced, and the recommended result generated by sequencing the article generated by the generating model and the existing real article has more reference value for users.

With reference to the first aspect, in a first possible implementation manner of the first aspect, after updating, by the device, the generating model according to a loss function of the discriminant model, the method further includes: the device generates scores of counterfeit items through the updated generation model, wherein the counterfeit items comprise the positive counterfeit items and the negative counterfeit items generated for the first user; the device sorts the genuine articles and the counterfeit articles according to the scores of the counterfeit articles and the scores of the existing genuine articles, and recommends articles to the first user according to the order in the sorting. It will be appreciated that the recommendation results generated by ordering the items generated by the generative model and the existing real items are of more reference value to the user.

With reference to the first aspect, or any one of the foregoing possible implementation manners of the first aspect, in a second possible implementation manner of the first aspect, after the generating a positive example forged article and a negative example forged article by the device through the generating model for the first user, before the training, by the device, the plurality of real article pairs and the plurality of forged article pairs to obtain the discriminating model, the method further includes: the equipment matches a first negative example forged article for each of a plurality of first positive example forged articles to form a plurality of forged article pairs, wherein the first negative example forged article belongs to a negative example forged article of the first user, which is marked and arranged in the front M bits, M is the number of the first positive example forged articles, and the first positive example forged article is the positive example forged article of the first user sampled from the positive example forged articles generated by the generation model; in addition, the device matches a first negative real object for each of a plurality of first positive real objects to form a plurality of real object pairs, wherein the first negative real object belongs to a negative real object with scores of top N in the negative real objects of the first user, N is the number of the first positive real objects, and the first positive real object is one positive real object sampled from the positive real objects of the first user.

It can be understood that the high-scoring articles are collected to form the article pairs, including the real article pairs and the counterfeit article pairs, and the high-scoring articles are more focused by users, so that the article pairs obtained in the way have larger information content and smaller noise for the users, and the characteristics focused by the users can be fully analyzed according to the training of the article pairs, so that the generating model with higher generating capacity is trained.

With reference to the first aspect, or any one of the foregoing possible implementation manners of the first aspect, in a third possible implementation manner of the first aspect, the initial generation model includes a positive case generation model, a negative case generation model, and a score generation model; the apparatus generates positive and negative counterfeit items for a first user by generating a model, comprising:

The device generates the distribution of the positive counterfeit items of the first user through a positive generation model, wherein the positive generation model is as follows:

The device generates a distribution of negative example counterfeit items of the first user through a negative example generation model, wherein the negative example generation model is as follows:

the device generates a score for each positive example counterfeit item and a score for each negative example counterfeit item through a score generator;

where g ⁺(f⁺ |u) is the distribution of the genuine counterfeit items, e _u is the embedded vector embedding of the first user, Embedding, e _i, embedding, b representing the bias value bias of the first user; g ^-(f^-|u,f⁺) is the distribution of the negative example counterfeit items,/>Is embedding of a negative example counterfeit item to be generated.

With reference to the first aspect, or any one of the foregoing possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the updating, by the device, the generating model according to a loss function of the discriminant model includes: the equipment determines attention indexes of the first user on the articles, wherein the attention indexes of the first user on the articles are obtained by training real article scores and fake article scores of the first user by adopting an attention network; the equipment obtains a reward value reward according to the loss function of the judging model, and optimizes the reward value reward through the attention index of the first user to the article to obtain a new reward value; the device updates the generative model with the new prize value.

It can be understood that the importance of each article pair is different, and the importance weight of each article pair is obtained by introducing the attention network, so that the high-quality article pair can be effectively selected, the negative influence of the low-quality article pair is reduced, and the generated model and the discrimination model obtained by the method are more robust and adaptive. The article pair may be a genuine article pair or a counterfeit article pair.

With reference to the first aspect, or any one of the foregoing possible implementation manners of the first aspect, in a fifth possible implementation manner of the first aspect, the determining, by the device, an attention index of the first user to an article includes:

The device calculates an attention index of the first user to the article according to the following formula by adopting an attention network;

α＝softmax(g(r⁺,r^-,f⁺,f^-|u))

wherein alpha is an attention index of the first user u to the article, w _u represents the trained weight of the first user, Weights representing trained positive example real items of first user,/>Weights representing trained negative example real items of first user,/>Weight representing trained genuine counterfeited articles of the first user,/>A weight representing the trained negative example counterfeit items of the first user; b is the bias value bias of the first user.

With reference to the first aspect, or any one of the foregoing possible implementation manners of the first aspect, in a sixth possible implementation manner of the first aspect, the optimizing, by the attention index of the first user to the item, the reward value reward to obtain a new reward value includes: optimizing the reward value reward through the attention index alpha of the first user to the article to obtain a reward value reward_1 corresponding to the first user, wherein the attention index alpha of the first user to the article, the reward value reward and the reward value reward_1 corresponding to the first user satisfy the following relation: reward_1=α; and determining a new rewarding value according to the rewarding value reward_1 corresponding to the first user.

In a second aspect, an embodiment of the present application provides a model training apparatus based on generating an countermeasure network, the apparatus including:

generating a model for generating a positive example forged article and a negative example forged article for a first user, wherein the negative example forged article is generated according to the positive example forged article, the positive example forged article of the first user is a predicted article focused by the first user, and the negative example forged article of the first user is a predicted article not focused by the Ren Di user;

Training a plurality of real pairs of items and a plurality of counterfeit pairs of items to obtain a discriminant model for discriminating differences between the plurality of real pairs of items and the plurality of counterfeit pairs of items; each real article pair comprises a positive real article and a negative real article, and each counterfeit article pair comprises one of the positive counterfeit articles and one of the negative counterfeit articles; the positive real object is an object which is considered by the first user according to the operation behavior of the first user, and the negative real object is an object which is considered by the first user according to the operation behavior of the first user and is not considered by the first user;

The training model is used for updating the generating model according to the loss function of the judging model.

By running the units, negative-example forged articles in the forged article pair are generated by relying on positive-example forged articles, and potential relations between the negative-example forged articles and the positive-example forged articles are fully considered, so that the forged articles are richer in contained information, the training effect is improved, the generating capacity of a generating model is enhanced, and therefore the recommended results generated by sequencing the articles generated by the generating model and the existing real articles are more valuable for a user.

With reference to the second aspect, in a first possible implementation manner of the second aspect, the apparatus further includes a recommendation model, wherein:

After the training model updates the generating model according to the loss function of the judging model, the updated generating model is used for generating scores of forged objects, and the forged objects comprise the positive forged objects and the negative forged objects generated for the first user;

The recommendation model is used for sorting the real articles and the forged articles according to the scores of the forged articles and the scores of the existing real articles, and recommending the articles to the first user according to the sorting order.

It will be appreciated that the recommendation results generated by ordering the items generated by the generative model and the existing real items are of more reference value to the user.

With reference to the second aspect, or any one of the foregoing possible implementation manners of the second aspect, in a second possible implementation manner of the second aspect, after the generating model generates the positive example forged article and the negative example forged article for the first user, before the training model trains the plurality of real article pairs and the plurality of forged article pairs to obtain the discrimination model, the training model is further configured to:

matching a first negative example forged article with each of a plurality of first positive example forged articles to form a plurality of forged article pairs, wherein the first negative example forged article belongs to a negative example forged article of the first user, which is arranged in the front M grades in the negative example forged articles of the first user, M is the number of the first positive example forged articles, and the first positive example forged article is the positive example forged article of the first user sampled from the positive example forged articles generated by the generation model;

And matching a first negative real object with each of a plurality of first positive real objects to form a plurality of real object pairs, wherein the first negative real object belongs to the positive real object with the top N scores in the negative real objects of the first user, N is the number of the first positive real objects, and the first positive real object is one positive real object sampled from the positive real objects of the first user.

With reference to the second aspect, or any one of the foregoing possible implementation manners of the second aspect, in a third possible implementation manner of the second aspect, the initial generation model includes a positive case generation model, a negative case generation model, and a score generation model; the generation model is used for generating a positive example forged article and a negative example forged article for a first user, and specifically comprises the following steps:

A distribution of positive counterfeit items for a first user is generated by a positive generation model, the positive generation model being:

A distribution for generating negative example counterfeits of a first user by a negative example generation model, the negative example generation model being:

for generating, by a score generator, a score for each positive example counterfeit item and a score for each negative example counterfeit item;

With reference to the second aspect, or any one of the foregoing possible implementation manners of the second aspect, in a fourth possible implementation manner of the second aspect, the generating model is updated according to a loss function of the discriminating model, specifically:

determining attention indexes of the first user to the articles, wherein the attention indexes of the first user to the articles are obtained by training real article scores and fake article scores of the first user by adopting an attention network;

Obtaining a reward value reward according to the loss function of the judging model, and optimizing the reward value reward through the attention index of the first user to the article to obtain a new reward value;

And updating the generation model by adopting the new rewards value.

With reference to the second aspect, or any one of the foregoing possible implementation manners of the second aspect, in a fifth possible implementation manner of the second aspect, the training model determines an attention index of the first user to an article, specifically:

calculating an attention index of the first user to the article according to the following formula by adopting an attention network;

α＝softmax(g(r⁺,r^-,f⁺,f^-|u))

With reference to the second aspect, or any one of the foregoing possible implementation manners of the second aspect, in a sixth possible implementation manner of the second aspect, the optimizing, by the attention index of the first user to the article, the reward value reward to obtain a new reward value is specifically:

Optimizing the reward value reward through the attention index alpha of the first user to the article to obtain a reward value reward_1 corresponding to the first user, wherein the attention index alpha of the first user to the article, the reward value reward and the reward value reward_1 corresponding to the first user satisfy the following relation: reward_1=α;

and determining a new rewarding value according to the rewarding value reward_1 corresponding to the first user.

In a third aspect, embodiments of the present application provide an apparatus comprising a processor and a memory, wherein the memory is for storing program instructions and sample data required to train a model, the processor being for invoking the program instructions to perform the method described in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein program instructions which, when run on a processor, implement the method described in the first aspect or any of the possible implementations of the first aspect.

Drawings

The drawings used in the embodiments of the present application are described below.

Fig. 1A is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 1B is a schematic view of another application scenario provided in an embodiment of the present application;

fig. 1C is a schematic view of another application scenario provided in an embodiment of the present application;

FIG. 1D is a schematic diagram of an apparatus according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a processing flow of a processor according to an embodiment of the present application;

FIG. 3 is a model training method based on generating an countermeasure network provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a training process of a discriminant model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a scenario of an attention mechanism provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a training process for generating a model according to an embodiment of the present application;

FIG. 7 is a schematic view of a scenario for overall training of a discrimination model and a generation model provided by an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The recommendation system aims to accurately predict the preference degree of the user for specific commodities, and the recommendation effect of the recommendation system not only influences the user experience, but also directly influences the benefits of a recommendation platform, so that accurate recommendation has important significance.

The following is a brief description of the recommendation principles and objectives of the recommendation system of table 1.

TABLE 1

User\article	101	102	103	104	105	106
							A	5	3	2.5	？	？	？
B	2	2.5	5	2	？	？
							C	2	？	？	4	4.5	5
D	5	？	3	4.5	？	4
							E	4	3	2	4	3.5	4

The users illustrated in table 1 include user a, user B, user C, user D, and user E, and the illustrated items include item 101, item 102, item 103, item 104, item 105, and item 106, and in addition, table 1 illustrates that the respective users score the respective items, and that a higher score for a certain item by a certain user represents a stronger preference for that item by the user. For example, user A scores 5 points for item 101, indicating that user A has a very high preference for item 101. The question marks in table 1 represent that the user has not scored the item at present, and the goal of the recommendation system is to predict the preference of the corresponding user for the non-rated item. For example, user A's score for items 104, 105, and 106 needs to be predicted, user B's score for items 105 and 106 needs to be predicted, and so on. After the recommendation algorithm of the recommendation system is calculated, the recommendation system can complement the scoring of the unscored objects by the user. If the recommender system wants to recommend a new item for user a, as shown in table 2, item 106 may be a good choice because the recommender system scores item 106 for 5 points, and there is a high likelihood that user a likes item 106, above other items.

TABLE 2

User\article	101	102	103	104	105	106
							A	5	3	2.5	2	4	5
B	2	2.5	5	2	2	4
							C	2	4	3	4	4.5	5
D	5	3	3	4.5	3	4
							E	4	3	2	4	3.5	4

The model training method based on the generated countermeasure network provided by the embodiment of the application can train the generated model with better effect, so that when recommending the article, the generated model is used as the basis for scoring the forged article, and better recommending effect can be obtained.

The model training method based on the generated countermeasure network in the embodiment of the application can be applied to many scenes, such as advertisement click prediction, topN item recommendation of interest, answer prediction most relevant to the problem, and the like, and is exemplified below.

In the advertisement recommendation scene, the advertisement recommendation system needs to return one or more ordered advertisement list display users. The embodiment of the application can predict the advertisements popular with users, thereby improving the click rate of the advertisements. The application can combine the advertisement clicked by the user and the advertisement not clicked into the real article pair, wherein the clicked advertisement is equivalent to the real article of the positive example, the advertisement not clicked is equivalent to the real article of the negative example, the IRGAN technology is adopted, the counterfeit article pair can be generated through the generation model, the discrimination model is used for discriminating which are the generated article pairs and which are the real article pairs, and the click probability (equivalent to the scoring of the article) of each advertisement can be estimated for the user under IRGAN countermeasure training. As shown in FIG. 1A, the predicted value of the click probability of each advertisement by the user can be obtained by training the historical behavior data of the user for the advertisement based on the model training method for generating the countermeasure network.

In a topN item recommendation scene, topN items most interesting to a user are required to be recommended to the user, so that the consumption behavior of the user on the items is promoted, wherein the items can be e-commerce products, application market APP and the like. The application can make up the real article pair with the article that the user consumes or downloads and the user consumes and the article that the user consumes and the user scores lower, wherein, the article that the score is higher is equivalent to the real article of the positive example, the article that the score is lower is equivalent to the real article of the negative example, adopting IRGAN technology, can produce the fake article pair through the generation model, try to distinguish which are the produced article pair through the discrimination model, which are the real article pair, under IRGAN countermeasure training, can estimate that the evaluation of each article by the user is higher, this is equivalent to the score of the article. As shown in FIG. 1B, by training the historical behavior data of the user on the items based on the model training method for generating the countermeasure network, the ranking of the interested degree of the user on each item can be obtained, and thus topN items of interest are output to the user.

In the question-answering scene, the question-answering system needs to give answers which meet the requirements of the user as much as possible aiming at the questions presented by the user, so that the friendliness of the user to the question-answering system is improved. The application can combine the answers which are received by the user and have higher scores by the user and the answers which are received by the user and have lower scores by the user into the real article pairs, wherein the answers with higher scores are equivalent to the real article of the positive example, the answers with lower scores are equivalent to the real article of the negative example, the counterfeiting article pairs can be generated through the generation model by adopting IRGAN technology, the discrimination model is used for discriminating which of the generated article pairs are real article pairs, and under IRGAN countermeasure training, the evaluation of each answer by the user can be estimated to be higher, which is equivalent to the score of the article. As shown in FIG. 1C, by training the historical behavior data of the user for the questions and answers based on the model training method for generating the countermeasure network, the ranking of the satisfaction degree of the user for each answer can be obtained, and accordingly N relatively satisfactory answers are output to the user.

An apparatus for performing the model training method based on generating a countermeasure network is described below with reference to fig. 1D.

Referring to fig. 1D, fig. 1D is a schematic structural diagram of an apparatus for classifying objects according to an embodiment of the present application, where the apparatus may be a device, such as a server, or a cluster formed by several devices, and the structure of the apparatus is briefly described below by taking the device as a server. The device 10 comprises a processor 101, a memory 102 and a communication interface 103, said processor 101, memory 102 and communication interface 103 being interconnected by a bus, wherein:

The communication interface 103 is used to obtain data of existing items, such as identification of existing items, scoring, information of users scoring existing items, etc. Optionally, the communication interface 103 may establish a communication connection with other devices, so that the data of the existing article sent by the other devices may be received or the data of the existing article may be read from the other devices; alternatively, the communication interface 103 may be connected to an external readable storage medium, so that the data of the existing article may be read from the external readable storage medium; the communication interface 103 may also obtain data of existing items in other ways.

Memory 102 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or portable read-only memory (CD-ROM), and memory 102 is used to store associated program instructions and to store associated data, which may include data obtained via communication interface 103, new data generated after processing such data, models, and model-based predictions, and the like.

The processor 101 may be one or more central processing units (central processing unit, CPU), and in the case where the processor 101 is a CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor 101 is configured to read program execution stored in the memory 102, and perform a related operation involved in a model training method based on generating an countermeasure network, such as training of a discrimination model, training of a generation model, scoring prediction of an article, and so on. Referring to fig. 2, fig. 2 illustrates a general execution flow of the processor, which includes inputting information of an existing item, information of a user scoring the item, information of a score value of the item, and the like into the initial discrimination model 201, wherein the information of the existing item may include an item identification ID, and the information of the user scoring the item may include the user identification ID. The generation model 202 also generates some forged objects and inputs the related information of the forged objects into the initial discrimination model 201, so that the discrimination model 201 is trained, the discrimination model 201 and the generation model 202 are continuously subjected to countermeasure to finally obtain a discrimination model 201 with strong capability of discriminating real samples from forged samples, and a generation model 202 with the capability of generating forged objects very close to real objects is obtained; then generating a score for the counterfeit item by the generation model 202; the ranking prediction 203 then generates a ranking of the items of any one user based on the scores of all items of that user, thereby deriving a list of item recommendations for that any user based on the ranking, optionally including real items and counterfeit items. In the embodiment of the present application, the discrimination model 201 includes a discriminator and an attention network, the discriminator is responsible for distinguishing the real object and the counterfeit object, and the attention network is used for recording the attention weights of different users to the real object and the counterfeit object, so as to provide a reference for generating the generation model; the generation model 202 includes an item generator for generating a counterfeit item and a score generator for generating a score for the counterfeit item, wherein the item generator may be further divided into a negative example generator for generating a positive example counterfeit item and a positive example generator for generating a negative example counterfeit item. Wherein dynamic sampling techniques are employed in the item generator for sampling.

Optionally, the device 10 may further include an output component, such as a display, a sound, etc., for presenting parameters to be used by the training model to the developer, so that the developer may learn the parameters, modify the parameters, and input the modified parameters into the device 10 through an input component, which may include a mouse, a keyboard, etc., for example. In addition, the apparatus 10 may also present the trained model, and the results based on the model predictions, to a developer via an output component.

A model training method based on generating an countermeasure network according to an embodiment of the present application will be described in more detail with reference to fig. 3.

Referring to fig. 3, fig. 3 is a model training method based on generating an countermeasure network according to an embodiment of the present application, where the method may be implemented based on the device 10 shown in fig. 1D, or may be implemented based on other architectures, and the method includes the following steps:

step S301: the apparatus generates a counterfeit item for the first user by generating a model.

Specifically, the embodiment of the application relates to a real article and a forged article, wherein the forged article comprises a positive forged article and a negative forged article, the real article comprises a positive real article and a negative real article, each user in a plurality of users has a plurality of concepts of the respective positive forged article, the negative forged article, the positive real article and the negative real article, the positive real article of any user is an article which is operated by the user and is concerned with the user, the negative real article of the user is an article which is operated by the user and is not concerned with the user, the positive forged article of the user is an article which is not operated by the user and is predicted to be concerned with the user, and the negative forged article of the user is an article which is not operated by the user and is not predicted to be concerned with the user. The first user in the embodiment of the present application is one user of multiple users, and for convenience of understanding, the first user is taken as an example for description, and the features of other users may refer to the description of the first user.

The operation behaviors of the first user on the articles displayed on a certain terminal include downloading, evaluating, clicking, browsing and the like, the behaviors can be recorded by the terminal and the corresponding articles are scored according to the operation behaviors, for example, the score of the score can be the score of the user or the score of the terminal or the equipment according to the behavior data of the user, the score is used for measuring the attention degree of the user on the articles, the positive example real articles and the negative example real articles of the certain user can be divided according to the score of each article with the operation behaviors of the certain user, for example, if the score ranges from 1 point to 5 points, the articles with the score ranging from 4 points to 5 points can be defined as the positive example real articles of the user, and the articles with the score ranging from 1 point to 3 point can be defined as the negative example real articles of the user. The items here are Applications (APP), or advertisements, or videos, or songs, or answers to a question and answer system, etc.

The generation model generates positive false articles for a first user as predicted articles which are concerned by the first user, and generates negative false articles for the first user as predicted articles which are not concerned by the first user. For example, the generation model generates a comedy movie 1, a comedy movie 2, a comedy movie 3 for the first user that may be of interest to the first user, and generates a horror movie 1, a horror movie 2, and a horror movie 3 for the first user that may not be of interest to the first user, then comedy movie 1, comedy movie 2, comedy movie 3 are positive counterfeit items for the first user, horror movie 1, horror movie 2, and horror movie 3 are negative counterfeit items for the first user, the generation model also generates scores for comedy movie 1, comedy movie 2, comedy movie 3, horror movie 1, horror movie 2, and horror movie 3, the generated scores belonging to the predicted scores for representing the first user's preference for these movies. The principle of the generation model for generating positive and negative counterfeit articles for other users may be referred to the description above for the first user. The positive and negative counterfeit items of different users may be the same or different, and the corresponding scores may be the same or different. The following describes the generation of the model.

In particular, the goal of the generation module is to generate pairs of counterfeit items comprising a positive example of a counterfeit item and a negative example of a counterfeit item and to approximate as closely as possible the correlation distribution of pairs of authentic items comprising a positive example of an authentic item and a negative example of an authentic item. The relevant linear distribution of the counterfeit item pairs generated here is shown in equation (1):

G(f|u)＝G((f⁺,f^-)|u)＝g⁺(f⁺|u)·g^-(f^-|u,f⁺) (1)

In formula (1), f represents the generated counterfeit article, f ⁺ is the generated positive counterfeit article, and f ^- is the generated negative counterfeit article. The generative model may be divided into two sub-models, positive and negative, g ⁺ representing the positive and g ^- representing the negative, and u representing the first user. The positive example generator g ⁺ is configured to generate a distribution of u positive example forged items of the first user, and the negative example generator g ^- is configured to generate a distribution of negative example forged items of the first user according to the positive example forged items generated by the positive example generator g ⁺, where the distribution of positive example forged items generated by the positive example generator g ⁺ is shown in formula (2):

In equation (2), e _u represents the first user's embedded vector (embedding), Embedding, e _i, embedding, b representing the bias of the first user, are the counterfeit items of the first example. The embedded neighbors embedding, bias values bias of embodiments of the present application may be configured with default values at the first initial training, embedding, bias being updated typically after each training.

In the embodiment of the application, some potential relations exist between the positive-example forged article and the negative-example forged article generated by the generation model, so that the generation of the negative-example forged article is after the generation of the positive-example forged article. For example, the negative example generator calculates the relationship between the positive example forged article and the negative example forged article in the form of an inner product, so as to obtain the distribution of the generated forged negative example articles as shown in the formula (3):

In the formula (3) of the present invention, Is embedding of a negative example counterfeit item to be generated. Alternatively, if a user likes comedy but dislikes horror, the device will generally train this layer of "opponent" relationship of comedy to horror, so after creating a comedy as a positive example counterfeit item for the user by equation 2, it is likely that a movie opponent to the comedy type will be created as a negative example counterfeit item, i.e., here horror, and it is unlikely that a comedy will be created as a negative example counterfeit item. The "horror" as a negative example counterfeit item is generated according to the previously generated positive example counterfeit item "comedy" instead of being generated independently, and represents the dependency relationship of the negative example counterfeit item on the positive example counterfeit item.

It will be appreciated that a series of positive and negative example counterfeit items may be generated in the manner described above, and that the apparatus may then generate a score for each positive and negative example counterfeit item generated by a score generation model, optionally the principle of which the score generation model generates a score may be as shown in equation (4):

r_u,t＝e_u·e_t+b (4)

in formula (4), r _u,t represents the score of the generated first user on the t-th counterfeit item, and e _t is embedding of the t-th counterfeit item t.

In the embodiment of the present application, after a series of positive example forged articles and scores thereof and a series of negative example forged articles and scores thereof are generated in the above manner, part of the positive example forged articles are sampled from the generated positive example forged articles, and part of the negative example forged articles are sampled from the generated negative example forged articles, so that the sampled positive example forged articles and the sampled negative example forged articles form a plurality of forged article pairs, and each forged article pair comprises one positive example forged article and one negative example forged article of a first user, and the manner of generating a plurality of forged article pairs can be as follows:

The device matches a negative case forged article with a first positive case forged article to form a pair of forged articles, wherein the negative case forged article is a negative case forged article which is scored and arranged at the M position in front in all negative case forged articles of the first user, M is the number of all positive case forged articles of the first user, the first positive case forged article is a positive case forged article which belongs to any one of the generated positive case forged articles and is sampled by the first user, and M is a positive integer. Optionally, for a sampled positive example counterfeit article, collecting a negative example counterfeit article with the highest score from the generated negative example counterfeit articles and forming a counterfeit article pair with the positive example counterfeit article, at this time, the sampled negative example counterfeit article is removed from the sampled pool, then for the next sampled positive example counterfeit article, collecting a negative example counterfeit article with the highest score from the generated negative example counterfeit articles and forming a further counterfeit article pair with the positive example counterfeit article, and so on, matching each sampled positive example counterfeit article with a negative example counterfeit article, thereby obtaining a plurality of counterfeit article pairs. The following schematically illustrates one implementation code:

Optionally, the device matches a negative real object with a first positive real object to form a real object pair, the negative real object is a negative real object with scores of N top positions in all negative real objects of the first user, N is the number of all positive real objects of the first user, the first positive real object is a positive real object belonging to the first user, which is sampled by any one of the existing positive real objects, and N is a positive integer. Optionally, for a sampled positive real object, collecting a negative real object with the highest score from the generated negative real objects and the positive real object to form a real object pair, at this time, the sampled negative real object is removed from the sampled pool, then for the next sampled positive real object, collecting a negative real object with the highest score from the generated negative real object and the positive real object to form another real object pair, and so on, matching each sampled positive real object with one negative real object, thereby obtaining a plurality of real object pairs.

Step S302: the apparatus trains a plurality of real article pairs and a plurality of counterfeit article pairs with a goal of minimizing a loss function to obtain a discriminant model.

Specifically, the discriminant model obtained by training is shown in formula (5):

in the formula (5), v may be r or f. When v is f, p (f|u) represents the distribution of the pair of counterfeit items generated by the generative model, e _u represents embedding of the first user, Embedding,/>, representing a genuine counterfeit itemEmbedding representing a negative example of a counterfeit item, b denotes bias of the first user. When v is r, p (r|u) represents the distribution as a real item pair sampled from a real item, e _u represents embedding,/>, of the first userEmbedding,/>, representing a genuine article of a positive exampleEmbedding, b, represents the bias of the first user, representing a negative example real item. The discrimination model is responsible for distinguishing the difference between the distribution of the above-mentioned counterfeit item pairs and the distribution of the above-mentioned genuine item pairs, and can be optimized using a cross-entropy loss function (6), so that the discrimination model can have a higher ability to identify genuine and counterfeit items.

D(r,f|u)＝cross_entropy(p(r|u),p(f|u))(6)

Optionally, in the process of training the discriminant model, the following procedure may be performed for each user:

1. sampling a real item pair (r ⁺,r^-) from a real dataset;

2. Generating a forged article by using the current generation model, and sampling the forged article to obtain a forged article pair (f ⁺,f^-);

3. giving (r ⁺,r^-) and (f ⁺,f^-) to the discrimination model together for training, and minimizing the loss function of the discrimination model;

4. Repeating the steps until all the scoring of the articles by the users is trained.

Alternatively, the training number of times set in advance is targeted for n times, and the training flow in this case is shown in fig. 4.

Step S303: the device updates the generative model according to a loss function of the discriminant model.

In an alternative solution, the updating, by the device, the generating model according to the loss function of the discriminating model may include: firstly, the device obtains a reward value, which can be calculated according to a parameter D (r, f|u) in a formula (6), according to a loss function of the discriminant model, which is shown in the formula (6), for example, the reward=log (1-D (r, f|u)), and then, the device updates the generating model with the new reward value to obtain a new generating model, wherein the generating model can be trained in a policy gradient (policy gradient) manner, so as to obtain an updated generating model, and the formula of the policy gradient is shown in a formula (7):

in the formula (7) of the present invention, As a desired function, f to Gu represent f generated from the generator G (f|u), and i is taken from 1 to N, f _i represents the ith sample generated by the generator, and reward in formula (7) is the prize value obtained previously.

In yet another alternative, the apparatus updating the generation model according to the loss function of the discriminant model to obtain a new generation model may include: the method comprises the steps that firstly, the equipment determines attention indexes of a first user on an article, wherein the attention indexes of the first user on the article are obtained by training real article scores and fake article scores of the first user by adopting an attention network; secondly, the equipment obtains a reward value reward according to a loss function of the judging model, and optimizes the reward value reward through an attention index of the first user on the article to obtain a new reward value; third, the device updates the generated model with the new prize value; the first, second and third steps are described below.

The first step: the apparatus determines an attention index of the first user to an item.

Specifically, the attention index of the first user to the article is obtained by training the real article and the forged article of the first user by adopting an attention network. In many cases, the first user's weight of attention is different between the real and counterfeit pairs of items, and we can consider using an attention network to memorize the weight between the first user's weight between the real and counterfeit pairs of items. There are many potential factors between pairs of items, taking the example of a movie score, some users like a higher score for movies they like, and a lower score for movies they dislike, e.g., a positive 5 score for movies and a negative 1 score for movies. Some users like to evaluate the middle score of two movies they like and dislike, e.g. a positive movie of 4 and a negative movie of 3. For a certain item pair, the difference in movie score between them varies from user to user. For the pair-wise module, these factors should be of interest. We use an attention mechanism to remember these potential pairwise factors. In this work, attention is represented by a series of weight vectors, which represent the importance of different items to each user. The attention weights of different users are typically different for a certain item pair. The higher the attention weight, the more important they are. The attention network may be a neural network of one or more layers that is associated with the user, as well as the generated counterfeit item pairs and sampled genuine item pairs. Through which the first user can learn the different weights of the two pairs. The network structure of the attention mechanism is shown in fig. 5.

Specifically, the attention index α of the first user to the article may be calculated by the formula (8), specifically as follows:

In equation (8), w _u represents the attention weight of the first user, Attention weight representing the first user's alignment of the real item,/>Attention weight representing first user to negative example real object,/>Attention weight for counterfeited items on behalf of first user,/>Representing the attention weight of the first user to the negative example counterfeit item, b represents the bias of the first user.

And a second step of: the apparatus obtains a reward value, which is optimized by the first user's attention index to the item to obtain a new reward value, based on the loss function of the discriminant model (the manner in which the reward is obtained has been described above).

Specifically, the device optimizes the reward value reward to obtain a new reward value through the attention index of the first user to the article, which may specifically be: the equipment optimizes the reward value reward through the attention index alpha of the first user to the article to obtain the reward value reward_1 corresponding to the first user, wherein the attention index alpha of the first user to the article, the reward value reward and the reward value reward_1 corresponding to the first user meet the following relation: reward_1=α; the first user is one of the multiple users, and the prize value corresponding to each of the multiple users is used to form a new prize value, for example, the new prize value may be denoted as reward0＝(reward_1₁,reward_1₂,reward_1₃,……,reward_1_i,……,reward_1_n-1,reward_1_n0,, where reorder_1 _i is the prize value corresponding to the ith user of the multiple users.

And a third step of: the device updates the generative model with the new prize value.

Specifically, the generation model may be trained by using a policy gradient (policy gradient), so as to obtain a new generation model, where a formula of the policy gradient is shown in the following formula (9):

the meaning of formula (9) can be referred to as formula (7), and the reorder 0 in formula (9) is the updated prize value obtained before.

The training process of the new generative model may include the following operations:

1. Generating a counterfeit item pair (f ⁺,f^-) using the current generation model;

2. Sampling a real item pair (r ⁺,r^-) from the real data set;

3. Feeding (r ⁺,r^-) and (f ⁺,f^-) to a judging module, and calculating a reward value reward;

4. calculating the alpha of attention networks;

5. updating the reorder value to obtain a new rewards value reorder 0;

6. updating the generated model by using a new reward value reward 0;

7. repeating the above steps.

Alternatively, the training procedure in this case is shown in fig. 6, with the preset training times reaching m times as the target.

It can be understood that the importance of each article to (pair) is different, and the importance weight of each pair is obtained by introducing the attention network, so that the high-quality pair can be effectively selected, the negative influence of the low-quality pair is reduced, and the generated model and the discrimination model obtained by the method are more robust and adaptive.

In the embodiment of the present application, the training of the discriminant model and the training of the generated model are critical parts, and the training process of the discriminant model and the training process of the generated model are also described above, and the two processes are described below in combination, so as to facilitate better understanding of the embodiment of the present application, and fig. 7 is a corresponding flow schematic diagram.

The preparation stage:

1. Initializing a generation model and a discrimination model by using random parameters theta and phi;

2. determining to pretrain with a dataset S of items;

training phase:

1、Repeat

Training discrimination module

For d_epoch do

2. The parameters of the fixed generation model are unchanged;

3. Sampling a pair of real items (r ⁺,r^-) from a data set S of existing real items;

4. Generating a model to generate a counterfeit item and collecting a pair of counterfeit items from the counterfeit item (f ⁺,f^-);

5. Training a discriminant model with (r ⁺,r^-) and (f ⁺,f^-);

6、End for

training to generate a model;

For g_epoch do

7. The parameters of the fixed discrimination model are unchanged;

8. Generating a model to generate a counterfeit item and collecting a pair of counterfeit items from the counterfeit item (f ⁺,f^-);

9. Calculating a reward value reward through a judging module according to a strategy gradient algorithm;

10. Updating the report according to the attention network, and updating the generated model by using the updated reward value report 0;

11. until judge model and generate model convergence.

In the embodiment of the present application, the updated generated model is specifically expressed in embedding, bias of updating equation (2), equation (3) and equation (4) with respect to the generated model.

Step S304: the device generates a score for the counterfeit item from the updated generation model.

Specifically, the forged article includes the steps of generating a positive forged article and a negative forged article for each of a plurality of users, respectively; that is, after a new generative model is trained, each positive example forged article and each negative example forged article generated before need to be scored again through the generative model, and the score generated by the new generative model has more reference value.

Step S305: the device sorts the genuine articles and the counterfeit articles according to the scores of the counterfeit articles and the scores of the existing genuine articles, and recommends articles to the first user according to the order in the sorting.

Specifically, the device may generate, for the first user, a ranking of the real articles and the counterfeit articles of the first user, where the ranking may be ranked according to a rule that the score is from high to low, or may be ranked according to other rules defined in advance; and then recommending items to the user according to the order in the ranking. The apparatus may also sort the authentic and counterfeit articles of other users, for example, if the counterfeit article of user 1 includes a positive counterfeit article 1 and has a corresponding score of 4.7, a positive counterfeit article 2 and has a corresponding score of 4, a negative counterfeit article 1 and has a corresponding score of 0.5, a negative counterfeit article 2 and has a corresponding score of 1.1, a negative counterfeit article 3 and has a corresponding score of 1, the authentic article of user 1 includes a positive authentic article 1 and has a corresponding score of 4.9, a positive authentic article 2 and has a corresponding score of 4.5, a negative authentic article 1 and has a corresponding score of 3.5, a negative authentic article 2 and has a corresponding score of 3.3, and a negative authentic article 3 and has a corresponding score of 3.4; then, if the scores are ranked from high to low, the obtained ranking order is as follows: a positive example real article 11, a positive example counterfeit article 01, a positive example real article 12, a positive example counterfeit article 02, a negative example real article 11, a negative example real article 03, a negative example real article 12, a negative example counterfeit article 02, a negative example counterfeit article 13, and a negative example counterfeit article 01. These genuine and counterfeit items are then recommended to the user 1 in this order.

The foregoing has outlined the principles of the embodiments of the present application in detail, and the detailed description of the embodiments is provided below in connection with a specific example.

The first step: data input

The embodiment of the application inputs the identification IDs of all users and the identification IDs of the scored articles of each user into the data set. Taking article recommendation as an example, in this embodiment, there are 10 articles in total, and the input information is shown in table 3:

TABLE 3 Table 3

Entry sequence number	User ID	Article ID
			1	U1	I1
2	U1	I3
			3	U1	I5
4	U1	I8
			5	U2	I2
6	U2	I3
			7	U2	I4

In Table 3, the first item with entry number 1 represents item I1 as evaluated by the user with identification number U1, the second item with entry number 2 represents item I3 as evaluated by the user with identification number U1, and so on.

And a second step of: parameters of the generated model and parameters of the discriminant model are initialized, including the size of the user embedding (representing a vector) and the item embedding, the size of the training batch, and the rate of training, where the batch is used to characterize the number of samples taken at one time.

And a third step of: and (5) maintaining the parameters of the generated model unchanged, and training a discrimination model. During training, the number of the article pairs is the same as that of the real articles in the positive examples, wherein the real articles in the positive examples are articles which are scored by the user and have higher scores, such as 4 points and more. In this embodiment, for the user U1, the evaluated items I1, I3, I5, I8 are positive real items, and the items I2, I4, I6, I7, I9, I10 that the user U1 has not evaluated are negative real items. User U1 has 4 evaluated items, so the sampled real item pairs are four pairs, as follows:

(I1,I2)，(I3,I4)，(I5,I9)，(I8,I6)；

The negative real articles I2, I4, I9 and I6 are extracted from articles which are not evaluated by the user U1, and may be extracted randomly or according to other predefined strategies. In training, it is also necessary to generate a model to generate a counterfeit pair of items. The positive example generator in the generation module is responsible for generating the positive example forged article, and the negative example generator is responsible for generating the negative example forged article.

For example, for user U1, the object pairs generated by the generative model may be:

(I1,I2)，(I2，I6)，(I5，I7)，(I8，I9)；

When training the discrimination model, the real object pairs (I1, I2), (I3, I4), (I5, I9), (I8, I6) and the generated forged object pairs (I1, I2), (I2, I6), (I5, I7), (I8, I9) are required to be delivered to the discrimination model together, and the discrimination model can discriminate the real object pairs from the forged object pairs as much as possible by minimizing the loss function, so as to achieve the purpose of improving the discrimination capability. Training the discriminant model is repeated until each user's item pair is sufficiently trained.

Fourth step: and (5) keeping the parameters of the judging model unchanged, and training to generate a model. Similar to the training of the discriminant model stage, for each user, it is necessary to collect a pair of real objects from existing real objects and generate a pair of counterfeit objects by generating a model, still taking the user U1 as an example:

The real item pair for this user U1 may be as follows:

(I1,I2)，(I3,I4)，(I5,I9)，(I8,I6)；

the counterfeit pair for this user U1 may be as follows:

(I1,I2)，(I2，I6)，(I5，I7)，(I8，I9)。

The difference from training the discriminant model is that the discriminant model computes a reorder value based on the two pairs of items entered. The generating module updates the parameters according to the new reward value reward0 obtained by updating the reward, and the training generating module is repeated until the object pairs of each user are fully trained.

Fifth step: repeating the steps 3-4 until the model is judged and the model is generated and trained to be optimal.

Sixth step: the device scores the generated counterfeit items according to the resulting generated model of the final training.

Seventh step: a user ID which wants to be scored with the measurement is input into the device, for example, the user U1, the device ranks all the articles according to the score for the user U1, the score is high, the preference degree is high, the all the articles comprise the existing real articles and the generated counterfeit articles, and table 4 exemplarily shows the ranking result:

TABLE 4 Table 4

User ID	Article ID	Scoring of
			U1	I3	2.54
U1	I5	2.35
			U1	I7	1.93
U1	I1	1.54
			U1	I8	1.32
U1	I2	1.14
			U1	I4	0.97
U1	I10	0.78
			U1	I9	0.76
U1	I6	0.54

From the recommendation list shown in Table 4, it can be known that the item that the user U1 may prefer is item I7.

By executing the method, the negative example forged article in the forged article pair is generated by relying on the positive example forged article, and the potential relation between the negative example forged article and the positive example forged article is fully considered, so that the forged article pair contains more information, the training effect is improved, the generating capacity of the generating model is enhanced, and the recommended result generated by sequencing the article generated by the generating model and the existing real article has more reference value for users. Further, the high-scoring articles are collected to form the article pairs, including the real article pairs and the counterfeit article pairs, and the high-scoring articles are more focused by users, so that the article pairs obtained in the mode are larger in information content and smaller in noise for the users, and the characteristics focused by the users can be fully analyzed according to the training of the article pairs, so that the generation model with higher generation capacity is trained.

In order to better understand the concept of the present application, as shown in fig. 8, an embodiment of the present application further provides a model training device 80 based on generating an countermeasure network, where the device includes a generating model 801, a training model 802 and a discriminating model, and the description of each model is as follows:

The generation model 801 is configured to generate a positive case forged article and a negative case forged article for a first user, where the negative case forged article is generated according to the positive case forged article, the positive case forged article of the first user is a predicted article focused by the first user, and the negative case forged article of the first user is a predicted article not focused by the Ren Di user;

Training model 802 is used to train a plurality of real item pairs and a plurality of counterfeit item pairs to obtain a discriminant model 803 for discriminating differences between the plurality of real item pairs and the plurality of counterfeit item pairs; each real article pair comprises a positive real article and a negative real article, and each counterfeit article pair comprises one of the positive counterfeit articles and one of the negative counterfeit articles; the positive real object is an object which is considered by the first user according to the operation behavior of the first user, and the negative real object is an object which is considered by the first user according to the operation behavior of the first user and is not considered by the first user;

the training model 802 is used to update the generation model based on the loss function of the discriminant model.

In an alternative, the device further comprises a recommendation model, wherein:

In yet another alternative, after the generating model generates the positive example counterfeit item and the negative example counterfeit item for the first user, the training model trains the plurality of real item pairs and the plurality of counterfeit item pairs to obtain the discrimination model, the training model is further configured to:

In yet another alternative, the initial generation model includes a positive case generation model, a negative case generation model, and a score generation model; the generation model is used for generating a positive example forged article and a negative example forged article for a first user, and specifically comprises the following steps:

In yet another alternative, the method is used for updating the generating model according to the loss function of the discriminating model, specifically:

And updating the generation model by adopting the new rewards value.

In yet another alternative, the training model determines an attention index of the first user to the item, specifically:

α＝softmax(g(r⁺,r^-,f⁺,f^-|u))

In yet another alternative, the reward value reward is optimized by the attention index of the first user to the article to obtain a new reward value, specifically:

It should be noted that the implementation of each unit may also correspond to the model training method based on generating the countermeasure network described in the foregoing embodiment, for example, steps S301 to S305.

Embodiments of the present application also provide a computer readable storage medium having instructions stored therein, which when executed on a processor, implement the model training method based on generating an countermeasure network described in the foregoing embodiments, e.g. steps S301-S305.

The embodiment of the application also provides a computer program product implementing the model training method based on generating an countermeasure network described in the previous embodiment, such as steps S301-S305, when the computer program product is run on a processor.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

Claims

1. A model training method based on generating an countermeasure network, comprising:

The method comprises the steps that equipment generates a positive case forged article and a negative case forged article for a first user through a generation model, wherein the negative case forged article is generated according to the positive case forged article, the positive case forged article of the first user is a predicted article focused by the first user, and the negative case forged article of the first user is a predicted article not focused by the first user;

The apparatus trains a plurality of real pairs of items and a plurality of counterfeit pairs of items to obtain a discriminant model for discriminating differences between the plurality of real pairs of items and the plurality of counterfeit pairs of items; each real article pair comprises a positive real article and a negative real article, and each counterfeit article pair comprises one of the positive counterfeit articles and one of the negative counterfeit articles; the positive real object is an object which is considered by the first user according to the operation behavior of the first user, and the negative real object is an object which is considered by the first user according to the operation behavior of the first user and is not considered by the first user;

The equipment updates the generation model according to the loss function of the discrimination model;

the device updates the generation model according to a loss function of the discrimination model, comprising:

the equipment determines attention indexes of the first user on the articles, wherein the attention indexes of the first user on the articles are obtained by training real article scores and fake article scores of the first user by adopting an attention network;

the equipment obtains a reward value reward according to the loss function of the judging model, and optimizes the reward value reward through the attention index of the first user to the article to obtain a new reward value;

The device updates the generative model with the new prize value.

2. The method of claim 1, wherein after updating the generative model according to the loss function of the discriminant model, the apparatus further comprises:

The device generates scores of counterfeit items through the updated generation model, wherein the counterfeit items comprise the positive counterfeit items and the negative counterfeit items generated for the first user;

The device sorts the genuine articles and the counterfeit articles according to the scores of the counterfeit articles and the scores of the existing genuine articles, and recommends articles to the first user according to the order in the sorting.

3. The method of claim 1 or 2, wherein after the device generates positive and negative counterfeit items for the first user by generating the model, the device trains a plurality of pairs of authentic items and a plurality of pairs of counterfeit items to obtain the discrimination model, further comprising:

The equipment matches a first negative example forged article for each of a plurality of first positive example forged articles to form a plurality of forged article pairs, wherein the first negative example forged article belongs to a negative example forged article of the first user, which is marked and arranged in the front M bits, M is the number of the first positive example forged articles, and the first positive example forged article is the positive example forged article of the first user sampled from the positive example forged articles generated by the generation model;

The equipment is characterized in that a plurality of first positive real articles are respectively matched with a first negative real article to form a plurality of real article pairs, the first negative real articles belong to the negative real articles of the first user, the negative real articles are scored and ranked in the front N positions, N is the number of the first positive real articles, and the first positive real articles are one positive real article sampled from the existing positive real articles of the first user.

4. The method of claim 1 or 2, wherein the generative model comprises a positive example generative model, a negative example generative model, and a scoring generative model; the apparatus generates positive and negative counterfeit items for a first user by generating a model, comprising:

where g ⁺(f⁺ |u) is the distribution of the genuine counterfeit items, e _u is the embedded vector embedding of the first user, Embedding, e _i, embedding, b representing the bias value bias of the first user; g ^-(f^-|u,f⁺) is the distribution of the negative counterfeit items, e _f- is embedding of the negative counterfeit items to be generated.

5. The method of claim 1 or 2, wherein the device determining an attention index of the first user to an item comprises:

α＝softmax(g(r⁺,r^-,f⁺,f^-|u))

wherein alpha is an attention index of the first user u to the article, w _u represents the trained weight of the first user, Weights representing trained positive example real items of first user,/>Weights representing trained negative example real items of first user,/>Weight representing trained genuine counterfeited articles of the first user,/>A weight representing the trained negative example counterfeit items of the first user; b is the bias value bias of the first user,/>Embedded vector embedding,/>, representing negative example real item of first userAn embedded vector embedding representing a positive example real item of the first user.

6. The method according to claim 1 or 2, wherein said optimizing the prize value reward by the first user's attention index to the item to obtain a new prize value comprises:

7. A model training apparatus based on generating an countermeasure network, comprising:

Generating a model for generating a positive example forged article and a negative example forged article for a first user, wherein the negative example forged article is generated according to the positive example forged article, the positive example forged article of the first user is a predicted article focused by the first user, and the negative example forged article of the first user is a predicted article not focused by the first user;

the training model is configured to update the generation model according to a loss function of the discriminant model, and includes:

And updating the generation model by adopting the new rewards value.

8. The apparatus of claim 7, further comprising a recommendation model, wherein:

9. The apparatus of claim 7 or 8, wherein after the generating model generates the positive and negative counterfeit items for the first user, the training model is further configured to:

10. The apparatus of claim 7 or 8, wherein the generative models comprise a positive example generative model, a negative example generative model, and a scoring generative model; the generation model is used for generating a positive example forged article and a negative example forged article for a first user, and specifically comprises the following steps:

Wherein g ⁺(f⁺ |u) is the distribution of the positive example counterfeit items, e _u is the embedded vector embedding of the first user, e _f+ is embedding of the positive example counterfeit items to be generated, e _i is embedding of the ith positive example counterfeit item, and b represents the bias value bias of the first user; g ^-(f^-|u,f⁺) is the distribution of the negative example counterfeit items, Is embedding of a negative example counterfeit item to be generated.

11. The apparatus according to claim 7 or 8, wherein the training model determines an attention index of the first user to an item, in particular:

α＝softmax(g(r⁺,r^-,f⁺,f^-|u))

12. The apparatus according to claim 7 or 8, characterized in that said reward value reward is optimized by the first user's attention index to the item to obtain a new reward value, in particular:

13. A computer readable storage medium, characterized in that it has stored therein program instructions which, when run on a processor, implement the method of any of claims 1-6.