CN111275205B

CN111275205B - Virtual sample generation method, terminal equipment and storage medium

Info

Publication number: CN111275205B
Application number: CN202010032925.XA
Authority: CN
Inventors: 谢宜廷; 李延平
Original assignee: Ud Network Co ltd
Current assignee: Ud Network Co ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2024-03-22
Anticipated expiration: 2040-01-13
Also published as: CN111275205A

Abstract

The application is applicable to the technical field of computers, and provides a virtual sample generation method, which comprises the following steps: obtaining virtual user information, inputting the virtual user information into a first machine learning model, and obtaining search behavior information of a virtual user; inputting the search behavior information into a second machine learning model to obtain selection behavior information of the virtual user; and combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample. Therefore, a training sample for model training is provided for the object recommendation model based on reinforcement learning, and the object recommendation model can be used for model training on line, so that model training on line is not needed, and the cost of model on line training is reduced.

Description

Virtual sample generation method, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a virtual sample generation method and terminal equipment.

Background

With the development of intelligence, the existing recommendation systems can combine with machine learning models to improve recommendation capability. Currently, the machine learning model applied to recommendation systems is mostly a deep learning model, and the model can be trained off-line by off-line data marking various labels. Besides the deep learning model, the recommendation system can also be combined with a reinforcement learning model, and an agent in the reinforcement learning model learns on line in a manner of continuously debugging and updating parameters. But online learning is a process of real-time interaction between the model and the user, and the model is not intelligent enough at this time, so that error information is easily recommended to the user, thus greatly affecting user experience and leading to very high cost of model training.

Disclosure of Invention

The embodiment of the application provides a virtual sample generation method, terminal equipment and storage medium, which can solve the problem of high model training cost of a reinforcement learning model applied to a recommendation system.

In a first aspect, an embodiment of the present application provides a method for generating a virtual sample, where the virtual interaction environment includes a first machine learning model and a second machine learning model, based on a virtual interaction environment, and the method includes:

obtaining virtual user information, inputting the virtual user information into a first machine learning model, and obtaining search behavior information of a virtual user;

inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user;

and combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample.

According to the method and the device for training the object recommendation model, the object searching behavior of the user on the recommendation system and the selecting behavior of the user on the object are simulated through the virtual interaction environment, the searching behavior and the selecting behavior are combined to be the virtual interaction track to serve as the virtual sample, so that a training sample for model training is provided for the object recommendation model based on reinforcement learning, the object recommendation model can be trained on line, model training on line is not needed, and the cost of model training on line is reduced.

In a second aspect, an embodiment of the present application provides a method for generating a virtual interaction environment, where the virtual interaction environment includes a first machine learning model and a second machine learning model, and the method includes:

obtaining virtual user information, inputting the virtual user information into a first preset model, and obtaining search behavior information of a virtual user;

inputting the search behavior information into a second preset model to obtain the selection behavior information of the virtual user;

combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information;

updating model parameters corresponding to the first preset model and the second preset model respectively according to a comparison result between the virtual interaction track and a preset real interaction track;

and taking the first preset model with updated model parameters as the first machine learning model, and taking the second preset model with updated model parameters as the second machine learning model.

In a third aspect, an embodiment of the present application provides a method for generating an item recommendation model, where the method is based on a virtual interaction environment, and the method includes:

obtaining a plurality of virtual samples output by the virtual interaction environment;

and inputting each virtual sample into an article recommendation model to update parameters of the article recommendation model.

In a fourth aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a method for generating a virtual sample, a method for generating a virtual interaction environment, or a method for generating an article recommendation model when the processor executes the computer program.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements a method for generating a virtual sample, a method for generating a virtual interaction environment, or a method for generating an item recommendation model.

In a sixth aspect, embodiments of the present application provide a computer program product, which when executed on a terminal device, causes the terminal device to perform the steps of the above-described method for generating a virtual sample, the method for generating a virtual interaction environment, or the method for generating an article recommendation model.

It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a virtual interactive environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method for generating virtual samples according to an embodiment of the present application;

fig. 3 is a flowchart of a method for generating a virtual sample according to another embodiment of the present application;

fig. 4 is a flowchart of a method for generating a virtual sample according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for generating a virtual interactive environment according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for generating an item recommendation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

As described in relation to the background art, the recommendation system may make item recommendations based on the reinforcement learning model, but the rewards guiding actions that the agents in the reinforcement learning model obtain by interacting with the environment emphasize how to act based on the environment to achieve the maximum expected benefit. According to the requirement of reinforcement learning, training the reinforcement learning model in the real environment is very challenging, and a large amount of sampling and 'trial and error' are required in the real environment, so that the training of the model in the real environment requires high cost, and even 'trial and error' may cause loss which is difficult to estimate.

Therefore, the embodiment of the application provides a virtual sample generation method, which realizes that the article searching behavior of a user on a recommendation system and the selection behavior of the user on the article are simulated through a virtual interaction environment, combines the searching behavior and the selection behavior into a virtual interaction track to serve as a virtual sample, thereby providing a training sample for model training for the article recommendation model based on reinforcement learning, further enabling the article recommendation model to perform model training on line without performing model training on line, and reducing the cost of model on line training.

Fig. 1 shows a schematic diagram of a virtual interaction environment provided in an embodiment of the present application, the virtual interaction environment 100 comprising a user generated model 101, a first machine learning model 102 and a second machine learning model 103. The user generation model 103 is mainly used for generating virtual user information, which may contain an activation function-full connectivity layer-softmax function; the first machine learning model 102 generates a model for search behavior, which is mainly used for simulating the search behavior of a user on a search engine, and may include an activation function-full connection layer; the second machine learning model 103 generates a model for the selection behavior, which is mainly used to simulate the behavior of the user for selecting the searched item, and may include an activation function-full connectivity layer-softmax function.

The method for generating the virtual sample provided by the embodiment of the application can be applied to terminal equipment, wherein the terminal equipment can be a mobile phone, a tablet personal computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a server and the like, and the embodiment of the application does not limit the specific type of the terminal equipment.

Fig. 2 shows a schematic flow chart of a method for generating a virtual sample provided in the present application, which may be applied to the above terminal device, and the above virtual interaction environment is built on the terminal device, by way of example and not limitation.

S201, obtaining virtual user information, inputting the virtual user information into a first machine learning model, and obtaining search behavior information of a virtual user;

in the step S201, the virtual user information is user information simulated by the system, which may include a user gender, occupation, a browsing record, and the like, and the browsing record is a browsing record of the user on the internet. The first machine learning model is a model for recommending article information to the virtual user according to the virtual user information, so that the searching behavior information of the user can be simulated by combining the virtual user information and the article information. The search behavior information comprises virtual user information and article information, wherein the article information can be information such as commodity information, news information and the like.

By way of example, and not limitation, when a real user makes a purchase using a shopping platform, the shopping platform typically searches for merchandise information to be purchased, and returns an item recommendation list to the user terminal according to the search information input by the real user. In this embodiment, the first machine learning model predicts the commodity information that the virtual user may search according to the virtual user information, and then obtains the commodity recommendation list according to the commodity information, so as to simulate the searching behavior of the real user on the shopping platform. Specifically, the virtual user information input into the first machine learning model may be activated by an activation function, and then feature extraction may be performed by the full connection layer, so as to output commodity information that may be searched by the virtual user.

S202, inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user;

in S202 described above, the second machine learning model is a model that predicts the selection of an item by the virtual user. By way of example and not limitation, when the shopping platform returns an item recommendation list to the user terminal of the real user, the real user may have a variety of choices for the item, such as selecting to click on the item, purchasing the item, leaving the item page after clicking on the item, browsing the next item recommendation list, and so forth. In this embodiment, the second machine learning model predicts a probability of each selection made by the virtual user of the item based on the search behavior information, and determines the selection made by the virtual user of the item based on the probability. Specifically, the search behavior information input into the second machine learning model may be activated by an activation function, then feature extraction may be performed by a fully connected layer, and probability values of each selection made by the virtual user on the article may be output by a softmax function, and the selection behavior information may be determined according to the probability values.

And S203, combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample.

In S203, the virtual interaction trajectory is an interaction trajectory between the simulated user and the item recommendation system. Alternatively, the search behavior information and the selection behavior information may be spliced in the time order in which the search behavior and the selection behavior occur. It should be appreciated that the same virtual user may have multiple search behavior information and multiple selection behavior information.

According to the method and the device for training the article recommendation model, the interaction track of the user and the article recommendation system is simulated through the virtual interaction environment, and the interaction track is used as a training sample for training the article recommendation model, so that the article recommendation model can conduct model training under the off-line condition.

On the basis of the embodiment shown in fig. 2, another embodiment of a method for generating a virtual sample is provided. The acquisition of virtual user information in step S201 described above includes steps S2011 and S2012. It should be noted that the same steps as those in the embodiment of fig. 2 are not repeated here, please refer to the foregoing.

S2011, acquiring preset Gaussian noise, and acquiring a plurality of target vectors with preset dimensions from the Gaussian noise;

in S2011, the gaussian noise is noise based on normal distribution, and the preset dimension is dim dimension, such as 2 dimension, 3 dimension, and the like. Randomly sampling from Gaussian noise to obtain m dim-dimensional target vectors z ¹ ，z ² ，…，z ^m 。

S2012, inputting the target vector into a user generation model to simulate user information, and obtaining the virtual user information corresponding to the target vector.

In S2012 described above, the user-generated model is generated to simulate user information. Optionally, the target vector z ¹ ，z ² ，…，z ^m Inputting the user generation model to obtain virtual user informationAccording to the method and the device for generating the virtual user information, the virtual user information is randomly generated according to Gaussian noise, so that a user is not required to input specific information, user operation is reduced, and the virtual user information generated based on normally distributed Gaussian noise is more stable.

Fig. 3 shows a flowchart of another virtual sample generation method according to the embodiment of the present application, based on the embodiment shown in fig. 2. As shown in fig. 3, the above-mentioned step S201 of inputting the virtual user information into the first machine learning model, obtaining the search behavior information of the virtual user, specifically includes steps S301 and S302. It should be noted that the same steps as those in the embodiment of fig. 2 are not repeated here, please refer to the foregoing.

S301, extracting first characteristic information of the virtual user information, and acquiring a corresponding item recommendation list according to the first characteristic information, wherein the item recommendation list comprises the item information;

in S301, the virtual user information may be activated by the activation function, a nonlinear factor of the virtual user information may be increased, a problem that the first machine learning model is not separable in linearity may be avoided, feature extraction is performed on the activated virtual user information by the fully connected network, item information that may be searched by the virtual user is determined according to the extracted feature information, and an item recommendation list corresponding to the item information is obtained.

S302, combining the item information on the item recommendation list with the virtual user information to generate the search behavior information of the virtual user.

In S302, since the item information and the virtual user information belong to static data, in order to generate the environment interaction, the item information and the virtual user information are combined to generate search behavior information of the virtual user to search for the item, thereby generating the data interacting with the environment.

Fig. 4 shows a flowchart of another virtual sample generation method according to the embodiment of the present application, based on the embodiment shown in fig. 2. As shown in fig. 4, the step S202 includes steps S401 and S402. It should be noted that the same steps as those in the embodiment of fig. 2 are not repeated here, please refer to the foregoing.

S401, extracting second characteristic information of the search behavior information, and classifying the second characteristic information through a preset classifier to obtain probability values of each selection made by the virtual user on the articles;

in S401, the search behavior information may be activated by the activation function, the nonlinear factor of the search behavior information may be increased, the problem that the second machine learning model is not linearly separable may be avoided, the activated search behavior information may be extracted by the feature through the fully connected network, and the extracted feature information may be classified by a preset classifier (such as softmax), so as to obtain a probability value of each selection made by the virtual user on the article.

S402, determining the selection behavior information of the virtual user according to a plurality of probability values.

In S402, each selection made by the virtual user on the article corresponds to a probability value, and optionally, a selection with the largest probability value is used as the selection behavior information of the virtual user.

Fig. 5 shows a schematic flowchart of a method for generating a virtual interaction environment provided in the present application, which may be applied to the above terminal device, and based on the above virtual interaction environment, the virtual interaction environment is built on the terminal device, by way of example and not limitation.

S501, obtaining virtual user information, and inputting the virtual user information into a first preset model to obtain search behavior information of a virtual user;

s502, inputting the search behavior information into a second preset model to obtain the selection behavior information of the virtual user;

s503, combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information;

in the steps S501 to S503, the first preset model is an initialization model of a first machine learning model, and the second preset model is an initialization model of a second machine learning model. For brevity of description, the rest of the description is please see S201 to S203.

S504, updating model parameters corresponding to the first preset model and the second preset model respectively according to a comparison result between the virtual interaction track and a preset real interaction track;

s505, taking the first preset model with updated model parameters as the first machine learning model, and taking the second preset model with updated model parameters as the second machine learning model.

In the steps S504 and S505, the comparison result between the virtual interaction track and the real interaction track is identified by a preset identifier, the comparison result is used as the rewarding value of the first preset model and the second preset model, and the model parameters corresponding to the first preset model and the second preset model respectively are updated according to the search behavior information, the selection behavior information and the rewarding value.

Based on the embodiment shown in fig. 5, another embodiment of a method for generating a virtual interactive environment is provided. Step S5011 is further included before the above step S501. It should be noted that the same steps as those in the embodiment of fig. 5 are not repeated here, please refer to the foregoing.

S5011, generating a user generation model by adopting an countermeasure generation network algorithm, wherein the user generation model is used for generating the virtual user information

In S5011 described above, the countermeasure generation network algorithm is a deep learning model algorithm, which includes a generator and a discriminator. Specifically, step A, collecting a plurality of target vectors with preset dimensions from Gaussian noise, and inputting the target vectors into a preset generator to obtain first virtual user data; step B, obtaining a maximum deviation value between the first virtual user data and preset real user data through a preset discriminator, and updating parameters of the preset discriminator according to the maximum deviation value; step C, collecting a plurality of target vectors with preset dimensions from Gaussian noise, and inputting the target vectors into a preset generator to obtain second virtual user data; step D, obtaining a minimum deviation value between the second virtual user data and the real user data through a preset discriminator after updating the parameters, and updating the parameters of a preset generator according to the minimum deviation value; and D, repeating the steps A to D until the minimum deviation value reaches a second preset value, and taking the preset generator as a user generation model.

In the step B, the real user data may be a data set obtained by performing one-hot encoding on user information such as user gender, occupation, and browsing records. In order to distinguish as far as possible between real and virtual data, it is necessary to maximize the distribution distance between the two data, optionally using the wasperstein distance measure, yielding the formula:

max _D E _x～Pdata [D(x)]-E _z～Pz [D(G(z))]，

parameter updating formula:

wherein D (x) is data of the discriminator, G (z) is data of the generator, θ is a parameter of the discriminator, and θ is defined not to exceed a constant c in order to prevent the parameter θ from becoming infinite.

In step D above, the Wasserstein distance is minimized, contrary to step B, resulting in the equation:

min _G -E _z～Pz [D(g(z))]；

parameter updating formula:

based on the embodiment shown in fig. 5, another embodiment of a method for generating a virtual interactive environment is provided. The above step S504 includes steps S5041 and S5042. It should be noted that the same steps as those in the embodiment of fig. 5 are not repeated here, please refer to the foregoing.

S5041, identifying a comparison result between the virtual interaction track and the real interaction track through a preset identifier, and taking the comparison result as a reward value of the first preset model and the second preset model;

in S5041, the preset discriminator distinguishes the virtually generated interaction trajectory from the actual interaction trajectory, i.e., makes their distance larger, and updates the preset discriminator parameter θ using the formula of the original countermeasure network, i.e., finds that the following formula has a maximum value at θ:

E _Tg [log(D _θ (s,a))]+E _Tc [log(1-D _θ (s,a))]the method comprises the steps of carrying out a first treatment on the surface of the Where s is search behavior information and a is selection behavior information.

And acquiring the theta corresponding to the maximum value of the formula, updating the model parameters of the preset discriminator by adopting a random gradient descent mode and the like according to the theta, and discriminating the minimum deviation value of the virtual interaction track and the real interaction track by the preset discriminator after updating the model parameters, wherein the minimum deviation value is used as the rewarding value.

S5042, updating model parameters corresponding to the first preset model and the second preset model respectively according to the search behavior information, the selection behavior information and the rewarding value.

In S5042, the virtual user information in the search behavior information is set as a state S1, the items in the search behavior information are newly set as a policy a1, the prize value of the policy a1 is adopted in the state S1 at a certain time, and the model parameters of the first preset model are updated by using the DQN algorithm.

Taking the search behavior information as a state s2, selecting the behavior information as a strategy a2, taking the rewarding value as the rewarding value of the strategy a2 under the state s2 at a certain moment, and updating the model parameters of the second preset model by adopting a DQN algorithm.

Fig. 6 shows a schematic flowchart of a method for generating an item recommendation model provided in the present application, which may be applied to the above terminal device, and based on the above virtual interaction environment, the virtual interaction environment is built on the terminal device, by way of example and not limitation.

S601, obtaining a plurality of virtual samples output by the virtual interaction environment;

s602, inputting each virtual sample into an article recommendation model to update parameters of the article recommendation model.

In the above steps S601 and S602, the virtual sample output by the virtual interaction environment is used as a training sample for training the article recommendation model, so that the article recommendation model does not need to acquire real-time data on the article recommendation system as the training sample, training of the article recommendation model based on reinforcement learning under the offline condition is achieved, and training cost and training risk of the article recommendation model are reduced.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one is shown in fig. 7), a memory 71 and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in the above-described embodiments of the method of generating a virtual sample, the method of generating a virtual interaction environment or the method of generating an article recommendation model when executing the computer program 72.

The terminal device 7 may have functions for implementing the steps in the embodiments of the method for generating a virtual sample, the method for generating a virtual interactive environment, and the method for generating an article recommendation model. It can be understood that the steps in the embodiment of the method for generating the virtual sample, the method for generating the virtual interaction environment, or the method for generating the article recommendation model may be implemented on the same terminal device, or the steps in the embodiment of the method for generating the virtual sample, the method for generating the virtual interaction environment, or the method for generating the article recommendation model may be implemented on different terminal devices.

The terminal device 7 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the terminal device 7 and is not limiting of the terminal device 7, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 70 may be a central processing unit (Central Processing Unit, CPU) and the processor 70 may be other general purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may in other embodiments also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 71 may also be used for temporarily storing data that has been output or is to be output.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of generating a virtual sample based on a virtual interactive environment, the virtual interactive environment comprising a first machine learning model and a second machine learning model, the method comprising:

obtaining virtual user information, inputting the virtual user information into a first machine learning model, and obtaining search behavior information of a virtual user, wherein the search behavior information comprises the virtual user information and article information;

inputting the search behavior information into a second machine learning model to obtain selection behavior information of the virtual user, wherein the second machine learning model is used for predicting the probability of each selection behavior of the virtual user on the article according to the search behavior information;

2. The method for generating a virtual sample according to claim 1, wherein the acquiring virtual user information includes:

acquiring preset Gaussian noise, and acquiring a plurality of target vectors with preset dimensions from the Gaussian noise;

and inputting the target vector into a user generation model to simulate user information, and obtaining the virtual user information corresponding to the target vector.

3. The method for generating a virtual sample according to claim 1 or 2, wherein inputting the virtual user information into a first machine learning model to obtain search behavior information of a virtual user comprises:

extracting first characteristic information of the virtual user information through the first machine learning model, and acquiring a corresponding article recommendation list according to the first characteristic information, wherein the article recommendation list contains article information;

and combining the item information on the item recommendation list with the virtual user information to generate the search behavior information of the virtual user.

4. The method for generating a virtual sample according to claim 1 or 2, wherein the inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user includes:

extracting second characteristic information of the search behavior information through the second machine learning model, and classifying the second characteristic information through a preset classifier to obtain probability values of each selection made by the virtual user on the articles;

and determining the selection behavior information of the virtual user according to a plurality of probability values.

5. A method of generating a virtual interactive environment, the virtual interactive environment comprising a first machine learning model and a second machine learning model, the method comprising:

obtaining virtual user information, inputting the virtual user information into a first preset model, and obtaining search behavior information of a virtual user, wherein the search behavior information comprises the virtual user information and article information;

and taking the first preset model with updated model parameters as the first machine learning model, and taking the second preset model with updated model parameters as the second machine learning model, wherein the second machine learning model is used for predicting the probability of each selection action made by a virtual user on an article according to the search action information.

6. The method for generating a virtual interactive environment according to claim 5, wherein before the obtaining the virtual user information, further comprises:

and generating a user generation model by adopting an countermeasure generation network algorithm, wherein the user generation model is used for generating the virtual user information.

7. The method for generating a virtual interactive environment according to claim 5, wherein updating model parameters corresponding to the first preset model and the second preset model respectively according to a comparison result between the virtual interactive track and a preset real interactive track comprises:

identifying a comparison result between the virtual interaction track and the real interaction track through a preset identifier, and taking the comparison result as a reward value of the first preset model and the second preset model;

and updating model parameters respectively corresponding to the first preset model and the second preset model according to the search behavior information, the selection behavior information and the rewarding value.

8. A method of generating an item recommendation model based on the virtual interaction environment of any of claims 1-4, the method comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any one of claims 1 to 4, or any one of claims 5 to 7, or 8 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 4, or any one of claims 5 to 7, or 8.