CN116186326A

CN116186326A - Video recommendation method, model training method, electronic device and storage medium

Info

Publication number: CN116186326A
Application number: CN202211727100.5A
Authority: CN
Inventors: 张燕飞; 曲春燕; 储蓉蓉; 崔静宇
Original assignee: Weimeng Chuangke Network Technology China Co Ltd
Current assignee: Weimeng Chuangke Network Technology China Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-30

Abstract

The application discloses a video recommendation method, a model training method, electronic equipment and a storage medium, wherein the method comprises the following steps: responding to a video recommendation request, and acquiring a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended, and the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended; and recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

Description

Video recommendation method, model training method, electronic device and storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a video recommendation method, a model training method, electronic equipment and a storage medium.

Background

With the development of computer network technology in recent years, personalized recommendation technology also starts to iterate and update rapidly. The current recommendation technology mainly comprises a recall stage and a sorting stage, wherein model recall of the recall stage mainly automatically learns user interests through some machine learning algorithms, and recalls materials interested by the user according to the learned user interests. At present, user characteristics and material characteristics are mainly generated through a double-tower model through two model towers respectively to generate a user characteristic vector and a material vector, and materials possibly interested by a user are recalled according to the user characteristic vector.

It is difficult to represent multiple interests of a user by one user feature vector, and thus it is difficult to comprehensively recommend video material of interest to the user.

Disclosure of Invention

The embodiment of the application provides a video recommendation method, a model training method, electronic equipment and a storage medium, which can solve the problem that video materials of interest are difficult to comprehensively recommend to a user according to a single user feature vector.

In a first aspect, an embodiment of the present application provides a video recommendation method, including: responding to a video recommendation request, and acquiring a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended; and recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

In a second aspect, an embodiment of the present application provides a training method for a multiple interest generation model, where the method includes: acquiring a sample basic feature vector and a plurality of sample sequence feature vectors; wherein the sample video feed has tag information; inputting the sample basic feature vector and a plurality of sample sequence feature vectors into an initial multiple-interest generation model to obtain at least two target sample interest vectors; obtaining a loss value corresponding to the initial multiple-interest generation model according to at least two target sample interest vectors; updating parameters of the initial multi-interest generation model based on the loss value until convergence conditions are met, so as to obtain a trained multi-interest generation model; wherein the convergence condition includes the loss value being less than a second preset value.

In a third aspect, an embodiment of the present application provides a video recommendation apparatus, including: the first acquisition module is used for responding to the video recommendation request and acquiring basic feature vectors of users to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; the second acquisition module is used for inputting the basic feature vector and the plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended; and the recommending module is used for recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

In a fourth aspect, embodiments of the present application provide an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction implementing the steps of the method according to the first or second aspect when executed by the processor.

In a fifth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps of the method according to the first or second aspect.

In a sixth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect or the second aspect.

In the embodiment of the application, a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials are obtained by responding to a video recommendation request, wherein the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; recommending the video materials to the user to be recommended according to at least two target interest vectors and label information of a plurality of video materials; according to the trained multi-interest generation model, a plurality of target interest vectors capable of expressing interests of users to be recommended are obtained, so that video materials can be recommended to the users to be recommended according to the plurality of target interest vectors and label information of the video materials, and the problem that the video materials of interest are difficult to comprehensively recommend to the users according to a single user feature vector can be solved.

Drawings

Fig. 1 is a schematic flow chart of a video recommendation method according to an embodiment of the present application;

FIG. 2 is a flowchart of a training method of a multiple interest generation model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a multiple interest generation model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video recommendation device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The video recommendation method, the model training method, the electronic device and the storage medium provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Fig. 1 is a flow chart illustrating a video recommendation method according to an embodiment of the present application, where the method may be performed by an electronic device, and the electronic device may include: a server and/or a terminal device, wherein the terminal device may be, for example, a vehicle terminal or a mobile phone terminal, etc. Referring to fig. 1, the method may include the steps of:

step 101: responding to a video recommendation request, and acquiring a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended, and the sequence feature vector is used for representing behavior information of the user to be recommended on the video material.

The basic feature vector of the user to be recommended can comprise gender, age and environmental features; the sequence feature vector of the user to be recommended may include recent interaction behavior information of the user to be recommended, unique identification information (mid) of the video material watched, and author information (aid) of the video material, a primary label (one_level_tag), a secondary label (two_level_tag), a tertiary label (thread_level_tag), and a keyword label (keyword_tag), wherein the one, two, three labels, and keyword labels may include a plurality of, for example, one sequence feature vector may be expressed as:

mid@aid@one_level_tag@wo_level_tag|wo_level_tag@three level tag@ke yword tag|keyword tag, the method comprises a first-level label, two second-level labels, a third-level label and three keyword labels.

Optionally, in the video recommendation scenario, the length of the sequence feature vector may be 50 according to the existing features; in addition, the length of the sequence feature vector can be expanded, so that the requirement of covering more sequence features is met.

Step 102: inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vectors are used for representing interests of users to be recommended, and different target interest vectors represent different interests of the users to be recommended respectively.

Step 103: and recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

Specifically, assuming that k target interest vectors exist, TOPN called back by the k target interest vectors, namely the first N video materials, can be respectively taken and used as the final result of the recall to be output to the subsequent sequencing flow; and after the results recalled by the k target interest vectors are integrally sequenced, TOPN, namely the first N video materials, which are integrally sequenced, can be output to the subsequent sequencing flow.

In the embodiment of the application, the basic feature vector used for representing the basic information of the user to be recommended and a plurality of sequence feature vectors corresponding to the video materials are obtained by responding to the video recommendation request, and the sequence feature vectors are used for representing the behavior information of the user to be recommended on the video materials; the basic feature vector and the sequence feature vector are input into a trained multi-interest generation model to obtain at least two target interest vectors capable of expressing interests of users to be recommended, so that the recommendation of video materials to the users to be recommended can be performed according to the at least two target interest vectors and the label information of the video materials, and the problem that the video materials of interest are difficult to comprehensively recommend to the users according to a single user feature vector can be solved.

In one implementation, the multiple interest generation model may include: a capsule network layer and at least two fully connected layers (Fully Connected layers, FC); the step 102 may include the following steps:

step 201: inputting the basic feature vector and a plurality of sequence feature vectors into the capsule network layer, carrying out weighting, fusion and nonlinear transformation on each sequence feature vector, and respectively carrying out connection processing on at least two obtained first interest vectors and the basic feature vector to obtain a second interest vector corresponding to each first interest vector;

Step 202: and carrying out convolution processing on each second interest vector through the at least two FC layers to obtain the corresponding target interest vector.

In the embodiment of the application, vector transformation processing is performed on a plurality of sequence feature vectors through a capsule network layer of a multi-interest generation model, then the sequence feature vectors subjected to vector transformation are connected with basic feature vectors, convolution processing is performed on each obtained second interest vector through at least two FC layers, and at least two target interest vectors are obtained, so that whether to recommend the video material to a user to be recommended can be judged according to the target interest vectors and label information of the video material.

In one implementation, the capsule network layer may include a first processing unit, a second processing unit, and a third processing unit; the step 201 may include the following steps:

step 2011: carrying out fixed weight weighting on each sequence feature vector through a first processing unit of the capsule network layer to obtain at least two first weighting vectors corresponding to each sequence feature vector; the number of the first weighting vectors corresponding to each sequence feature vector is equal;

The first processing unit of the capsule network layer performs fixed weight weighting on each sequence feature vector through multiplication of the sequence feature vector and a corresponding weight matrix.

Step 2012: dynamically weighting each first weighting vector through a second processing unit of the capsule network layer to obtain a second weighting vector corresponding to each first weighting vector, wherein the dynamic weights are obtained through dynamic routing; each sequence feature vector corresponds to at least two second weighting vectors, and the number of the second weighting vectors corresponding to each sequence feature vector is equal;

the dynamic weight obtained through dynamic routing carries out dynamic weight weighting on the first weight vectors to obtain second weight vectors corresponding to each first weight vector; and when a fixed number of input sequence feature vectors have a gap, a minimum weight is given to the position without the sequence feature vectors.

Optionally, the initial dynamic weight in the dynamic routing process can select a weight with larger variance, so as to avoid the generation of multiple similar target interest vectors by the multiple interest generation model.

Step 2013: for at least two second weighting vectors corresponding to the sequence feature vectors, respectively carrying out fusion processing on the same number of second weighting vectors corresponding to the sequence feature vectors one by one to obtain at least two fusion vectors;

step 2014: carrying out nonlinear transformation on each fusion vector through a third processing unit of the capsule network layer to obtain a corresponding first interest vector;

and the third processing unit of the capsule network layer performs nonlinear transformation on the second weight vector, compresses the length of the second weight vector to be less than 1, and reduces the dimension of the second weight vector.

Step 2015: and carrying out connection concat processing on each first interest vector and the basic feature vector to obtain a second interest vector.

The basic feature vector is connected with the first interest vector processed through the multiple layers, and the obtained second interest vector comprises basic feature information of the user to be recommended and behavior information of the user to be recommended on the video material, so that the interests of the user to be recommended can be expressed more accurately.

In one implementation, the multi-interest generation model further includes: a pooling (pooling) layer; the method may further include, before the step 201: and inputting a plurality of sequence feature vectors into a pooling layer to pool each sequence feature vector, so as to obtain the processed sequence feature vector.

The pooling processing of the sequence feature vector can be realized by an average pooling (average_pooling) method; before vector transformation processing is performed on the sequence feature vectors, preprocessing is performed on the sequence feature vectors, for example, outlier removal, normalization processing and the like are performed, and subsequent use efficiency is improved.

In one implementation, the step 103 may include the following steps:

step 301: acquiring video materials with matching values larger than a first preset value between the video materials and the target interest vectors according to at least two target interest vectors and label information of a plurality of video materials;

step 302: sorting the video materials according to the sequence from the large matching value to the small matching value;

step 303: and recommending the video materials with the preset quantity which are ranked in front to the user to be recommended.

In the embodiment of the application, after at least two target interest vectors are obtained through a multiple interest generation model, searching video materials according to the target interest vectors, and obtaining the video materials with the matching value of the target interest vectors larger than a first preset value by matching the target interest vectors with the tag information of the video materials; sequencing the obtained video materials according to the sequence from the large value to the small value of the matching value of the obtained video materials with the target interest vector; the video materials with the preset quantity and the ordered before are recommended to the user to be recommended, the video materials which accord with a plurality of interests of the user to be recommended are recommended to the user to be recommended, and the problem that the video materials which are interested are difficult to comprehensively recommend to the user according to a single user feature vector is solved.

Optionally, the video materials can be searched according to each target interest vector, and the video materials with the preset number with the previous matching value are recommended to the user to be recommended; and the video materials can be searched according to the whole of at least two target interest vectors, and the preset number of video materials with the previous matching value are obtained for recommendation to the users to be recommended.

FIG. 2 is a flow chart of a training method for multiple interest generation models according to an embodiment of the present application. Referring to fig. 2, the method may include the steps of:

step 210: acquiring a sample basic feature vector and a plurality of sample sequence feature vectors;

step 220: inputting the sample basic feature vector and a plurality of sample sequence feature vectors into an initial multiple-interest generation model to obtain at least two target sample interest vectors;

step 230: obtaining a loss value corresponding to the initial multiple-interest generation model according to at least two target sample interest vectors;

step 240: updating parameters of the initial multi-interest generation model based on the loss value until convergence conditions are met, so as to obtain a trained multi-interest generation model; wherein the convergence condition includes the loss value being less than a second preset value.

In the embodiment of the application, at least two target sample interest vectors are obtained by inputting a sample basic feature vector and a plurality of sample sequence feature vectors into an initial multi-interest generation model; according to the target sample interest vector, acquiring a loss value corresponding to the initial multi-interest generation model, updating parameters of the initial multi-interest generation model based on the loss value, and performing iterative training on the initial multi-interest generation model until the loss value is smaller than a second preset value, so as to meet a convergence condition, thereby obtaining a trained multi-interest generation model, wherein the trained multi-interest generation model can generate interest vectors more conforming to user interests.

In one implementation, the initial multiple interest generation model includes: the system comprises a capsule network layer and at least two FC layers, wherein the capsule network layer comprises a first processing unit, a second processing unit and a third processing unit; the step 240 may include: updating the fixed weight of the first processing unit of the capsule network layer when the loss value is not smaller than the second preset value; updating parameters of the at least two FC layers.

Wherein the training of the initial multiple interest model is specifically to update the fixed weight in the first processing unit of the capsule network layer and the parameters of the FC layer; and when the loss value is smaller than a second preset value, determining the fixed weight in the first processing unit of the capsule network layer and parameters of the FC layer, and completing training of the initial multi-interest model to obtain a trained multi-interest generation model.

In one implementation, the initial multiple interest generation model further includes: a landing layer;

before said inputting said sample base feature vector and a plurality of said sample sequence feature vectors into said capsule network layer, further comprising:

and inputting a plurality of sample sequence feature vectors into a pooling layer to pool each sample sequence feature vector, so as to obtain the processed sample sequence feature vector.

In one implementation, the initial multiple interest generation model further includes: label-aware project layer and Sampled Softmax Loss layer; the step 230 may include the following steps:

step 231: inputting at least two target sample interest vectors into the Label-aware interest layer for processing to obtain a third interest vector;

step 232: inputting the third interest vector and a plurality of sample video material vectors into the Sampled Softmax Loss layer, and calculating a loss value corresponding to the initial multiple-interest generation model;

the plurality of sample video material vectors are obtained according to label information of the plurality of sample video materials.

Optionally, pooling processing can be performed through the pooling layer according to the tag information of the plurality of sample video materials to obtain a plurality of sample video material vectors.

Alternatively, the label information for the video material may be selected by negative sampling in a Batch (Batch) and filtering hot materials.

Optionally, before inputting the third interest vector and the video material vector into the Sampled Softmax Loss layer, regularization may be performed on the third interest vector and the video material vector, so that the values of the third interest vector and the video material vector are between-1 and 1, so as to calculate the loss value.

In the embodiment of the application, at least two target interest vectors are input into a Label-aware attribute layer for processing to obtain a third interest vector for calculating a loss value, the third interest vector and a plurality of sample video material vectors are input into a Sampled Softmax Loss layer, the loss value corresponding to the initial multiple-interest model is obtained through calculation, and whether the parameters of the initial multiple-interest model need to be updated is judged according to the loss value.

FIG. 3 is a schematic structural diagram of a multiple interest generation model according to an embodiment of the present application; according to the embodiment of the application, the basic feature vector of the user to be recommended and a plurality of sequence feature vectors corresponding to video materials are input into a multi-interest generation model, the sequence feature vectors are input into a pulling layer 31 for preprocessing, the preprocessed sequence feature vectors are connected with the basic feature vector through a first processing unit 32, a second processing unit 33 and a third processing unit 34 of a capsule network layer, at least two second interest vectors are obtained, and convolution processing is carried out on the second interest vectors through at least two

FC layers

35 and 36 to obtain a target interest vector capable of comprehensively expressing the interests of the user to be recommended; and then, according to at least two target interest vectors and the label information of the video materials, the video materials are comprehensively recommended to the user to be recommended, and the problem that the video materials of interest are difficult to comprehensively recommend to the user according to a single user feature vector is solved.

In one embodiment, when applied to the video recommendation method described in the present application, some of the video recommendation index benefits that can be obtained are as follows: recommended return on investment (Return on Investment, ROI) +2.55%; effectively play video return on investment (roi_3s_vv) +2.49%; recommended bit stream per person effective Video play volume (Video Views, VV) +2.67%; recommended average effective duration of persons +2.23%; duration return on investment (roi_3s duration) +2.00% for effective video play; recommendation bit effective playing time length+2.09%; recommended bit active play amount +2.53%.

It should be noted that, in the video recommendation method provided in the embodiment of the present application, the execution subject may be a video recommendation device, or a control module in the video recommendation device for executing the video recommendation method. In the embodiment of the present application, a video recommendation device executes a video recommendation method as an example, and the video recommendation device provided in the embodiment of the present application is described.

Fig. 4 is a schematic structural diagram of a video recommendation device according to an embodiment of the present application. As shown in fig. 4, the video recommendation device 40 includes: a first acquisition module 41, a second acquisition module 42 and a recommendation module 43.

A first obtaining module 41, configured to obtain, in response to a video recommendation request, a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; the second obtaining module 42 is configured to input the basic feature vector and the plurality of sequence feature vectors to a trained multiple interest generating model for vector transformation processing, so as to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended; and the recommending module 43 is configured to recommend video materials to the user to be recommended according to at least two target interest vectors and tag information of a plurality of video materials.

In one implementation, the multiple interest generation model includes: a capsule network layer and at least two FC layers; the second obtaining module 42 may be configured to input the basic feature vector and a plurality of the sequence feature vectors into the capsule network layer, perform weighting, fusion and nonlinear transformation on each of the sequence feature vectors, and perform connection processing on at least two obtained first interest vectors and the basic feature vector respectively to obtain a second interest vector corresponding to each of the first interest vectors; and carrying out convolution processing on each second interest vector through the at least two FC layers to obtain the corresponding target interest vector.

In one implementation, the capsule network layer includes a first processing unit, a second processing unit, and a third processing unit; the second obtaining module 42 may be configured to perform fixed weight weighting on each of the sequence feature vectors by using the first processing unit of the capsule network layer, so as to obtain at least two first weighted vectors corresponding to each of the sequence feature vectors; the number of the first weighting vectors corresponding to each sequence feature vector is equal; dynamically weighting each first weighting vector through a second processing unit of the capsule network layer to obtain a second weighting vector corresponding to each first weighting vector, wherein the dynamic weights are obtained through dynamic routing; each sequence feature vector corresponds to at least two second weighting vectors, and the number of the second weighting vectors corresponding to each sequence feature vector is equal; for at least two second weighting vectors corresponding to the sequence feature vectors, respectively carrying out fusion processing on the same number of second weighting vectors corresponding to the sequence feature vectors one by one to obtain at least two fusion vectors; carrying out nonlinear transformation on each fusion vector through a third processing unit of the capsule network layer to obtain a corresponding first interest vector; and carrying out connection concat processing on each first interest vector and the basic feature vector to obtain the corresponding second interest vector.

In one implementation, the multiple interest generation model further includes a pooling layer; the second obtaining module 42 may be further configured to input a plurality of the sequence feature vectors into a pooling layer to pool each of the sequence feature vectors, so as to obtain a processed sequence feature vector.

In one implementation manner, the recommendation module 43 may be configured to obtain, according to tag information of at least two target interest vectors and a plurality of video materials, a video material with a matching value with the target interest vectors greater than a first preset value; sorting the video materials according to the sequence from the large matching value to the small matching value; and recommending the video materials with the preset quantity which are ranked in front to the user to be recommended.

In one implementation, the apparatus 40 may further include a training module 44 that may be configured to update the fixed weight of the first processing unit of the capsule network layer when the loss value is not less than the second preset value; updating parameters of the at least two FC layers.

In one implementation, the apparatus 40 may further include a calculation module 45, configured to input at least two of the target sample interest vectors into the Label-aware interest layer for processing to obtain a third interest vector; inputting the third interest vector and a plurality of sample video material vectors into the Sampled Softmax Loss layer, and calculating a loss value corresponding to the initial multiple-interest generation model; the plurality of sample video material vectors are obtained according to label information of the plurality of sample video materials.

The video recommending apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The video recommendation device in the embodiment of the application may be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The video recommending apparatus provided in the embodiment of the present application can implement each process implemented in the method embodiments of fig. 1 to 2, and in order to avoid repetition, a description is omitted here.

Based on the same technical concept, the embodiment of the present application further provides an electronic device, where the electronic device is configured to perform the video recommendation method described above, and fig. 5 is a schematic structural diagram of an electronic device implementing the embodiments of the present application. The electronic device may have a relatively large difference due to different configurations or performances, and may include a processor (processor) 501, a communication interface (Communications Interface) 502, a memory (memory) 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504. The processor 501 may call a computer program stored on the memory 503 and executable on the processor 501 to perform the steps of:

responding to a video recommendation request, and acquiring a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material; inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended; and recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

The specific implementation steps can refer to the steps of the video recommendation method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and details are not repeated here.

Or performing acquisition of a sample basic feature vector and a plurality of sample sequence feature vectors; inputting the sample basic feature vector and a plurality of sample sequence feature vectors into an initial multiple-interest generation model to obtain at least two target sample interest vectors; obtaining a loss value corresponding to the initial multiple-interest generation model according to at least two target sample interest vectors; updating parameters of the initial multi-interest generation model based on the loss value until convergence conditions are met, so as to obtain a trained multi-interest generation model; wherein the convergence condition includes the loss value being less than a second preset value.

The specific execution steps can be referred to the steps of the training method embodiment of the multi-interest generation model, and the same technical effects can be achieved, so that repetition is avoided, and details are not repeated here.

The above electronic device structure does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine some components, or may be different in arrangement of components, for example, an input unit, may include a graphics processor (Graphics Processing Unit, GPU) and a microphone, and a display unit may configure a display panel in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit includes at least one of a touch panel and other input devices. Touch panels are also known as touch screens. Other input devices may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory may be used to store software programs as well as various data. The memory may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory may include volatile memory or nonvolatile memory, or the memory may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM).

The processor may include one or more processing units; optionally, the processor integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the processes of the embodiments of the video recommendation method and the training method of the multiple interest generation model are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, implement each process of the embodiments of the video recommendation method and the training method of the multiple interest generation model, and achieve the same technical effect, so that repetition is avoided, and no further description is given here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A video recommendation method, comprising:

responding to a video recommendation request, and acquiring a basic feature vector of a user to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended, and the sequence feature vector is used for representing behavior information of the user to be recommended on the video material;

inputting the basic feature vector and a plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended;

and recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

2. The method of claim 1, wherein the multiple interest generation model comprises: a capsule network layer and at least two fully connected layers FC;

inputting the basic feature vector and the sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors, wherein the method comprises the following steps of:

Inputting the basic feature vector and a plurality of sequence feature vectors into the capsule network layer, carrying out weighting, fusion and nonlinear transformation on each sequence feature vector, and respectively carrying out connection processing on at least two obtained first interest vectors and the basic feature vector to obtain a second interest vector corresponding to each first interest vector;

and carrying out convolution processing on each second interest vector through the at least two full connection layers FC to obtain the corresponding target interest vector.

3. The method of claim 2, wherein the capsule network layer comprises a first processing unit, a second processing unit, and a third processing unit;

inputting the basic feature vector and a plurality of sequence feature vectors into the capsule network layer, carrying out weighting, fusion and nonlinear transformation on each sequence feature vector, respectively carrying out connection processing on at least two obtained first interest vectors and the basic feature vector to obtain a second interest vector corresponding to each first interest vector, wherein the method comprises the following steps:

carrying out fixed weight weighting on each sequence feature vector through a first processing unit of the capsule network layer to obtain at least two first weighting vectors corresponding to each sequence feature vector; the number of the first weighting vectors corresponding to each sequence feature vector is equal;

Dynamically weighting each first weighting vector through a second processing unit of the capsule network layer to obtain a second weighting vector corresponding to each first weighting vector, wherein the dynamic weights are obtained through dynamic routing; each sequence feature vector corresponds to at least two second weighting vectors, and the number of the second weighting vectors corresponding to each sequence feature vector is equal;

for at least two second weighting vectors corresponding to the sequence feature vectors, respectively carrying out fusion processing on the same number of second weighting vectors corresponding to the sequence feature vectors one by one to obtain at least two fusion vectors;

carrying out nonlinear transformation on each fusion vector through a third processing unit of the capsule network layer to obtain a corresponding first interest vector;

and carrying out connection concat processing on each first interest vector and the basic feature vector to obtain the corresponding second interest vector.

4. The method of claim 2, wherein the multiple interest generation model further comprises: pooling the pooling layer;

before said inputting of said base feature vector and a plurality of said sequence feature vectors into said capsule network layer, further comprising:

And inputting a plurality of sequence feature vectors into a pooling layer to pool each sequence feature vector so as to obtain the processed sequence feature vector.

5. The method of claim 1, wherein the recommending video material to the user to be recommended based on at least two of the target interest vectors and tag information of a plurality of the video materials comprises:

acquiring video materials with matching values larger than a first preset value between the video materials and the target interest vectors according to at least two target interest vectors and label information of a plurality of video materials;

sorting the video materials according to the sequence from the large matching value to the small matching value;

and recommending the video materials with the preset quantity which are ranked in front to the user to be recommended.

6. A training method for a multiple interest generation model, comprising:

acquiring a sample basic feature vector and a plurality of sample sequence feature vectors corresponding to sample video materials; wherein the sample video feed has tag information;

inputting the sample basic feature vector and a plurality of sample sequence feature vectors into an initial multi-interest generation model to perform vector transformation processing to obtain at least two target sample interest vectors;

Obtaining a loss value corresponding to the initial multiple-interest generation model according to at least two target sample interest vectors;

updating parameters of the initial multi-interest generation model based on the loss value until convergence conditions are met, so as to obtain a trained multi-interest generation model;

wherein the convergence condition includes the loss value being less than a second preset value.

7. The method of claim 6, wherein the initial multiple interest generation model comprises: the system comprises a capsule network layer and at least two full connection layers FC, wherein the capsule network layer comprises a first processing unit, a second processing unit and a third processing unit;

the updating parameters of the initial multiple-interest generation model based on the loss value includes:

and when the loss value is not smaller than the second preset value, updating the fixed weight of the first processing unit of the capsule network layer, and updating the parameters of the at least two full-connection layers FC.

8. The method of claim 6, wherein the initial multiple interest generation model further comprises: label-aware project layer and Sampled Softmax Loss layer;

the obtaining a loss value corresponding to the initial multiple-interest generation model according to at least two target sample interest vectors includes:

Inputting at least two target sample interest vectors into the Label-aware interest layer for processing to obtain a third interest vector;

inputting the third interest vector and a plurality of sample video material vectors into the Sampled Softmax Loss layer, and calculating a loss value corresponding to the initial multiple-interest generation model;

9. A video recommendation device, comprising:

the first acquisition module is used for responding to the video recommendation request and acquiring basic feature vectors of users to be recommended and a plurality of sequence feature vectors corresponding to video materials; the basic feature vector is used for representing basic information of the user to be recommended; the sequence feature vector is used for representing behavior information of the user to be recommended on the video material;

the second acquisition module is used for inputting the basic feature vector and the plurality of sequence feature vectors into a trained multi-interest generation model to perform vector transformation processing to obtain at least two target interest vectors; the target interest vector is used for representing the interests of the user to be recommended;

And the recommending module is used for recommending the video materials to the user to be recommended according to at least two target interest vectors and the label information of a plurality of video materials.

10. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method of any of claims 1 to 8.

11. A readable storage medium, characterized in that it stores thereon a program or instructions, which when executed by a processor, implement the steps of the method according to any of claims 1 to 8.