CN114926192A

CN114926192A - Information processing method and device and computer readable storage medium

Info

Publication number: CN114926192A
Application number: CN202110138145.8A
Authority: CN
Inventors: 刘冲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-08-19

Abstract

The embodiment of the application discloses an information processing method, an information processing device and a computer readable storage medium, wherein the information processing method comprises the steps of obtaining a user characteristic vector after user characteristic information conversion, a video characteristic vector after video characteristic information conversion and a popularization characteristic vector after popularization characteristic information conversion; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; performing feature weighting training on the preset multi-task learning model according to the joint vector and the label information and different task types to obtain the preset multi-task learning model; and displaying the target video information and the target promotion information of which the association degree with the target video information is greater than a preset threshold value. Therefore, the accuracy of information processing is improved.

Description

Information processing method and device and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an information processing method and apparatus, and a computer-readable storage medium.

Background

With the continuous development of artificial intelligence, recommendation systems are more and more intelligent, for example, when advertisements or video recommendations are performed, the system can intelligently recommend advertisement types or video types which are interesting to users based on the use habits of the users, and accurate recommendation is achieved.

However, the traditional CTR (Click-Through-Rate) model only infers that the user does not need to focus on the advertisement, but does not infer whether the video associated with the advertisement likes any relevance, and if the inference needs to be combined, a model of video preference needs to be made, the process is complicated, and the two cannot be jointly processed, so that how to accurately recommend the multi-task recommendation type in a multi-dimensional way is also used for solving the problem.

Disclosure of Invention

The embodiment of the application provides an information processing method, an information processing device and a computer readable storage medium, which can improve the accuracy of information processing.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

an information processing method comprising:

acquiring a user characteristic vector after user characteristic information conversion, a video characteristic vector after video characteristic information conversion and a promotion characteristic vector after promotion characteristic information conversion;

respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector;

splicing the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector to obtain a joint vector;

performing feature weighting training on a preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model;

and displaying target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold value, wherein the target video information and the target popularization information are obtained by pushing a user through the trained preset multitask learning model.

An information processing apparatus comprising:

the acquiring unit is used for acquiring the user characteristic vector after the user characteristic information is converted, the video characteristic vector after the video characteristic information is converted and the popularization characteristic vector after the popularization characteristic information is converted;

the attention processing unit is used for respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector;

the splicing unit is used for splicing the user characteristic vector, the video characteristic vector, the promotion characteristic vector, the user video fusion vector and the user promotion fusion vector to obtain a joint vector;

the training unit is used for performing feature weighting training on a preset multi-task learning model according to the joint vector and the label information and different task types to obtain the trained preset multi-task learning model;

and the display unit is used for displaying target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold, wherein the target video information and the target popularization information are obtained by pushing a user through the trained preset multitask learning model.

In some embodiments, the attention processing unit comprises:

the dimension reduction subunit is used for reducing the dimensions of the video feature vector and the promotion feature vector through a preset full connection layer to obtain a target video feature vector and a target promotion feature vector with preset sizes;

and the processing subunit is used for respectively carrying out attention fusion processing on the user characteristic vector according to the target video characteristic vector and the target popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector.

In some embodiments, the processing subunit is to:

multiplying each user characteristic domain vector in the user characteristic vectors by a first preset matrix vector to obtain transition vectors with corresponding quantity;

transpose multiplying each transition vector with the target video feature vector to obtain a first weight value corresponding to each user feature domain vector;

weighting according to each user characteristic domain vector and the corresponding first weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user video fusion vector;

transpose multiplying each transition vector and the target promotion feature vector respectively to obtain a second weight value corresponding to each user feature domain vector;

and weighting according to each user characteristic domain vector and the corresponding second weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user popularization fusion vector.

In some embodiments, the training unit comprises:

the input subunit is used for inputting the joint vector to a preset multi-task learning model, and training a plurality of expert networks in the preset multi-task learning model to obtain a plurality of trained expert networks;

the determining subunit is used for determining a third weight value corresponding to each expert network under different task types;

the weighting subunit is used for performing weighting connection on the output of each expert network according to a third weight value corresponding to each task type;

the output subunit is used for loading the corresponding output after weighted connection to a corresponding task training network in a preset multi-task learning model according to the task type and outputting a target output result corresponding to each task type;

the comparison subunit is used for comparing the target output result of each task type with the corresponding label information to obtain a difference value;

and the adjusting subunit is used for adjusting the network parameters of the task training network according to the difference value until the difference value is converged to obtain a trained preset multi-task learning model.

In some embodiments, the determining subunit is configured to:

connecting the joint vector with a target video feature vector to obtain a first connecting vector;

multiplying the first connection vector by a second preset matrix vector to obtain a third weight value corresponding to each expert network under the video task type;

connecting the joint vector with the target popularization characteristic vector to obtain a second connection vector;

and multiplying the second connection vector by a second preset matrix vector to obtain a third weight value corresponding to each expert network under the promotion task type.

In some embodiments, the weighting subunit is configured to:

performing weighted connection on the output of each expert network according to a corresponding third weight value under the video task type;

and performing weighted connection on the output of each expert network according to a corresponding third weight value under the promotion task type.

In some embodiments, the output subunit is to:

loading the output after the corresponding weighted connection to a corresponding first task training network in a preset multi-task learning model according to the video task type, and outputting a first output result corresponding to the video task type;

acquiring a first low-order cross feature corresponding to the video task type, inputting the first low-order cross feature into a factorization model, and outputting a second output result corresponding to the first low-order cross feature;

loading the corresponding weighted output to a corresponding second task training network in a preset multi-task learning model according to the promotion task type, and outputting a third output result corresponding to the promotion task type;

acquiring second low-order cross features corresponding to the promotion task type, inputting the second low-order cross features into a factorization machine model, and outputting a fourth output result corresponding to the second low-order cross features;

adding the first output result and the second output result to obtain a target output result corresponding to the video task type;

and adding the third output result and the fourth output result to obtain a target output result corresponding to the promotion task type.

In some embodiments, the obtaining unit is configured to:

acquiring user characteristic information, video characteristic information and popularization characteristic information;

vectorizing the feature identifier corresponding to each feature domain in the user feature information to obtain a user feature domain vector corresponding to each feature domain;

splicing according to the user characteristic domain vector corresponding to each characteristic domain to obtain a user characteristic vector;

vectorizing the feature identifier corresponding to each feature domain in the video feature information to obtain a video feature domain vector corresponding to each feature domain;

splicing according to the video feature domain vector corresponding to each feature domain to obtain a video feature vector;

vectorizing the feature identifier corresponding to each feature domain in the popularization feature information to obtain a popularization feature domain vector corresponding to each feature domain;

and splicing according to the promotion feature domain vector corresponding to each feature domain to obtain the promotion feature vector.

A computer readable storage medium, storing a plurality of instructions, the instructions being suitable for being loaded by a processor to execute the steps of the information processing method.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the information processing method when executing the computer program.

A computer program product or computer program comprising computer instructions stored in a storage medium. The processor of the computer device reads the computer instructions from the storage medium, and executes the computer instructions to enable the computer to perform the steps of the information processing method.

According to the embodiment of the application, the user characteristic vector after the user characteristic information is converted, the video characteristic vector after the video characteristic information is converted and the popularization characteristic vector after the popularization characteristic information is converted are obtained; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; performing feature weighting training on the preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying the target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold value, wherein the target video information and the target popularization information are obtained by pushing a user through a trained preset multitask learning model. Therefore, by utilizing an attention mechanism, the user characteristic vector and the user popularization characteristic vector are respectively subjected to attention fusion processing through the video characteristic vector and the popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector, the user video fusion vector and the user popularization fusion vector can represent characteristic information of a video task and a popularization task which are more concerned in the user characteristic vector, the commonality of multiple tasks can be kept, the concerned information can be captured for each task, further, the user characteristic vector, the video characteristic vector, the popularization characteristic vector, the user video fusion vector and the user popularization fusion vector are spliced to obtain a joint vector, characteristic weighting training is carried out on the preset multi-task learning model according to different task types to obtain the trained preset multi-task learning model, and target video information and target popularization information of which the relevance with the target video information is more than a preset threshold value are output to be displayed, compared with the traditional prediction method of the CTR model, the trained preset multi-task learning model can be used for learning by paying attention to the characteristics required by the tasks through the self-attention mechanism, and can be used for predicting the multi-tasks at the same time, so that the accuracy of the output result of the model is higher, and the accuracy of information processing is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a scenario of an information processing system provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of an information processing method provided in an embodiment of the present application;

FIG. 3 is another schematic flow chart diagram of an information processing method provided in an embodiment of the present application;

FIG. 4a is a schematic product diagram of an information processing method according to an embodiment of the present application;

FIG. 4b is a block diagram of a multitask learning model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an information processing apparatus provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an information processing method, an information processing device and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of an information processing system according to an embodiment of the present application, including: the terminal a and the server (the information processing system may also include other terminals besides the terminal a, and the specific number of the terminals is not limited herein), the terminal a and the server may be connected through a communication network, which may include a wireless network and a wired network, wherein the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown in the figure. The terminal a may perform information interaction with the server through a communication network, for example, the terminal a sends the user feature information, the video feature information, and the popularization feature information to the server through an instant messaging application.

The information processing system may include an information processing apparatus, which may be specifically integrated in a server, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. As shown in fig. 1, the server receives user feature information, video feature information and popularization feature information sent by a terminal a, and obtains a user feature vector after user feature information conversion, a video feature vector after video feature information conversion and a popularization feature vector after popularization feature information conversion; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector to obtain a joint vector; performing feature weighting training on a preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and pushing the target video information and the target popularization information of which the association degree with the target video information is greater than a preset threshold value to the terminal based on the trained preset multitask learning model, so that the terminal can display the target video information and the target popularization information of which the association degree with the target video information is greater than the preset threshold value, and the video information and the popularization information are simultaneously recommended.

The terminal a in the information processing system may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal A can be used for installing various applications required by users, such as video applications and the like, the terminal A can collect user characteristic information, video characteristic information and popularization characteristic information of the users when the users use the video applications and send the user characteristic information, and can also receive target popularization information which is obtained by pushing the users by the server based on a trained preset multi-task learning model and displays target video information and the target popularization information of which the relevance degree with the target video information is larger than a preset threshold value.

It should be noted that the scenario diagram of the information processing system shown in fig. 1 is only an example, and the information processing system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application.

The following are detailed below.

In the present embodiment, description will be made from the viewpoint of an information processing apparatus which can be integrated specifically in a server having a storage unit and a microprocessor mounted thereon and having an arithmetic capability.

Referring to fig. 2, fig. 2 is a schematic flowchart of an information processing method according to an embodiment of the present disclosure. The information processing method includes:

in step 101, a user feature vector after the user feature information conversion, a video feature vector after the video feature information conversion, and a popularization feature vector after the popularization feature information conversion are obtained.

The user characteristic information can be composed of characteristic information of a plurality of user characteristic fields, such as preference characteristic information of a user to a video, user gender characteristic information and recently clicked label characteristic information. The video feature information may be composed of feature information of a plurality of video feature fields, such as classification feature information of the video and duration feature information of the video. The promotion feature information may be composed of feature information of a plurality of promotion feature domains, for example, classification feature information of the promotion information and price interval feature information of the promotion information, and the promotion information may be advertisement information.

Furthermore, the feature information of each feature domain of the user feature information, the video feature information and the popularization feature information can be processed by embedding (dimension reduction) to obtain a vector of each feature domain as an expression of the feature domain, and the embedding can realize conversion of a large sparse vector into a low-dimensional space with a semantic relationship. And splicing the vectors of each characteristic domain in the user characteristic information to obtain the user characteristic vector. And splicing the vectors of each characteristic domain in the video characteristic information to obtain the video characteristic vector. And splicing the vectors of each feature domain in the popularization feature information to obtain the popularization feature vector.

In some embodiments, the step of obtaining the user feature vector, the video feature vector, and the popularization feature vector after the user feature information, the video feature information, and the popularization feature information are converted includes:

(1) acquiring user characteristic information, video characteristic information and popularization characteristic information;

(2) vectorizing the feature identifier corresponding to each feature domain in the user feature information to obtain a user feature domain vector corresponding to each feature domain;

(3) splicing according to the user characteristic domain vector corresponding to each characteristic domain to obtain a user characteristic vector;

(4) vectorizing the feature identifier corresponding to each feature domain in the video feature information to obtain a video feature domain vector corresponding to each feature domain;

(5) splicing according to the video feature domain vector corresponding to each feature domain to obtain a video feature vector;

(6) vectorizing the feature identifier corresponding to each feature domain in the popularization feature information to obtain a popularization feature domain vector corresponding to each feature domain;

(7) and splicing according to the popularization feature domain vector corresponding to each feature domain to obtain the popularization feature vector.

The method includes the steps of obtaining user characteristic information, video characteristic information and popularization characteristic information, conducting vectorization processing on a characteristic identifier corresponding to each characteristic domain in the user characteristic information to obtain a user characteristic domain vector corresponding to each characteristic domain, wherein the characteristic domain can contain a plurality of characteristic identifiers, the characteristic identifiers are values representing the characteristics through characteristic hash (hash), for example, gender characteristics can be hash to 0 or 1, 0 can represent girls, and 1 can represent boys. For a better understanding of the embodiments of the present application, reference may be made to the following examples: suppose that the user characteristics are composed of three characteristic fields, namely the preference of the user to the video (characteristic field 1), the gender of the user (characteristic field 2) and the label clicked by the user recently (characteristic field 3), wherein, the feature field 1 comprises 3 feature Identifications (ID), the feature field 2 comprises 1 feature identification, the feature field 3 comprises 8 feature identifications, the embedding size (size) of the feature is set to be 8, after the embedding process, the feature domain 1 obtains 3 vectors of 1-by-8 dimensions, the feature domain 2 obtains 1 vector of 1-by-8 dimensions, the feature domain 3 obtains 3 vectors of 1-by-8 dimensions, and then carrying out average pooling operation on a plurality of vectors in each feature domain, so that each feature domain obtains a vector with 1 x 8 dimensions as the expression of each feature domain, and splicing the vectors of the three feature domains to obtain the user feature vector. By analogy, the generation process of the video feature vector and the promotion feature vector is the same, and details are not described here.

In step 102, attention fusion processing is performed on the user feature vector based on the video feature vector and the popularization feature vector, so as to obtain a user video fusion vector and a user popularization fusion vector.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:

among them, Attention (Attention) mechanism focuses limited Attention on the key information, so as to save resources and obtain the most effective information quickly. Namely the Attention operation, in order to better find the unique focus of different task types on user feature information.

Based on this, in the embodiment of the application, attention fusion processing is performed on the user feature vector respectively based on the video feature vector and the popularization feature vector to obtain the user video fusion vector and the user popularization fusion vector, the user video fusion vector is in a video task type and focuses on distribution information of importance of each feature domain in the user feature vector, the user feature domain which is important for processing the video task type is often distributed with a large weight to indicate that the focus is high, and the user feature domain which is unimportant for processing the video task type is often distributed with a small weight to indicate that the focus is low.

Furthermore, in the user popularization fusion vector, in the popularization task type, the importance degree attention distribution information of each feature domain in the user feature vector is concerned, the user feature domain which is important for popularization task type processing is often distributed with a large weight to indicate that the attention degree is high, and the user feature domain which is unimportant for popularization task type processing is often distributed with a small weight to indicate that the attention degree is low.

Therefore, the user video fusion vector and the user popularization fusion vector can be achieved through self-attention processing of the characterization video task and the popularization task on the feature information which is more concerned in the user feature vector, the commonality of multiple tasks can be reserved, the concerned information can be captured for each task, and the follow-up training on the multi-task processing can be more accurate.

In some embodiments, the step of performing attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector may include:

(1) reducing the dimensions of the video feature vector and the promotion feature vector through a preset full connection layer to obtain a target video feature vector and a target promotion feature vector with preset dimensions;

(2) and respectively carrying out attention fusion processing on the user characteristic vector according to the target video characteristic vector and the target popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector.

The preset full connection layer can be a full connection layer, the video feature vectors and the promotion feature vectors are subjected to dimensionality reduction through the full connection layer, the dimensionality reduction is carried out on the target video feature vectors and the target promotion feature vectors of the preset size, and the preset size can be 1-by-8 dimensionality.

Furthermore, the target video feature vector and the target popularization feature vector are used as context vectors in attention operation, attention fusion is carried out on vectors of each feature domain in the user feature vector, and the target video feature vector of the user feature information based on the video task type and the target popularization feature vector of the user feature information based on the popularization task type are obtained respectively.

In some embodiments, the step of performing attention fusion processing on the user feature vector according to the target video feature vector and the target popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector includes:

(1.1) multiplying each user characteristic domain vector in the user characteristic vectors by a first preset matrix vector to obtain transition vectors with corresponding quantity;

(1.2) transposing and multiplying each transition vector with the target video feature vector respectively to obtain a first weight value corresponding to each user feature domain vector;

(1.3) weighting according to each user characteristic domain vector and a corresponding first weight value, and carrying out vector averaging processing on a plurality of weighted user characteristic domain vectors to obtain a user video fusion vector;

(1.4) transposing and multiplying each transition vector and the target popularization characteristic vector respectively to obtain a second weight value corresponding to each user characteristic field vector;

and (1.5) weighting according to each user characteristic domain vector and the corresponding second weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user popularization fusion vector.

It can be understood that the Attention operation is to better find unique Attention points of different tasks to user features, for example, in a promotion prediction task, the gender of the user plays a very important role, the preference of the user to the video plays a little role, then the weight obtained by the Attention operation is 1 × 8-dimensional vector of the feature field of the gender of the user, the obtained weight is larger, and the preference of the user to the video is a vector of the field, the weight is much smaller. Correspondingly, in the video prediction task, the action of the user on the preference of the video is probably great, and the attention weight is high; while the gender characteristics of the user play little role, the attention weight is low.

Each user feature domain vector in the user feature vector may be multiplied by a first preset matrix vector, where the first preset matrix vector may be an 8-by-8 matrix, and a 1 × 8 transition vector (also referred to as an intermediate vector) corresponding to each user feature domain is obtained. And (4) transposing and multiplying each transition vector and the feature vector of the target video in the previous step respectively to obtain a corresponding fraction of each feature domain vector. And performing softmax (normalization processing) on all the scores to obtain a first weight value corresponding to each feature domain vector. The higher the weight value is, the higher the attention of the video task type to the feature domain vector is. The lower the weight value, the lower the attention of the video task type to the feature domain vector. And weighting according to each user characteristic domain vector and the corresponding first weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user video fusion vector as the attention vector expression of the video type task.

Furthermore, each transition vector is respectively transposed and multiplied with the target popularization eigenvector in the previous step, so as to obtain the corresponding fraction of each eigenvector in the eigen domain. And performing softmax on all the scores to obtain a first weight value corresponding to each feature domain vector. The higher the weight value is, the higher the attention of the promotion task type to the feature domain vector is. The lower the weight value is, the lower the attention of the promotion task type to the feature domain vector is. And weighting according to each user characteristic domain vector and the corresponding first weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors, namely carrying out vector summation on the plurality of user characteristic domain vectors and then carrying out average value processing on the vectors to obtain a user popularization fusion vector as the attention vector expression of the popularization type task.

In step 103, the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector are spliced to obtain a joint vector.

The user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector can be spliced through a concat layer to obtain a joint vector which is used as the input of the accessed multi-task learning model.

In step 104, feature weighting training is performed on the preset multi-task learning model according to the joint vector and the label information and according to different task types, so as to obtain the trained preset multi-task learning model.

In an embodiment, the preset Multi-task learning model may be an MMOE (Multi-gate texture-of-Experts) model, and the preset Multi-task learning model may process tasks with untight association, for example, the preset Multi-task learning model may recommend videos that users like, while considering whether the users have interest in goods carried by the videos.

The tag information is tag information corresponding to each task type, two task types are assumed, the video task type and the promotion task type are provided, the tag information corresponding to the video task type comprises 0 or 1, 0 is a video which is not clicked by a user, and 1 is a video which is clicked by the user. The tag information corresponding to the promotion task type can also contain 0 or 1, wherein 0 indicates that the user does not click on the promotion information, and 1 indicates that the user clicks on the promotion information. The method comprises the steps of training a preset multi-task learning model according to a joint vector and label information, performing feature weighting training through a user video fusion vector and a user popularization fusion vector in the joint vector according to different task types, enabling the preset multi-task learning model to be more concerned with a user feature domain with positive benefits in training of different task types in the training process of different task types through the user video fusion vector and the user popularization fusion vector, guiding network parameters of the preset multi-task learning model according to the difference between the label information and an output result until the difference is converged, and obtaining the trained preset multi-task learning model.

In some embodiments, the step of performing feature weighting training on the preset multi-task learning model according to different task types according to the joint vector and the label information to obtain the trained preset multi-task learning model may include:

(1) inputting the joint vector into a preset multi-task learning model, and training a plurality of expert networks in the preset multi-task learning model to obtain a plurality of trained expert networks;

(2) determining a third weight value corresponding to each expert network under different task types;

(3) performing weighted connection on the output of each expert network according to a corresponding third weight value under each task type;

(4) loading the output after the corresponding weighted connection to a corresponding task training network in a preset multi-task learning model according to the task type, and outputting a target output result corresponding to each task type;

(5) comparing the target output result of each task type with corresponding label information to obtain a difference value;

(6) and adjusting the network parameters of the task training network according to the difference value until the difference value is converged to obtain a trained preset multi-task learning model.

The joint vector can be input into a preset multi-task learning model, a plurality of expert Networks in the preset multi-task learning model are trained, the expert Networks are of standard Deep Neural Networks (DNN) structures, the number of the expert Networks can be set freely, and each expert network can learn different knowledge freely, for example, one part of the expert Networks can learn more knowledge about video information favorites, and the other part of the expert Networks can learn more knowledge about popularization information favorites.

Furthermore, because different tasks have different attention points to the output of each expert network, for example, in the task of the video task type, the attention points to more expert networks for learning the favorite knowledge of the video information are stronger, and in the task of the promotion task type, the attention points to more expert networks for learning the favorite knowledge of the promotion information are stronger, so that the corresponding third weight values of each expert network under different task types can be determined, the output of each expert network can be weighted and connected according to the corresponding third weight values under each task type by the preset multi-task learning model, the corresponding task training networks are independently set for the tasks of different task types in the preset multi-task learning model, the task training networks are composed of a plurality of layers of fully-connected networks, and the corresponding weighted and connected outputs are loaded to the corresponding task training networks according to the task types, and outputting a target output result corresponding to each task type, wherein the target output result can be between [ 0, 1 ], and for the prediction score, the closer the target output result is to 0, the closer the task is to the negative sample, and the closer the target output result is to 1, the closer the task is to the positive sample.

And comparing the target output result of each task type with the corresponding label information to obtain a difference value, wherein the difference value is the difference degree between the predicted value and the true value of the representative model, and adjusting the network parameters of the corresponding task training network according to the difference value to ensure that the task training network predicts the task more and more accurately until the difference value output by each task training network begins to converge and the representative training is finished to obtain the trained preset multi-task learning model.

In step 105, the target video information and the target promotion information with the association degree of the target video information being greater than the preset threshold value are displayed.

Wherein, the trained preset multitask learning model can deduce whether the user clicks the video information or not according to the user characteristic information of the user, and can also deduce whether the user clicks the promotion information carried by the video information or not, so that the target video information and the target promotion information which are highly related to the user characteristic information can be pushed for the user, namely, on the basis of pushing the target video information which is interested by the user, the target promotion information which is strongly related to the target video information can be simultaneously pushed, the preset threshold value is a critical value for judging whether the target video information and the target promotion information are strongly related or not, when the relevance of the target video information and the target promotion information is greater than the preset threshold value, the target video information and the target promotion information are judged to be strongly related, namely, the video information which is interested by the user can be pushed, and the promotion information carried by the video information which is interested by the user is also simultaneously interested by the user, in an embodiment, the target video information and the target popularization information can be pushed to the terminal in real time to be displayed in real time, and the accuracy of information pushing is improved.

As can be seen from the above, in the embodiment of the application, the user feature vector after the user feature information conversion, the video feature vector after the video feature information conversion, and the popularization feature vector after the popularization feature information conversion are obtained; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; performing feature weighting training on the preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying the target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold value, wherein the target video information and the target popularization information are obtained by pushing a user through a trained preset multitask learning model. Therefore, by utilizing an attention mechanism, the user characteristic vector and the user popularization characteristic vector are respectively subjected to attention fusion processing through the video characteristic vector and the popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector, the user video fusion vector and the user popularization fusion vector can represent characteristic information of a video task and a popularization task which are more concerned in the user characteristic vector, the commonality of multiple tasks can be kept, the concerned information can be captured for each task, further, the user characteristic vector, the video characteristic vector, the popularization characteristic vector, the user video fusion vector and the user popularization fusion vector are spliced to obtain a joint vector, characteristic weighting training is carried out on the preset multi-task learning model according to different task types to obtain the trained preset multi-task learning model, and target video information and target popularization information of which the relevance with the target video information is more than a preset threshold value are output to be displayed, compared with a traditional prediction method of the CTR model, the trained preset multi-task learning model can learn by paying attention to the characteristics required by the tasks through a self-attention mechanism, and can predict the multi-tasks at the same time, so that the accuracy of the output result of the model is higher, and the accuracy of information processing is greatly improved.

The method described in connection with the above embodiments will be described in further detail below by way of example.

In the present embodiment, the information processing apparatus will be described by taking an example in which it is specifically integrated in a server, and specific reference will be made to the following description.

Referring to fig. 3, fig. 3 is another schematic flow chart of an information processing method according to an embodiment of the present disclosure. The method flow can comprise the following steps:

in step 201, the server obtains user feature information, video feature information, and popularization feature information.

To better describe the embodiment of the present application, please refer to fig. 4a, where fig. 4a is a schematic product diagram of an information processing method provided in the embodiment of the present application, where the product interface 10 includes a video playing area 11 and a push information displaying area 12, and the push information may be advertisement information and is used to link to a selling interface of a commodity, that is, when the product interface is pushed for a user, a video that the user likes is recommended to the video playing area 11, and at the same time, whether the user is interested in the push information carried by the video is considered.

In order to solve the above problem and perform more accurate push, the server needs to obtain user feature information, video feature information, and popularization feature information. The user characteristic information may be composed of characteristic information of a plurality of user characteristic fields, such as user preference characteristic information of a video, user gender characteristic information and tag characteristic information which is clicked recently. The video feature information may be composed of feature information of a plurality of video feature fields, such as classification feature information of the video and duration feature information of the video. The promotion feature information may be composed of feature information of a plurality of promotion feature domains, for example, classification feature information of the promotion information and price interval feature information of the promotion information.

In step 202, the server performs vectorization processing on the feature identifier corresponding to each feature domain in the user feature information to obtain a user feature domain vector corresponding to each feature domain, and performs splicing according to the user feature domain vectors corresponding to each feature domain to obtain a user feature vector.

Please refer to fig. 4b simultaneously, fig. 4b is a schematic diagram of an architecture of a multitask learning model provided in the embodiment of the present application, where a user feature may be composed of three feature domains, which are a preference of a user for a video (feature domain 1), a gender of the user (feature domain 2), and a tag that the user has clicked recently (feature domain 3), where the feature domain 1 includes 3 feature Identifiers (IDs), the feature domain 2 includes 1 feature identifier, the feature domain 3 includes 8 feature identifiers, and an embedding size (size) of a feature is set to 8, after the embedding process, the feature domain 1 obtains 3 vectors of 1 by 8 dimensions, the feature domain 2 obtains 1 vector of 1 by 8 dimensions, the feature domain 3 obtains 3 vectors of 1 by 8 dimensions, and then an average pooling operation is performed on a plurality of vectors in each feature domain, so that each feature domain obtains one vector of 1 by 8 dimensions, and splicing the three characteristic domain vectors as the expressions of the respective characteristic domains to obtain the user characteristic vector with the dimension of 1-by-24.

In step 203, the server performs vectorization processing on the feature identifier corresponding to each feature domain in the video feature information to obtain a video feature domain vector corresponding to each feature domain, performs splicing according to the video feature domain vector corresponding to each feature domain to obtain a video feature vector, performs vectorization processing on the feature identifier corresponding to each feature domain in the promotion feature information to obtain a promotion feature domain vector corresponding to each feature domain, and performs splicing according to the promotion feature domain vector corresponding to each feature domain to obtain a promotion feature vector.

Please refer to the derivation process of the user feature vector, and the generation processes of the video feature vector and the popularization feature vector are the same, so that the 1-by-16 dimensional video feature vector and the 1-by-16 dimensional popularization feature vector can be obtained, which is not described in detail here.

In step 204, the server performs dimension reduction on the video feature vector and the promotion feature vector through a preset full connection layer to obtain a target video feature vector and a target promotion feature vector of a preset size.

As shown in fig. 4b, the server reduces the dimensions of the video feature vector and the promotion feature vector to the target video feature vector and the target promotion feature vector of the preset size of 1 by 8 through a full connection layer.

In step 205, the server multiplies each user feature domain vector in the user feature vectors by a first preset matrix vector to obtain a corresponding number of transition vectors, performs transposition multiplication on each transition vector and the target video feature vector respectively to obtain a first weight value corresponding to each user feature domain vector, performs weighting according to each user feature domain vector and the corresponding first weight value, and performs vector averaging processing on the weighted plurality of user feature domain vectors to obtain a user video fusion vector.

As shown in fig. 4b, the server multiplies the 1-by-8 dimensional vector of each user feature field in the user feature vector by the 8-by-8 dimensional first preset matrix vector, to obtain a 1-by-8 transition vector corresponding to each user feature field. And (4) transposing and multiplying each transition vector and the feature vector of the target video in the previous step respectively to obtain a corresponding fraction of each feature domain vector. And performing softmax (normalization processing) on all the scores to obtain a first weight value corresponding to each feature domain vector. The higher the weight value is, the higher the attention of the video task type to the feature domain vector is. The lower the weight value is, the lower the attention of the video task type to the feature domain vector is. And weighting according to each user characteristic domain vector and the corresponding first weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user video fusion vector as self-attention vector expression of the video type task.

In step 206, the server performs transposition multiplication on each transition vector and the target popularization eigenvector respectively to obtain a second weight value corresponding to each user eigenvector, performs weighting according to each user eigenvector and the corresponding second weight value, and performs vector averaging processing on the weighted plurality of user eigenvectors to obtain the user popularization fusion vector.

As shown in fig. 4b, each transition vector is transposed and multiplied by the target popularization eigenvector in the previous step, so as to obtain a corresponding score of each eigenvector. And performing softmax on all the scores to obtain a first weight value corresponding to each feature domain vector. The higher the weight value is, the higher the attention of the promotion task type to the feature domain vector is. The lower the weight value is, the lower the attention of the promotion task type to the feature domain vector is. And weighting according to each user characteristic domain vector and the corresponding first weight value, and carrying out vector averaging processing on the weighted plurality of user characteristic domain vectors to obtain a user popularization fusion vector as self-attention vector expression of the popularization type task.

In step 207, the server splices the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector, and the user promotion fusion vector to obtain a joint vector.

Referring to fig. 4b, the server may concatenate the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector, and the user promotion fusion vector through the concat input layer to obtain a joint vector, which is used as an input of the incoming multi-task learning model.

In step 208, the server inputs the joint vector into a preset multitask learning model, and trains a plurality of expert networks in the preset multitask learning model to obtain a plurality of trained expert networks.

As shown in fig. 4b, the server may input the joint vector to a plurality of expert networks in a preset multi-task learning model for training, where the expert networks are standard deep neural networks and may be three, two experts may learn more knowledge about the popularization information, and another expert may learn more knowledge about the video information. And obtaining the trained expert network, wherein the trained expert network can output the result information after corresponding learning.

In step 209, the server connects the joint vector with the target video feature vector to obtain a first connection vector, multiplies the first connection vector by a second preset matrix vector to obtain a third weight value corresponding to each expert network in the video task type, connects the joint vector with the target promotion feature vector to obtain a second connection vector, multiplies the second connection vector by the second preset matrix vector to obtain a third weight value corresponding to each expert network in the promotion task type.

Please refer to fig. 4b together, because different tasks have different Attention points for the output of each expert network, the joint vector and the target video feature vector may be connected by an Attention mechanism Gate network (Attention Gate) to obtain a first connection vector, and then the first connection vector is multiplied by a second preset matrix vector, assuming that the first connection vector is a 1-by-50 dimensional vector and the second preset matrix vector may be a 50-by-3 dimensional vector, so that 3 scores may be obtained, and the 3 scores are normalized to obtain a third weight value corresponding to each expert network in the video task type, which represents the Attention degrees of the three expert networks in the video task type for the video type task processing, and the higher the weight value is, the higher the Attention degree is, and the better the benefit for the task processing is.

Further, the joint vector and the target promotion feature vector can be connected through the attention mechanism gating network to obtain a second connection vector, the second connection vector is multiplied by a second preset matrix vector, the first connection vector is assumed to be a 1-by-50 dimensional vector, the second preset matrix vector can be a 50-by-3 dimensional vector, 3 scores can be obtained, 3 scores are normalized, a third weight value corresponding to each expert network under the promotion task type is obtained, the attention degree of the three expert networks to promotion type task processing under the promotion task type is represented, the higher the weight value is, the higher the attention degree is, and the better the benefit of the task processing is.

In step 210, the server performs weighted connection on the output of each expert network according to a third weighted value corresponding to the video task type, and performs weighted connection on the output of each expert network according to a third weighted value corresponding to the promotion task type.

As shown in fig. 4B, the server performs weighted connection on the output of each expert network according to the third weight value corresponding to the video type through the task merging layer a (task a large layer), and performs weighted connection on the output of each expert network according to the third weight value corresponding to the promoted task type through the task merging layer B (task B large layer).

In step 211, the server loads the output after the corresponding weighted connection to a corresponding first task training network in a preset multi-task learning model according to the video task type, outputs a first output result corresponding to the video task type, obtains a first low-order cross feature corresponding to the video task type, inputs the first low-order cross feature into the factorization model, and outputs a second output result corresponding to the first low-order cross feature.

It can be understood that, because only the high-order cross feature and no low-order cross feature exist in the preset multitask learning model, in order to better help the preset multitask learning model to perform fitting and avoid loss of effective information, the low-order cross feature is introduced in the embodiment of the present application, and the following steps are specifically referred to:

as shown in fig. 4b, the server may load the output after the corresponding weighted connection to a corresponding first task training network (Tower a) in a preset multitask learning model through the task merging layer a according to the video task type, output a first output result corresponding to the video task type, may obtain a first low-order cross feature corresponding to the video task type for performing a combination calculation by introducing the low-order cross feature, where the first low-order cross feature may include the user video fusion vector and the video feature vector, input the user video fusion vector and the video feature vector into a factor decomposition model (FM), and output a second output result corresponding to the prediction of the first low-order cross feature.

In step 212, the server loads the output after the corresponding weighted connection to a corresponding second task training network in a preset multi-task learning model according to the promotion task type, outputs a third output result corresponding to the promotion task type, obtains a second low-order cross feature corresponding to the promotion task type, inputs the second low-order cross feature into the factorization model, and outputs a fourth output result corresponding to the second low-order cross feature.

As shown in fig. 4B, the server may load the output after the corresponding weighted connection to a corresponding first task training network B (tower B) in a preset multitask learning model through a task merging layer B according to the promotion task type, output a third output result corresponding to the promotion task type, perform a combined calculation to introduce a low-order cross feature, may obtain a second low-order cross feature corresponding to the promotion task type, where the second low-order cross feature may include the user promotion fusion vector and the promotion feature vector, input the user promotion fusion vector and the promotion feature vector into a factor decomposition Machine (ads factor Machine, FM), and output a fourth output result corresponding to the prediction of the second low-order cross feature.

In step 213, the server adds the first output result and the second output result to obtain a target output result corresponding to the video task type, adds the third output result and the fourth output result to obtain a target output result corresponding to the promotion task type, compares the target output result of each task type with the corresponding label information to obtain a difference value, and adjusts the network parameters of the task training network according to the difference value until the difference value converges to obtain the trained preset multi-task learning model.

As shown in fig. 4b, the server adds the first output result and the second output result to obtain a target output result corresponding to the video task type, and because the output information corresponding to the low-order cross feature is introduced into the target output result corresponding to the video task type, the preset multi-task model can better perform data fitting, so that effective information corresponding to the low-order feature data is prevented from being lost, and the prediction result corresponding to the task of the video task type can be more accurate.

Furthermore, the server adds the third output result and the fourth output result to obtain a target output result corresponding to the promotion task type, and the information of the low-order cross feature is introduced into the target output result corresponding to the promotion task type, so that the prediction result corresponding to the task of the promotion task type is more accurate.

The tag information is tag information corresponding to each task type, the number of the task types is two, the number of the task types is a video task type and a promotion task type, the tag information corresponding to the video task type comprises 0 or 1, 0 is a video which is not clicked by a user, and 1 is a video which is clicked by the user. The tag information corresponding to the promotion task type can also contain 0 or 1, wherein 0 indicates that the user does not click on the promotion information, and 1 indicates that the user clicks on the promotion information. And comparing target output results of the video task type and the promotion task type with corresponding label information to obtain a difference value, adjusting network parameters of the two task training networks according to the difference value until the difference value is converged, and obtaining a trained preset multi-task learning model.

In step 214, the server displays the target video information and the target promotion information with the association degree with the target video information being greater than a preset threshold.

The trained preset multi-task learning model can deduce whether the user clicks the video information or not according to the user characteristic information of the user, and can deduce whether the user clicks the promotion information carried by the video information or not at the same time, so that the goal promotion information strongly related to the goal video information can be simultaneously pushed out on the basis of pushing out the goal video information interested by the user, the video information interested by the user can be pushed out, the promotion information carried by the video information interested by the user is also interested at the same time, the server can directly display the goal video information and the goal promotion information with the relevance degree of the goal video information being larger than a preset threshold value, the display picture is pushed to the corresponding user terminal for real-time display, and the accuracy of information pushing is improved.

In some embodiments, since the click-through rate of an advertisement is very low, only 0.35%, and the sample scale is unbalanced, a Focal local implementation model is used to address this imbalance, by which positive samples are more focused.

As can be seen from the above, in the embodiment of the application, the user feature vector after the conversion of the user feature information, the video feature vector after the conversion of the video feature information, and the popularization feature vector after the conversion of the popularization feature information are obtained; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; performing feature weighting training on the preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying target video information and target promotion information of which the association degree with the target video information is greater than a preset threshold value, wherein the target video information and the target promotion information are obtained by pushing a user through a trained preset multi-task learning model. Therefore, by utilizing an attention mechanism, the user characteristic vector and the user popularization characteristic vector are respectively subjected to attention fusion processing through the video characteristic vector and the popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector, the user video fusion vector and the user popularization fusion vector can represent characteristic information of a video task and a popularization task which are more concerned in the user characteristic vector, the commonality of multiple tasks can be kept, the concerned information can be captured for each task, further, the user characteristic vector, the video characteristic vector, the popularization characteristic vector, the user video fusion vector and the user popularization fusion vector are spliced to obtain a joint vector, characteristic weighting training is carried out on the preset multi-task learning model according to different task types to obtain the trained preset multi-task learning model, and target video information and target popularization information of which the relevance with the target video information is more than a preset threshold value are output to be displayed, compared with the traditional prediction method of the CTR model, the trained preset multi-task learning model can be used for learning by paying attention to the characteristics required by the tasks through the self-attention mechanism, and can be used for predicting the multi-tasks at the same time, so that the accuracy of the output result of the model is higher, and the accuracy of information processing is greatly improved.

Further, according to experiments, the model of the embodiment of the application is found to be obviously superior to the MMoE model, and the specific experimental data is as follows:

in order to better implement the information processing method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the information processing method. The terms are the same as those in the above-described information processing method, and details of implementation may refer to the description in the method embodiment.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure, where the information processing apparatus may include an obtaining unit 301, an attention processing unit 302, a splicing unit 303, a training unit 304, a display unit 305, and the like.

An obtaining unit 301, configured to obtain a user feature vector after the user feature information is converted, a video feature vector after the video feature information is converted, and a popularization feature vector after the popularization feature information is converted.

In some embodiments, the obtaining unit 301 is configured to:

vectorizing the feature identifier corresponding to each feature domain in the promotion feature information to obtain a promotion feature domain vector corresponding to each feature domain;

The attention processing unit 302 is configured to perform attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector, respectively, to obtain a user video fusion vector and a user popularization fusion vector.

In some embodiments, the attention processing unit 302 includes:

In some embodiments, the processing subunit is to:

multiplying each user characteristic field vector in the user characteristic vectors by a first preset matrix vector to obtain transition vectors with corresponding quantity;

transpose and multiply each transition vector with the feature vector of the target video respectively to obtain a first weight value corresponding to each user feature domain vector;

weighting according to each user characteristic domain vector and a corresponding first weight value, and carrying out vector averaging processing on a plurality of weighted user characteristic domain vectors to obtain a user video fusion vector;

transpose and multiply each transition vector and the target popularization characteristic vector respectively to obtain a second weight value corresponding to each user characteristic domain vector;

And the splicing unit 303 is configured to splice the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector, and the user popularization fusion vector to obtain a joint vector.

And the training unit 304 is configured to perform feature weighting training on the preset multi-task learning model according to different task types according to the joint vector and the label information, so as to obtain the trained preset multi-task learning model.

In some embodiments, the training unit 304 includes:

the weighting subunit is used for performing weighting connection on the output of each expert network according to a third weighted value corresponding to each task type;

and the adjusting subunit is used for adjusting the network parameters of the task training network according to the difference value until the difference value is converged to obtain the trained preset multi-task learning model.

In some embodiments, the determining subunit is configured to:

In some embodiments, the weighting subunit is configured to:

In some embodiments, the output subunit is configured to:

acquiring a first low-order cross feature corresponding to the video task type, inputting the first low-order cross feature into a factorization machine model, and outputting a second output result corresponding to the first low-order cross feature;

acquiring second low-order cross characteristics corresponding to the promotion task type, inputting the second low-order cross characteristics into a factorization model, and outputting a fourth output result corresponding to the second low-order cross characteristics;

A display unit 305, configured to display target video information and target popularization information whose association with the target video information is greater than a preset threshold, where the target video information and the target popularization information are obtained by pushing a user through the trained preset multitask learning model.

The specific implementation of each unit can refer to the previous embodiment, and is not described herein again.

As can be seen from the above, in the embodiment of the present application, the obtaining unit 301 obtains the user feature vector after the user feature information is converted, the video feature vector after the video feature information is converted, and the popularization feature vector after the popularization feature information is converted; the attention processing unit 302 performs attention fusion processing on the user feature vector based on the video feature vector and the promotion feature vector to obtain a user video fusion vector and a user promotion fusion vector; the splicing unit 303 splices the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; the training unit 304 performs feature weighted training on the preset multi-task learning model according to different task types according to the joint vector and the label information to obtain a trained preset multi-task learning model; the display unit 305 displays target video information and target promotion information having a degree of association with the target video information greater than a preset threshold, where the target video information and the target promotion information are obtained by pushing a user through a trained preset multitask learning model. Therefore, by utilizing an attention mechanism, the user characteristic vector and the user popularization characteristic vector are respectively subjected to attention fusion processing through the video characteristic vector and the popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector, the user video fusion vector and the user popularization fusion vector can represent characteristic information of a video task and a popularization task which are more concerned in the user characteristic vector, the commonality of multiple tasks can be kept, the concerned information can be captured for each task, further, the user characteristic vector, the video characteristic vector, the popularization characteristic vector, the user video fusion vector and the user popularization fusion vector are spliced to obtain a joint vector, characteristic weighting training is carried out on the preset multi-task learning model according to different task types to obtain the trained preset multi-task learning model, and target video information and target popularization information of which the relevance with the target video information is more than a preset threshold value are output to be displayed, compared with the traditional prediction method of the CTR model, the trained preset multi-task learning model can be used for learning by paying attention to the characteristics required by the tasks through the self-attention mechanism, and can be used for predicting the multi-tasks at the same time, so that the accuracy of the output result of the model is higher, and the accuracy of information processing is greatly improved.

An embodiment of the present application further provides a computer device, as shown in fig. 6, which shows a schematic structural diagram of a server according to the embodiment of the present application, specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 6 does not constitute a limitation of the computer device, and may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby integrally monitoring the computer device. Optionally, processor 401 may include one or more processing cores; optionally, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the components, and optionally, the power supply 403 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement the various method steps provided by the foregoing embodiments, as follows:

acquiring a user characteristic vector after user characteristic information conversion, a video characteristic vector after video characteristic information conversion and a promotion characteristic vector after promotion characteristic information conversion; based on the video feature vector and the popularization feature vector, attention fusion processing is respectively carried out on the user feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector to obtain a joint vector; performing feature weighting training on a preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold, wherein the target video information and the target popularization information are obtained by pushing a user through the trained preset multitask learning model.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the information processing method, which is not described herein again.

As can be seen from the above, the computer device according to the embodiment of the present application may obtain the user feature vector after the user feature information is converted, the video feature vector after the video feature information is converted, and the popularization feature vector after the popularization feature information is converted; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector; performing feature weighting training on the preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying the target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold value, wherein the target video information and the target popularization information are obtained by pushing a user through a trained preset multitask learning model. Therefore, by using an attention mechanism, the user characteristic vectors are respectively subjected to attention fusion processing through the video characteristic vectors and the popularization characteristic vectors to obtain user video fusion vectors and user popularization fusion vectors, the user video fusion vectors and the user popularization fusion vectors can represent characteristic information of a video task and a popularization task, which is more concerned in the user characteristic vectors, not only can the commonness of multiple tasks be kept, but also the concerned information can be captured for each task, further, the user characteristic vectors, the video characteristic vectors, the popularization characteristic vectors, the user video fusion vectors and the user popularization fusion vectors are spliced to obtain a joint vector, the obtained joint vector carries out feature weighted training on the preset multi-task learning models according to different task types to obtain the trained preset multi-task learning models, and target popularization information with the relevance degree of the target video information being larger than a preset threshold value is output to be displayed, compared with the traditional prediction method of the CTR model, the trained preset multi-task learning model can be used for learning by paying attention to the characteristics required by the tasks through the self-attention mechanism, and can be used for predicting the multi-tasks at the same time, so that the accuracy of the output result of the model is higher, and the accuracy of information processing is greatly improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the information processing methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:

acquiring a user characteristic vector after user characteristic information conversion, a video characteristic vector after video characteristic information conversion and a promotion characteristic vector after promotion characteristic information conversion; respectively carrying out attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector; splicing the user feature vector, the video feature vector, the promotion feature vector, the user video fusion vector and the user promotion fusion vector to obtain a joint vector; performing feature weighting training on a preset multi-task learning model according to the joint vector and the label information and different task types to obtain a trained preset multi-task learning model; and displaying target video information and target popularization information of which the association degree with the target video information is greater than a preset threshold, wherein the target video information and the target popularization information are obtained by pushing a user through the trained preset multitask learning model.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations provided by the embodiments described above.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any information processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any information processing method provided in the embodiments of the present application can be achieved, and detailed descriptions are omitted here for the details, see the foregoing embodiments.

The foregoing detailed description has provided a method, an apparatus, and a computer-readable storage medium for information processing provided in the embodiments of the present application, and specific examples have been applied herein to explain the principles and implementations of the present application, and the description of the foregoing embodiments is only used to help understand the method and its core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An information processing method characterized by comprising:

splicing the user feature vector, the video feature vector, the popularization feature vector, the user video fusion vector and the user popularization fusion vector to obtain a joint vector;

2. The information processing method according to claim 1, wherein the step of performing attention fusion processing on the user feature vector based on the video feature vector and the popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector comprises:

reducing the dimensions of the video feature vector and the promotion feature vector through a preset full connection layer to obtain a target video feature vector and a target promotion feature vector with preset sizes;

and respectively carrying out attention fusion processing on the user characteristic vector according to the target video characteristic vector and the target popularization characteristic vector to obtain a user video fusion vector and a user popularization fusion vector.

3. The information processing method according to claim 2, wherein the step of performing attention fusion processing on the user feature vector according to the target video feature vector and the target popularization feature vector to obtain a user video fusion vector and a user popularization fusion vector comprises:

multiplying each user feature domain vector in the user feature vectors by a first preset matrix vector to obtain transition vectors with corresponding quantity;

transpose and multiply each transition vector with the target video feature vector respectively to obtain a first weight value corresponding to each user feature domain vector;

4. The information processing method according to claim 1, wherein the step of performing feature weighting training on the preset multi-task learning model according to different task types according to the joint vector and the label information to obtain the trained preset multi-task learning model comprises:

inputting the joint vector into a preset multi-task learning model, and training a plurality of expert networks in the preset multi-task learning model to obtain a plurality of trained expert networks;

determining a third weight value corresponding to each expert network under different task types;

performing weighted connection on the output of each expert network according to a corresponding third weight value under each task type;

loading the output after the corresponding weighted connection to a corresponding task training network in a preset multi-task learning model according to the task type, and outputting a target output result corresponding to each task type;

comparing the target output result of each task type with corresponding label information to obtain a difference value;

and adjusting the network parameters of the task training network according to the difference value until the difference value is converged to obtain a trained preset multi-task learning model.

5. The information processing method according to claim 4, wherein the step of determining a third weight value corresponding to each expert network under different task types comprises:

6. The information processing method according to claim 5, wherein the step of connecting outputs of each expert network in a weighted manner according to the third weight value corresponding to each task type includes:

carrying out weighted connection on the output of each expert network according to a corresponding third weight value under the video task type;

7. The information processing method according to claim 6, wherein the step of loading the output after the corresponding weighted connection to a corresponding task training network in a preset multi-task learning model according to the task type and outputting a target output result corresponding to each task type comprises:

8. The information processing method according to any one of claims 1 to 7, wherein the step of obtaining the user feature vector after the user feature information conversion, the video feature vector after the video feature information conversion, and the popularization feature vector after the popularization feature information conversion includes:

9. An information processing apparatus characterized by comprising:

the acquisition unit is used for acquiring the user characteristic vector after the user characteristic information is converted, the video characteristic vector after the video characteristic information is converted and the popularization characteristic vector after the popularization characteristic information is converted;

the training unit is used for performing feature weighting training on a preset multi-task learning model according to different task types according to the joint vector and the label information to obtain a trained preset multi-task learning model;

10. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the information processing method according to any one of claims 1 to 8.

11. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the information processing method of any one of claims 1 to 8 when executing the computer program.