CN113742566B

CN113742566B - Recommendation method and device for multimedia information, electronic equipment and storage medium

Info

Publication number: CN113742566B
Application number: CN202010478099.1A
Authority: CN
Inventors: 杨晓宇; 卞俊杰; 叶璨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2024-01-02
Anticipated expiration: 2040-05-29
Also published as: CN113742566A

Abstract

The disclosure relates to a recommendation method, a recommendation device, an electronic device and a storage medium of multimedia information, wherein the recommendation method of the multimedia information comprises the following steps: responding to a received multimedia information recommendation request, and acquiring historical behavior characteristics of an account for sending the multimedia information recommendation request, wherein the historical behavior characteristics at least comprise interaction information of the account and live broadcast content; determining whether the account is a live active user or not according to at least the interaction information of the account and live content; and under the condition that the account is not a live broadcast active user, judging whether to return a live broadcast aggregation list to the account while returning a multimedia resource according to the basic characteristics of the account and the historical behavior characteristics, wherein the historical behavior characteristics further comprise: the account and the interaction information of the non-live broadcast content, the live broadcast aggregation page is used for gathering M live broadcast contents and displaying the live broadcast aggregation page through N display positions, M, N is a natural number, and M is more than N.

Description

Recommendation method and device for multimedia information, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of internet, and in particular relates to a recommendation method and device for multimedia information, electronic equipment and a storage medium.

Background

Along with the rapid development of network technology and the popularization of application of electronic equipment, APP application for users rapidly develops, application functions are increasingly abundant, and life, work and study of people are full of application software with various functions and various forms. At the same time, in increasingly rich application markets, application function recommendations are becoming a necessary means to improve the market coverage of application functions.

However, in the related art, recommendation is performed for all users, and a recommendation mode that does not distinguish user groups will cause information interference to users having application habits of the application function, and even if recommendation is performed after user groups are distinguished, recommendation rules set by relying on manual experience in the related art are often limited by manual thinking dimension and subjective knowledge, so that the identified users to be recommended are inaccurate, and the recommendation efficiency is low.

Disclosure of Invention

The disclosure provides a recommendation method, a recommendation device, an electronic device and a storage medium for multimedia information, so as to at least solve the technical problems in the related art. The technical scheme of the present disclosure is as follows:

According to a first aspect of an embodiment of the present disclosure, a method for recommending multimedia information is provided, the method including:

responding to a received multimedia information recommendation request, and acquiring historical behavior characteristics of an account for sending the multimedia information recommendation request, wherein the historical behavior characteristics at least comprise interaction information of the account and live broadcast content;

determining whether the account is a live active user or not according to at least the interaction information of the account and live content;

and under the condition that the account is not a live broadcast active user, judging whether to return a live broadcast aggregation list to the account while returning a multimedia resource according to the basic characteristics of the account and the historical behavior characteristics, wherein the historical behavior characteristics further comprise: the account and the interaction information of the non-live broadcast content, the live broadcast aggregation page is used for gathering M live broadcast contents and displaying the live broadcast aggregation page through N display positions, M, N is a natural number, and M is more than N.

Optionally, the returning the live broadcast aggregation page to the account includes:

displaying a live broadcast aggregation entry in a page of the application program logged in by the account;

and detecting triggering operation of the live broadcast aggregation entry, and jumping to a live broadcast aggregation page corresponding to the live broadcast aggregation entry.

Optionally, the method further comprises:

and returning the multimedia resource corresponding to the multimedia information recommendation request under the condition that the account is a live active user.

Optionally, the determining whether to return the live broadcast aggregation list to the account while returning the multimedia resource according to the basic feature and the historical behavior feature of the account includes:

vectorizing the basic features and the historical behavior features of the account to obtain account feature vectors corresponding to the basic features and the historical behavior features;

extracting features of the account feature vector based on the trained depth network model;

and determining whether to return the live broadcast aggregation page to the account while returning the multimedia resource according to the feature vector extracted by the depth network model.

Optionally, the depth network model includes a first network model and a second network model, and the method further includes:

determining an agent's presence in a user state s based on an empirical pool as training samples _j Execute the recommended control action a _j Post immediate feedback r _j To feed back r according to the instant _j Determining actual evaluation data y _j ；

Based on the actual evaluation data y _j And predictive evaluation data Q(s) determined by a first network model of the deep network models _j ,a _j ) And adjusting model parameters of the first network model to determine a depth network model including the model parameters adjusted first network model as the trained depth network model.

Optionally, the feedback r is based on the instant feedback _j Determining actual evaluation data y _j Comprising:

detecting the user state s _j With next user state s _j+1 The interval time length between the two is longer than a preset time length threshold T _max When the instant feedback r is used _j Assigning a value to said actual evaluation data y _j ；

Detecting the user state s _j With next user state s _j+1 The interval duration between the two is not more than a preset duration threshold T _max In time, according to the instant feedback r _j And pairs derived from a second one of the depth network modelsCorresponding to the next user state s _j+1 Determining said actual evaluation data y _j 。

Optionally, the determining whether the account is a live active user at least according to the interaction information of the account and the live content includes:

and extracting features of at least the interaction information of the account and the live content based on a neural network regression model, so as to determine whether the account is a live active account according to the extracted features, wherein the neural network regression model is pre-trained based on a user information sample set, and the user information sample set at least comprises interaction information samples of the live content and activity degree labeling information corresponding to the interaction information samples.

Optionally, the determining whether the account is a live active account according to the extracted features includes:

determining a feedback value of the account as a live active user according to the extracted characteristics; when the feedback value is detected to be lower than a preset feedback threshold value, determining that the account is a live inactive user; when the feedback value is detected not to be lower than a preset feedback threshold value, determining that the account is a live active user; or,

determining a first feedback value of the account as a live active user according to the extracted characteristics, and determining a second feedback value of the account as the live active user; when the first feedback value is detected to be lower than the second feedback value, determining that the account is a live inactive user; and when the first feedback value is detected not to be lower than the second feedback value, determining that the account is a live active user.

According to a second aspect of the embodiments of the present disclosure, there is provided a recommendation apparatus for multimedia information, the apparatus including:

the characteristic acquisition module is used for responding to the received multimedia information recommendation request and acquiring historical behavior characteristics of an account for sending the multimedia information recommendation request, wherein the historical behavior characteristics at least comprise interaction information of the account and live broadcast content;

The user determining module is used for determining whether the account is a live user or not at least according to the interaction information of the account and the live content;

the operation determining module is used for judging whether to return the live broadcast aggregation page to the account while returning the multimedia resource according to the basic characteristics of the account and the historical behavior characteristics under the condition that the account is not the live broadcast active user, wherein the historical behavior characteristics further comprise: the account and the interaction information of the non-live broadcast content, the live broadcast aggregation page is used for gathering M live broadcast contents and displaying the live broadcast aggregation page through N display positions, M, N is a natural number, and M is more than N.

Optionally, the operation determining module is specifically configured to:

Optionally, the method further comprises:

and the resource return module returns the multimedia resource corresponding to the multimedia information recommendation request under the condition that the account is a live active user.

Optionally, the operation determining module is further configured to:

Optionally, the depth network model includes a first network model and a second network model, and the apparatus further includes:

the immediate feedback determining module is used for determining the state s of the intelligent agent in the user according to the experience pool serving as a training sample _j Time-dependent execution of recommendation controlAction a _j Post immediate feedback r _j To feed back r according to the instant _j Determining actual evaluation data y _j ；

Model parameter adjustment module based on the actual evaluation data y _j And predictive evaluation data Q(s) determined by a first network model of the deep network models _j ,a _j ) And adjusting model parameters of the first network model to determine a depth network model including the model parameters adjusted first network model as the trained depth network model.

Optionally, the operation determining module is further configured to:

Detecting the user state s _j With next user state s _j+1 The interval duration between the two is not more than a preset duration threshold T _max In time, according to the instant feedback r _j And a state s corresponding to a next user obtained from a second network model of the deep network models _j+1 Determining said actual evaluation data y _j 。

Optionally, the user determining module is specifically configured to:

Optionally, the user determination module is further configured to:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method for recommending multimedia information according to any of the embodiments described above.

According to a fourth aspect of the embodiments of the present disclosure, a storage medium is provided, which when executed by a processor of an electronic device, enables the electronic device to perform the method for recommending multimedia information according to any of the embodiments described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product configured to perform the method of recommending multimedia information according to any of the embodiments described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the embodiment of the disclosure, under the condition that a multimedia information recommendation request is received, historical behavior characteristics of an account sending the multimedia information recommendation request can be obtained, so that whether the account is a live broadcast content active user or not is determined according to the obtained historical behavior characteristics, and further, under the condition that the account is not a live broadcast active user, whether a live broadcast gathering page is returned to the account while a multimedia resource is returned is judged according to the basic characteristics and the historical behavior characteristics of the account, and by distinguishing whether the user is a live broadcast active user or not, information interference caused to the user due to live broadcast recommendation of the type of users with live broadcast use habits can be reduced; in addition, in the method, whether the live broadcast aggregation list is returned to the account or not is determined according to the basic characteristics and the historical behavior characteristics of the account, the problem that recommendation efficiency is low due to recommendation according to fixed recommendation rules set according to manual experience is avoided, and the recommendation efficiency of live broadcast recommendation executed on the account is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flowchart of a method for recommending multimedia information according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a deep network model training method for multimedia information recommendation provided in accordance with an exemplary embodiment of the present application;

FIG. 3 is a flowchart of another method for deep network model training for multimedia information recommendation provided in accordance with an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of a live function recommendation shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic block diagram of a recommendation device for multimedia information according to one of the exemplary embodiments of the present disclosure;

FIG. 6 is a schematic block diagram of a recommendation device for multimedia information according to a second exemplary embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation, according to one of the exemplary embodiments of the present disclosure;

FIG. 8 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation, shown in accordance with a second exemplary embodiment of the present disclosure;

FIG. 9 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation according to a third exemplary embodiment of the present disclosure;

FIG. 10 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation, according to a fourth illustrated embodiment of the present disclosure;

fig. 11 is a schematic block diagram of an electronic device shown in accordance with an embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Application functions such as video recording, video editing, picture editing, short video sharing communities and the like can be included in the APP, each application function corresponds to an independent audience group or cross audience exists among each application function, and by sending recommendation information of the application function to non-audience groups of the application functions, the non-audience groups can contact the recommended application functions according to the recommendation information, and even further know the recommended application functions in depth, so that the number of the audience groups of the recommended application functions is increased.

Taking an example live broadcast as an example, when the user A uses the video recording application function, an access inlet of other application functions, such as an aggregation inlet of the live broadcast function, can be added in a video recording product interface, so that the user can enter a product page of the live broadcast function through the aggregation inlet of the live broadcast function to experience the live broadcast product function, and therefore an audience originally independent of the video recording application function becomes an intersection audience of the video recording function and the live broadcast function, and the user use quantity of the live broadcast function is improved.

However, when the user has developed good usage habits about the live broadcast function to be recommended, repeated recommendation of the live broadcast function to the user will cause interference to the user, and reduce interaction experience between the user and the product containing the live broadcast function, as in the related art, if the live broadcast function is recommended to all known users, whether the user has application requirements on the live broadcast function to be recommended or not, recommendation information about the live broadcast function will be received, and information interference will be caused to the user with the usage habits of the live broadcast function, so that the usage experience of the user on the product containing the live broadcast function is affected; if the live broadcast function is recommended according to the established rule determined by the manual experience, the live broadcast function is often limited by the dimension of manual thinking or subjective cognition, and the problems of inconsistent user group division, poor recommendation effect and the like are caused.

In view of the foregoing, the present disclosure provides a method, an apparatus, an electronic device, and a storage medium for recommending multimedia information, so as to at least solve the problems in the related art, and in order to explain the technical solutions of the present application, the technical solutions of the present application are described in detail below through a plurality of embodiments.

Fig. 1 is a flowchart of a method for recommending multimedia information according to an exemplary embodiment of the present application, and as shown in fig. 1, the method may include the steps of:

step 101, responding to a received multimedia information recommendation request, and acquiring historical behavior characteristics of an account sending the multimedia information recommendation request, wherein the historical behavior characteristics at least comprise interaction information of the account and live broadcast content.

In an embodiment, when the multimedia information recommendation request is received, a historical behavior characteristic of an account sending the multimedia information recommendation request may be obtained, where the historical behavior characteristic at least includes interaction information of the account and live content.

Specifically, at least the interaction information of the account and the live broadcast content can be written in the multimedia information recommendation request, so that the interaction information of the account and the live broadcast content contained in the request information is at least determined by analyzing the received request information; user identification information such as a user account, user identity credentials and the like can also be determined according to the received request information, so that interaction information matched with the user identification information is determined in the association relationship between the maintained user identification information and interaction information of the account and live broadcast content.

Further, in addition to the interaction information of the account and the live content, user basic information such as information of the number of attention people, the number of fan, the age, the sex, the region, the active days and the like of the user can be included; user behavior information such as the number of exposures, the number of clicks, the duration of use, etc. with respect to a preset live; contextual information such as preset exposure rate, click rate, length of time to enter live broadcast, etc. of live broadcast in a past period of time. In the present disclosure, several kinds of user basic information, user behavior information, and context information are listed by way of examples, and the specific forms of information specifically included in various information categories are not limited in the present disclosure, and other information categories are added according to actual application needs in addition to the user basic information, the user behavior information, and the context information.

And 102, determining whether the account is a live active user or not at least according to the interaction information of the account and the live content.

In an embodiment, vectorization processing can be performed on basic features and historical behavior features of an account to obtain account feature vectors corresponding to the basic features and the historical behavior features of the account, feature extraction is performed on the account feature vectors based on a trained deep network model, and whether a live broadcast aggregation page is returned to the account or not is determined according to the feature vectors extracted by the deep network model while a multimedia resource is returned.

Specifically, the depth network model may be a mathematical model for simulating a sequential decision class of randomness strategy and return achievable by an agent in an environment with markov property, such as a Double Deep QNetwork (DDQN), and feature extraction is performed on account feature vectors through the trained depth network model, so as to determine whether to return a live broadcast aggregation page to an account while returning a multimedia resource according to the feature vectors extracted by the depth network model.

In practical applications, the deep network model may include a first network model and a second network model, and the training process of the deep network model for feature extraction may include the following processes: determining an agent's presence in a user state s based on an empirical pool as training samples _j Execute the recommended control action a _j Post immediate feedback r _j To feed back r according to the instant _j Determining actual evaluation data y _j The method comprises the steps of carrying out a first treatment on the surface of the Further, based on the actual evaluation data y _j And predictive evaluation data Q(s) determined by a first network model of the deep network models _j ,a _j ) And adjusting model parameters of the first network model to determine a depth network model including the model parameters adjusted first network model as the trained depth network model.

Further, according to the instant feedback r _j Determining actual evaluation data y _j The process of (1) may include: detecting the user state s _j With next user state s _j+1 The interval time length between the two is longer than a preset time length thresholdValue T _max When the instant feedback r is used _j Assigning a value to said actual evaluation data y _j The method comprises the steps of carrying out a first treatment on the surface of the Detecting the user state s _j With next user state s _j+1 The interval duration between the two is not more than a preset duration threshold T _max In time, according to the instant feedback r _j And a state s corresponding to a next user obtained from a second network model of the deep network models _j+1 Determining said actual evaluation data y _j 。

Before vectorizing the user state information of the features to be extracted, the basic features and the historical behavior features of the account can be subjected to word segmentation, and then vectorization is performed according to word information after word segmentation. Specifically, the word vector corresponding to the basic information and the historical behavior feature of the account can be determined by a co-occurrence matrix, singular value decomposition and other methods, or the word vector corresponding to the basic information and the historical behavior feature of the account can be determined by a language model such as CBOW.

In this embodiment, before feature extraction is performed on user state information based on the reinforcement learning model, basic information and historical behavior features of an account may be preprocessed, and the basic information and the historical behavior features of the account to be extracted may be vectorized, so as to improve feature extraction efficiency of the reinforcement learning model.

In another embodiment, before at least feature extraction is performed on the interaction information of the account and the live content based on the depth network model, whether the account is a live active account or not can be determined according to the extracted features, so that feature extraction is performed on basic features and historical behavior features of the account of the non-live active user, and feature extraction is not required to be performed on the basis of the live active user, and overall processing efficiency is improved.

Specifically, at least feature extraction is performed on interaction information of an account and live broadcast content based on a neural network regression model, so that whether the account is a live broadcast active account or not is determined according to the extracted features, wherein the neural network regression model is trained in advance based on a user information sample set, and the user information sample set at least comprises interaction information samples of the live broadcast content and activity degree labeling information corresponding to the interaction information samples.

Further, in the process of determining whether the account is a live active account according to the extracted features, the account can be determined to be a feedback value of a live active user according to the extracted features, the account is determined to be a live inactive user when the feedback value is detected to be lower than a preset feedback threshold, and the account is determined to be a live active user when the feedback value is detected to be not lower than the preset feedback threshold; or determining a first feedback value of the account as the live active user according to the extracted characteristics, and determining a second feedback value of the account as the live active user; when the first feedback value is detected to be lower than the second feedback value, determining that the account is a live inactive user; and when the first feedback value is detected not to be lower than the second feedback value, determining that the account is a live active user.

In this embodiment, at least feature extraction may be performed on the interaction information of the account and the live content, so as to determine the probability of use of the account for live broadcast, and further determine whether the account is a live active user according to the probability of use of the account for live broadcast. In practical application, the interaction information with the live content may include historical interaction behavior with the live content, context information for interacting with the live content, and feature extraction may also be performed on other content except the interaction information of the account and the live content, where specific content may be set according to practical application conditions.

Step 103, judging whether to return the live broadcast aggregation page to the account while returning the multimedia resource according to the basic characteristics of the account and the historical behavior characteristics when the account is not the live broadcast active user, wherein the historical behavior characteristics further comprise: the account and the interaction information of the non-live broadcast content, the live broadcast aggregation page is used for gathering M live broadcast contents and displaying the live broadcast aggregation page through N display positions, M, N is a natural number, and M is more than N.

In this embodiment, the live broadcast syndication entry may be displayed in a page of an application program registered by an account, and when a trigger operation on the live broadcast syndication entry is detected, the live broadcast syndication page corresponding to the live broadcast syndication entry is skipped. In the embodiment, the live broadcast aggregation entry is displayed in the page of the application program logged in by the account instead of directly returning to the live broadcast aggregation page, so that other contents except the live broadcast aggregation entry can be displayed to the user in addition to the live broadcast aggregation entry, further, whether to return to the live broadcast aggregation page can be determined according to whether to receive the triggering operation of the live broadcast aggregation entry, and the interactive experience of the user is enhanced.

In this embodiment, when the account is a live active user, a multimedia resource corresponding to the multimedia information recommendation request may be returned. When the account is detected to be the live broadcast active user, the multimedia resource corresponding to the multimedia information recommendation request is directly returned, and the live broadcast aggregation page is not returned to the account, so that the interference to the original live broadcast active user caused by the return of the live broadcast aggregation page is avoided.

According to the embodiment, under the condition that the multimedia information recommendation request is received, the historical behavior characteristics of the account sending the multimedia information recommendation request can be obtained, so that whether the account is a live broadcast content active user or not is determined according to the obtained historical behavior characteristics, and further, under the condition that the account is not a live broadcast active user, whether the live broadcast aggregation page is returned to the account while the multimedia resource is returned is judged according to the basic characteristics and the historical behavior characteristics of the account, and by distinguishing whether the user is a live broadcast active user or not, information interference caused to the user due to live broadcast recommendation of the user with live broadcast use habit can be reduced; in addition, in the method, whether the live broadcast aggregation list is returned to the account or not is determined according to the basic characteristics and the historical behavior characteristics of the account, the problem that recommendation efficiency is low due to recommendation according to fixed recommendation rules set according to manual experience is avoided, and the recommendation efficiency of live broadcast recommendation executed on the account is improved.

Fig. 2 is a flowchart of a deep network model training method for multimedia information recommendation, according to an exemplary embodiment of the present application, and as shown in fig. 2, the deep network model may include a dual deep Q network, and the training method may include the following steps:

step 201, initializing a current network Q and a target network Q in a dual deep Q network ^* Is a network model of (a).

In one embodiment, the initialized current network Q and target network Q in the dual deep Q network ^* May have the same network structure, such as the same number of layers of the network layer, consistent parameters of the network model, etc. Specifically, in a dual deep Q network, a current network Q and a target network Q ^* In the process of initialization, the current network Q and the target network Q can be determined in a random assignment mode ^* Such as gaussian random assignments, etc., which is not limiting in this disclosure.

Step 202, determining the state s of the user containing the agent according to the experience pool as training sample _j Executing the actual recommended control action a _j Instant prize r _j Is a sample of the data of the sample.

Step 203, based on the instant prize r _j Determining an actual value evaluation value y _j 。

In one embodiment, the user state s may be detected based on _j With next user state s _j+1 The interval duration between the two time intervals is equal to a preset duration threshold T _max The relationship between the actual value evaluation values y is determined based on different ways _j The user state at least comprises basic characteristics and historical behavior characteristics of the account, wherein the historical behavior characteristics comprise interaction information of the account and non-live content.

Specifically, upon detection of the user state s _j With next user state s _j+1 The interval time length between the two is longer than a preset time length threshold T _max In the case of (a), the prize r _j Assigned to the actual value evaluation value y _j The method comprises the steps of carrying out a first treatment on the surface of the Upon detection of the user state s _j With next user state s _j+1 The interval duration between the two is not more than a preset duration threshold T _max According to the situation of the reward r _j And by the target network Q ^* The resulting state s corresponds to the next user state _j+1 The value evaluation value of (2) determines the actual value evaluation value y _j 。

In this embodiment, the actual value evaluation value is determined according to the interval duration between the user state and the next user state, so that the actual value evaluation value is determined by more value evaluation values obtained by the target network corresponding to the next user state in the case that the interval duration between the user state and the next user state is short, and the actual value evaluation value is determined by the received instant prize in the case that the interval duration between the user state and the next user state is long, thereby realizing the accuracy and the practicability of the process of determining the actual value evaluation value.

Further, in accordance with the prize r _j And by target network Q ^* The resulting state s corresponds to the next user state _j+1 Determining an actual value evaluation value y _j Can be based on the user state s _j With next user state s _j+1 The interval duration T(s) _j ,s _j+1 ) Determining discount coefficient alpha _j The discount coefficient alpha _j ＝γexp(-T(s _j ,s _j+1 )/T _max ) The method comprises the steps of carrying out a first treatment on the surface of the And then according to the rewards r _j Discount coefficient alpha _j By the target network Q ^* The resulting state s corresponds to the next user state _j+1 The value evaluation value of (2) determines the actual value evaluation value y _j Wherein the actual value evaluation value

In the present embodiment, a further improvement is made in the manner of determining the actual value evaluation value by using the fixed discount coefficient in the related art, that is, the original fixed discount coefficient is changed to the dynamic discount coefficient alpha related to the interval duration between the user state and the next user state _j So that the discount coefficient is dynamically changed to a smaller value in the case that the interval time between the user state and the next user state is longer, thereby taking the influence of the value evaluation value of the next user state into account less and taking the pair of instant rewards into account moreInfluence of actual value evaluation value; under the condition that the interval duration between the user state and the next user state is short, the discount coefficient is dynamically changed into a larger value, so that the influence of the value evaluation value of the next user state is more considered, the setting of the dynamic discount coefficient enables the determination process of the actual value evaluation value to be matched with the actual user thinking habit, the effectiveness and the practicability of the determination of the actual value evaluation value are realized, and the optimization efficiency of model parameters is improved.

Step 204, based on the actual value evaluation value y _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) And adjusting model parameters of the current network Q.

In one embodiment, the following iterative process may be repeated until the number of iterations reaches a preset number of thresholds, and then the model parameters of the current network Q are assigned to the target network Q ^* : and acquiring the next user state of the intelligent agent after executing the actual recommendation control action, and adjusting the model parameters of the current network Q according to the difference between the actual value evaluation value and the predicted value evaluation value determined by the sample data corresponding to the next user state.

In this embodiment, after a preset number of iterations, the model parameters of the current network Q are assigned to the target network Q ^* So that the target network Q ^* Certain noise is kept in the model parameter updating process of (1), and a target network Q participating in error analysis due to over fitting is avoided ^* The model parameter training of the current network Q is effectively improved due to the distortion.

At step 205, feature extraction is performed on at least the basic features and the historical behavioral features of the account according to the trained dual depth Q network to implement recommendation control with respect to live based on the optimal actions determined from the extracted features.

In one embodiment, the value y can be estimated based on the actual value _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) Determining the actual value evaluation value y by the mean square error loss value between _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) Differences between them. Specifically, the actual value evaluation value y _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) The mean square error loss value between can be determined by a loss functionAnd is determined.

Actual value evaluation value y determined by back propagation _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) The difference between the two-depth Q network and the model parameters of the two-depth Q network are optimized based on a gradient descent method until the determined actual value evaluation value y _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) And when the difference is lower than the difference threshold, determining the dual-depth Q network with the model parameters adjusted as the dual-depth Q network with the training completed.

From the above embodiments, it can be seen that a dual depth Q network for determining optimal actions based on characteristics of user state information can be pre-determined in user state s by inclusion of agents _j Executing the actual recommended control action a _j Instant prize r _j Is trained from sample data in an experience pool, and in the training process, the actual value evaluation value y _j The model parameters of the current network Q are determined by the instant rewards, and then are adjusted according to the difference between the determined instant rewards and the predicted value estimated value determined by the current network Q, and the model parameters of the current network Q are timely corrected to be matched with the determined actual value estimated value, so that the updating efficiency of model training is improved; in addition, the pre-trained double-depth Q network is analyzed according to the user state, so that the optimal action about live broadcast recommendation control is determined, the determination efficiency of the optimal action is improved, the problem that the accuracy of the identified user group to be recommended is low due to the limitation of manual thinking dimension or subjective knowledge is avoided, and the effectiveness of recommending live broadcast is improved.

In the application, whether the account is a live active user can be determined according to the pre-trained neural network model, the training process of the neural network model can be shown in fig. 3, fig. 3 is a flowchart of another deep network model training method for recommending multimedia information, which is provided according to an exemplary embodiment of the application, and the following steps may be specifically involved in the training process of the neural network model:

Step 301, determining an active information sample set as a training sample, wherein the active information sample set contains user state information and active degree labeling information about live broadcast corresponding to the user state information.

In an embodiment, the determined active information sample set as the training sample, the user status information may be selected from at least one of the following: basic characteristics of an account, historical behavior characteristics, wherein the historical behavior characteristics at least comprise interaction information of the account and live content, interaction information of the account and non-live content and the like, and an active information sample set can be composed of user state information and activity degree labeling information corresponding to the user state information about live broadcast, such as when the user state information is user basic information a, user behavior information b and context information c, the user account corresponding to the user state information for live broadcast is an active user, and as an example, the activity degree labeling information corresponding to the user state information about live broadcast can be expressed as an active state; in the case that the user state information is the user basic information d, the user behavior information e and the context information f, the user account corresponding to the user state information for live broadcasting is an inactive user, and for example, the active degree labeling information about live broadcasting, which is represented as corresponding to the user state information, may be represented as an inactive state.

And 302, extracting features of the user state information by the neural network model to determine activity level prediction information according to the extracted features.

In an embodiment, the user state information may be vectorized to obtain a user state vector corresponding to the user state information, and then the neural network model performs feature extraction on the user state vector to determine activity level prediction information according to the extracted feature, where the activity level prediction information characterizes a probability value p that a user account corresponding to the user state information is in an active state for live broadcast.

Step 303, determining a difference between the activity level labeling information and the activity level prediction information, so as to adjust model parameters of a neural network model according to the difference of back propagation.

In an embodiment, the difference between the activity level labeling information and the activity level prediction information may be determined by a loss function corresponding to the neural network model. Specifically, the loss function L may be l= -x log (p) - (1-x) log (1-p), where x represents activity level annotation information, and is used to represent whether the user account corresponding to the user state information input into the neural network model is actually an active user; p represents the probability p that the account predicted according to the extracted characteristics is in an active state after the characteristics of the user state information are extracted by the neural network model, namely the activity degree prediction information p.

Further, after determining the loss value between the activity level labeling information and the activity level predicting information based on the loss function, the determined loss value can be back propagated, and then the neural network model parameters are optimized according to the gradient descent method, until the loss value between the activity level labeling information and the activity level predicting information is determined to be lower than a preset loss threshold value based on the loss function, and the neural network model with the model parameters adjusted is determined to be a regression model after training is completed.

Fig. 4 is a schematic diagram of live function recommendation according to an exemplary embodiment of the disclosure, as shown in fig. 4, after a multimedia information recommendation request is detected, a feature extraction may be performed on a historical behavior feature of an account sending the multimedia information recommendation request through a neural network model, where the historical behavior feature at least includes interaction information of the account with live content, so as to determine whether the account is a live active user according to the extracted feature, and specifically, a basic feature and the historical behavior feature of the account may be provided to the neural network model, so that the neural network model determines whether the account is a live active user according to the extracted feature.

Specifically, in determining whether the account is a live active account according to the features extracted by the neural network model, in an embodiment, a feedback value of the account as a live active user may be determined according to the extracted features; when the feedback value is detected to be lower than a preset feedback threshold value, determining that the account is a live inactive user; and when the feedback value is detected not to be lower than a preset feedback threshold value, determining that the account is a live active user.

In another embodiment, a first feedback value for which the account is a live active user and a second feedback value for which the account is a live active user may be determined according to the extracted features; when the first feedback value is detected to be lower than the second feedback value, determining that the account is a live inactive user; and when the first feedback value is detected to be not lower than the second feedback value, determining that the account is a live active user.

If the neural network model determines that the account is a live active user, returning the multimedia resource corresponding to the multimedia information recommendation request; if the account is determined by the neural network model not to be a live active user, providing at least basic features and historical behavior features of the account to the deep network model to determine whether to return live aggregate pages to the account while returning multimedia resources at least according to the basic features and the historical behavior features of the account, wherein the deep network model can comprise the pre-trained dual deep Q network mentioned in the above embodiment.

In practical application, the process of returning the live broadcast aggregation page to the account may be to display a live broadcast aggregation entry in a page of an application program logged in by the account; and further detecting triggering operation of the live broadcast aggregation entry, and jumping to a live broadcast aggregation page corresponding to the live broadcast aggregation entry.

For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure.

Further, those skilled in the art will appreciate that the embodiments described in the specification are all alternatives.

The present disclosure also proposes an embodiment of an image editing apparatus corresponding to the foregoing embodiment of the image editing method.

Fig. 5 is a schematic block diagram of a recommendation device for multimedia information according to one of the exemplary embodiments of the present disclosure. The recommendation device for multimedia information shown in the embodiment may be suitable for video playing applications, where the applications are suitable for terminals, and the terminals include, but are not limited to, mobile phones, tablet computers, wearable devices, personal computers, and other electronic devices. The video playing application can be an application program installed in a terminal, a web page application integrated in a browser, and a user can play videos through the video playing application, wherein the played videos can be long videos, such as movies and television dramas, or short videos, such as video clips, situation dramas and the like.

Referring to fig. 5, the apparatus may include a feature acquisition module 501, a user determination module 502, and an operation determination module 503; wherein,

the feature acquisition module 501 is used for responding to a received multimedia information recommendation request and acquiring historical behavior features of an account for sending the multimedia information recommendation request, wherein the historical behavior features at least comprise interaction information of the account and live broadcast content;

the user determining module 502 determines whether the account is a live active user or not at least according to the interaction information of the account and the live content;

an operation determining module 503, configured to determine, when the account is not a live active user, whether to return a live aggregate page to the account while returning to a multimedia resource according to a basic feature of the account and the historical behavior feature, where the historical behavior feature further includes: the account and the interaction information of the non-live broadcast content, the live broadcast aggregation page is used for gathering M live broadcast contents and displaying the live broadcast aggregation page through N display positions, M, N is a natural number, and M is more than N.

Optionally, the operation determining module 503 is specifically configured to:

Optionally, the operation determining module 503 is further configured to:

Optionally, the depth network model includes a first network model and a second network model, and the operation determining module 503 is further configured to:

the immediate feedback determining module is used for determining the state s of the intelligent agent in the user according to the experience pool serving as a training sample _j Execute the recommended control action a _j Post immediate feedback r _j To feed back r according to the instant _j Determining actual evaluation data y _j ；

Optionally, the operation determining module 503 is further configured to:

detecting the user state s _j With next user state s _j+1 The interval time length between the two is longer than a preset time length threshold T _max When the instant feedback r is used _j Assignment toThe actual evaluation data y _j ；

Optionally, the user determining module 502 is specifically configured to:

Optionally, the user determining module 502 is further configured to:

As shown in fig. 6, fig. 6 is a schematic block diagram of a recommendation device for multimedia information according to a second exemplary embodiment of the present disclosure, where the embodiment may further include a resource return module 504 on the basis of the foregoing embodiment shown in fig. 5:

and the resource returning module 504 returns the multimedia resource corresponding to the multimedia information recommendation request when the account is a live active user.

Fig. 7 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation according to one of the exemplary embodiments of the present disclosure. The deep network model training device for multimedia information recommendation shown in the embodiment may be suitable for video playing application, where the application is suitable for a terminal, and the terminal includes, but is not limited to, mobile phones, tablet computers, wearable devices, personal computers, and other electronic devices. The video playing application can be an application program installed in a terminal, a web page application integrated in a browser, and a user can play videos through the video playing application, wherein the played videos can be long videos, such as movies and television dramas, or short videos, such as video clips, situation dramas and the like.

Referring to fig. 7, the apparatus may include a model initialization module 701, a sample data determination module 702, an evaluation value determination module 703, a first parameter adjustment module 704, a first feature extraction module 705; wherein,

model initialization module 701 initializes current network Q and target network Q in a dual deep Q network ^* Is a network model of (a);

sample data determination module 702 determines whether the inclusion agent is in user state s based on the empirical pool as a training sample _j Executing the actual recommended control action a _j Instant prize r _j Is a sample of the data;

an evaluation value determination module 703 for determining an instant prize r based on the instant prize r _j Determining an actual value evaluation value y _j ；

The first parameter adjustment module 704 is configured to adjust the actual value evaluation value y based on the actual value evaluation value y _j And a predictive value evaluation value Q(s) determined by the current network Q _j ,a _j ) The difference of the current network Q is used for adjusting the model parameters of the current network Q;

the first feature extraction module 705 performs feature extraction on at least the basic features and the historical behavioral features of the account according to the trained dual depth Q network to implement recommendation control for live broadcast according to the optimal actions determined from the extracted features.

Optionally, the method further comprises:

the model parameter determining module 712 determines the model parameter when the loss value is lower than a preset threshold as the model parameter of the neural network model after training.

As shown in fig. 8, fig. 8 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation according to a second exemplary embodiment of the present disclosure, which may further include, on the basis of the foregoing embodiment shown in fig. 7: a state acquisition module 706, a second parameter adjustment module 707, and a parameter assignment module 708; wherein,

The state acquisition module 706 acquires the next user state after the intelligent agent executes the actual recommended control action;

a second parameter adjustment module 707 that adjusts model parameters of the current network Q according to differences between the actual value evaluation value and the predicted value evaluation value determined from the sample data corresponding to the next user state;

the parameter assignment module 708 assigns the model parameters of the current network Q to the target network Q when the number of times of repeatedly executing the above two steps reaches a preset number of times threshold ^* 。

As shown in fig. 9, fig. 9 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation according to a third exemplary embodiment of the present disclosure, which may further include, on the basis of the foregoing embodiment shown in fig. 7: a sample set determination module 709, a second feature extraction module 710, a third parameter adjustment module 711; wherein,

a sample set determining module 709 that determines an active information sample set as a training sample, the active information sample set including user status information and active level annotation information about live broadcast corresponding to the user status information;

a second feature extraction module 710, configured to perform feature extraction on the user state information by the neural network model, so as to determine activity level prediction information according to the extracted features;

The third parameter adjustment module 711 determines a difference between the activity level labeling information and the activity level prediction information to adjust model parameters of the neural network model according to the difference of back propagation.

As shown in fig. 10, fig. 10 is a schematic block diagram of a deep network model training apparatus for multimedia information recommendation according to a fourth exemplary embodiment of the present disclosure, which is based on the foregoing embodiment shown in fig. 7, the evaluation value determining module 703 may include: assignment submodule 7031 and evaluation value determination submodule 7032; wherein,

assignment submodule 7031, upon detection of said user state s _j With next user state s _j+1 The interval time length between the two is longer than a preset time length threshold T _max In the case of (a), the instant prize r _j Assigned to the actual value evaluation value y _j ；

The evaluation value determination sub-module 7032, upon detection of the user state s _j With next user state s _j+1 The interval duration between the two is not more than a preset duration threshold T _max According to the situation of the reward r _j And by the target network Q ^* The resulting state s corresponds to the next user state _j+1 The value evaluation value of (2) determines the actual value evaluation value y _j 。

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements described above as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the disclosed solution. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the disclosure also proposes an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

Embodiments of the present disclosure also provide a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method for recommending multimedia information according to any of the above embodiments.

Embodiments of the present disclosure also propose a computer program product configured to perform the recommendation method of multimedia information according to any of the embodiments described above.

Fig. 11 is a schematic block diagram of an electronic device shown in accordance with an embodiment of the present disclosure. For example, the electronic device 1100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, an electronic device 1100 may include one or more of the following components: a processing component 1102, a memory 1104, a power component 1106, a multimedia component 1108, an audio component 1110, an input/output (I/O) interface 1113, a sensor component 1114, and a communication component 1116.

The processing component 1102 generally controls overall operation of the electronic device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1102 may include one or more processors 1120 to execute instructions to perform all or part of the steps of the recommendation method for multimedia information described above. Further, the processing component 1102 can include one or more modules that facilitate interactions between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.

The memory 1104 is configured to store various types of data to support operations at the electronic device 1100. Examples of such data include instructions for any application or method operating on the electronic device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 1106 provides power to the various components of the electronic device 1100. The power supply component 1106 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1100.

Multimedia component 1108 includes a screen between electronic device 1100 and a user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, multimedia component 1108 includes a front camera and/or a rear camera. When the electronic device 1100 is in an operational mode, such as a shooting mode or a video mode, the front-facing camera and/or the rear-facing camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1110 is configured to output and/or input an audio signal. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1100 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1104 or transmitted via the communication component 1116. In some embodiments, the audio component 1110 further comprises a speaker for outputting audio signals.

The I/O interface 1113 provides an interface between the processing component 1102 and peripheral interface modules, which may be a keyboard, click wheel, buttons, and the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1114 includes one or more sensors for providing status assessment of various aspects of the electronic device 1100. For example, the sensor assembly 1114 may detect an on/off state of the electronic device 1100, a relative positioning of components such as a display and keypad of the electronic device 1100, a change in position of the electronic device 1100 or a component of the electronic device 1100, the presence or absence of a user's contact with the electronic device 1100, an orientation or acceleration/deceleration of the electronic device 1100, and a change in temperature of the electronic device 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1116 is configured to facilitate communication between the electronic device 1100 and other devices, either wired or wireless. The electronic device 1100 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 1116 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1116 further includes a Near Field Communication (NFC) module to facilitate short range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In one embodiment of the present disclosure, the electronic device 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described recommended methods of multimedia information.

In an embodiment of the present disclosure, there is also provided a non-transitory computer-readable storage medium, such as memory 1104, including instructions executable by processor 1120 of electronic device 1100 to perform the recommendation method of multimedia information described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the method and apparatus provided by the embodiments of the present disclosure, and the detailed description of the principles and embodiments of the present disclosure has been provided herein with the application of the specific examples, the above examples being provided only to facilitate the understanding of the method of the present disclosure and its core ideas; meanwhile, as one of ordinary skill in the art will have variations in the detailed description and the application scope in light of the ideas of the present disclosure, the present disclosure should not be construed as being limited to the above description.

Claims

1. A method for recommending multimedia information, the method comprising:

2. The method of claim 1, wherein the returning the live aggregate page to the account comprises:

3. The method according to claim 1, wherein the method further comprises:

4. The method of claim 1, wherein the determining whether to return the live aggregate page to the account while returning the multimedia asset based on the base characteristics of the account and the historical behavioral characteristics comprises:

5. The method of claim 4, wherein the deep network model comprises a first network model and a second network model, the method further comprising:

determining an agent's presence in a user state s based on an empirical pool as training samples _j Execute the recommended control action a _j Post instant Q feedback r _j To feed back r according to the instant _j Determining actual evaluation data y _j ；

Based on the actual evaluationData y _j And predictive evaluation data Q(s) determined by a first network model of the deep network models _j ,a _j ) And adjusting model parameters of the first network model to determine a depth network model including the model parameters adjusted first network model as the trained depth network model.

6. The method of claim 5, wherein said providing said immediate feedback r _j Determining actual evaluation data y _j Comprising:

7. The method of claim 1, wherein the determining whether the account is a live active user based at least on interaction information of the account with live content comprises:

8. The method of claim 7, wherein the determining whether the account is a live active account based on the extracted features comprises:

9. A recommendation device for multimedia information, the device comprising:

10. The apparatus of claim 9, wherein the operation determination module is specifically configured to:

11. The apparatus as recited in claim 9, further comprising:

12. The apparatus of claim 9, wherein the operation determination module is further configured to:

13. The apparatus of claim 12, wherein the deep network model comprises a first network model and a second network model, the apparatus further comprising:

Model parameter adjustment module based on the actual evaluation data y _j And predictive evaluation data Q(s) determined by a first network model of the deep network models _j ,a _j ) To adjust model parameters of the first network model to include a modelAnd determining the depth network model of the first network model with the model parameters adjusted as the depth network model with the training completed.

14. The apparatus of claim 13, wherein the operation determination module is further configured to:

15. The apparatus of claim 9, wherein the user determination module is specifically configured to:

16. The apparatus of claim 15, wherein the user determination module is further configured to:

17. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute instructions to implement the recommendation method of multimedia information as claimed in any one of claims 1 to 8.

18. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the recommendation method of multimedia information according to any one of claims 1 to 8.