CN117217839A

CN117217839A - Method, device, equipment and storage medium for issuing media resources

Info

Publication number: CN117217839A
Application number: CN202311279265.5A
Authority: CN
Inventors: 王山雨
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-12-12

Abstract

The application provides a method, a device, equipment and a storage medium for issuing media resources, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing time based on object information and behavior information of an object; determining target resource content, target issuing scene and target issuing time from the plurality of candidate resource content, the plurality of candidate issuing scenes and the plurality of candidate issuing time through a multi-scene reinforcement learning model based on object information, behavior information, the plurality of candidate resource content, the plurality of candidate issuing scenes and the plurality of candidate issuing time; and issuing target resource content to the object under the target issuing time and the target issuing scene. The method enables the issued media resources to achieve popularization effect and promotes the conversion of the media resources. And the waste of flow resources and the waste of scene resources of the issuing scene are avoided, and the processing pressure of the resource issuing platform is reduced.

Description

Method, device, equipment and storage medium for issuing media resources

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for issuing media resources.

Background

In the promotion of services, the promotion of the services is generally achieved by issuing media resources to the objects. The media resource issuing scenario is various, for example, the media resource issuing can be performed through a notification message, or the media resource issuing can be performed through jumping out of a message popup window in an application.

In the prior art, a single issuing scene is used for issuing media resources. And the different delivery scenes have different delivery mechanisms, and the different delivery scenes may correspond to the same object pool, so that the same media resources may be delivered to the same object through a plurality of delivery scenes, and waste of traffic resources is caused.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for issuing media resources, which enable the issued media resources to achieve popularization effect and promote the conversion of the media resources. And the waste of flow resources and the waste of scene resources of the issuing scene are avoided, and the processing pressure of the resource issuing platform is reduced. The technical scheme is as follows:

in one aspect, a method for issuing a media resource is provided, where the method includes:

acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times based on object information and behavior information of an object, wherein the behavior information is used for indicating interactive behaviors generated by the object based on the historically issued media resources;

Determining target resource content, target issuing scene and target issuing time from the plurality of candidate resource content, the plurality of candidate issuing scenes and the plurality of candidate issuing time by a multi-scene reinforcement learning model based on the object information, the behavior information, the plurality of candidate resource content, the plurality of candidate issuing scenes and the plurality of candidate issuing time, wherein the multi-scene reinforcement learning model is obtained by reinforcement learning based on object information, behavior information, the plurality of sample resource content, the plurality of sample issuing time and the plurality of sample issuing scenes of a plurality of sample objects in a plurality of sample issuing scenes;

and issuing the target resource content to the object under the target issuing time and the target issuing scene.

In another aspect, a device for delivering media resources is provided, where the device includes:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing time based on object information and behavior information of an object, and the behavior information is used for indicating interaction behaviors generated by the object based on historically issued media resources;

The determining module is configured to determine, based on the object information, the behavior information, the plurality of candidate resource contents, the plurality of candidate issuing scenes and the plurality of candidate issuing times, target resource contents, target issuing scenes and target issuing times from the plurality of candidate resource contents, the plurality of candidate issuing scenes and the plurality of candidate issuing times through a multi-scene reinforcement learning model, where the multi-scene reinforcement learning model is obtained by reinforcement learning based on object information, behavior information, a plurality of sample resource contents, a plurality of sample issuing times and the plurality of sample issuing scenes of a plurality of sample issuing scenes;

and the issuing module is used for issuing the target resource content to the object under the target issuing time and the target issuing scene.

In some embodiments, the determining module is configured to:

inputting the object information, the behavior information, the plurality of candidate resource contents, the plurality of candidate delivery scenes and the plurality of candidate delivery times into the multi-scene reinforcement learning model;

determining a first number of resource contents, a second number of delivery scenes and a third number of delivery times from the plurality of candidate resource contents, the plurality of candidate delivery scenes and the plurality of candidate delivery times by the multi-scene reinforcement learning model;

And determining the target resource content, the target issuing scene and the target issuing time from the first quantity of resource content, the second quantity of issuing scenes and the third quantity of issuing time.

In some embodiments, the determining module is configured to:

determining a first number of resource contents from the plurality of candidate resource contents based on the object information and the behavior information through the multi-scene reinforcement learning model, wherein the first number of resource contents is the first number of candidate resource contents with the highest matching degree with the object information and the behavior information in the plurality of candidate resource contents;

determining a second number of issuing scenes from the plurality of candidate issuing scenes based on the object information, the behavior information and the first number of resource contents through the multi-scene reinforcement learning model, wherein the second number of issuing scenes is a first second number of candidate issuing scenes with highest matching degree with the object information, the behavior information and the first number of resource contents in the plurality of candidate issuing scenes;

and determining a third number of delivery times from the plurality of candidate delivery times based on the object information, the behavior information, the first number of resource contents and the second number of delivery scenes by the multi-scene reinforcement learning model, wherein the third number of delivery times is a previous third number of candidate delivery times with the highest matching degree with the object information, the behavior information, the first number of resource contents and the second number of delivery scenes in the plurality of candidate delivery times.

In some embodiments, the apparatus further comprises a training module for:

acquiring a plurality of pieces of state information of a plurality of sample objects in a plurality of sample issuing scenes, wherein the state information comprises object information and behavior information of the sample objects;

for each state information, inputting the state information into the multi-scene reinforcement learning model to obtain action information, and obtaining return information based on the state information and the action information, wherein the action information comprises sample resource content, sample issuing scene and sample issuing time, and is used for indicating a resource issuing platform to issue sample resource content for a sample object in the sample issuing time and the sample issuing scene, and the return information is used for indicating return generated by the platform to which the resource belongs for issuing the sample resource content for the sample object based on the action information;

and adjusting model parameters of the multi-scene reinforcement learning model based on the return information.

In some embodiments, the apparatus further comprises an execution module for performing at least one of:

if the object information and the behavior information of the object indicate that the object reaches a target behavior state, executing the steps of acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times based on the object information and the behavior information of the object;

If a media resource issuing request of an object to a preset issuing scene is received, executing object information and behavior information based on the object, and acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times;

and if the target event occurs, executing the steps of acquiring a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times based on object information and behavior information of the object.

In some embodiments, the acquisition module is further to:

any object is obtained from a candidate object set, and object information and behavior information of any object in the candidate object set are matched with at least one of reference object information and reference behavior information.

In some embodiments, the acquisition module is further to:

if a target event occurs, determining a plurality of complementary objects corresponding to the target event based on an event type and an object library of the target event, adding the plurality of complementary objects to the candidate object set, wherein the object library is used for storing a plurality of event types and complementary objects corresponding to the event types, and each complementary object corresponding to the event type is a candidate object for issuing the media resource of the event type.

In some embodiments, the acquiring module is configured to:

recall the resource content of the media resource based on the object information, the behavior information and the content library of the media resource to obtain a plurality of candidate resource contents;

recall the issuing scenes of the media resources based on the object information, the behavior information and the scene library of the media resources to obtain a plurality of candidate issuing scenes;

and recalling the issuing time of the media resource based on the object information, the behavior information and the time base of the media resource to obtain a plurality of candidate issuing times.

In some embodiments, the issuing module is configured to:

determining a target display style of the target resource content from a plurality of candidate display styles by a style model corresponding to the target issuing scene based on the object information, the behavior information and the target resource content, wherein the style model is used for determining the display style of the media resource;

and issuing the target resource content of the target display style to the object under the target issuing time and the target issuing scene.

In some embodiments, the apparatus further comprises a training module for:

Acquiring a plurality of display style samples of a media resource, wherein the display style samples are used for indicating object information, behavior information, sample resource content and display styles of the sample resource content of a sample object;

masking part of the features in the first display style sample to obtain a second display style sample, and forming a positive sample pair based on the first display style sample and the second display style sample, wherein the first display style sample is any one of the display style samples;

forming a negative sample pair based on a third display style sample and a fourth display style sample, the third display style sample and the fourth display style sample being different samples of the plurality of display style samples;

determining a first loss based on the positive sample pair, the negative sample pair, and the pattern model;

inputting the first display style sample into the style model to obtain a predicted click rate, and determining a second loss based on the predicted click rate and a reference click rate;

model parameters of the model are adjusted based on the first loss and the second loss.

In another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory is configured to store at least one program, and the at least one program is loaded and executed by the processor to implement a method for issuing a media resource in an embodiment of the present application.

In another aspect, a computer readable storage medium is provided, where at least one section of program is stored, where the at least one section of program is loaded and executed by a processor to implement a method for delivering a media resource in an embodiment of the present application.

In another aspect, a computer program product is provided, the computer program product including at least one program stored in a computer readable storage medium, the at least one program being read from the computer readable storage medium by a processor of a computer device, the processor executing the at least one program, causing the computer device to perform the method for issuing a media resource according to any of the above implementations.

The embodiment of the application provides a media resource issuing method, which is used for determining the issuing time, the issuing scene and the resource content of a media resource for an object based on the object information and the behavior information of the object through a multi-scene reinforcement learning model. Because the multi-scene reinforcement learning is obtained by reinforcement learning training based on the data of a plurality of scenes, training data are various and rich, the trained multi-scene reinforcement learning model has good prediction effect and high accuracy, so that the most appropriate issuing time, issuing scene and resource content matched with objects can be determined based on the multi-scene reinforcement learning model, and then the issuing of media resources is performed based on the determined issuing time, issuing scene and resource content, the issued media resources can achieve popularization effect, and the conversion of the media resources is promoted. In addition, as the repeated issuing of the same media resources by a plurality of issuing scenes is avoided, the waste of flow resources is avoided, and the scene resource waste of the issuing scenes is avoided; and the resource issuing platform avoids repeatedly issuing the media resource through a plurality of issuing scenes, so that the processing pressure of the resource issuing platform can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for delivering media resources according to an embodiment of the present application;

FIG. 3 is a flowchart of another method for delivering media resources according to an embodiment of the present application;

FIG. 4 is a flowchart for establishing an index relationship according to an embodiment of the present application;

FIG. 5 is a schematic diagram of object partitioning according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a multi-scenario reinforcement learning model according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a model in a training phase according to an embodiment of the present application;

FIG. 8 is a flowchart of a media resource delivery process according to an embodiment of the present application;

FIG. 9 is a flowchart of another media resource delivery provided by an embodiment of the present application;

FIG. 10 is a block diagram of a device for delivering media resources according to an embodiment of the present application;

fig. 11 is a block diagram of a terminal according to an embodiment of the present application;

fig. 12 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.

The term "at least one" in the present application means one or more, and the meaning of "a plurality of" means two or more.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the object information and the behavior information involved in the present application are acquired with sufficient authorization.

The following describes the terms of art to which the present application relates:

artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment.

The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. With the research and advancement of artificial intelligence technology, the research and application of artificial intelligence technology is developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, digital twin, virtual man, robot, artificial intelligence generated content (AIGC, artificial Intelligence Generated Content), conversational interactions, smart medical services, smart customer service, game AI, etc., and it is believed that with the development of technology, the artificial intelligence technology will find application in more fields and play an increasingly important value.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

The following describes an implementation environment according to the present application:

the method for issuing the media resource provided by the embodiment of the application can be executed by computer equipment, and the computer equipment can be provided as a server or a terminal. An implementation environment schematic diagram of the method for delivering media resources provided by the embodiment of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment of a method for delivering media resources according to an embodiment of the present application, where the implementation environment includes a terminal 101 and a server 102. The terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. In some embodiments, the server 102 is configured to train a multi-scenario reinforcement learning model, where the trained multi-scenario reinforcement learning model is configured to determine a target delivery time, a target delivery scenario, and target resource content for the target delivery media resource based on the object information and behavior information of the object. Optionally, the server is a resource issuing platform and is used for issuing media resources so as to realize popularization of the service. In other embodiments, the terminal 101 has a multi-scenario reinforcement learning model embedded therein, and the terminal 101 determines, based on object information and behavior information of the object, a target delivery time, a target delivery scenario, and target resource content of the object delivery media resource through the multi-scenario reinforcement learning model.

In some embodiments, the terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, a VR (Virtual Reality) device, an AR (Augmented Reality) device, and the like. In some embodiments, the server 102 is a server cluster or a distributed system formed by a plurality of servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. In some embodiments, the server 102 primarily takes on computing work and the terminal 101 takes on secondary computing work; alternatively, the server 102 assumes secondary computing services and the terminal 101 assumes primary computing tasks; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal 101.

Referring to fig. 2, fig. 2 is a flowchart of a method for delivering a media resource according to an embodiment of the present application, where the method is performed by a computer device, and the method includes the following steps.

201. The computer equipment acquires a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times based on object information and behavior information of the object, wherein the behavior information is used for indicating interactive behaviors generated by the object based on the historically issued media resources.

In the embodiment of the application, the object information of the object comprises basic information such as the name, the sex, the age and the like of the object, and also comprises interest tags of the object on the media resources. The behavior information of the object includes behavior statistics characteristics, behavior sequence characteristics, the number of behaviors executed in a preset period of time, and the like. The ending time of the preset time period is the current time. The behavior statistics are used to indicate the number of times various behaviors are generated by the object. The behavior sequence feature is used to indicate the order in which the various behaviors are performed by the object. Further, the interaction behavior further comprises clicking operation, closing operation and other behaviors of the media resource, and registration, transaction and other behaviors generated based on the media resource.

The plurality of candidate asset content is content in the media asset including, but not limited to, content stored in a new/hot content library, a preset level content library, a welfare asset content library, and a landing page content library. Wherein the resource content stored in the new/hot content library is the content of the news event or the hot event which happens recently. The resource content stored in the preset level content library is the resource content which can be watched by the object reaching the preset level. The resource content stored in the welfare resource content library is the resource content for the object to issue the virtual resource, and the virtual resource can be a virtual article package.

The plurality of candidate delivery scenarios include, but are not limited to, an on-premise push scenario, and an RTA (Real Time API) delivery scenario. When the application runs in the foreground, the media resource is sent to the application at this time, namely the in-terminal pushing. When the application runs in the background for a long enough time, the application process is disconnected due to cleaning or other reasons, and the third-party platform transmits the media resource to the application at the moment, namely, the out-of-end pushing is realized. The candidate issuing scene can be a notification message pushing scene in an end push scene, a message popup scene jumped out in an application in an end push scene, and the like.

The candidate delivery times may be a plurality of hours, such as 8, 12, etc., or may be time points within each hour, such as one-half, one-third, etc. of each hour. The plurality of candidate delivery times may be morning, noon or afternoon, workdays and holidays, 8 a.k.8 to 8 a.k.8 and 8 a.k.8, which are not particularly limited herein.

In the embodiment of the application, the object information and the behavior information of the object are acquired under the condition of obtaining the authorization. And displaying an authorization interface on the terminal used by the object, wherein the authorization interface displays prompt information, consent controls and disagreement controls. The prompt information is used for indicating object information and behavior information of the acquired object. The consent control is used for indicating the object to consent the terminal to acquire the object information and the behavior information. And responding to the triggering operation of the consent control, and sending the object information and the behavior information of the object to the computer equipment by the terminal.

202. The computer equipment determines target resource content, target issuing scene and target issuing time from the candidate resource content, the candidate issuing scene and the candidate issuing time through a multi-scene reinforcement learning model based on object information, behavior information, the candidate issuing scene and the candidate issuing time, and the multi-scene reinforcement learning model is obtained by reinforcement learning based on object information, behavior information, the sample resource content, the sample issuing time and the sample issuing scene of the sample objects in the sample issuing scene.

In the embodiment of the application, the multi-scene reinforcement learning model determines target resource content from a plurality of candidate resource contents, determines a target issuing scene from a plurality of candidate issuing scenes, and determines target issuing time from a plurality of candidate issuing times.

In the embodiment of the application, the multi-scene reinforcement learning model is obtained based on sample data training in a plurality of sample issuing scenes. The plurality of sample delivery scenarios includes a plurality of candidate delivery scenarios.

In an embodiment of the application, the training target of the multi-scenario reinforcement learning model includes a DAU (Daily Active User, daily active user number) of a platform to which the growth resource belongs. Accordingly, the multi-scene reinforcement learning model obtained through training can select proper resource content, issuing scene and issuing time for the object based on the object information and behavior information of the object, and then issue media resources for the object based on the selected resource content, issuing scene and issuing time, so that the DAU of the platform to which the resources belong can be increased.

203. And the computer equipment issues the target resource content to the object under the target issuing time and the target issuing scene.

In the embodiment of the application, after receiving the target resource content, the terminal used by the object is used for displaying the target resource content in the target issuing scene. The computer device may process the target resource content based on the target presentation style before issuing the target resource content to the object, so as to issue the target resource content of the target presentation style to the object. Or the computer equipment issues the target resource content and the target display style to the object, and the terminal used by the object processes the target resource content based on the target display sample and displays the target resource content of the target display style. Or the computer equipment only issues the target resource content to the object, the terminal used by the object is configured with the target display style matched with the object information and the behavior information of the object in advance, and the terminal used by the object displays the target resource content of the target display style after processing the target resource content based on the target display sample, so that personalized display of the target resource content of different objects is realized. Wherein the target presentation style comprises at least one of a video style, a text style, and an image style.

In the embodiment of the present application, the foregoing embodiment of fig. 2 is used to briefly describe a media resource distribution process, and the following embodiment of fig. 3 is used to further describe a media resource distribution process, referring to fig. 3, and fig. 3 is a flowchart of a media resource distribution method provided in the embodiment of the present application. The method is performed by a computer device, the method comprising the following steps.

301. The computer device determines a set of candidate objects, object information and behavior information of any object in the set of candidate objects matching at least one of the reference object information and the reference behavior information.

In an embodiment of the application, the computer device determines the candidate object set based on the reference object information and the reference behavior information. Object information of objects in the candidate object set is matched with reference object information, or behavior information is matched with reference behavior information, or object information is matched with reference object information and behavior information is matched with reference behavior information.

In embodiments of the present application, the reference object information may include sub-object information of various aspects. The computer device may perform object screening based on the sub-object information of the aspects, respectively. For example, the reference object information indicates a new object, which refers to an object whose registration time does not exceed a preset duration. For another example, the reference subject information indicates a subject whose age is within a preset age range.

In embodiments of the present application, the reference behavior information may include sub-behavior information for multiple aspects. The computer device may perform object screening based on the sub-behavior information of the aspects, respectively. For example, the reference behavior information indicates that the object generates forward interactive behaviors based on the issued media resource, wherein the forward interactive behaviors comprise clicking, registering, trading and the like, so as to realize conversion of the media resource. For example, the reference behavior information is to acquire a virtual package or a virtual resource, and trade is performed using the acquired virtual resource or virtual package to select a benefit sensitive object. For another example, the reference behavior information indicates that the object is online to select the online object. For another example, the reference behavior information indicates that the object generates forward interactive behavior based on the issued media resource, and the number of times of generating the forward interactive behavior is greater than a preset number of times, so as to select the active object. For another example, the reference behavior information indicates that the object generates forward interactive behavior based on the issued new/hot media resources to select a new/hot content sensitive object.

In some embodiments, the computer device classifies the object information and the behavior information of the plurality of objects to obtain a plurality of types of objects, wherein the plurality of types of objects comprise one type of object conforming to the reference object information and one type of object conforming to the reference behavior information, and the computer device adds the one type of object conforming to the reference object information and the one type of object conforming to the reference behavior information into the candidate object set.

In some embodiments, the computer device performs screening of the object by the supervised classification model based on the object information and the behavior information to obtain an object that meets at least one of the reference object information and the reference behavior information. The supervised classification model includes, but is not limited to, at least one of a CTR (Click-Through-Rate) model, a CVR (Conversion Rate) model, an LTV (Life Time Value) model, and a VP (Video Play) model. The CTR model is used for predicting the click rate based on at least one of the object information and the behavior information so as to screen out the object of which the click rate accords with the reference click rate. The CVR model is used for predicting conversion rate based on at least one of the object information and the behavior information to screen out objects with conversion rate conforming to the reference conversion rate. The LTV model is used for predicting long-period objects based on at least one of object information and behavior information to screen out long-period objects, wherein the long-period objects refer to objects which generate interactive behaviors based on media resources for a long time. The VP model is used for predicting the media resource play rate based on at least one of the object information and the behavior information so as to screen out the object of which the resource play rate accords with the reference play rate.

In some embodiments, the candidate object set may also be supplemented. For example, if the target event occurs, the computer device determines a plurality of supplemental objects corresponding to the target event based on an event type and an object library of the target event, adds the plurality of supplemental objects to the candidate object set, and the object library is used for storing the plurality of event types and supplemental objects corresponding to the plurality of event types, wherein the supplemental object corresponding to each event type is a candidate object for issuing a media resource of the event type.

The target event includes an emergency event, which refers to an event that has not occurred previously, such as a current event, news or hot event that has occurred recently, or a newly played television play or movie, etc. In the embodiment of the application, under the condition that the target event occurs, as the media resource with the resource content being the target event needs to be issued to the interested object in time, the audience object of the target event is added to the candidate object set, and then the media resource is issued to the object in the candidate object set, so that the media resource can be issued to the complementary objects, the integrity of the crowd aimed at by the issuing of the resource is ensured, and the popularization effect of the target event is further improved.

In some embodiments, the computer device has previously established an index relationship for the event type and the object to which the event type corresponds. Alternatively, the index relationship is established in the BETree manner (an index tree). For example, referring to fig. 4, fig. 4 is a flowchart for establishing an index relationship according to an embodiment of the present application. After the computer equipment acquires the object information of a plurality of objects, the computer equipment performs unified conversion of the formats of the object information and converts the object information into an internal format corresponding to the BETree. Then, based on the object information of the objects, a plurality of objects are respectively allocated to a plurality of event types, and a Predicate Index (Predicate Index) is established. Wherein, the computer device establishes an Inverted index for each value in the rule to obtain an Inverted List (Inverted List) composed of a numerical value (Number) and a String (String), so as to improve the retrieval efficiency. The inverted list adopts the conjunctive inverted list, and sorts various rules according to the length, so that the subsequent pruning calculation is facilitated. Wherein each rule includes at least one sub-rule for indicating at least one attribute information of an object corresponding to the event type. The sub-rules of the object corresponding to an event type include gender, age, etc., and the length refers to the number of sub-rules included. Wherein a BitMap (BitMap) algorithm is used to perform calculations to convert various information into DNF (Disjunctive Normal Form, disjunctive normal) expressions, resulting in the back-off of the conjunctive formula. And matching the event types based on the index relation, so that the objects corresponding to the event types can be obtained quickly.

In some embodiments, the object library is configured to store a plurality of event types and object types of supplemental objects corresponding to each of the plurality of event types. Optionally, the computer device establishes a first index relationship for the event type and the object type corresponding to the event type, where the first index relationship is the same as the above index relationship, and is not described herein again. Accordingly, after determining the object type based on the object library of event types, the computer device performs object retrieval based on the object type to obtain an object of the object type. Optionally, the computer device establishes a second index relationship for the object type and the object corresponding to the object type, where the second index relationship is the same as the above index relationship, and is not described herein again.

In some embodiments, the computer device directly adds the plurality of supplemental objects to the candidate object set. In other embodiments, the computer device screens the plurality of supplemental objects based on the click rate estimation model to obtain a target supplemental object with a click rate exceeding a preset threshold, and adds the target supplemental object to the candidate object set, so that the accuracy of the determined supplemental object is further improved, the configured candidate object set is more stable and reliable, and the new operation model of "content finding" based on the massive objects and the content data becomes a new efficient paradigm.

In the embodiment, the retrieval and screening of the objects are performed through the index relation and the click rate estimation model, namely, a unified tool for real-time retrieval and vector nearest neighbor retrieval of the hybrid tag is established through the data engine and the model engine, mass data millisecond-level online content retrieval is realized, the object matching speed of the target event is improved from the day level to the second level, and the retrieval speed is effectively improved.

302. The computer device obtains any object from the candidate object set.

In the embodiment of the application, the objects in the candidate object set are objects of the selected media resource to be issued. Accordingly, the computer device performs the following steps 303-308, respectively, for any object in the candidate object set. Wherein the computer device may perform steps 303-308 on at least two objects simultaneously to increase the efficiency of the delivery.

In the embodiment of the application, the candidate object set is determined based on the reference object information and the reference behavior information, namely the object to be issued the media resource is selected first, so that the media resource is issued only to the selected object, the accuracy of issuing the media resource is improved, and the waste of the flow resource is avoided.

It should be noted that, the computer device periodically executes step 301 to update the objects in the candidate object set in time, so as to improve the accuracy of the objects in the candidate object set, and further obtain the accurate objects from the candidate object set.

303. The computer equipment acquires a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times based on object information and behavior information of the object, wherein the behavior information is used for indicating interactive behaviors generated by the object based on the historically issued media resources.

In some embodiments, the process of obtaining, by the computer device, a plurality of candidate resource contents, a plurality of candidate delivery scenes, and a plurality of candidate delivery times based on object information and behavior information of an object includes the following steps: the computer equipment recalls the resource content of the media resource based on the object information, the behavior information and the content library of the media resource to obtain a plurality of candidate resource contents; recall the issuing scenes of the media resources based on the object information, the behavior information and the scene library of the media resources to obtain a plurality of candidate issuing scenes; and recalling the issuing time of the media resource based on the object information, the behavior information and the time base of the media resource to obtain a plurality of candidate issuing times.

The steps of obtaining the candidate resource content, the candidate issuing scene and the candidate issuing time can be performed simultaneously, so that efficiency is improved. The content library is for storing a plurality of resource contents. The scene library is used for storing a plurality of issuing scenes. The time base is used for storing a plurality of issuing times.

In some embodiments, the resource content in the content library is annotated with object information and behavior information, and the computer device selects matching resource content from the content library based on the object information and behavior information of the object. The issuing scenes in the scene library are marked with object information and behavior information, and the computer equipment selects matched issuing scenes from the scene library based on the object information and the behavior information of the objects. The delivery time in the time base is marked with object information and behavior information of the object, and the computer equipment selects matched delivery time from the time base based on the object information and the behavior information of the object.

In other embodiments, the resource content, the delivery scenario, and the delivery time are retrieved based on a recommendation system. For example, the computer device determines a resource content preference, a delivery scenario preference, and a delivery time preference of the object based on object information and behavior information of the object, then selects resource content matching the resource content preference from the content library as candidate resource content, selects a delivery scenario matching the delivery scenario preference from the scenario library as candidate delivery scenario, and selects a delivery time matching the delivery time preference from the time library as candidate delivery time. Optionally, the computer device determines a similarity between the plurality of resource contents in the content library and the resource content preferences, respectively, and the resource contents whose similarity satisfies a preset condition are candidate resource contents. The recall process of the candidate issuing scene and the candidate issuing time is the same as the above, and is not described in detail here.

In the embodiment of the application, based on the object information and the behavior information of the object, a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing time are determined, so that the coarse screening of the resource contents, the issuing scenes and the issuing time is realized, the processing amount of the data is larger, the processing speed is required to be higher, and the processing efficiency can be improved by performing the coarse screening in a recall mode. And then fine screening is carried out on the resource content, the issuing scene and the issuing time which are screened out by the coarse screening, and the data quantity is reduced, so that the fine screening efficiency can be improved.

In some embodiments, if a target event occurs, the target event is supplemented into a plurality of candidate resource content to issue the target event to the object of interest.

In some embodiments, the computer device periodically performs step 303 for each object. In other embodiments, the computer device performs step 303 for the object in response to reaching a certain trigger opportunity. Optionally, the embodiment of the present application further includes at least one of the following three implementations.

In a first implementation, if the object information and behavior information of the object indicate that the object reaches the target behavior state, the computer device performs step 303.

Wherein the behavior information includes real-time behavior information and historical behavior information. The target behavior state can be set and changed as needed. The target behavior state includes a new registration object state, a positive feedback state, a negative feedback state, and the like. The positive feedback state is used to indicate that the object produced a positive interactive behavior based on the issued media asset. Such as the object just completed browsing of the post or video. The negative feedback state is used for indicating that the object generates negative interaction behavior based on the issued media resource. Such as negative feedback actions including the object closing the delivered media asset; or the method comprises the steps that the object continuously turns over a plurality of issued media resources, and the watching residence time is smaller than a preset time length, for example, more than 3 media resource pages are slid within 1 minute.

In some embodiments, the computer device may determine whether the object reaches the target behavioral state based on the supervised model. A onion-type object layering model is established, so that a plurality of objects are divided into slightly interactive, moderately interactive and severely interactive objects of a platform to which the resource belongs through the model. The method comprises the steps of lightly interacting, moderately interacting and severely interacting, wherein the lightly interacting, moderately interacting and severely interacting are used for indicating the times of interaction behaviors generated by media resources of a platform to which the object resources belong. The computer device may treat at least one of the light interaction, the medium interaction, and the heavy interaction as an object to achieve the target behavioral state.

For example, referring to fig. 5, fig. 5 is a schematic diagram of object division according to an embodiment of the present application. The object is divided into 4 classes and respectively stored in a culture pond, a survival pond, an early warning pond and a loss pond. Wherein, for the newly added object, the computer equipment pulls a new object and stores the new object into the culture pond; for reflow objects, the computer device recalls them and stores them to the culture pond. If the object in the culture pond is successfully cultured, the object is stored in the continuous pond, and if the object is failed to be cultured, the object is stored in the loss pond. Success of incubation refers to the generation of forward interactive behavior based on media assets. Failure to culture refers to negative interaction behavior based on media assets. And for the object in the persistence pool, if the activity level of the forward interactive behavior is reduced, storing the object in the early warning pool. And for the object in the early warning pool, if the early warning is recovered, storing the object in the continuous pool, and if the early warning is failed to cause loss, storing the object in the loss pool. Early warning recovery refers to forward interactive behavior generated based on issued media resources. Wherein the computer device treats the objects in the at least one pool as objects that reach the target behavior state.

In the embodiment of the application, in response to the object reaching the target behavior state, determining to issue a media resource for the object; and because the target behavior state is reached, the behavior state of the object is converted, so that the object is determined to issue the media resource in time, the strategy for issuing the media resource for the object can be adjusted based on the real-time behavior state of the object, the media resource is issued for the object based on the real-time preference of the object, the issuing accuracy is improved, and the loss rate is reduced.

In a second implementation, if a request for issuing a media resource of a preset issuing scene by an object is received, the computer device executes step 303. In the embodiment of the present application, the preset issuing scene may be any issuing scene. If a media resource issuing request of an object for a preset issuing scene is received, the fact that the object is on line currently and the requirement of issuing the media resource exists is indicated, and the fact that the object issues the media resource is determined at the moment, so that timeliness is guaranteed. And the issuing scene is determined again, so that obviously the resource issuing request of the issuing scene can be used as a trigger signal source of another issuing scene, the complementarity among a plurality of scenes is ensured, and obviously, the introduction of the incremental information generates beneficial value for each issuing scene.

In some embodiments, if a media asset delivery request is received and the object makes a forward interaction based on the media asset delivered by the media asset delivery request, the computer device performs step 303. The object makes a forward interactive action, which indicates that the object is interested in the issued media resource, and then determines that the object issues the media resource at this time, and step 303 is executed, so as to conform to the preference of the object for viewing the media resource currently, and further promote the conversion rate of the media resource. In some embodiments, if a media asset delivery request is received and the object makes a negative interaction based on the media asset delivered by the media asset delivery request, the computer device performs step 303. The object makes a negative interaction behavior, which indicates that the object is not interested in the issued media resource, and then determines that the object issues the media resource at the moment, and can issue other media resources again for the object based on the current negative interaction behavior of the object, so that the object is issued the media resource conforming to the preference of the object as much as possible through timely adjustment, and the loss rate of the object can be reduced.

In a third implementation, if the target event occurs, the computer device performs step 303. In the embodiment of the application, under the condition that the target event occurs, the media resource with the resource content being the target event needs to be issued to the interested object in time, so that under the condition that the target event occurs, the media resource is determined to be issued to the object, the media resource with the resource content being the target event can be issued to the object in time, the timeliness is ensured, and the popularization of the target event is promoted.

In other embodiments, if the media asset's download scenario includes a launch page of the application, the computer device performs step 303 when the application is launched. In other embodiments, if the number of continuous starts of the application reaches the preset number, step 303 is performed. For another example, if the object plays the multimedia resource on the bottom page, step 303 is performed. Wherein, the bottom page refers to the playing page of the multimedia resource. The multimedia resource may be a distributed media resource, or may be another media resource, such as a television show, a movie, etc.

304. The computer equipment inputs object information, behavior information, a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing time into a multi-scene reinforcement learning model, and the multi-scene reinforcement learning model is obtained by reinforcement learning based on the object information, the behavior information, the plurality of sample resource contents, the plurality of sample issuing time and the plurality of sample issuing scenes of a plurality of sample objects in the plurality of sample issuing scenes.

In the embodiment of the application, the training process of the multi-scene reinforcement learning model comprises the following steps: the method comprises the steps that computer equipment obtains a plurality of pieces of state information of a plurality of sample objects in a plurality of sample issuing scenes, wherein the state information comprises object information and behavior information of the sample objects; for each state information, inputting the state information into a multi-scene reinforcement learning model to obtain action information, and obtaining return information based on the state information and the action information, wherein the action information comprises sample resource content, a sample issuing scene and sample issuing time, and is used for indicating a resource issuing platform to issue sample resource content for a sample object in the sample issuing time and the sample issuing scene, and the return information is used for indicating return generated by the platform to which the resource belongs for issuing the sample resource content for the sample object based on the action information; based on the return information, model parameters of the multi-scene reinforcement learning model are adjusted.

In the embodiment of the present application, the report information may be a DAU, an ROI (Return On Investment ), or a weighted value of the DAU and the ROI, which is not limited herein. Accordingly, based on motion information predicted by the multi-scene reinforcement learning model, it can be ensured that the DAU and the ROI remain at a certain level, and further, the DAU and the ROI can be improved. If the DAU and the ROI are multiple services, the report information may be a weighted value of the multiple services.

305. The computer device determines, via the multi-scenario reinforcement learning model, a first number of resource contents, a second number of delivery scenarios, and a third number of delivery times from among the plurality of candidate resource contents, the plurality of candidate delivery scenarios, and the plurality of candidate delivery times.

In embodiments of the application, the first number, the second number, and the third number may be the same or different. In some embodiments, the multi-scene reinforcement learning model is a hierarchical reinforcement learning model. The multi-scene reinforcement learning model sequentially determines resource content, a delivery scene and delivery time.

Accordingly, the above-mentioned computer device determines, from among a plurality of candidate resource contents, a plurality of candidate delivery scenes, and a plurality of candidate delivery times, a first number of resource contents, a second number of delivery scenes, and a third number of delivery times, by a multi-scene reinforcement learning model, a process comprising the steps of: the computer equipment determines a first number of resource contents from a plurality of candidate resource contents based on the object information and the behavior information through a multi-scene reinforcement learning model, wherein the first number of resource contents is the first number of candidate resource contents with the highest matching degree with the object information and the behavior information in the plurality of candidate resource contents. The computer equipment determines a second number of issuing scenes from the plurality of candidate issuing scenes based on the object information, the behavior information and the first number of resource contents through the multi-scene reinforcement learning model, wherein the second number of issuing scenes are the front second number of candidate issuing scenes with the highest matching degree with the object information, the behavior information and the first number of resource contents in the plurality of candidate issuing scenes. The computer equipment determines a third number of issuing time from the plurality of candidate issuing times based on the object information, the behavior information, the first number of resource contents and the second number of issuing scenes through the multi-scene reinforcement learning model, wherein the third number of issuing time is a previous third number of candidate issuing time with highest matching degree with the object information, the behavior information, the first number of resource contents and the second number of issuing scenes in the plurality of candidate issuing time.

In the embodiment of the application, the matching degree between the resource content and the object information and the behavior information is the similarity probability between the object information and the behavior information and the resource content. The matching degree between the issuing scene and the object information, the behavior information and the resource content is the preference probability of the issuing scene. The matching degree between the issuing time and the object information, the behavior information, the resource content and the issuing scene is the time preference probability of the issuing time.

In the embodiment of the application, the resource content, the issuing scene and the issuing time are determined in sequence, and because the determined resource content is considered when the issuing scene is determined and the determined resource content and the issuing scene are considered when the issuing time is determined, the determined resource content, the issuing scene and the issuing time are related and are matched, so that a better comprehensive effect can be achieved based on the determined resource content, the issuing scene and the issuing time, the object is promoted to make interactive behavior based on the issued media resource, and the conversion rate of the media resource is improved.

Wherein the computer device determines a first amount of resource content, including the following two implementations.

In a first implementation, the multi-scene reinforcement learning model determines a degree of matching between object information and behavior information of an object and a plurality of candidate resource contents, and ranks the plurality of candidate resource contents based on the degree of matching, and uses the candidate resource contents ranked in the first number as the first number of resource contents. The implementation mode can determine a plurality of resource contents at one time, so that the efficiency of determining the resource contents is improved.

In a second implementation, there is an association between the first amount of resource content. The multi-scene reinforcement learning model determines the matching degree between object information and behavior information of an object and a plurality of candidate resource contents respectively, and takes a first candidate resource content with the highest matching degree as one resource content in a first quantity. And then, the first candidate resource content acts on the object to adjust the state information of the object, wherein the state information comprises the object information and the behavior information of the object, the matching degree between the adjusted state information and the rest candidate resource content is determined, the second candidate resource content with the highest matching degree is used as one resource content in the first quantity, and the steps are repeatedly executed until the first quantity of resource content is determined. In the implementation mode, the latter resource content depends on the selected resource content, so that the determined resource content considers the time sequence characteristic, and the accuracy of the determined resource content is further improved.

Wherein the computer device determines a second number of issues with the scenario, including the following two implementations.

In a first implementation, the computer device multi-scene reinforcement learning model directly determines the matching degree of the plurality of candidate delivery scenes based on the object information, the behavior information and the first number of resource contents, so as to directly determine the second number of delivery scenes based on the matching degree of the plurality of candidate delivery scenes, thereby improving efficiency.

In a second implementation manner, for each resource content in the first number of resource contents, the computer device determines, through a multi-scenario reinforcement learning model, a first sub-number of delivery scenarios from the plurality of candidate delivery scenarios based on the object information, the behavior information, and the resource content, where the first sub-number of delivery scenarios is a previous first sub-number of candidate delivery scenarios in the plurality of candidate delivery scenarios that has a highest degree of matching with the object information, the behavior information, and the resource content. And then accumulating the first sub-quantity of the issuing scenes corresponding to the first quantity of the resource contents respectively to obtain a second quantity of the issuing scenes. The process of determining the first sub-number of the delivering scenes by the computer device is the same as the process of determining the first number of the resource contents, and is not described herein. The implementation mode enables the determined issuing scene to have pertinence, to be more matched with corresponding resource content, and to be higher in accuracy.

Wherein the computer device determines the third number of delivery times includes the following two implementations.

In a first implementation manner, the computer device directly determines the matching degree of the plurality of candidate issuing times based on the object information, the behavior information, the first quantity of resource contents and the second quantity of issuing scenes through the multi-scene reinforcement learning model, so that the third quantity of issuing times is directly determined based on the matching degree of the plurality of candidate issuing times, and efficiency is improved.

In a second implementation, the computer device combines the first number of resource contents and the second number of delivery scenarios two by two to obtain a plurality of combinations. Each combination includes a resource content and a delivery scenario. For each combination, determining a second sub-number of delivery times from the plurality of candidate delivery times based on the object information, the behavior information and the combination by the multi-scene reinforcement learning model, wherein the second sub-number of delivery times is a first second number of candidate delivery times with highest matching degree with the object information, the behavior information and the combination from the plurality of candidate delivery times. And then accumulating the issuing time of the second sub-quantity corresponding to the plurality of combinations respectively to obtain the issuing time of the third quantity. The process of determining the second sub-number of delivery times by the computer device is the same as the process of determining the first number of resource contents, and is not described herein. The implementation mode enables the determined issuing time to have pertinence, to be matched with corresponding resource content and issuing scenes, and to be higher in accuracy.

For example, referring to fig. 6, fig. 6 is a schematic structural diagram of a multi-scenario reinforcement learning model according to an embodiment of the present application. And the recall subsystem is used for recalling the resource content, the issuing scene and the issuing time to obtain a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times. And then inputting the contents of the plurality of candidate resources, the plurality of candidate issuing scenes, the plurality of candidate issuing times, the object information and the behavior information of the object into a multi-scene reinforcement learning model. The multi-scene reinforcement learning model firstly iteratively determines a first number of resource contents from a plurality of candidate contents based on object information and behavior information, then determines a second number of issuing scenes, and then determines a third number of issuing time. Wherein id1, id2, etc. identify a plurality of candidate resource contents, sc1, sc2, etc. identify a plurality of candidate delivery scenarios, and t1, t2, etc. identify a plurality of candidate delivery times. The multi-scene reinforcement learning model comprises an actor (action) network, wherein the actor network is a multi-task cascade perception network, and as shown in the upper right corner of fig. 6, the actor network is used for sequentially selecting resource content, issuing scenes and issuing time.

306. The computer equipment determines target resource content, target issuing scenes and target issuing time from the first quantity of resource content, the second quantity of issuing scenes and the third quantity of issuing time.

In some embodiments, the first number of resource content, the second number of delivery scenarios, and the third number of delivery times may be freely combined, resulting in a plurality of combinations. A target combination is determined from the plurality of combinations, the target combination including target resource content, target delivery scenario, and target delivery time. In other embodiments, the computer device determines target resource content from a first amount of resource content; determining a target issuing scene from the second number of issuing scenes; and determining the target issuing time from the third quantity of issuing times.

In some embodiments, the output of the multi-scenario reinforcement learning model is a < resource content id, issue scenario id, issue time id > triplet, i.e., three elements needed to issue media resources for indicating when in which scenario which resource content is presented.

In some embodiments, the multi-scenario reinforcement learning model outputs only one set of triples that includes target resource content, target delivery scenario, and target delivery time. In other embodiments, the multi-scenario reinforcement learning model outputs a plurality of triples based on a first number of resource contents, a second number of delivery scenarios, and a third number of delivery times, the target triples are determined by the computer device, and the resource contents, the delivery scenarios, and the delivery times in the target triples are respectively used as target resource contents, target delivery scenarios, and target delivery times.

In some embodiments, the first number of resource contents are arranged in sequence, the second number of delivery scenarios are arranged in sequence, the third number of delivery times are arranged in sequence, the computer device uses the first number of resource contents as target resource contents, the first number of delivery scenarios as target delivery scenarios, and the first number of delivery times as target delivery times.

In other embodiments, for any one of the issuing scenarios, the first preset bit number and the second preset bit number of the issuing time respectively correspond to the first preset bit number and the second preset bit number of the resource content, where the first preset bit number and the second preset bit number may be set and changed as required, and if the first preset bit number is the first bit number and the second preset bit number is the second bit number. And regarding any selected target issuing scene, taking the resource content of the first preset number of times in the first number of resource contents as target resource content, and taking the issuing time of the second preset number of times in the third number of issuing time as target issuing time. The computer device uses the third preset number of issuing scenes in the second number of issuing scenes as the target issuing scenes, and the third preset number of issuing scenes can be set and changed according to the needs, which is not particularly limited herein.

In the embodiment of the present application, the process of determining the target resource content, the target delivery scenario and the target delivery time from the plurality of candidate resource contents, the plurality of candidate delivery scenarios and the plurality of candidate delivery times by the multi-scenario reinforcement learning model based on the object information, the behavior information, the plurality of candidate resource contents, the plurality of candidate delivery scenarios and the plurality of candidate delivery times is now performed through the steps 304-306. In the embodiment, a certain amount of resource contents, issuing scenes and issuing time are first screened out, and then the target resource contents, the target issuing scenes and the target issuing time are screened out based on a screening mechanism of the rank, so that the flexibility and the accuracy are improved.

It should be noted that, the above steps 304-306 are only one alternative implementation manner of the process, and the computer device may implement the process in other alternative implementations, which are not described herein.

307. Based on the object information, the behavior information and the target resource content, the computer equipment determines a target display style of the target resource content from a plurality of candidate display styles through a style model corresponding to a target issuing scene, wherein the style model is used for determining the display style of the media resource.

In some embodiments, the training process of the style model includes the steps of: the method comprises the steps that computer equipment obtains a plurality of display style samples of media resources, wherein the display style samples are used for indicating object information, behavior information, sample resource content and display styles of the sample resource content of sample objects; masking part of the features in the first display style sample by the computer equipment to obtain a second display style sample, and forming a positive sample pair based on the first display style sample and the second display style sample, wherein the first display style sample is any one of the display style samples; the computer device forms a negative pair based on a third presentation style sample and a fourth presentation style sample, the third presentation style sample and the fourth presentation style sample being different samples of the plurality of presentation style samples; the computer device determining a first loss based on the positive pair of samples, the negative pair of samples, and the pattern model; the computer equipment inputs the first display style sample into the style model to obtain a predicted click rate, and determines a second loss based on the predicted click rate and a reference click rate; the computer device adjusts model parameters of the model based on the first loss and the second loss.

It should be noted that, different issuing scenes correspond to different style models, and for each style model under the issuing scene, the style model is trained based on object information, behavior information, sample resource content and display style of the sample object under the issuing scene.

Wherein the computer device determines a cross entropy between the predicted click rate and the reference click rate resulting in a second penalty. The computer device performs weighted summation on the first loss and the second loss to obtain a target loss, and adjusts model parameters of the model based on the target loss.

In an embodiment of the present application, the process of masking a part of features in a first display style sample by the computer device to obtain a second display style sample includes the following steps. The computer device replaces part of the features masked (mask) with a preset embedding vector (embedding) for the continuous features of the first presentation style sample to obtain a second presentation style sample. Consecutive features refer to features that are used to indicate that a presentation style sample is in one dimension. As for an image, the pixel features that make up the image are continuous features. For a plurality of discrete features, i.e., multi-valued features, of the first presentation style sample, a portion of the features are randomly altered or discarded (dropout) according to probability, resulting in a second presentation style sample. The discrete features are used for indicating parallel features of the presentation style sample in different dimensions, such as a category feature of an image can be a landscape image or a black-and-white image, and then the landscape image feature of the image can be randomly discarded.

In an embodiment of the application, the computer device determines a sub-loss based on the positive pair of samples, determines a sub-loss based on the negative pair of samples, and the weighted sum of the two sub-losses yields a first loss, the first loss targeting closer distances between the samples in the positive pair of samples and further distances between the samples in the negative pair. Wherein the style model is used to predict click through rates. Optionally, the computer device determines click rates for two samples in each sample pair, respectively, and determines a self-supervising penalty based on a gap between the two click rates to obtain the first penalty.

In the embodiment of the application, the richness of sample data is improved by constructing the positive sample pair and the negative sample pair, and the model is trained based on the loss determined by the data, so that the generalization and the accuracy of the model can be improved, and the training effect is further improved.

For example, referring to fig. 7, fig. 7 is a schematic structural diagram of a model of a training phase according to an embodiment of the present application. The sample mode comprises three connecting layer modules, and model parameters of the three connecting layer modules are mutually shared. The left connecting layer module is used for obtaining second loss, and the middle connecting layer module and the right connecting layer module are respectively used for processing positive sample pairs and negative sample pairs so as to obtain first loss; or the middle and right connection layer modules are used to process the continuous and discrete features of the presentation style sample, respectively, to obtain a first penalty. Wherein the input data of the input layer of the style model includes a plurality of features of the sample resource content and object information and behavior information of the user. The plurality of features include an embedded feature of the cover map, a text feature of the video, a multi-valued feature extracted from the cover map, the plurality of features being used to represent the resource content and the presentation style. In the using process of the style model, outputting the style model to predict the click rate of each candidate display style, and then taking the candidate display style with the click rate larger than the click rate threshold as the screened target display style. Optionally, if the click rate is greater than the click rate threshold, the candidate display style with the highest click rate is used as the target display style.

In some embodiments, the computer device predicts a click rate of each of the plurality of candidate presentation styles by the style model, and takes the candidate presentation style with the highest click rate as the target presentation style. In other embodiments, the computer device obtains actual click rates of each of the plurality of candidate display styles, sums the predicted click rates of the sample model and the actual click rates in a weighted manner to obtain a comprehensive click rate of the plurality of candidate display styles, and uses the candidate display style with the largest comprehensive click rate as the target display style. The actual click rate may be a click rate obtained through an experimental test, or may be a click rate determined based on behavior information of histories of a plurality of objects, which is not described herein.

In the embodiment of the application, after the issuing scene and the resource content are given, the target display style with the highest click rate is selected through the style model and posterior actual data, so that the target display style is determined based on the click rates of two dimensions, and the accuracy of the determined target display style is high due to the high accuracy of the comprehensive click rates of the multiple dimensions, so that the determined target display style accords with the object preference better, and the conversion of the issued media resource is promoted.

In embodiments of the application, the plurality of candidate presentation styles may be generated in batches by artificial or artificial intelligence. The click rate of the display style is estimated through the style model, namely the win-win or lose of the display style is eliminated, the expansion of the online display style quantity is avoided, the trial-and-error cost is reduced, and the efficiency is improved.

308. And the computer equipment issues the target resource content of the target display style to the object under the target issuing time and the target issuing scene.

In the embodiment of the application, the computer equipment can send the target resource content of the target display style to the object in the modes of end-in pushing and end-out pushing. In the embodiment of the application, the method is applied to the in-terminal pushing scene, so that the throwing efficiency of the media resource can be increased, and the probability of the object based on the virtual resource exchange permission can be improved. The rights may be rights to view certain multimedia assets. Accordingly, by the method provided by the embodiment of the application, if the applied ROI is used as the strengthening target of the multi-scene strengthening learning model, the applied ROI can be improved by the method provided by the embodiment of the application. Such as ROIs where the in-application object redeems rights based on virtual resources may be enhanced.

For example, referring to fig. 8, fig. 8 is a flowchart of delivering a media resource according to an embodiment of the present application. Wherein, the object is selected first to obtain a plurality of candidate objects. The plurality of candidate objects include a benefit sensitive object, a new/hot content sensitive object, an active object, an online object, and the like. And judging the triggering time to obtain the resource triggering time, wherein the triggering time comprises timing triggering, triggering when the application is started, triggering when the page is fast sliding, triggering when the bottom page plays video, triggering when the application is started for a plurality of times, triggering when the advertisement request is sent and the like. And then using the multi-scene reinforcement learning model to obtain the optimal issuing scene and resource content. Such as under-field scenes, may be an end-extrapolated scene and an RTA-cast scene. The resource content may be content in a new/hot content library, a preset level content library, a welfare resource library, and a landing page library. The style model is then used to derive an optimal presentation style, the presentation style including at least one of a picture style, a video style, a text style, and the like.

For example, referring to fig. 9, fig. 9 is a flowchart of delivering a media resource according to an embodiment of the present application. And performing object circling based on the behavior logs of the plurality of objects and the intra-terminal behavior logs. The method comprises the steps of obtaining object states, object behavior information and the like of an object based on a behavior log, wherein the object states, the object behavior information and the like comprise real-time behavior information and day-level behavior information. And then, object circling is carried out based on the object state and the object behavior information, so that a plurality of candidate objects are obtained. And based on the life cycle state diagram of the object as shown in fig. 5, the current state of the object is obtained, the current state of the object is taken as priori knowledge, and the behavior information of the object is combined to judge whether the resource triggering time of the object is reached. And if the resource triggering time of the object is reached, performing action prediction by using a multi-scene learning model. And predicting based on the behavior information of the object, the object information and the priori knowledge to obtain a target issuing scene, target resource content and target issuing time. And then determining a target display style based on the style model corresponding to the target issuing scene. And delivering the target resource content of the target display style for the target object under the target delivery time and the target delivery scene. And issues the secondary media asset down as stored into the behavioral log of the object.

In the embodiment of the present application, the process of delivering the target resource content to the object in the target delivery time and the target delivery scene is realized through the steps 307 to 308. In the embodiment, the display style matched with the target resource content, the object information and the behavior information is determined, so that the resource content of the display style is issued to the object, the issued media resource better accords with the preference of the object to the display style, and the object is promoted to generate interactive behavior based on the media resource.

It should be noted that, the above steps 307 to 308 are only one alternative implementation manner of implementing the process, and the computer device may implement the process in other alternative implementations, which are not described herein.

In the embodiment of the application, the flow resources of each scene are integrated through multi-scene growth joint modeling, and the DAU is promoted from the global optimization growth strategy by comprehensively considering the use of a plurality of scenes. And based on the method, a multi-scene reinforcement learning model is constructed, so that the multi-scene reinforcement learning model can recommend proper contents to proper objects in proper time and in proper scenes, and personalized increase operation of the service is realized. Aiming at multi-scene growth service, the embodiment of the application provides a multi-scene joint modeling model, which utilizes reinforcement learning to estimate the preference of an object to the content of the growth resource and the preference of the issuing scene, and thereby promotes the overall DAU of the platform to which the resource belongs.

Fig. 10 is a block diagram of a device for delivering media resources according to an embodiment of the present application. Referring to fig. 10, the apparatus includes:

the obtaining module 1001 is configured to obtain, based on object information and behavior information of an object, a plurality of candidate resource contents, a plurality of candidate delivery scenes, and a plurality of candidate delivery times, where the behavior information is used to indicate interactive behaviors generated by the object based on a media resource delivered historically;

the determining module 1002 is configured to determine, based on object information, behavior information, a plurality of candidate resource contents, a plurality of candidate delivery scenarios, and a plurality of candidate delivery times, target resource contents, target delivery scenarios, and target delivery times from among the plurality of candidate resource contents, the plurality of candidate delivery scenarios, and the plurality of candidate delivery times by using a multi-scenario reinforcement learning model, where the multi-scenario reinforcement learning model performs reinforcement learning based on object information, behavior information, a plurality of sample resource contents, a plurality of sample delivery times, and a plurality of sample delivery scenarios of a plurality of sample delivery scenarios;

and the issuing module 1003 is configured to issue the target resource content to the object under the target issuing time and the target issuing scene.

In some embodiments, the determining module 1002 is configured to:

Inputting object information, behavior information, a plurality of candidate resource contents, a plurality of candidate issuing scenes and a plurality of candidate issuing times into a multi-scene reinforcement learning model;

determining a first number of resource contents, a second number of delivery scenes and a third number of delivery times from the plurality of candidate resource contents, the plurality of candidate delivery scenes and the plurality of candidate delivery times through a multi-scene reinforcement learning model;

and determining target resource content, target issuing scenes and target issuing time from the first quantity of resource content, the second quantity of issuing scenes and the third quantity of issuing time.

In some embodiments, the determining module 1002 is configured to:

determining a first number of resource contents from the plurality of candidate resource contents based on the object information and the behavior information through the multi-scene reinforcement learning model, wherein the first number of resource contents are the first number of candidate resource contents with the highest matching degree with the object information and the behavior information in the plurality of candidate resource contents;

determining a second number of issuing scenes from the plurality of candidate issuing scenes based on the object information, the behavior information and the first number of resource contents through the multi-scene reinforcement learning model, wherein the second number of issuing scenes are the front second number of candidate issuing scenes with the highest matching degree with the object information, the behavior information and the first number of resource contents in the plurality of candidate issuing scenes;

And determining a third number of issuing time from the plurality of candidate issuing times based on the object information, the behavior information, the first number of resource contents and the second number of issuing scenes by using the multi-scene reinforcement learning model, wherein the third number of issuing time is the previous third number of candidate issuing time with the highest matching degree with the object information, the behavior information, the first number of resource contents and the second number of issuing scenes in the plurality of candidate issuing time.

In some embodiments, the apparatus further comprises a training module for:

for each state information, inputting the state information into a multi-scene reinforcement learning model to obtain action information, and obtaining return information based on the state information and the action information, wherein the action information comprises sample resource content, a sample issuing scene and sample issuing time, and is used for indicating a resource issuing platform to issue sample resource content for a sample object in the sample issuing time and the sample issuing scene, and the return information is used for indicating return generated by the platform to which the resource belongs for issuing the sample resource content for the sample object based on the action information;

Based on the return information, model parameters of the multi-scene reinforcement learning model are adjusted.

In some embodiments, the apparatus further comprises an execution module to perform at least one of:

In some embodiments, the acquisition module 1001 is further to:

any object is acquired from the candidate object set, and the object information and the behavior information of any object in the candidate object set are matched with at least one item of reference object information and reference behavior information.

In some embodiments, the acquisition module 1001 is further to:

If the target event occurs, determining a plurality of supplementary objects corresponding to the target event based on the event type and the object library of the target event, adding the plurality of supplementary objects to the candidate object set, wherein the object library is used for storing the plurality of event types and the supplementary objects corresponding to the event types, and each supplementary object corresponding to the event type is a candidate object for issuing the media resource of the event type.

In some embodiments, the obtaining module 1001 is configured to:

In some embodiments, the issuing module 1003 is configured to:

determining a target display style of the target resource content from a plurality of candidate display styles by a style model corresponding to a target issuing scene based on the object information, the behavior information and the target resource content, wherein the style model is used for determining the display style of the media resource;

And under the target issuing time and the target issuing scene, issuing target resource content of a target display style to the object.

In some embodiments, the apparatus further comprises a training module for:

masking part of the features in the first display style sample to obtain a second display style sample, forming a positive sample pair based on the first display style sample and the second display style sample, wherein the first display style sample is any one of a plurality of display style samples;

inputting the first display style sample into a style model to obtain a predicted click rate, and determining a second loss based on the predicted click rate and a reference click rate;

The embodiment of the application provides a media resource issuing device, which determines issuing time, issuing scene and resource content of a media resource for an object based on object information and behavior information of the object through a multi-scene reinforcement learning model. Because the multi-scene reinforcement learning is obtained by reinforcement learning training based on the data of a plurality of scenes, training data are various and rich, the trained multi-scene reinforcement learning model has good prediction effect and high accuracy, so that the most appropriate issuing time, issuing scene and resource content matched with objects can be determined based on the multi-scene reinforcement learning model, and then the issuing of media resources is performed based on the determined issuing time, issuing scene and resource content, the issued media resources can achieve popularization effect, and the conversion of the media resources is promoted. In addition, as the repeated issuing of the same media resources by a plurality of issuing scenes is avoided, the waste of flow resources is avoided, and the scene resource waste of the issuing scenes is avoided; and the resource issuing platform avoids repeatedly issuing the media resource through a plurality of issuing scenes, so that the processing pressure of the resource issuing platform can be reduced.

In the embodiment of the application, the computer equipment can be a terminal or a server, and when the computer equipment is the terminal, the terminal is used as an execution main body to implement the technical scheme provided by the embodiment of the application; when the computer equipment is a server, the server is used as an execution main body to implement the technical scheme provided by the embodiment of the application; or, the technical scheme provided by the application is implemented through interaction between the terminal and the server, and the embodiment of the application is not limited to the above.

Fig. 11 shows a block diagram of a terminal 1100 according to an exemplary embodiment of the present application.

Generally, the terminal 1100 includes: a processor 1101 and a memory 1102.

The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 1101 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1101 to implement the method of delivering media assets provided by the method embodiments of the present application.

In some embodiments, the terminal 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, and a power supply 1108.

A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 may also include NFC (Near Field Communication, short-range wireless communication) related circuitry, which is not limiting of the application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be one and disposed on the front panel of the terminal 1100; in other embodiments, the display 1105 may be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1105 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.

A power supply 1108 is used to power the various components in terminal 1100. The power supply 1108 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1108 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1100 also includes one or more sensors 1109. The one or more sensors 1109 include, but are not limited to: acceleration sensor 1110, gyroscope sensor 1111, pressure sensor 1112, optical sensor 1113, and proximity sensor 1114.

The acceleration sensor 1110 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 1100. For example, the acceleration sensor 1110 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1101 may control the display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1110. Acceleration sensor 1110 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 1111 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1111 may collect a 3D motion of the user on the terminal 1100 in cooperation with the acceleration sensor 1110. The processor 1101 may implement the following functions based on the data collected by the gyro sensor 1111: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 1112 may be disposed on a side frame of terminal 1100 and/or on an underlying layer of display 1105. When the pressure sensor 1112 is disposed at a side frame of the terminal 1100, a grip signal of the terminal 1100 by a user may be detected, and the processor 1101 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 1112. When the pressure sensor 1112 is disposed at the lower layer of the display screen 1105, the processor 1101 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1113 is used to collect the intensity of ambient light. In one embodiment, the processor 1101 may control the display brightness of the display screen 1105 based on the intensity of ambient light collected by the optical sensor 1113. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1105 is turned down. In another embodiment, the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1113.

A proximity sensor 1114, also referred to as a distance sensor, is typically provided at the front panel of the terminal 1100. Proximity sensor 1114 is used to collect the distance between the user and the front of terminal 1100. In one embodiment, when the proximity sensor 1114 detects that the distance between the user and the front face of the terminal 1100 gradually decreases, the processor 1101 controls the display 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1114 detects that the distance between the user and the front surface of the terminal 1100 gradually increases, the display screen 1105 is controlled by the processor 1101 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 11 is not limiting and that terminal 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1201 and one or more memories 1202, where the memories 1202 are used to store executable program codes, and the processors 1201 are configured to execute the executable program codes to implement the method for issuing media resources provided by the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

The embodiment of the application also provides a computer readable storage medium, wherein at least one section of program is stored in the computer readable storage medium, and the at least one section of program is loaded and executed by a processor so as to realize the method for issuing the media resource in any implementation manner.

The embodiment of the application also provides a computer program product, which comprises at least one section of program, the at least one section of program is stored in a computer readable storage medium, a processor of the computer device reads the at least one section of program from the computer readable storage medium, and the processor executes the at least one section of program, so that the computer device executes the method for issuing the media resource in any implementation manner.

In some embodiments, a computer program product according to embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices at one site or on multiple computer devices distributed across multiple sites and interconnected by a communication network, where the multiple computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., which fall within the spirit and principles of the present application.

Claims

1. A method for delivering media resources, the method comprising:

2. The method of claim 1, wherein the determining, by a multi-scenario reinforcement learning model, the target resource content, the target delivery scenario, and the target delivery time from the plurality of candidate resource content, the plurality of candidate delivery scenarios, and the plurality of candidate delivery times based on the object information, the behavior information, the plurality of candidate resource content, the plurality of candidate delivery scenarios, and the plurality of candidate delivery times comprises:

3. The method of claim 2, wherein the determining, by the multi-scenario reinforcement learning model, a first number of resource contents, a second number of delivery scenarios, and a third number of delivery times from the plurality of candidate resource contents, the plurality of candidate delivery scenarios, and the plurality of candidate delivery times comprises:

4. The method of claim 1, wherein the training process of the multi-scene reinforcement learning model comprises:

5. The method of claim 1, further comprising at least one of:

6. The method according to claim 1, wherein the method further comprises:

7. The method of claim 6, wherein the method further comprises:

8. The method of claim 1, wherein the obtaining a plurality of candidate resource contents, a plurality of candidate delivery scenarios, and a plurality of candidate delivery times based on the object information and the behavior information of the object comprises:

9. The method according to claim 1, wherein said delivering the target resource content to the object at the target delivery time and the target delivery scenario comprises:

10. The method of claim 9, wherein the training process of the style model comprises:

11. A device for delivering media resources, the device comprising:

12. A computer device, characterized in that it comprises a processor and a memory for storing at least one program, said at least one program being loaded by said processor and executing the method of issuing media assets according to any of claims 1 to 10.

13. A computer-readable storage medium storing at least one program for executing the method of delivering a media asset according to any one of claims 1 to 10.

14. A computer program product, characterized in that the computer program product comprises at least one program stored in a computer readable storage medium, from which the at least one program is read by a processor of a computer device, the processor executing the at least one program causing the computer device to perform the method of issuing media assets according to any of claims 1 to 10.