CN117391359B

CN117391359B - Method, device, electronic equipment and storage medium for resource scheduling

Info

Publication number: CN117391359B
Application number: CN202311359557.XA
Authority: CN
Inventors: 於喆; 曹绍升; 周霖; 陈政企
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2023-10-19
Filing date: 2023-10-19
Publication date: 2024-04-16
Anticipated expiration: 2043-10-19
Also published as: CN117391359A

Abstract

The present disclosure relates to a method, an apparatus, an electronic device, and a storage medium for resource scheduling. In one method, first supply and demand data associated with a service at a first point in time is acquired. Based on the first supply-demand data, determining, with the first network, a first resource scheduling policy associated with the first supply-demand data, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service. The first resource scheduling policy is applied so as to obtain a first influence of the first resource scheduling policy on a service target of the service, and an association relationship between the first resource scheduling policy and the first influence satisfies a predetermined condition. In this way, the efficiency of resource scheduling may be improved and the impact of a resource scheduling policy may be matched to the scheduling credentials in the resource scheduling policy.

Description

Method, device, electronic equipment and storage medium for resource scheduling

Technical Field

Implementations of the present disclosure relate to the field of computers, and more particularly, to a method, apparatus, electronic device, and storage medium for resource scheduling.

Background

With the development of internet technology, more and more platforms provide a wide variety of services through the internet. For example, a user may obtain services such as driving, delivering, or driving a car via an internet platform. In an internet service platform, the balance of demands between the demand side and the provider side of the service will greatly affect the quality of service provided. For example, in some cases, the proliferation of the number of requesters may result in the failure of the desirers to obtain service for a short period of time. In some cases, a reduced number of demand parties may also result in a large number of providers being in an idle state. At this time, it is desirable to provide a more efficient resource scheduling manner in the internet service platform, so as to improve the service objective of the service platform.

Disclosure of Invention

According to a first aspect of the present disclosure, a method of performing resource scheduling in a service is provided. In the method, first supply-demand data associated with a service at a first point in time is acquired. Based on the first supply-demand data, determining, with the first network, a first resource scheduling policy associated with the first supply-demand data, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service. The first resource scheduling policy is applied so as to obtain a first influence of the first resource scheduling policy on a service target of the service, and an association relationship between the first resource scheduling policy and the first influence satisfies a predetermined condition.

According to a second aspect of the present disclosure, an apparatus for performing resource scheduling in a service is provided. The device comprises: an acquisition module configured to acquire first supply-demand data associated with a service at a first point in time; a determination module configured to determine, based on the first supply-demand data, a first resource scheduling policy associated with the first supply-demand data with the first network, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service; and an application module configured to apply a first resource scheduling policy so as to obtain a first impact of the first resource scheduling policy on a service objective of the service, an association relationship between the first resource scheduling policy and the first impact satisfying a predetermined condition.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory and a processor; wherein the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement a method according to the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure there is provided a computer readable storage medium having stored thereon one or more computer instructions, wherein the one or more computer instructions are executed by a processor to implement a method according to the first aspect of the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program/instruction, wherein the computer program/instruction, when executed by a processor, implements the method according to the first aspect of the present disclosure.

Drawings

Features, advantages, and other aspects of various implementations of the disclosure will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example, and not by way of limitation, several implementations of the disclosure. In the drawings:

FIG. 1 schematically illustrates a block diagram of interactions between a requestor and a provider of a service according to an example implementation of the present disclosure;

FIG. 2 schematically illustrates a block diagram for a resource scheduling process, according to an example implementation of the present disclosure;

FIG. 3 schematically illustrates a block diagram of an application environment in accordance with an exemplary implementation of the present disclosure;

FIG. 4 schematically illustrates a block diagram of supply-demand collaboration in accordance with an exemplary implementation of the present disclosure;

FIG. 5 schematically illustrates a block diagram of a resource scheduling policy according to an example implementation of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a data structure of a supply and demand state according to an exemplary implementation of the present disclosure;

fig. 7 schematically illustrates a block diagram of a structure of a first network according to an exemplary implementation of the present disclosure;

fig. 8 schematically illustrates a block diagram of a structure of a second network according to an exemplary implementation of the present disclosure;

FIG. 9 schematically illustrates a flow chart of a method for resource scheduling according to an example implementation of the present disclosure;

FIG. 10 schematically illustrates a block diagram of an apparatus for resource scheduling in accordance with an example implementation of the present disclosure;

fig. 11 schematically illustrates a block diagram of a computing device/server for resource scheduling in accordance with an exemplary implementation of the present disclosure.

Detailed Description

Preferred implementations of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred implementations of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the implementations set forth herein. Rather, these implementations are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example implementation" and "one implementation" mean "at least one example implementation". The term "another implementation" means "at least one additional implementation". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

In the present description and the implementation, if the personal information processing is referred to, the processing is performed on the premise of having a validity base (for example, obtaining agreement of the personal information body, or being necessary for executing a contract, etc.), and the processing is performed only within a prescribed or contracted range. The user refuses to process the personal information except the necessary information of the basic function, and the basic function is not influenced by the user.

Example scenario

As discussed above, with the development of internet technology, the internet is capable of serving various types of services to users to facilitate various aspects of people's lives. In general, a participating subject of a service may include a demander and a provider. The service demander is a user who needs the service, for example, a user who needs to drive in the travel service, a user who needs to provide delivery articles in the express service, a user who wants to find a driving service in the driving service, and the like. The service provider is a person or organization that can provide a service, such as a shared vehicle provider in travel service, an express person or organization in express service, a driver in driver service, and so on.

For ease of description, more details regarding resource scheduling will be described below with the taxi service as a specific example. Typically, the service demander and provider need to complete the matching to begin a particular service process. Fig. 1 schematically illustrates an interaction process 100 between a requestor and a provider of a service according to an exemplary implementation of the present disclosure. As shown in fig. 1, the requestor 110 may initially be in a service willingness state 115, i.e., it has an initial willingness to acquire service. Such a demand party may also be referred to as a potential demand party. The potential demander may perform various query 125 processes, for example, by the application, to obtain preliminary information related to acquiring the service. Taking the taxi service as an example, such potential demand parties may include, for example, users who have entered a start and destination and clicked on a view offer, but have not yet placed a order.

After obtaining the preliminary information, some users may enter a state 120 of determining the service requirements, i.e., the users may formally submit service requests 135. Such a demand party may also be referred to as a real demand party, i.e. having submitted an order. In some scenarios, such a demander may have completed prepayment, for example.

Similarly, two states may exist for the provider 160 of the service, for example. First, the provider 160 may be in an idle state 165, for example, that it has not accepted any orders submitted by the requesters 110. In addition, provider 160 may also accept service request 170. Accordingly, the service may proceed to begin the service 145 process and after a period of time the service ends 150.

In some cases, the balance of supply and demand for services may handle unreasonable conditions. For example, there may be too many requesters 110 submitting service requests, but not enough providers 160 to provide the service in a timely manner. As another example, it is also possible that only fewer requesters 110 submit service requests, which may result in a large number of providers 160 being in an idle state. Such unbalanced supply and demand states will greatly affect the service efficiency, possibly resulting in waste of productivity, or resulting in that the user cannot obtain the service in time, and affect the user experience.

Some schemes attempt to improve the efficiency of services by scheduling resources to coordinate supply and demand balances. For example, in the process 130 from the querying process 125 to submitting the service request 135, the amount of service requests may be increased or decreased by adjusting the scheduling credentials to the demanding party 110. Alternatively and/or additionally, in submitting service request 135 to start service 145, the probability of provider 160 accepting an order may be submitted or reduced by adjusting the scheduling credentials to provider 160. Here, the dispatch voucher refers to an electronic voucher in the internet service platform for facilitating achievement of a service target. However, the state of supply and demand changes significantly with time during service, resulting in the need for a significant computational effort for a significant resource scheduling process.

It will be appreciated that when the driver side obtains the dispatch vouchers, the supply of the terminal may be affected after the passenger has been sent to the end of the order, whereas the above-described method only involves problems from an individual point of view, there being a strong assumption that the individuals are independent of each other. Further, when order context is considered, the requirements of high-dimensional states on storage space and computational resources will become enormous. At this time, it is desirable that resource scheduling can be performed in a simpler and efficient method, thereby facilitating achievement of a service objective of a service in a more rational manner.

Summary of resource scheduling

To at least partially solve the above-described problems, according to one example implementation of the present disclosure, a method of performing resource scheduling in a service is presented. In general terms, a resource scheduling policy to be executed may be determined based on supply and demand data at a certain point in time and using a machine learning network model. The determined resource scheduling policy may be applied to achieve the desired service objective. It should be understood that the network model herein may describe an association relationship between supply and demand data and a resource scheduling policy, and that a predetermined condition may be satisfied between the resource scheduling policy at this time and an influence generated after the application of the resource scheduling policy.

An overview according to one example implementation of the present disclosure is described with reference to fig. 2, which fig. 2 schematically illustrates a resource scheduling process 200 according to an example implementation of the present disclosure. As shown in fig. 2, there may be a first time point T1 and a second time point T2. First supply and demand data 210 associated with the service at a first point in time may be determined. It should be appreciated that the first supply-demand data 210 herein may describe various data associated with the provider and the demander, including current data as well as historical service data, and so forth.

The first network 250 may be utilized to determine a first resource scheduling policy 230 associated with the first supply-demand data 210 based on the first supply-demand data 210. Here, the first resource scheduling policy 230 may indicate scheduling credentials provided to the service. The scheduling credential may include at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service. Further, a first resource scheduling policy 230 may be applied. After the first resource scheduling policy 230 has been applied, the supply and demand data of the service is converted to the second supply and demand data 220. At this point, a first impact 240 may be obtained on the service objective of the service by applying the first resource scheduling policy 230. The association between the first resource scheduling policy 230 and the first influence 240 satisfies a predetermined condition.

Here, the first network 250 is obtained in a case where a predetermined constraint condition is satisfied. For example, the first network 250 may be trained using an actor-critter (actor-critic) architecture. That is, a first network 250 based on actor roles and a second network 252 and a third network 254 based on critique roles may be generated during the training process. The second network 252 and the third network 254 may be trained separately with the supervision data and the second network 252 and the third network 254 are utilized to assist in obtaining the first network 250 meeting the predetermined constraints. Further, the corresponding resource scheduling policy may be acquired again at the second point in time. In this way, resource scheduling policies can be obtained in real time based on the latest supply and demand data continuously, thereby improving service objectives with the investment of predetermined scheduling credentials.

With example implementations of the present disclosure, a resource scheduling policy may be determined in consideration of a current supply and demand state in the course of resource scheduling. For example, the respective scheduling credentials may be provided to the provider and/or the demander of the service, respectively, and enable the provided scheduling credentials to correspondingly promote the service objective. In this way, resource scheduling can be performed in a more efficient manner, so that the supply and demand states of the service can be more reasonable and the overall service objective of the service can be improved.

Detailed procedure for resource scheduling

Having described an overview of one example implementation according to the present disclosure, hereinafter, detailed procedures regarding resource scheduling will be provided. Fig. 3 schematically illustrates an architecture 300 of an application environment according to an exemplary implementation of the present disclosure. As shown in fig. 3, the application environment may include a computing device 310. As shown in fig. 3, computing device 310 may determine resource scheduling policy 320 based on the status of the service's demander 110 and the service's provider 160.

According to one example implementation of the present disclosure, the service may include, for example, a taxi service, in which case the demander 110 may include a passenger of the vehicle, and the provider 160 may include a driver of the vehicle. For example, a passenger may utilize an internet vehicle application to enter a starting location and destination and to place a call to the vehicle. The driver can pick up a ticket and provide taxi taking services. As another example, the service may include, for example, an item delivery service. For example, such a demander 110 may include a shipper to deliver goods to a particular location, taking the same city freight as an example. Accordingly, the provider of services 160 may be a delivery party capable of providing delivery services. For example, provider 160 may be a freight driver or freight organization. For another example, the services may include, for example, a vehicle ride-on service, the demander may include a user who needs the ride-on service, and the provider may include a ride-on provider who provides the ride-on service.

For ease of description, more details of resource scheduling will be described in the context of the present disclosure with the taxi taking service as an example. For a taxi service, the desiring party 110 and the provider 160 of the service are main factors for describing the supply and demand states of the service. Thus, the supply-demand state of a service can be expressed as a switch between different states. Fig. 4 schematically illustrates a process 400 of supply-demand collaboration according to an exemplary implementation of the present disclosure. As shown in fig. 4, for a requestor 110, when a new order 410 is submitted, the order may enter two states. The first state is a no match 420 state, i.e., no provider accepts the new order 410. The second state is the match 470 state, i.e., the provider that was not temporarily matched with the order accepted the new order 410.

Similarly, for provider 160, there may be two states when a new provider 450 is present. The first state is the unmatched 460 state, i.e., the new provider 450 did not accept the existing order. The second state is the match 440 state, i.e., the new provider 450 accepted the existing order. After the match state, then a complete order state may be entered, for example as shown by provisions 430 and 480 in FIG. 4.

According to one example implementation of the present disclosure, a resource scheduling policy may indicate scheduling credentials provided to a service to cause a match to occur between more passengers and drivers. Specifically, the scheduling credential may include at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service. More details regarding resource scheduling policies are described with reference to fig. 5, which schematically illustrates an architecture 500 of a resource scheduling policy according to an exemplary implementation of the present disclosure.

As shown in fig. 5, the resource scheduling policy 320 may include a requestor scheduling credential 510 for the requestor 110. In some implementations, the requestor schedule credential 510 may be, for example, a schedule credential that the requestor 110 is able to obtain when submitting a service request, which may include any suitable type of schedule credential, such as a substance schedule credential, a mental schedule credential, or other aspect of a schedule credential, in order to cause the requestor 110 to submit an order. Resource scheduling policy 320 may also include provider scheduling credentials 520 for provider 160. In some implementations, provider dispatch voucher 520 may be a dispatch voucher that provider 160 can obtain upon accepting a service request, which may include any suitable type of dispatch voucher, such as a substance dispatch voucher, a mental dispatch voucher, or other aspect dispatch voucher, to facilitate acceptance of an order by provider 160. With example implementations of the present disclosure, the aggressiveness with which a demand party 110 and/or provider 160 performs tasks may be increased via different approaches, thereby increasing the service objective.

According to one example implementation of the present disclosure, a resource scheduling policy may adjust services within a predetermined range over a predetermined period of time. For example, the predetermined period of time may be set to 1 day, 1 hour, and/or other period of time, and the predetermined range may be set to, for example, an entire city, and/or a neighborhood in a city, and so forth. The service objective, i.e. the objective optimized by the resource scheduling policy, can be set according to its own needs.

According to one example implementation of the present disclosure, a service objective may be associated with a number of tasks of a service. In the specific example of a taxi service, the number of tasks may be expressed in terms of the number of orders, where the resource scheduling policy is applied with the aim of increasing the number of orders, i.e. the service objective may be to increase the number of taxi orders. As another example, a service goal may be associated with a task value of a service. At this point, the goal of applying the resource scheduling policy is to increase the value sum of the orders. In the case of a taxi service, the service objective may be to increase the total value of the individual taxi orders.

According to one example implementation of the present disclosure, supply and demand data may be acquired, and a corresponding resource scheduling policy may be determined based on the supply and demand data. Describing first the format of the supply-demand data, according to one example implementation of the present disclosure, the supply-demand data may be represented in the following parameters: (State, action, transition, reward, cost, gamma).

According to one example implementation of the present disclosure, in particular, state may represent a supply-demand State, i.e., a supply-demand characteristic within a certain specified area. The states may be described as (Nc, nd, gmv _total, cost_total, order_price, order_fc, ordr_fd, face_socket, price_socket, fd_socket). At this time, (Nc, nd) represents the number of orders not accepted and the number of idle drivers in the driving service, respectively. (Gmv _total, cost_total) represents the total size of the formed sheets and the total amount of Cost paid in the taxi service. (order_price, order_fc, order_fd) correspond to the price of the current Order in the taxi service, the characteristics related to the Order placement, and the characteristics related to the Order taking, respectively. The (facebucket, price Bucket, fd Bucket) may represent historical information in the taxi service, e.g., resource allocation statistics, price statistics, and feature statistics, which in turn represent non-accepted orders, and so forth.

According to one example implementation of the present disclosure, an Action may be defined as a two-dimensional vector (Cc, cd), which may represent a passenger-side resource schedule, and may include various scheduling credentials, such as data for marketing campaigns, advertisements, etc., that are granted prior to the passenger placing an order. When there is a surplus of free drivers in the transportation system relative to the passenger demand, the number of such dispatch vouchers may be increased to stimulate consumption, thereby promoting a supply and demand balance. The action Cd may represent a resource schedule on the driver side and may include various scheduling vouchers provided before the driver accepts the order. When the order subscription amount in the taxi service exceeds the number of free drivers in a given area, higher driver-side dispatch vouchers may be provided, including driver online rewards, driver dispatch rewards, etc., to encourage the driver to accept as many orders as possible.

According to one example implementation of the present disclosure, transitions may represent transitions between various states in a service as described above with reference to fig. 4. Passengers and drivers can arrive/leave the taxi service system at a poisson distribution rate, and the passengers can decide whether to place an order or not according to Cc. If an order is placed, the order enters a matching queue. The passenger's order may be taken by the currently free driver or by the driver who has just entered the taxi service. The driver can decide whether to order or not according to Cd.

According to one example implementation of the present disclosure, reward (rewards) may represent an optimization objective in determining a resource scheduling policy. In particular, rewards may be defined as a single scale of current taxi services given scheduling credentials before transitioning to the next state. There may be two cases: when an order is accepted by the driver, reward is set to a positive value, e.g., related to the order quantity or value; when the order is not accepted by the driver, reward is set to zero.

According to one example implementation of the present disclosure, cost may represent a constraint of a resource scheduling policy. For example, the cost may represent the total amount of schedule credentials required within a time window to transition from a current state to a next state. During this time window, if the driver accepts the order, the cost may be set to the sum of the passenger-side and driver-side dispatch vouchers associated with the order; otherwise, the cost is set to zero. In this way, the cost may ensure that the resource schedule of the platform does not exceed the allocated budget.

According to one example implementation of the present disclosure, gamma may represent a discount rate for controlling the degree of interest of a resource scheduling policy for the future. The selection of the appropriate discount rate is critical: the larger the discount rate is, the larger the variance of the function of the service target in a long period is; the smaller the discount rate, the more focused attention is on recent targets.

According to one example implementation of the present disclosure, various parameters in supply and demand data may be represented based on different formats. For example, different parameters may be represented in a low-dimensional format and a high-dimensional format, respectively, based on the memory space required for the different parameters. Here, the high-dimensional format may be further subdivided into a sparse high-dimensional format (i.e., a sparse matrix that includes only a small amount of non-zero data) and a dense high-dimensional format (i.e., a dense matrix that includes a large amount of non-zero data).

A specific data structure of the supply-demand state is described with reference to fig. 6, which fig. 6 schematically illustrates a data structure 600 of a supply-demand state 610 according to an exemplary implementation of the present disclosure. As shown in fig. 6, the supply-demand state 610 may include: at least one low-dimensional portion 620 represented in a low-dimensional format, at least one sparse portion 622 represented in a sparse high-dimensional format, and at least one dense portion 624 represented in a dense high-dimensional format. With the example implementations of the present disclosure, the respective storage modes may be set according to the requirements of different parameters for storage space and potential information loss in subsequent processing.

In fig. 6, the low-dimensional portion 620 may include at least any one of the following: the value of the task associated with the service, the data of completed tasks associated with the service, the data of provided scheduling credentials associated with the service, the number of demand parties for incomplete tasks associated with the service, and the number of service parties for incomplete tasks associated with the service, etc. Specifically, the low-dimensional portion may include order_price, gmv_total, cost_total, nc, nd, and the like in the State data described above. Such parameters are closely related to the immediate return of the order, cost constraints, and supply and demand balances. Because of the large dimensions of the weakly correlated features, the information of the features may be masked when the feature handling layer is co-input with other weakly correlated features, and thus these features may be directly coupled to the end of the network channel.

According to one example implementation of the present disclosure, the sparse portion 622 may represent at least any one of the following: distribution of scheduling credentials, distribution of value, and distribution of features for the desirside of incomplete tasks associated with a service, and so forth. Specifically, the sparse portion 622 may include, for example, the FavorBucket in State data described above, i.e., the allocation of customer-side resources for historically unprocessed orders. Because of the time-centrality of the supply and demand states, only a few specific dimensions are non-zero at any given time. Thus, the embedding operation may be selected to compress the information, thereby reducing the instability that high-dimensional sparse features may introduce to the reinforcement learning process.

According to one example implementation of the present disclosure, dense portion 624 may represent at least any one of: scheduling credential allocation data, historical value data, and historical feature distribution for the requesters and providers of the current task associated with the service, and so forth. In particular, the dense portion may include contextual characteristics of the current order and the order, a historical price distribution, a distribution of historical order acceptance characteristics, and so forth. These features may be combined and input into a multi-layer perceptual (MLP) layer for feature extraction to obtain more abstract hidden layer information.

By using the example implementation manner of the present disclosure, each parameter may be stored according to different formats, so that the requirement of the storage space is considered, and meanwhile, the potential information loss in the steps of subsequent feature extraction and the like is considered, so that the multifaceted information of the supply and demand data is saved to the greatest extent.

According to one example implementation of the present disclosure, an intelligent resource scheduling framework may be implemented based on the supply and demand data described above, and a resource scheduling policy may be adaptively generated according to the supply and demand imbalance of the current state. The resource scheduling framework may achieve two goals: (1) Adjusting supply and demand by scheduling vouchers in consideration of order context, thereby maximizing passenger single-template; and (2) meeting a predetermined schedule credential aggregate constraint.

The two objects described above can be achieved in two core steps, respectively. In the first step, a deep reinforcement learning method can be adopted to solve the data explosion problem in the prior art scheme, and deep reinforcement learning can be utilized to solve the optimal resource scheduling strategy in high-dimensional data, and meanwhile long-term benefits are considered. In a second step, deep reinforcement learning may be combined with constraint reinforcement learning to maximize passenger compliance with a single template while controlling the total amount of schedule credentials. At this time, the method of combining the Lagrangian multiplier method with deep reinforcement learning can be used to optimize passenger order and total resource control.

According to one example implementation of the present disclosure, a deep reinforcement learning based PPO algorithm may be employed to achieve solving the problems described above. At this time, the optimization target of the algorithm is as shown in formula 1.

In the above-mentioned formula(s),parameters representing a machine learning network for solving a resource scheduling policy, +.>Represent the firstkThe parameters of the run,sthe supply and demand state is indicated,athe action is represented by an action which,maxrepresenting the maximization function, +.>Representing relative to supply and demand statesLower application resource scheduling policy->Is effective for the service objective. Where the service objective relates to the number of orders, the impact may represent the number of orders that can be brought about using the resource scheduling policy.

Equation 2 represents a constraint that, in order to guarantee the stability of the algorithm, two resource scheduling policies determined at two successive time points (i.e.,and->) The Kullback-Leibler (KL) divergence between them is smaller than a predetermined value +.>。

According to one example implementation of the present disclosure, the above formula may be solved based on an actor-critter architecture. Specifically, the actor roles may learn corresponding resource scheduling policies, and the commentator roles may evaluate the performance of the actor roles and instruct the resource scheduling policies of the next stage of the actor roles. In particular, the first network may be implemented based on actor roles (e.g., parameters are) The goal of the first network here is to learn a resource scheduling policy.

The resource scheduling policy may be represented in a number of ways. In a simple case, the resource scheduling policy may be expressed in a binary manner, i.e. whether the provider schedule credential is provided. In another case, the resource scheduling policy may be expressed in a multi-element manner, i.e., may represent a distribution of provisioning scheduling credentials at a predetermined probability (e.g., 10%, 20%, …, and 100%). At this time, the first network may output a corresponding resource scheduling policy based on the supply and demand data.

According to one example implementation of the present disclosure, the supply-demand data 610 represented in the above-described format may be input into a first network (i.e., actor roles), thereby acquiring corresponding resource scheduling policies. Here, the first network may be pre-trained. In determining the first network, training data may be acquired to perform a training process. In particular, a first reference scheduling policy may be determined using the first network based on first reference supply and demand data for the service. Further, the second network may be implemented based on the reviewer role (i.e., the reviewer role) and evaluate whether the resource scheduling policy output by the first network is capable of producing an expected impact. That is, the second network may be utilized to determine a first reference impact of applying the first reference scheduling policy on the service objective based on the first reference supply and demand data. Further, the first network may be updated based on the first reference supply-demand data and the first reference impact.

According to one example implementation of the present disclosure, historical supply and demand data during a historical service may be collected and used as training data to determine a first network. In particular, the dominance function may be estimated based on the second network The first network is employed to record a current resource scheduling policy. The data obtained from the distributed sampling may be iteratively used for policy updates for the next round.

According to one example implementation of the present disclosure, in order to translate a constrained optimization problem into an unconstrained problem. Specifically, equation 2 may be converted to equation 3 to facilitate solution using a machine learning network.

In the above formula, the truncation functionclipRepresenting a predetermined truncation function which may be selected only in a rangeThe numerical part in this case ∈>Representing a predefined smaller value.argmaxRepresents an argmax function, anminRepresenting a minimization function.

According to one example implementation of the present disclosure, the truncation function may be predetermined in equation 3 aboveclipThe first network is updated such that a difference between the first resource scheduling policy and a previous resource scheduling policy preceding the first resource scheduling policy satisfies a predetermined difference condition. With the example implementation of the present disclosure, a constrained problem that is otherwise difficult to solve may be converted into an unconstrained problem based on a mathematical transformation, thereby facilitating the solution of a constraint that can be met as described above. In this way, it can be ensured that the step size of the iterative update is not too large, thereby reducing the potential deviation of the update gradient and direction.

According to an example implementation of the present disclosure, the first network may take the supply and demand states as input, splice the strongly related features (value of order, GMV, cost, number of drivers, number of orders), input to the fusion layer at the end, splice with the sparse part after feature extraction, and other dense feature abstract representations, and output the fusion vector. Finally, the MLP layer can be utilized to output a corresponding resource scheduling policy for the fusion vector.

According to one example implementation of the present disclosure, the first network may be implemented based on a variety of network models that are currently known and/or that will be developed in the future. Fig. 7 schematically illustrates a structure 700 of a first network according to an exemplary implementation of the present disclosure. As shown in fig. 7, at least one low-dimensional portion 620, at least one sparse portion 622, and at least one dense portion 624 may be combined to generate a characterization representation of the first supply-demand data.

In particular, for the at least one low-dimensional portion 620, a join operation may be performed using the join unit 710, thereby determining a low-dimensional representation of the at least one low-dimensional portion. For at least one sparse portion 622, a feature extraction operation may be performed with the embedding unit 712 to determine a sparse representation of the sparse portion. For at least one dense portion 624, a join and planarize operation may be performed with the join & planarize unit 714 to determine a dense representation of the dense portion. The low-dimensional representation, sparse representation, and dense representation may be concatenated using a concatenation unit 720 to generate a feature representation.

According to one example implementation of the present disclosure, a first resource scheduling policy may be determined using individual MLP elements in a first network based on a feature representation. As shown in fig. 7, the first network may further include MLP units 730, 732, …, and 734, which correspond to different probabilities, respectively. These units may receive the feature representations and output resource scheduling policies 740, 742, …, and 744, respectively, corresponding to different probabilities.

According to one example implementation of the present disclosure, the second network may have a similar structure as the first network, and the two networks may have different parameters. Fig. 8 schematically illustrates a structure 800 of a second network according to an exemplary implementation of the present disclosure. As shown in fig. 8, the MLP unit 810 may output an impact 820 of the resource scheduling policy on the service objective. In the event that an increase in the number of orders to drive is desired, the MLP unit 810 may output a predicted value for the number of orders.

According to one example implementation of the present disclosure, the cost function may be represented using a second network of commentator roles. At this time, the second network may have the same structure as the first network but have different parameters before the fusion feature is obtained. Finally, the MLP layer can be utilized to extract the characteristics of the fusion vector to obtain the cost function in the current supply and demand state V_r (state). The function may represent the application of the obtained GM during the process from the current state to the end of a predetermined time window (e.g., 1 day)Predicted value of V. Specifically, the recurrence formula of the dominance function can be expressed as:。

in the above-mentioned formula(s),is expressed in the supply and demand state->And resource scheduling policy->The following advantage function is given by the following,rewardrepresenting a reward function->Representing the cost function in the next state, +.>Representing a cost function in the current state.

According to one example implementation of the present disclosure, the second network may be updated with a supervised learning approach. A predicted value of a second reference impact of applying a second reference scheduling policy on the service target may be determined using the second network based on the second reference supply and demand data of the service. Further, the second network may be updated based on a difference between the predicted value and a second impact of applying a second reference scheduling policy on the service objective.

In particular, in the supervised learning process, the learning system canAs marking datalabelBy minimizing the mean square error +.>To make the predicted value of the second network more accurate. According to one example implementation of the present disclosure, in order to obtain accurate tag datalabelThe sequence marker data can be de-sampled by means of Monte Carlo method >Thereby obtaining +.>. At this time, a->Respectively representing the status of the point in time t, the resource allocation policy and the rewards. With the example implementations of the present disclosure, the available historical data may be leveraged to train the second network, thereby enabling the second network to accurately learn knowledge about the service objectives.

According to one example implementation of the present disclosure, the number of scheduling credentials put into the resource configuration policy may be further considered in the solution process. According to one example implementation of the present disclosure, wherein updating the first network further comprises: determining a reference scheduling credential of the first reference scheduling policy using the third network based on the first reference supply and demand data; and updating the first network based on the reference scheduling credential. In this way, the association between the scheduling credential and the impact on the service objective due to the scheduling credential may be considered in updating the first network, and the resource scheduling policy may be determined towards a direction that may help to improve the service objective.

For example, in a trip optimization problem with limited scheduling credentials, there may be different business scenarios: joint optimization and independent optimization. The joint optimization may take the total amount of scheduling credentials as a constraint. On the other hand, independent optimization includes constraining the passenger-side demand side dispatch vouchers and the driver-side provider dispatch vouchers separately. Unlike existing constrained reinforcement learning methods, in the context of the present disclosure, constraints can be imposed on the ratio of cost to order quantity, without fixing the total amount of scheduling vouchers. To solve these constrained optimization problems, two reinforcement learning algorithms that take constraints into account can be employed: an AP3O algorithm for adaptive penalty near-end policy optimization for joint optimization.

Specifically, to solve the joint optimization problem, a more robust AP3O algorithm may be evolved from the PPO algorithm described above. The algorithm constructs a cost function for computing the sum of the total schedule credentials. At this time, the constrained overall optimization objective is as follows.

In the above-mentioned formula(s),representing a cost-dependent control function, +.>Representing a control function associated with the reward. Wherein, equation 4 is equivalent to equation 1 in the PPO method, responsible for maximizing the order quantity. Equation 5 represents a condition that is desirably satisfied, and the numerator portion (i.e., equation 6.1) on the left side of equation 5 represents the thkWhen iterating, expected scheduling credentials within a predetermined time window; while the denominator portion to the left of this equation 5 (i.e., equation 6.2) represents the thkAt each iteration, the number of orders desired within a predetermined time window.

According to one example implementation of the present disclosure, to constrain costs, a cost function may be introducedV_c (state)The function may represent the total amount of schedule credentials provided from the beginning of the current state to the end of a predetermined time window (e.g., 1 day). The cost function may be represented by another reviewer role (e.g., a third network):. In this way, the best solution and thus the best resource scheduling policy can be obtained with the cost constraint satisfied.

According to one example implementation of the present disclosure, the third network may be similar in structure to the second network and have different parameters. Similar to the process of determining the second network, the third network may be updated with a supervised learning approach. Specifically, a predicted value of a reference scheduling credential of a second reference scheduling policy is determined using a third network based on second reference supply-demand data of the service. Further, the third network may be updated based on a difference between the predicted value of the scheduling credential and a reference scheduling credential of the second reference scheduling policy.

In the supervised learning process, the method canAs marking datalabelBy minimizing the mean square error +.>To make the predicted value of the third network more accurate. According to one example implementation of the present disclosure, in order to obtain accurate tag datalabelThe sequence marker data can be de-sampled by means of Monte Carlo method>Thereby obtaining +.>. At this time, the liquid crystal display device,the state, resource allocation policy and cost of the point in time t are represented respectively.

According to one example implementation of the present disclosure, the first supply-demand data further includes control parameters (e.g., the Gamma parameters described above) for controlling long-term impact of the first resource scheduling policy on the service objective. At this time, in determining the first resource scheduling policy, the first resource scheduling policy for maximizing the long-term impact may be determined further based on the control parameter.

In the above equations 5, 6.1 and 6.2,control parameters, i.e. the discount rate, representing the long-term impact of the resource scheduling policy on the service objective are used to control the degree of interest of the resource scheduling policy in the future. The selection of the appropriate discount rate is critical:the larger the discount rate is, the larger the variance of the function of the service target in a long period is; the smaller the discount rate, the more attention is paid to the short-term impact of the scheduling credential. In this way, it is possible to regulate +.>To specify the degree of concern for long term effects.

According to one example implementation of the present disclosure, the above formula may be converted to formula 7 when importance sampling is employed for correcting differences in policy distribution.

Unlike previous approaches that use taylor's formulas to approximate the problem as a convex optimization problem within the trust zone, the problem may result in increased time complexity due to the quadratic optimization problem and additional space complexity due to storing the jersey matrix. In the context of the present disclosure, the original problem is converted to a first order optimization problem using a Lagrangian multiplier, as shown in equation 8. Furthermore, a set linear unit (ReLU) operation may be employed that is adapted to constraint terms, the constraint factors only functioning when the expected resources of the current strategy exceed a maximum budget. Otherwise, the constrained optimization problem is equivalent to the unconstrained optimization problem in equation 7, and the solution method is degenerated to the PPO algorithm described above.

At the present firstkIn the next iteration, the exact state distribution in the future is not known. At this time, in the subsequent iteration, if the strategyIs +.>The deviation of (2) is very small and a valid approximation can be used from the problem as shown in equation 9.

It should be appreciated that approximation errors between the two strategies before and after iteration are unavoidable. According to one example implementation of the present disclosure, a monotonic increase in the strategy during iterative updating can be ensured, thereby ensuring the stability of the learning process, as long as this error is appropriately minimized such that the strategy of the next iteration is updated only in the neighborhood of the current strategy with a sufficiently limited radius. In particular, can be adoptedclipA function to ensure that the step size of the front-to-back policy update is small enough, as shown in equations 10-12 below.

In the above-mentioned formula(s),representing a distance between two iterations determined based on the P3O algorithm; />Representing the involvement in the iterative processclipA function, the specific value of which can be calculated based on formula 11, and +.>Representing what is involved in the reasoning processclipA function, the specific value of which can be calculated based on equation 12.

It should be appreciated that the various formulas above provide specific examples for solving resource configuration policies by way of example only, and that the formulas may be adjusted based on the needs of a particular environment. The first network may be updated based on the optimization objective of the above formula so that the first network may output resource adjustment policies that satisfy various desired constraints.

According to one example implementation of the present disclosure, in updating the first network based on the reference scheduling credential, it may further be set to: the sum of the dispatch vouchers at the reference requester and the dispatch vouchers at the reference provider satisfies a predetermined constraint. That is, the scheduling credentials in both the desirside and the provider due to the resource scheduling policy are considered at this time. The first network may be updated under the constraint. At this time, the updated resource scheduling policy output by the first network may further consider the association relationship between the sum of the scheduling credentials in two aspects and the service target.

According to one example implementation of the present disclosure, in updating the first network based on the reference scheduling credential, a reference demander scheduling credential and a reference provider scheduling credential associated with the first reference scheduling policy may be determined; and updating the first network if the difference between the reference demand side schedule credential and the reference provider schedule credential satisfies a predetermined constraint. For example, it may be specified that scheduling credentials be distributed between a requestor and a provider in a more balanced manner; assigning scheduling credentials between the demander and the provider in a manner that prioritizes the passenger-side experience may be specified; it may be specified to distribute scheduling credentials between the desiring party and the providing party in a manner that prioritizes the driver-side experience. In this way, more flexible constraint relationships can be set, thereby refining the determination of various requirements in the resource scheduling policy.

According to one example implementation of the present disclosure, after the training process has been completed, the trained first network may be utilized to determine resource scheduling policies at various points in time. For example, first supply and demand data associated with a service at a first point in time may be determined. First supply and demand data may be input to the first network and a corresponding first resource scheduling policy is obtained. The first resource scheduling policy may be applied. At this time, the impact on the service objective due to the application of the first resource scheduling policy will be proportional to the scheduling credentials put in the first resource scheduling policy. For example, the higher the number of placed scheduling vouchers, the higher the number of orders that can be obtained.

It should be appreciated that although the resource scheduling process is described above with an order quantity as an example. Alternatively and/or additionally, the service objective may relate to other factors, for example, when the service objective relates to the total value of an order, then rewards used in the training process may be expressed in terms of the total value of the order. The higher the placed scheduling voucher, the higher the total value of the order obtained.

According to one example implementation of the present disclosure, at a second point in time after the first resource scheduling policy has been applied, the relevant supply and demand data for the second point in time may be collected again and a corresponding second resource scheduling policy is obtained using the first network. Further, a second resource scheduling policy may be applied in order to achieve the corresponding service objective. According to one example implementation of the present disclosure, the resource optimization process may be performed at predetermined time intervals (e.g., daily, hourly). In this way, the corresponding resource scheduling policy can be determined continuously based on the current up-to-date supply and demand data.

Example procedure

Fig. 9 illustrates a flow chart of a method 900 of performing resource scheduling in a service according to some implementations of the disclosure. At block 910, first supply and demand data associated with a service at a first point in time is acquired. At block 920, based on the first supply-demand data, determining, with the first network, a first resource scheduling policy associated with the first supply-demand data, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service. At block 930, a first resource scheduling policy is applied to obtain a first impact of the first resource scheduling policy on a service objective of the service, an association between the first resource scheduling policy and the first impact satisfying a predetermined condition.

According to one example implementation of the present disclosure, the method 900 further includes determining the first network based on: determining a first reference scheduling policy with the first network based on the first reference supply and demand data of the service; determining, with the second network, a first reference impact of applying the first reference scheduling policy on the service objective based on the first reference supply and demand data; and updating the first network based on the first reference scheduling policy and the first reference impact.

According to one example implementation of the present disclosure, updating the first network includes: the first network is updated such that a difference between the first resource scheduling policy and a previous resource scheduling policy preceding the first resource scheduling policy satisfies a predetermined difference condition, under the constraint of a predetermined cutoff function.

According to one example implementation of the present disclosure, the method 900 further includes determining the second network based on: determining, based on the second reference supply-demand data of the service, a predicted value of a second reference impact of applying a second reference scheduling policy to the service target using the second network; and updating the second network based on a difference between the predicted value and a second impact of applying a second reference scheduling policy on the service objective.

According to one example implementation of the present disclosure, updating the first network further comprises: determining a reference scheduling credential of the first reference scheduling policy using the third network based on the first reference supply and demand data; and updating the first network based on the reference scheduling credential.

According to one example implementation of the present disclosure, updating the first network based on the reference scheduling credential includes: determining a reference requestor schedule credential and a reference provider schedule credential associated with a first reference schedule policy; and updating the first network if the difference between the reference demand side schedule credential and the reference provider schedule credential satisfies a predetermined constraint.

According to one example implementation of the present disclosure, updating the first network based on the reference scheduling credential includes: the first network is updated if the sum of the reference requester schedule credential and the reference provider schedule credential satisfies a predetermined constraint.

According to one example implementation of the present disclosure, the method 900 further includes determining a third network based on: determining a predicted value of a reference scheduling credential of a second reference scheduling policy using a third network based on the second reference supply-demand data of the service; and updating the third network based on a difference between the predicted value of the reference scheduling credential and the reference scheduling credential of the second reference scheduling policy.

According to one example implementation of the present disclosure, the first supply-demand data further includes control parameters for controlling long-term impact of the first resource scheduling policy on the service objective, and determining the first resource scheduling policy further includes: a first resource scheduling policy for maximizing long term impact is determined based on the control parameters.

According to one example implementation of the present disclosure, the first supply-demand data includes: at least one low-dimensional portion represented in a low-dimensional format, at least one sparse portion represented in a sparse high-dimensional format, and at least one dense portion represented in a dense high-dimensional format.

According to one example implementation of the present disclosure, determining a first resource scheduling policy includes: combining the at least one low-dimensional portion, the at least one sparse portion, and the at least one dense portion to generate a feature representation of the first supply-demand data; and determining a first resource scheduling policy with the first network based on the characteristic representation.

According to one example implementation of the present disclosure, generating a characteristic representation of the first supply-demand data includes: performing a join operation for the at least one low-dimensional portion to determine a low-dimensional representation of the at least one low-dimensional portion; performing a flattening operation on the at least one sparse portion to determine a sparse representation of the sparse portion; performing an extraction operation for at least one dense portion to determine a dense representation of the dense portion; and concatenating the low-dimensional representation, the sparse representation, and the dense representation to generate the feature representation.

According to one example implementation of the present disclosure, the low-dimensional portion represents at least any one of: the value of the task associated with the service, the data of the completed task associated with the service, the data of the provided scheduling credentials associated with the service, the number of the demanding parties of the incomplete task associated with the service, and the number of the servicing parties of the incomplete task associated with the service.

According to one example implementation of the present disclosure, the sparse portion represents at least any one of: distribution of scheduling credentials, distribution of value, and distribution of features for the desirors of incomplete tasks associated with a service.

According to one example implementation of the present disclosure, the dense portion represents at least any one of: scheduling credential allocation data, historical value data, and historical feature distribution for the requesters and providers of the current task associated with the service.

According to one example implementation of the present disclosure, the method further comprises: acquiring second supply and demand data associated with the service at a second point in time, the second point in time being after the first resource scheduling policy is applied; determining, based on the second supply-demand data, a second resource scheduling policy associated with the second supply-demand data using the first network; and applying a second resource scheduling policy.

According to one example implementation of the present disclosure, a first resource scheduling policy is used to adjust services within a predetermined range over a predetermined period of time, a service objective being associated with at least any one of: the number of tasks of the service, the task value of the service.

According to one example implementation of the present disclosure, the service includes at least any one of: a driving service, wherein the demand party comprises passengers of the vehicle, and the provider comprises drivers of the vehicle; an item delivery service, the demander comprising a sender of the item to be delivered, the provider comprising a delivery party responsible for delivering the item; the vehicle driving service comprises a user needing driving service, and the provider comprises a driving party providing driving service.

Example apparatus and apparatus

Specific details of a method of performing resource scheduling in a service have been described above. According to one exemplary implementation of the present disclosure, an apparatus for performing resource scheduling in a service is provided. Fig. 10 schematically illustrates a block diagram of an apparatus 1000 for resource scheduling according to an exemplary implementation of the present disclosure. The apparatus 1000 comprises: an acquisition module 1010 configured to acquire first supply-demand data associated with a service at a first point in time; a determining module 1020 configured to determine, based on the first supply-demand data, a first resource scheduling policy associated with the first supply-demand data with the first network, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service; and an application module 1030 configured to apply the first resource scheduling policy so as to obtain a first impact of the first resource scheduling policy on a service objective of the service, an association relationship between the first resource scheduling policy and the first impact satisfying a predetermined condition.

According to one example implementation of the present disclosure, the first network is determined based on: determining a first reference scheduling policy with the first network based on the first reference supply and demand data of the service; determining, with the second network, a first reference impact of applying the first reference scheduling policy on the service objective based on the first reference supply and demand data; and updating the first network based on the first reference scheduling policy and the first reference impact.

According to one example implementation of the present disclosure, the first network is updated based on: the first network is updated such that a difference between the first resource scheduling policy and a previous resource scheduling policy preceding the first resource scheduling policy satisfies a predetermined difference condition, under the constraint of a predetermined cutoff function.

According to one example implementation of the present disclosure, the second network is determined based on: determining, based on the second reference supply-demand data of the service, a predicted value of a second reference impact of applying a second reference scheduling policy to the service target using the second network; and updating the second network based on a difference between the predicted value and a second impact of applying a second reference scheduling policy on the service objective.

According to one example implementation of the present disclosure, the first network is further updated based on: determining a reference scheduling credential of the first reference scheduling policy using the third network based on the first reference supply and demand data; and updating the first network based on the reference scheduling credential.

According to one example implementation of the present disclosure, the first network is further updated based on: determining a reference requestor schedule credential and a reference provider schedule credential associated with a first reference schedule policy; and updating the first network if the difference between the reference demand side schedule credential and the reference provider schedule credential satisfies a predetermined constraint.

According to one example implementation of the present disclosure, the first network is further updated based on: the first network is updated if the sum of the reference requester schedule credential and the reference provider schedule credential satisfies a predetermined constraint.

According to one example implementation of the present disclosure, the third network is determined based on: determining a predicted value of a reference scheduling credential of a second reference scheduling policy using a third network based on the second reference supply-demand data of the service; and updating the third network based on a difference between the predicted value of the reference scheduling credential and the reference scheduling credential of the second reference scheduling policy.

According to one example implementation of the present disclosure, the determining module further comprises: a generation module configured to combine the at least one low-dimensional portion, the at least one sparse portion, and the at least one dense portion to generate a feature representation of the first supply-demand data; and a utilization module configured to utilize the first network to determine a first resource scheduling policy based on the feature representation.

According to one example implementation of the present disclosure, the generating module includes: a first operation module configured to perform a join operation for the at least one low-dimensional portion to determine a low-dimensional representation of the at least one low-dimensional portion; a second operation module configured to perform a flattening operation for at least one sparse portion to determine a sparse representation of the sparse portion; a third operation module configured to perform an extraction operation for at least one dense portion to determine a dense representation of the dense portion; and a fourth operation module configured to join the low-dimensional representation, the sparse representation, and the dense representation to generate a feature representation.

According to one example implementation of the present disclosure, the acquisition module is further configured to: acquiring second supply and demand data associated with the service at a second point in time, the second point in time being after the first resource scheduling policy is applied; the determination module is further configured to: determining, based on the second supply-demand data, a second resource scheduling policy associated with the second supply-demand data using the first network; and the application module is further configured to apply a second resource scheduling policy.

Fig. 11 schematically illustrates a block diagram of a computing device/server for resource scheduling in accordance with an exemplary implementation of the present disclosure. It should be appreciated that the computing device/server 1100 illustrated in fig. 11 is merely exemplary and should not be construed as limiting the functionality and scope of the implementations described herein.

As shown in fig. 11, computing device/server 1100 is in the form of a general purpose computing device. Components of computing device/server 1100 may include, but are not limited to, one or more processors or processing units 1110, memory 1120, storage 1130, one or more communication units 1140, one or more input devices 1150, and one or more output devices 1160. The processing unit 1110 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 1120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of computing device/server 1100.

Computing device/server 1100 typically includes a number of computer storage media. Such media can be any available media that is accessible by computing device/server 1100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. The memory 1120 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. The storage device 1130 may be a removable or non-removable medium and may include a machine readable medium such as a flash drive, diskette, or any other medium which may be capable of storing information and/or data (e.g., training data for training) and may be accessed within the computing device/server 1100.

The computing device/server 1100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 11, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 1120 may include a computer program product 1125 having one or more program modules configured to perform the various methods or acts of the various implementations of the present disclosure.

The communication unit 1140 enables communication with other computing devices via a communication medium. Additionally, the functionality of the components of computing device/server 1100 may be implemented in a single computing cluster or in multiple computing machines capable of communicating over a communication connection. Accordingly, computing device/server 1100 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another network node.

The input device 1150 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 1160 may be one or more output devices such as a display, speakers, printer, etc. The computing device/server 1100 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as needed, through the communication unit 1140, with one or more devices that enable a user to interact with the computing device/server 1100, or with any device (e.g., network card, modem, etc.) that enables the computing device/server 1100 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).

According to an exemplary implementation of the present disclosure, a computer-readable storage medium is provided, on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims

1. A method of performing resource scheduling in a service, the method comprising:

acquiring first supply and demand data associated with a service at a first point in time, the first supply and demand data comprising: at least one low-dimensional portion represented in a low-dimensional format, at least one sparse portion represented in a sparse high-dimensional format, and at least one dense portion represented in a dense high-dimensional format, the low-dimensional portion representing at least any one of: the value of the task associated with the service, the data of the completed task associated with the service, the data of the provided scheduling credentials associated with the service, the number of the demanding parties of the incomplete task associated with the service, and the number of the servicing parties of the incomplete task associated with the service;

Determining, with a first network, a first resource scheduling policy associated with the first supply-demand data based on the first supply-demand data, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service, and determining the first resource scheduling policy comprises:

combining the at least one low-dimensional portion, the at least one sparse portion, and the at least one dense portion to generate a feature representation of the first supply-demand data; and

determining, with the first network, the first resource scheduling policy based on the characteristic representation; and

applying the first resource scheduling policy so as to obtain a first influence of the first resource scheduling policy on a service target of the service, wherein an association relationship between the first resource scheduling policy and the first influence satisfies a predetermined constraint condition, and the predetermined constraint condition comprises at least any one of the following: constraints on the demand side schedule credentials, constraints on the provider schedule credentials, and constraints on the total amount of schedule credentials.

2. The method of claim 1, further comprising determining the first network based on:

determining a first reference scheduling policy with the first network based on first reference supply and demand data for the service;

determining, with a second network, a first reference impact of applying the first reference scheduling policy on the service objective based on the first reference supply and demand data; and

the first network is updated based on the first reference scheduling policy and the first reference impact.

3. The method of claim 2, wherein updating the first network comprises: updating the first network such that a difference between the first resource scheduling policy and a previous resource scheduling policy preceding the first resource scheduling policy satisfies a predetermined difference condition, under the constraint of a predetermined cutoff function representing a range of ratios between the first resource scheduling policy and the previous resource scheduling policy.

4. The method of claim 2, further comprising determining the second network based on:

Determining, with the second network, a predicted value of a second reference impact of applying a second reference scheduling policy on the service target based on second reference supply and demand data of the service; and

the second network is updated based on a difference between the predicted value and the second reference impact of applying the second reference scheduling policy on the service objective.

5. The method of claim 4, wherein updating the first network further comprises:

determining reference scheduling credentials of the first reference scheduling policy using a third network based on the first reference supply and demand data; and

the first network is updated based on the reference scheduling credential.

6. The method of claim 5, wherein updating the first network based on the reference scheduling credential comprises:

determining a reference requestor schedule credential and a reference provider schedule credential associated with the first reference schedule policy; and

the first network is updated if a difference between the reference-requiring-party scheduling credential and the reference-provider scheduling credential satisfies a predetermined constraint.

7. The method of claim 6, wherein updating the first network based on the reference scheduling credential comprises: the first network is updated if the sum of the reference requester schedule credential and the reference provider schedule credential meets a predetermined constraint.

8. The method of claim 5, further comprising determining the third network based on:

determining, with the third network, a predicted value of a reference scheduling credential of the second reference scheduling policy based on second reference supply-demand data of the service; and

the third network is updated based on a difference between the predicted value of the reference scheduling credential and a reference scheduling credential of the second reference scheduling policy.

9. The method of claim 1, wherein the first supply and demand data further comprises control parameters for controlling a long term impact of the first resource scheduling policy on the service objective, and wherein determining the first resource scheduling policy further comprises: based on the control parameters, the first resource scheduling policy for maximizing the long term impact is determined.

10. The method of claim 1, wherein generating the characteristic representation of the first supply-demand data comprises:

performing a join operation for the at least one low-dimensional portion to determine a low-dimensional representation of the at least one low-dimensional portion;

performing a flattening operation on the at least one sparse portion to determine a sparse representation of the sparse portion;

Performing an extraction operation for the at least one dense portion to determine a dense representation of the dense portion; and

the low-dimensional representation, the sparse representation, and the dense representation are concatenated to generate the feature representation.

11. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the sparse portion represents at least any one of: distribution of scheduling credentials, distribution of value, and distribution of features of a demander of an incomplete task associated with the service; and

the dense portion represents at least any one of: scheduling credential allocation data, historical value data, and historical feature distribution for the requesters and providers of the current task associated with the service.

12. The method according to claim 1, wherein the method further comprises:

acquiring second supply and demand data associated with the service at a second point in time, the second point in time being after application of the first resource scheduling policy;

determining, with the first network, a second resource scheduling policy associated with the second supply-demand data based on the second supply-demand data; and

and applying the second resource scheduling policy.

13. The method of claim 1, wherein the first resource scheduling policy is used to adjust services within a predetermined range over a predetermined period of time, the service objective being associated with at least any one of: the number of tasks of the service, the task value of the service.

14. The method of claim 1, wherein the service comprises at least any one of:

a taxi service, said demand party comprising a passenger of a vehicle, said provider comprising a driver of said vehicle;

an item delivery service, the demander comprising a sender of an item to be delivered, the provider comprising a delivery party responsible for delivering the item;

and the vehicle driving service is carried out by the demander, the demander comprises a user needing the driving service, and the provider comprises a driving party providing the driving service.

15. An apparatus for performing resource scheduling in a service, the apparatus comprising:

an acquisition module configured to acquire first supply-demand data associated with a service at a first point in time, the first supply-demand data including: at least one low-dimensional portion represented in a low-dimensional format, at least one sparse portion represented in a sparse high-dimensional format, and at least one dense portion represented in a dense high-dimensional format, the low-dimensional portion representing at least any one of: the value of the task associated with the service, the data of the completed task associated with the service, the data of the provided scheduling credentials associated with the service, the number of the demanding parties of the incomplete task associated with the service, and the number of the servicing parties of the incomplete task associated with the service;

A determining module configured to determine, based on the first supply-demand data, a first resource scheduling policy associated with the first supply-demand data with a first network, the first resource scheduling policy indicating scheduling credentials provided to the service, the scheduling credentials including at least any one of: a demander schedule credential for a demander of the service and a provider schedule credential for a provider of the service, and the determination module comprises:

a generation module configured to combine the at least one low-dimensional portion, the at least one sparse portion, and the at least one dense portion to generate a feature representation of the first supply-demand data; and

a utilization module configured to determine the first resource scheduling policy with the first network based on the characteristic representation; and

an application module configured to apply a first of the resource scheduling policies in order to obtain a first impact of the first resource scheduling policy on a service objective of the service, an association between the first resource scheduling policy and the first impact satisfying a predetermined constraint, the predetermined constraint comprising at least any one of: constraints on the demand side schedule credentials, constraints on the provider schedule credentials, and constraints on the total amount of schedule credentials.

16. An electronic device, the electronic device comprising:

a memory and a processor;

wherein the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1 to 14.

17. A computer readable storage medium storing one or more computer instructions, wherein the one or more computer instructions are executable by a processor to implement the method of any one of claims 1 to 14.

18. A computer program product, characterized in that the computer program product comprises a computer program/instruction, wherein the computer program/instruction, when executed by a processor, implements the method according to any of claims 1 to 14.