CN111814987A

CN111814987A - Dynamic feedback method, model training method, device, equipment and storage medium

Info

Publication number: CN111814987A
Application number: CN202010647910.4A
Authority: CN
Inventors: 肜博辉; 杨秀君
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2020-10-23

Abstract

The application provides a dynamic feedback method, a model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: receiving a service request sent by a request client; if the service provider matched with the service request does not exist, acquiring the state information of the user corresponding to the service request; processing the state information according to a preset action feedback model to determine a target feedback action; and feeding back the target feedback action to a request client corresponding to the service request. Therefore, the problem that the feedback action fed back to the client side is not appropriate and the user runs off in the prior art is solved, and the negative emotion of the user is reduced.

Description

Dynamic feedback method, model training method, device, equipment and storage medium

Technical Field

The present application relates to the field of model training technologies, and in particular, to a dynamic feedback method, a model training method, an apparatus, a device, and a storage medium.

Background

With the penetration and popularity of service class platforms in people's lives, for example: take-out, online booking, various websites or restaurants, etc.; more and more people realize the consumption or the appointed consumption of the service class by ordering on a service class platform in life.

Due to the unbalanced distribution of service class requirements, for example: the peak time of a meal, the peak time of going to work and the peak time of going to work, severe weather and the like all cause that a user needs to wait for a long time to be responded by a service provider under a specific scene; during waiting, users are often associated with negative emotions such as anxiety, complaints, anger, and the like. On one hand, for the user, these negative emotions seriously affect the mood, and even can cause tension in the relationship between the service platform and the user; on the other hand, for the service platform, a long-time waiting brings a bad product experience to the user, and further brings negative situations such as complaint behaviors of the user and user loss.

In the prior art, negative emotions of users in a waiting link are generally relieved in a simple and single mode, for example, the users waiting are pacified in a message mode through the pre-unified configuration of service provider operation, but the mode does not consider the scene and the characteristics of each user, the messages received by each user are the same, the users are easy to fatigue, and negative situations such as complaint behaviors of the users and user loss cannot be avoided.

Disclosure of Invention

In view of this, an object of the present application is to provide a dynamic feedback method, a model training method, an apparatus, a device and a storage medium, which can solve the problem that a single feedback manner in the prior art causes fatigue of a user, and can not avoid negative situations such as complaint behaviors of the user and user loss, and achieve the effect of relieving negative emotions of the user while waiting.

In a first aspect of the present application, the present application provides a dynamic feedback method, the method comprising:

receiving a service request sent by a request client;

if the service provider matched with the service request does not exist, acquiring the state information of the user corresponding to the service request;

processing the state information according to a preset action feedback model to determine a target feedback action;

and feeding back the target feedback action to a request client corresponding to the service request.

Optionally, the method further comprises:

after the feedback of the target feedback action is obtained, state change information of the service request is obtained;

and updating the action feedback model according to the state change information and the target feedback action.

Optionally, the updating the action feedback model according to the state change information and the target feedback action includes:

and updating the action feedback model according to the state change information, the feedback effect score corresponding to the state change information and the target feedback action.

Optionally, the status information includes at least one of the following information: behavioral state information, personal information, service scenario information.

In a second aspect, the present application provides a method for training a motion feedback model, the method comprising:

acquiring a training data set formed by a plurality of groups of historical data, wherein each group of historical data comprises: historical state information and feedback actions corresponding to the historical state information;

and performing model training by adopting a preset reinforcement learning algorithm according to the training data set to obtain the preset action feedback model.

Optionally, the performing, according to the training data set, model training by using a preset reinforcement learning algorithm to obtain the preset action feedback model includes:

clustering the multiple sets of historical data in the training data set;

and performing model training by adopting the reinforcement learning algorithm according to the clustered training data set to obtain the preset action feedback model.

and performing model training by adopting the reinforcement learning algorithm according to the training data set and the service scene corresponding to the training data set to obtain the preset action feedback model corresponding to the service scene.

Optionally, the historical status information comprises at least one of the following information: behavioral state information, personal information, service scenario information.

In a third aspect of the present application, the present application further provides a dynamic feedback apparatus, the apparatus comprising: the device comprises a receiving module, an obtaining module, a determining module and a feedback module, wherein:

the receiving module is used for receiving a service request sent by a request client;

the obtaining module is used for obtaining the state information of the user corresponding to the service request if the service provider matched with the service request does not exist;

the determining module is used for processing the state information according to a preset action feedback model and determining a target feedback action;

and the feedback module is used for feeding back the target feedback action to a request client corresponding to the service request.

Optionally, the apparatus further comprises: an update module, wherein:

the obtaining module is specifically configured to obtain state change information of the service request after the feedback of the target feedback action is obtained;

and the updating module is used for updating the action feedback model according to the state change information and the target feedback action.

Optionally, the updating module is specifically configured to update the action feedback model according to the state change information, the feedback effect score corresponding to the state change information, and the target feedback action.

In a fourth aspect of the present application, there is also provided a training apparatus for a motion feedback model, the apparatus comprising: an acquisition module and a training module, wherein:

the obtaining module is configured to obtain a training data set formed by multiple sets of historical data, where each set of historical data includes: historical state information and feedback actions corresponding to the historical state information;

and the training module is used for carrying out model training by adopting a preset reinforcement learning algorithm according to the training data set to obtain the preset action feedback model.

Optionally, the apparatus further comprises: a clustering module, wherein:

the clustering module is used for clustering the multiple groups of historical data in the training data set;

the training module is specifically configured to perform model training by using the reinforcement learning algorithm according to the clustered training data set to obtain the preset action feedback model.

Optionally, the training module is specifically configured to perform model training by using the reinforcement learning algorithm according to the training data set and a service scenario corresponding to the training data set, so as to obtain the preset action feedback model corresponding to the service scenario.

In a fifth aspect of the present application, there is provided a dynamic feedback device, a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the dynamic feedback device is operated, the processor and the storage medium communicate with each other through the bus, and the processor executes the machine-readable instructions to perform the steps of any one of the methods of the first aspect.

In a sixth aspect of the present application, there is provided a training apparatus for a dynamic feedback model, a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the training apparatus for a training model for a motion feedback model runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to perform the steps of the method according to any one of the second aspects.

In a seventh aspect of the present application, there is also provided a storage medium having stored thereon a computer program for performing the steps of the method according to any one of the first or second aspects when the computer program is executed by a processor.

Based on any aspect, by adopting the dynamic feedback method provided by the application, when no service provider matched with the service request exists, the current state information of the user corresponding to the service request is obtained, the state information is processed according to a preset action feedback model, the target feedback action is determined, and then the target feedback action is fed back to the request client corresponding to the service request, because each target feedback action is determined according to the current state information of each user, the target feedback action is personalized, the target feedback actions corresponding to different users are possibly different, even if the same user exists, the target feedback actions received in different states can also be different, and the problems of negative conditions that the user is tired due to a single feedback mode in the prior art, the complaint behaviors of the user and the loss of the user cannot be avoided are solved, the effect of relieving the negative emotion of the user when waiting is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic structural diagram illustrating a dynamic feedback system according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating a dynamic feedback method according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for training a motion feedback model according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a method for training a motion feedback model according to another embodiment of the present application;

FIG. 5 is a flow chart illustrating a dynamic feedback method according to another embodiment of the present application;

FIG. 6 is a schematic structural diagram of a dynamic feedback device according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a dynamic feedback apparatus according to another embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus for motion feedback models according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a training apparatus for motion feedback models according to another embodiment of the present application;

fig. 10 is a schematic structural diagram illustrating a dynamic feedback device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram illustrating a training apparatus for a motion feedback model according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In order to enable those skilled in the art to use the present disclosure, the following embodiments are provided in combination with dynamic feedback of a specific application scenario in which a car appointment platform waits for a pick-up scenario. It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is described primarily in the context of dynamic feedback in a scenario where a network appointment platform waits for a pick-up order, it should be understood that this is only an exemplary embodiment, and the present application may be applied in various scenarios where dynamic feedback is required, such as: dynamic feedback in a take-away platform and other meal scenes, dynamic feedback in a designated driving platform waiting order receiving scene, feedback in a restaurant waiting ranking scene, feedback in a website downloading scene and the like.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

One aspect of the present application relates to a dynamic feedback system. The system can acquire the corresponding state information of the user corresponding to the client under the current service request after receiving the service request sent by the client and no service provider matched with the service request exists at present, process the state information according to a preset action feedback model, and determine and send a target feedback action to the corresponding client, wherein the service provider can be as follows: a cyber appointment, a take-away, a designated drive, a restaurant, a website, etc., and the application is not limited thereto.

It is noted that prior art generally uses a single method, such as a simple and direct case, to inform a user of a corresponding status of a current service request, such as: the restaurant feeds back to the user '23 table equipotentials in front at present' in a waiting ranking scene; the website feeds back to the user 'current downloading speed 2M, downloading progress 30%, and downloading completion in 13 minutes is expected' in a resource downloading waiting scene; the network appointment platform feeds back to a user '32-bit waiting vehicles in front and the expected 10-minute order taking' in a waiting order taking scene; however, the feedback method does not consider the scene where each user is located and the characteristics of each user, the feedback received by each user is the same, the user is easy to fatigue, and the negative emotion generated when the user waits for a long time cannot be appealed; therefore, the single repeated feedback mode can cause negative emotion influence on the mood of the user, and further cause complaints or user loss.

According to the dynamic feedback method, when a service provider matched with the service request does not exist, namely the user is in a state of waiting for response, the current state information of the user corresponding to the service request is obtained, the state information is processed according to a preset action feedback model, the target feedback action is determined, and then the target feedback action is fed back to the request client corresponding to the service request.

Fig. 1 is a schematic architecture diagram of a dynamic feedback system 100 provided in an embodiment of the present application, for example: the dynamic feedback system 100 may be a dynamic feedback scenario for a network appointment platform waiting for an order taking scenario, or any platform or scenario involving dynamic feedback. As shown in fig. 1, the dynamic feedback system 100 may include one or more of a server 110, a network 120, a service terminal 130, and a database 140.

In some embodiments, the server 110 may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described herein. For example, the processor may determine the current status information of the user service request based on the service request obtained from the service terminal 130. In some embodiments, a processor may include one or more processing cores (e.g., a single-core processor (S) or a multi-core processor (M)). Merely by way of example, a Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced Instruction Set Computing, RISC), a microprocessor, or the like, or any combination thereof.

In some embodiments, the device type to which the service terminal 130 corresponds may be a mobile device, such as may include a wearable device, a smart mobile device, a tablet computer, a laptop computer, and so on. Taking a dynamic feedback scenario in a scenario where the network car booking platform waits for receiving orders as an example, the service terminal 130 may be a mobile phone of a user, the user initiates a service request through a network car booking application installed on the mobile phone, and a background of the network car booking application initiates a service request according to current user information and each piece of network car booking information, for example: location information, etc., and corresponding service response information or target feedback information is returned.

In some embodiments, a database 140 may be connected to the network 120 to communicate with one or more components (e.g., the server 110, the service terminal 130, the service provider, etc.) in the dynamic feedback system 100. One or more components in the dynamic feedback system 100 may access data or instructions stored in the database 140 via the network 120. In some embodiments, the database 140 may be directly connected to one or more components in the dynamic feedback system 100, or the database 140 may be part of the server 110.

In the following, a detailed description is given to a dynamic feedback method provided in the embodiment of the present application with reference to the content described in the dynamic feedback system 100 shown in fig. 1, where the following dynamic feedback method is applied to the above system, an execution main body may be executed for a server or a terminal device corresponding to a user, in the following embodiments, a dynamic feedback scenario in a scenario where a network appointment platform waits for a pick-up is all used, the execution main body is explained for the terminal device corresponding to the user, and an application program of a service class corresponding to a service request sent by the user is installed on the terminal device, a specific preset scenario may be designed and adjusted according to a user requirement, and any scenario or platform related to dynamic feedback may be used, and the present application is not limited herein.

The following explains a dynamic feedback method provided by the embodiments of the present application with reference to a plurality of specific application examples. Referring to fig. 2, a schematic flowchart of a dynamic feedback method according to an embodiment of the present application is shown, and as shown in fig. 2, the method includes:

s201: and receiving a service request sent by a request client.

Optionally, the request client may be any intelligent terminal device, such as a wearable device, an intelligent mobile device, a tablet computer, a laptop computer, or the like, in which a service application corresponding to the service request is installed; the service request type is determined according to the service request type provided by the service class; for example, it may be: under the network car booking platform, the service request types may include: car sharing type, special car type, windward type or appointment type and the like; under a restaurant platform, service request types may include: ranking type or reservation type, etc.; under the takeaway platform, the service request types may include: ordering type, appointment delivery type, leg-run purchasing type and the like; the type of the specific service request is subject to the type provided by the service provider, and is not limited to the above embodiment.

For example, the way in which the user determines the service request on the client may be: selecting a target service type from a plurality of service types on an application program of the service type, and initiating a service request corresponding to the target service type; or determining the default service type as a target service type according to the default service type set by the service type, and initiating a service request corresponding to the target service type; the service request may include one service type or a combination of multiple service types, for example, the service request under the car booking platform may be: the calling mode can provide various choices for the user, the user can call various types of network appointment cars simultaneously, and finally takes the network appointment car which responds to the calling first, and the calling to other network appointment cars is stopped when the order state is changed to be in progress, so that the waiting time of the user is reduced; the number of service types included in a specific service request, etc., which is not limited herein, may be determined according to a user selection operation or a default setting of a service provider.

S202: and if the service provider matched with the service request does not exist, acquiring the state information of the user corresponding to the service request.

The service provider matched with the service request does not exist, that is, the service request sent by the current user through the client is not responded by the service provider, and the condition that the service request is not responded is probably that the service provider cannot respond in time because the number of people initiating the service request in the peripheral range where the current user is located is too large, and the user needs to respond to the service request of the user after waiting for the service requests of other users queued in front to be responded; or the service provider does not exist in the peripheral range of the user, and the user needs to wait for the scheduling of the service provider; no matter what kind of situation causes the service request to have no response, it indicates that the user needs to enter a waiting state at present, waits for the service request initiated by the user to be responded, and at this time, needs to acquire the state information of the corresponding user, so as to analyze the state information of the user and then perform the subsequent steps.

Optionally, in an embodiment of the present application, the status information may include at least one of the following information: behavioral state information, personal information, service scenario information.

For example, in a network car appointment scenario, the behavior state information may be, for example: the map operation behavior of the user (such as map enlargement or map reduction), the switching operation of the user (such as switching to other types of online appointments for viewing), and the like; the personal information may be, for example: the historical sending order request information, the historical cancellation information, the age information, the commonly used departure place information, the commonly used destination information, the historical complaint information and the like of the user; the service context information may be, for example: whether the current time period is the rush hour of taking a car, the type of the current order taking point, the current weather condition, whether the current time period is a holiday, the distribution condition of surrounding drivers, the surrounding prerequisite user order sending requirements and other scene information.

S203: and processing the state information according to a preset action feedback model to determine a target feedback action.

The state information is changed according to the real-time condition of the user, so that the target feedback action is determined after the state information is processed, different actions can be fed back to different users, the probability that the user cancels the service request before the service request is answered is effectively reduced, and the user experience is improved.

Illustratively, the target feedback action may include, for example, the following: feeding back emotional packaging case; feedback on the reason why waiting is to continue; feeding back information of information type; feeding back the waiting time of the user; feedback of encouragement, etc.; the content included in the specific target feedback action can be flexibly adjusted according to the user's needs, and is not limited to the embodiments described above.

S204: and feeding back the target feedback action to the request client corresponding to the service request.

Alternatively, the feedback of the target feedback action may be performed by: if the application program corresponding to the service request on the client is in an open state, feeding back a target feedback action in a popup mode on an application program interface; if the application program corresponding to the service request on the client is in a closed state, the target feedback action is not sent until the state of the application program is in an open state, and then the corresponding target feedback action is determined and fed back according to the state information corresponding to the user in the open state; the feedback mode of the specific feedback action can be flexibly adjusted according to the user's needs, and is not limited to the embodiments described above.

By adopting the dynamic feedback method provided by the application, the current state information of the user corresponding to the service request can be acquired when no service provider matched with the service request exists, and processes the state information according to a preset action feedback model, determines a target feedback action, then feeds back the target feedback action to a request client corresponding to the service request, because each target feedback action is determined according to the current state information of each user, the target feedback actions are personalized, the target feedback actions corresponding to different users can be different, even for the same user, the target feedback actions received in different states can be different, so that the problems that the user is tired due to a single feedback mode in the prior art, complaint behaviors of the user, user loss and other negative conditions cannot be avoided are solved, and the effect of relieving the negative emotion of the user in waiting is achieved.

Optionally, on the basis of the above embodiment, the embodiment of the present application may further provide a dynamic feedback method, and how to obtain an implementation process of the preset action feedback model in the above method is described with reference to the following drawings. Fig. 3 is a schematic flowchart of a method for training a motion feedback model according to an embodiment of the present application, and as shown in fig. 3, the method includes:

s205: and acquiring a training data set formed by multiple groups of historical data.

Wherein each set of historical data comprises: historical state information and feedback actions corresponding to the historical state information; the historical state information is the historical state information of the user corresponding to the historical service request.

Optionally, in an embodiment of the present application, the historical status information may include at least one of the following information: behavioral state information, personal information, service scenario information.

Selecting a training data set under corresponding service scenes according to different service scenes to be trained, calculating feedback information of a user on feedback actions under each group of historical state information, and training a model according to the feedback information; the feedback information of the user may be, for example: the order is cancelled, the order continues to wait, the order is answered, and the like, and the content included in the specific feedback information can be flexibly adjusted according to the user requirement, and is not limited to the content provided in the above embodiment.

Optionally, in an embodiment of the present application, a Q-learning method may be used to train a model, and the finally trained model is a Q-value table; and after the model is deployed on a line, target feedback actions corresponding to different state information can be determined only according to the feedback actions corresponding to the different state information in the Q value table, so that the target feedback actions can be determined and pushed.

Since the state information includes a plurality of information, the method for determining the target feedback action may be, for example: if the Q value table has state information completely consistent with the current state information of the user, directly determining the feedback action corresponding to the state information as a target feedback action; if no state information completely consistent with the current state information of the user exists in the Q-value table, selecting, in the Q-value table, state information with the highest similarity to the current state information of the user, or state information with the largest number of consistent pieces of information in the state information as target state information, and then determining a feedback action corresponding to the target state information as a target feedback action, it should be understood that a specific manner of determining the target feedback action may be flexibly adjusted according to a user's need, and is not limited to the manner provided in the foregoing embodiment, and the present application is not limited thereto.

For example, the following steps are carried out: in one embodiment of the present application, the feedback information of the user still includes: for example, the method of canceling an order, continuing to wait, and answering an order is described, and the bonus points may be determined by: if the user cancels the service request order after receiving the feedback action, the reward integral corresponding to the state is-100, and the order state corresponding to the service request is the end, namely the order round corresponding to the service request is ended; if the user selects to continue waiting after receiving the feedback action, the reward integral corresponding to the state is +1, and the order state corresponding to the service request is a continuous state, which indicates that the order round corresponding to the service request continues; if the user receives the feedback action and the order is successfully answered, the current order is successfully answered, the reward integral corresponding to the state is +100, and the order state corresponding to the service request is finished, namely the order round corresponding to the service request is finished; it should be understood that the method for training the model and the corresponding method for determining the bonus point can be flexibly adjusted according to the user's needs, and are not limited to the embodiments described above.

S206: and performing model training by adopting a preset reinforcement learning algorithm according to the training data set to obtain a preset action feedback model.

In an embodiment of the present application, a reinforcement learning algorithm may be used to perform model training according to a training data set and a service scenario corresponding to the training data set, so as to obtain a preset action feedback model corresponding to the service scenario.

By adopting the dynamic feedback method provided by the application, the dynamic feedback models which can be applied to different service scenes can be obtained by training according to different training data sets, so that the application range of the method is expanded, any scene related to dynamic feedback can use the method provided by the application, the viscosity of a service provider corresponding to the scene of a client is improved, and the use feeling of the user is improved.

Optionally, on the basis of the above embodiment, the embodiment of the present application may further provide a dynamic feedback method, and how to obtain an implementation process of the preset action feedback model in the above method is described with reference to the following drawings. Fig. 4 is a flowchart illustrating a method for training a motion feedback model according to another embodiment of the present application, and as shown in fig. 4, S206 may include:

s207: clustering groups of historical data in the training data set.

In an application scenario of each service provider, a user base is huge and historical data is sparse, so that low-frequency users need to be clustered, and therefore the problem of poor model generalization capability (namely adaptability to a fresh sample) caused by data sparsity is solved, and a clustering rule can be flexibly set and adjusted according to user needs, and in one embodiment of the application, the clustering rule can be, for example: the number of service request orders initiated by the user whose number of service request orders initiated in the last year is within 1-5 orders in the historical data is unified to 3 orders, but the specific clustering rule is not limited to the one given in the above embodiment, and the application is not limited thereto.

S208: and performing model training by adopting a reinforcement learning algorithm according to the clustered training data set to obtain a preset action feedback model.

After low-frequency users are clustered according to a preset clustering rule, the clustered training data set is used for training the model, a preset action feedback model is obtained after training, and the generalization capability of the model is improved by the training mode.

Optionally, on the basis of the above embodiment, an embodiment of the present application may further provide a dynamic feedback method, and how to update the implementation process of the preset action feedback model in the above method is described with reference to the following drawings. Fig. 5 is a schematic flowchart of a dynamic feedback method according to another embodiment of the present application, and as shown in fig. 5, the method further includes:

s209: and after the feedback of the target feedback action is acquired, the state change information of the service request is acquired.

After the preset action feedback model is deployed to the online application, the corresponding target feedback action can be fed back according to the state information of the user, and the state change information of the order corresponding to the service request after the target feedback action is obtained.

S210: and updating the action feedback model according to the state change information and the target feedback action.

And after state change information of the order corresponding to the service request after the target feedback action is acquired, adding the state information of the user, the target feedback action information and the state change information of the order corresponding to the service request into a training data set as new historical data, and updating the model according to all data in the training data set, so that the system can continuously and automatically learn.

Optionally, in an embodiment of the present application, at the first time after obtaining the state change information of the order corresponding to the service request after the target feedback action, the state information of the user, the target feedback action information, and the state change information of the order corresponding to the service request may be added to the training data set as new historical data; or a preset updating time interval can be set, after the preset time interval, state information of all users, target feedback action information and state change information of an order corresponding to the service request in the previous time interval are obtained, and all the obtained information is added into the training data set; the specific method for acquiring the information may be flexibly adjusted according to the user's needs, and is not limited to the embodiments described above.

Optionally, in an embodiment of the present application, the action feedback model may be updated according to the state change information, the feedback effect score corresponding to the state change information, and the target feedback action.

By adopting the dynamic feedback method provided by the application, the specific target feedback action can be given according to the current corresponding state information of the user, so that different users or different states of the same user can be fed back with different target feedback actions, the fatigue of the user on the target feedback actions is avoided, the cancellation rate of the user before the service request order is responded is effectively reduced, and the user experience is improved.

Based on the same inventive concept, a dynamic feedback device corresponding to the dynamic feedback method is also provided in the embodiments of the present application, and since the principle of solving the problem of the device in the embodiments of the present application is similar to that of the dynamic feedback method in the embodiments of the present application, the implementation of the device can refer to the implementation of the method, and the repeated points of the beneficial effects are not described again.

Fig. 6 is a schematic structural diagram of a dynamic feedback device according to an embodiment of the present application, and as shown in fig. 6, the device includes: a receiving module 301, an obtaining module 302, a determining module 303 and a feedback module 304, wherein:

the receiving module 301 is configured to receive a service request sent by a requesting client.

The obtaining module 302 is configured to obtain the status information of the user corresponding to the service request if there is no service provider matching the service request.

And the determining module 303 is configured to process the state information according to a preset action feedback model, and determine a target feedback action.

And the feedback module 304 is configured to feed back the target feedback action to the requesting client corresponding to the service request.

Fig. 7 is a schematic structural diagram of a dynamic feedback device according to an embodiment of the present application, and as shown in fig. 7, the device further includes: an update module 305, wherein:

the obtaining module 302 is specifically configured to obtain the state change information of the service request after the feedback of the target feedback action is obtained.

An updating module 305, configured to update the action feedback model according to the state change information and the target feedback action.

Optionally, the updating module 305 is specifically configured to update the action feedback model according to the state change information, the feedback effect score corresponding to the state change information, and the target feedback action.

Fig. 8 is a schematic structural diagram of a training apparatus for a motion feedback model according to an embodiment of the present application, as shown in fig. 7, the apparatus includes: an acquisition module 401 and a training module 402, wherein:

the obtaining module 401 is specifically configured to obtain a training data set formed by multiple sets of historical data, where each set of historical data includes: historical state information and feedback actions corresponding to the historical state information.

The training module 402 is configured to perform model training by using a preset reinforcement learning algorithm according to the training data set to obtain a preset action feedback model.

Fig. 9 is a schematic structural diagram of a training apparatus for a motion feedback model according to another embodiment of the present application, and as shown in fig. 9, the apparatus further includes: a clustering module 403, wherein:

a clustering module 403, configured to cluster multiple sets of historical data in the training data set.

The training module 402 is specifically configured to perform model training by using a reinforcement learning algorithm according to the clustered training data set, so as to obtain a preset action feedback model.

Optionally, the training module 402 is specifically configured to perform model training by using a reinforcement learning algorithm according to the training data set and a service scenario corresponding to the training data set, so as to obtain a preset action feedback model corresponding to the service scenario.

Based on the same inventive concept, a dynamic feedback device corresponding to the dynamic feedback method is also provided in the embodiments of the present application, and since the principle of solving the problem of the device in the embodiments of the present application is similar to that of any one of the methods in fig. 1 to 5 in the embodiments of the present application, the implementation of the device may refer to the implementation of the method, and the repetition of the beneficial effects is not repeated.

Fig. 10 is a schematic structural diagram of a dynamic feedback device according to an embodiment of the present application, and as shown in fig. 10, the dynamic feedback device includes: a processor 601, a memory 602, and a bus 603; the memory 602 stores machine-readable instructions executable by the processor 601, and when the dynamic feedback device is operated, the processor 601 communicates with the memory 602 through the bus 603, and the processor 601 executes the machine-readable instructions to perform the steps of the dynamic feedback method provided by the foregoing method embodiments.

Specifically, the machine readable instructions stored in the memory 602 are the execution steps of the dynamic feedback method described in the foregoing embodiment of the present application, and the processor 601 can execute the dynamic feedback method to process the status information of the user, so that the dynamic feedback apparatus also has all the advantages described in the foregoing embodiment of the method, and the description of the present application is not repeated.

It should be noted that the dynamic feedback device may be a general-purpose computer or a special-purpose computer, and other servers for processing data, and all of the three may be used to implement the dynamic feedback method of the present application. Although the dynamic feedback method is described only by a computer and a server, respectively, for convenience, the functions described herein may be implemented in a distributed manner on a plurality of similar platforms to balance the processing load.

For example, the dynamic feedback device may include one or more processors for executing program instructions, a communication bus, and different forms of storage media, such as a disk, ROM, or RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof. The method of the present application may be implemented in accordance with these program instructions.

For ease of illustration, only one processor is described in the above embodiments. However, it should be noted that the dynamic feedback device in the present application may also comprise a plurality of processors, and thus the steps performed by one processor described in the present application may also be performed by a plurality of processors in combination or individually.

Fig. 11 is a schematic structural diagram of a training apparatus for a motion feedback model according to an embodiment of the present application, and as shown in fig. 11, the training apparatus for a motion feedback model includes: a processor 701, a memory 702, and a bus 703; the memory 702 stores machine-readable instructions executable by the processor 701, when the training apparatus for the motion feedback model is running, the processor 701 communicates with the memory 702 via the bus 703, and the processor 701 executes the machine-readable instructions to perform the steps of the training method for the motion feedback model as provided in the foregoing method embodiments.

Specifically, the machine readable instructions stored in the memory 702 are steps executed in the method for training the motion feedback model according to the foregoing embodiment of the present application, and the processor 701 may execute the method for training the motion feedback model to process the state information of the user, so that the apparatus for training the motion feedback model also has all the advantages described in the foregoing method embodiments, and the description of the present application is not repeated.

The embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and the computer program is executed by a processor to perform the steps of the dynamic feedback method.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the dynamic feedback method can be executed, so that the problem that the feedback action to the client is not appropriate and the user is lost in the prior art is solved, and the effect of reducing the negative emotion of the user is achieved.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A dynamic feedback method, the method comprising:

receiving a service request sent by a request client;

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 2, wherein said updating the action feedback model based on the state change information and the target feedback action comprises:

4. A method according to any of claims 1-3, wherein the status information comprises at least one of: behavioral state information, personal information, service scenario information.

5. A method for training a motion feedback model, the method comprising:

acquiring a training data set formed by a plurality of groups of historical data, wherein each group of historical data comprises: historical state information and feedback actions corresponding to the historical state information; the historical state information is historical state information of a user corresponding to the historical service request;

6. The method of claim 5, wherein the performing model training using a predetermined reinforcement learning algorithm according to the training data set to obtain the predetermined action feedback model comprises:

clustering the multiple sets of historical data in the training data set;

7. The method of claim 5, wherein the performing model training using a predetermined reinforcement learning algorithm according to the training data set to obtain the predetermined action feedback model comprises:

8. The method of claim 5, wherein the historical state information comprises at least one of: behavioral state information, personal information, service scenario information.

9. A dynamic feedback apparatus, the apparatus comprising: the device comprises a receiving module, an obtaining module, a determining module and a feedback module, wherein:

10. The apparatus of claim 9, wherein the apparatus further comprises: an update module, wherein:

11. An apparatus for training a motion feedback model, the apparatus comprising: an acquisition module and a training module, wherein:

the obtaining module is configured to obtain a training data set formed by multiple sets of historical data, where each set of historical data includes: historical state information and feedback actions corresponding to the historical state information; the historical state information is historical state information of a user corresponding to the historical service request;

12. A dynamic feedback device, characterized in that the device comprises: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the dynamic feedback device is operating, the processor executing the machine-readable instructions to perform the method of any one of claims 1-4.

13. An apparatus for training a motion feedback model, the apparatus comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the training apparatus of the motion feedback model is running, the processor executing the machine-readable instructions to perform the method of any of the above claims 5-8.

14. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method of any of the preceding claims 1-8.