CN115705359A

CN115705359A - Data processing method and device, electronic equipment and computer readable medium

Info

Publication number: CN115705359A
Application number: CN202211261184.8A
Authority: CN
Inventors: 张智慧; 邹波; 宋双永
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2023-02-17

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a computer readable medium, which relate to the technical field of computers, wherein one specific embodiment comprises the steps of receiving a data processing request and acquiring corresponding session data and role identification; extracting target role session data from the session data according to the role identification; predicting the probability that the target role session data belong to a preset category, the probability containing a positive factor and the probability containing a negative factor; and determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor. The target session data is a result screened by a plurality of deep learning methods together, so that the target session data can be rapidly and accurately mined for new people to learn.

Description

Data processing method and device, electronic equipment and computer readable medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a computer-readable medium.

Background

In the internet era, each company needs customer service and is responsible for solving various problems for customers. For the customer service personnel, the improvement of the working capacity not only relates to the growth and performance of individuals, but also relates to the image of enterprises. If the customer service is too unprofessional, the problem can not be solved frequently. And the newly-enrolled customer service staff can read the details of the response sessions of other old customer service staff so as to learn response skills.

In the process of implementing the present application, the inventors found that at least the following technical problems exist in the prior art:

the excellent response cases mined by the rules have higher limitations, for example, screening is performed through service duration or total word number in the service period, only few conversation turns or some meaningless service cases can be filtered, and the real good service cases can not be extracted, so that the learning effect of newly-enrolled customer service personnel is poor.

Disclosure of Invention

In view of this, embodiments of the present application provide a data processing method, an apparatus, an electronic device, and a computer readable medium, which can solve the problem that the existing excellent response cases mined by rules have high limitations, for example, screening is performed according to service duration or total number of words during service, only few session turns or some meaningless service cases can be filtered, and a real good service case may not be extracted, so that the learning effect of newly-enrolled customer service personnel is poor.

To achieve the above object, according to an aspect of an embodiment of the present application, there is provided a data processing method including:

receiving a data processing request, and acquiring corresponding session data and role identification;

extracting target role session data from the session data according to the role identification;

predicting the probability that the target role session data belong to a preset category, the probability containing a positive factor and the probability containing a negative factor;

and determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor.

Optionally, determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing a positive factor, and the probability of containing a negative factor, including:

for each piece of target role session data, determining a corresponding score according to the probability belonging to a preset category and the score of the preset category;

and determining and outputting target session data in the target role session data according to each score in response to the fact that the probability containing the positive factors is larger than a preset threshold and the probability containing the negative factors is smaller than the preset threshold.

Optionally, determining and outputting target session data in the target role session data according to each score, including:

and in response to the fact that the scores are larger than the preset score threshold value in the scores, determining the target role session data corresponding to the scores larger than the preset score threshold value in the scores as target session data and outputting the target role session data.

Optionally, the data processing method further includes:

and determining that the target role session data does not exist in the target role session data in response to the fact that the probabilities containing the positive factors are all smaller than a preset threshold or the probabilities containing the negative factors are all larger than the preset threshold, and ending the data processing process.

Optionally, predicting the probability that the target role session data belongs to the preset category, the probability containing a positive factor, and the probability containing a negative factor includes:

calling a percentile model to input target role conversation data into the percentile model and further output the probability that the target role conversation data belong to a preset category;

calling a forward factor model to input the target role session data into the forward factor model, and further outputting the probability that the target role session data contains a forward factor;

and calling a negative factor model to input the target role session data into the negative factor model, and further outputting the probability that the target role session data contain the negative factors.

Optionally, extracting target role session data from the session data according to the role identifier includes:

and determining the target role according to the role identification, and further extracting target role session data corresponding to the target role from the session data.

Optionally, before predicting the probability that the target role session data belongs to the preset category, the probability of containing a positive factor and the probability of containing a negative factor, the method further comprises:

determining time corresponding to the target role session data, and splicing the target role session data according to the time to generate spliced session data;

the target role session data is updated with the splicing session data.

In addition, the present application also provides a data processing apparatus, including:

the receiving unit is configured to receive a data processing request and acquire corresponding session data and role identification;

a data extraction unit configured to extract target character session data from the session data according to the character identification;

a probability prediction unit configured to predict a probability that the target role session data belongs to a preset category, a probability including a positive direction factor, and a probability including a negative direction factor;

and the output unit is configured to determine and output target session data from the target role session data according to the probability of belonging to the preset category, the probability containing the positive-going factor and the probability containing the negative-going factor.

Optionally, the output unit is further configured to:

Optionally, the probability prediction unit is further configured to:

calling a percentile model to input the target role conversation data into the percentile model, and further outputting the probability that the target role conversation data belongs to a preset category;

calling a forward factor model to input target role session data into the forward factor model and further output the probability that the target role session data contain forward factors;

and calling the negative factor model to input the target role session data into the negative factor model, and further outputting the probability that the target role session data contains the negative factor.

Optionally, the data extraction unit is further configured to:

and determining a target role according to the role identification, and further extracting target role session data corresponding to the target role from the session data.

Optionally, the data processing apparatus further comprises a data splicing unit configured to:

the target role session data is updated with the splicing session data.

In addition, the present application also provides a data processing electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the data processing method as described above.

In addition, the present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the data processing method as described above.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of obtaining corresponding session data and role identification by receiving a data processing request; extracting target role session data from the session data according to the role identification; predicting the probability that the target role session data belong to a preset category, the probability containing a positive factor and the probability containing a negative factor; and determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor. The target session data is based on a common screening result of a plurality of deep learning methods, and the target session data can be rapidly and accurately mined for new people to learn.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein:

fig. 1 is a schematic view of a main flow of a data processing method according to a first embodiment of the present application;

fig. 2 is a schematic view of a main flow of a data processing method according to a second embodiment of the present application;

fig. 3 is a schematic diagram of an application scenario of a data processing method according to a third embodiment of the present application;

FIG. 4 is a schematic diagram of a percentage model execution logic of a data processing method according to an embodiment of the application;

FIG. 5 is a schematic diagram of the logic for implementing the positive/negative factor model of the data processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of the main elements of a data processing apparatus according to an embodiment of the present application;

FIG. 7 is an exemplary system architecture diagram to which embodiments of the present application may be applied;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application to assist in understanding, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.

Fig. 1 is a schematic diagram of a main flow of a data processing method according to a first embodiment of the present application, and as shown in fig. 1, the data processing method includes:

step S101, receiving a data processing request, and acquiring corresponding session data and a role identifier.

In this embodiment, an execution subject (for example, a server) of the data processing method may receive the data processing request by wired connection or wireless connection. Specifically, the data processing request may be a request for screening out excellent session data cases from the session data of the customer service and the customer. The data processing request carries session data to be processed, which is exemplified as follows:

customer/customer service session content

Customer changes to manual work

The customer service is good, the customer is good, and the money applies for goods return

Whether the customer service your pickup address is just your delivery address, and whether the contact and number have changes?

The customers do not

Is you customer submit returns?

Submitting before 18 customer service points, checking before 22 day points, submitting after 18 day points, checking before 12 tomorrow points

The customer service is

The customer service I firstly submits to wait for examination

Customer service asks you to wait for patiently, you can pay attention to the processing progress through XXAPP My-refund/after-sale-application

Customer service has applied for

What is you applying for the customer, what is you doing, i wait for XX to get goods, regardless?

After the customer service takes a letter, first waits for the telephone contact

When the customer probably contacts

The customer service checks the call before 22 o' clock today, and is expected to contact you in the morning of tomorrow and please keep the call unobstructed

Good, thank you for the customer

Kay by customer service

And the role identification carried in the data processing request is used for representing which role of session data is to be extracted for processing. The role identifier, which may be KF or 01, for example, indicates that conference session data of the customer service is to be processed.

And step S102, extracting target role session data from the session data according to the role identification.

Specifically, extracting target role session data from the session data according to the role identifier includes: and determining the target role according to the role identification, and further extracting target role session data corresponding to the target role from the session data.

By way of example, the target role may be, for example, the role "customer service". When the role identification is KF, the executing agent may extract session data of "customer service" from the session data according to the role identification, and an example of partial customer session data (i.e., target role session data) is as follows:

customer service of good and in

Customer service, whether a pickup address is a pickup address and whether a contact person and a number are changed?

18 customer service points before submitting, 22 points before reviewing, 18 points after submitting, and 12 points before reviewing tomorrow

The customer service is

Customer service me submits first and waits to be audited

Customer service has applied for

Kay by customer service

And step S103, predicting the probability of the target role conversation data belonging to the preset category, the probability containing a positive factor and the probability containing a negative factor.

The predetermined categories may include very satisfied, general, unsatisfied, and very unsatisfied. The probability of the execution subject predicting the target role session data belonging to the preset category can be realized by the following modes: the target character conversation data is participled to obtain participle data, for example, the target character conversation data of "22 o 'clock before check today, and it is expected that you are in contact with tomorrow in the morning and please keep the telephone unobstructed" is participled to obtain "22 o' clock before check", "tomorrow in the morning", "contact with you" and "please keep the telephone unobstructed". After the executive body obtains the word segmentation data, a preset database in which word segmentation-category key value pairs are stored can be called, so that corresponding categories are obtained according to the obtained word segmentation data of ' 22 o ' clock prior audit ', ' tomorrow morning ', ' contacting with you ', ' please keep the phone unobstructed ', and then the probability of belonging to the preset categories is determined based on the obtained corresponding categories. It is assumed that the categories of the participle data "22 a review before examination", "tomorrow am", "contact you", "please keep phone clear" are: the "22 o 'clock prior review" - "very satisfied", "tomorrow am" - "very satisfied", "contact with you" - "very satisfied", "please keep telephone clear" - "satisfied", then the execution main body can obtain "22 o' clock prior review today", it is expected that tomorrow am contacts with you, please keep telephone clear ", the probability that the target character conversation data belongs to the preset category is: the probability of belonging to "very satisfied" is 75%, the probability of belonging to "satisfied" is 25%, the probability of belonging to "general" is 0, the probability of belonging to "unsatisfied" is 0, and the probability of belonging to "very unsatisfied" is 0.

The positive going factor may be a word, word or phrase having a positive meaning. A negative factor may be a word, word or phrase having a negative meaning. The following are exemplified:

forward factor: sweet language, homonymy, escalation feedback scheme, calming/apology to the user's mood, clear scheme aging, words, or phrases responsible for.

Negative factors are as follows: the semantics are not questioned, improper communication, deniability, unappealing emotion, promise of not cashing, and the customer compensating the unresponsive words, words or phrases.

The execution subject may perform word segmentation on the target role session data to obtain word segmentation data, which is exemplified by: "22 check before you," tomorrow morning, "contact you," "please leave the phone clear," "you can find XX for customer service, this i am not too much," this is not work within my responsibility, "" you want so do not so do me, "" not my mistake, "" this compensation cannot now be given to you, "" sorry is now off-hours.

In this embodiment, "22 a check before examination", "tomorrow morning", "contact you", "please keep the phone clear" belongs to the forward factor, and the probability in the obtained participle data is 40%, that is, the probability that the forward factor is included in the target character conversation data is 40%; "you can find XX customer service questions, this me does not know very much", "this is not work within my responsibility", "you want to do so without me", "not my mistake", "this compensation cannot now give you" and "sorry is now off duty", do not belong to the forward factor, with a probability of 60%.

In this example, "22 a check before examination", "tomorrow morning", "contact you", "please keep phone clear" is not a negative factor, and the probability in the resulting segmented data is 40%; "you can ask XX customer service questions that me does not understand well", "this is not work within my responsibility", "you want to do so without me", "not my mistake", "this compensation cannot now be given to you", "apology is now off duty", belongs to the negative factor, i.e. the probability that the target character session data contains the negative factor is 60%.

And step S104, determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability containing the positive factors and the probability containing the negative factors.

Specifically, before predicting the probability that the target role session data belongs to the preset category, the probability containing a positive factor and the probability containing a negative factor, the method further comprises the following steps: determining time corresponding to the target role session data, and splicing the target role session data according to the time to generate spliced session data; the target role session data is updated with the splicing session data.

An example, splicing mode: character string splicing and space adding in the middle.

For example, before splicing:

customer service: your good at

Customer service: ask what can help you

After splicing: you can ask what can help you, what you are

By splicing the target session data, the target session data can be more complete, the context semantics can be clearer, and the accuracy of various probability predictions can be improved.

Specifically, predicting the probability that the target role session data belongs to the preset category, the probability containing a positive factor and the probability containing a negative factor includes: calling a percentile model to input target role conversation data into the percentile model and further output the probability that the target role conversation data belong to a preset category; calling a forward factor model to input the target role session data into the forward factor model, and further outputting the probability that the target role session data contains a forward factor; and calling the negative factor model to input the target role session data into the negative factor model, and further outputting the probability that the target role session data contains the negative factor.

And in the one-pass session, all parts belonging to customer service are extracted and spliced to be used as the input of the percentile model. Model structure: the model is classified by adopting a fasttext model, other classification models can be replaced, the execution logic of the specific percentile model is shown in fig. 4, the customer service skills are spliced, then category prediction is carried out through the fasttext model to obtain 5 category probabilities, and then the probabilities are multiplied by corresponding scores respectively to obtain final scores, for example:

the very satisfactory probability is 0.9, the satisfactory probability is 0.05, the general probability is 0.01, the unsatisfactory probability is 0.02, and the very unsatisfactory probability is 0.02. The final score is:

100*0.9+80*0.05+60*0.01+40*0.02+20*0.02＝95.8

wherein, the model training corpus of the percentage model is as follows: 10000 manual customer service sessions are selected for marking, and each session is marked with a category according to the satisfaction degree. Label category very satisfied, general, unsatisfied, very unsatisfied, one of the five. The labeling rules are that the customer service answers in the 10000 communication session are manually classified into five categories, namely, very satisfied, general, unsatisfied and very unsatisfied.

Positive and negative factor models: the processing logic used for both models is the same and is described together here. The forward factor model is taken as an example below:

inputting a model: and extracting the customer service parts in the conversation details of the customer service and the customer, splicing the customer service parts and inputting the parts serving as models. The logic here is the same as for the percentile model.

And (3) outputting a model: and outputting two categories, wherein the first category is the probability of containing the forward factor, and the second category is the probability of not containing the forward factor. That is, if the first category probability is greater than the second category probability, the customer answer in this answer contains a forward factor.

Model training corpus: and selecting 2000 channels of manual customer service sessions for marking, wherein each channel of session is marked according to whether a forward factor is included.

The model structure of the forward factor model is shown in fig. 5. In fig. 5, the execution logic of the model is: firstly, splicing customer service skills, then carrying out category prediction through a fasttext model, and finally determining whether a forward factor is contained according to the category probability.

The negative factor model is also the same step and model, and the difference lies in that the training corpora of the two are different, one is to label positive direction and non-positive direction, and the other is to label negative direction and non-negative direction.

And determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor. The target session data is the screened excellent answer case. And (5) mining excellent response cases together according to the percentile model, the positive factor model and the negative factor model. The specific rule is as follows: (1) percent modeling results are greater than 80; (2) The forward factor model results in the inclusion of a forward factor (the probability of inclusion of a forward factor is greater than 50%); (3) The negative factor model results in no negative factor being included (the probability of including a negative factor is 50% or less).

Wherein, the screening condition of the percentage model (the percentage model result is more than 80) can be adjusted at any time according to the final result of the excavation. The excellent customer service cases are screened together based on the deep learning models, so that the excellent customer service cases (namely target session data) obtained through screening are more accurate, and the working capacity of a new person is promoted rapidly.

In the embodiment, the corresponding session data and the role identification are acquired by receiving the data processing request; extracting target role session data from the session data according to the role identification; predicting the probability that the target role session data belong to a preset category, the probability containing a positive factor and the probability containing a negative factor; and determining and outputting target session data from the target role session data according to the probability belonging to the preset category, the probability containing the positive factor and the probability containing the negative factor. The target session data is a result screened by a plurality of deep learning methods together, so that the target session data can be rapidly and accurately mined for new people to learn.

Fig. 2 is a schematic main flow diagram of a data processing method according to a second embodiment of the present application, and as shown in fig. 2, the data processing method includes:

step S201, receiving a data processing request, and acquiring corresponding session data and a role identifier.

The session data corresponding to the data processing request may be session data of the same day, or session data of several days apart, for example, session data of monday, session data of wednesday, and session data of friday. The session data may also be session data for discrete time periods, such as 10 a.m. monday: 01-10, 14 pm. The embodiment of the present application does not specifically limit the time corresponding to the session data.

Step S202, extracting target role session data from the session data according to the role identification.

Step S203, predicting the probability that the target role session data belongs to the preset category, the probability containing the positive factors and the probability containing the negative factors.

Step S204, for each piece of target role session data, determining a corresponding score according to the probability belonging to the preset category and the preset category score.

The execution subject may multiply the probability of belonging to the preset category and the corresponding preset category score to obtain a final score of each piece of target role session data (which may be session data of customer service 1, session data of customer service 2, and session data of customer service N).

The following are exemplified: the probability of being assigned to the very satisfied category is 0.9, the probability of being assigned to the satisfied category is 0.05, the probability of being assigned to the general category is 0.01, the probability of being assigned to the unsatisfied category is 0.02, and the probability of being assigned to the very unsatisfied category is 0.02. The final score is:

100*0.9+80*0.05+60*0.01+40*0.02+20*0.02＝95.8

and S205, in response to the fact that the probability containing the positive factors is larger than a preset threshold and the probability containing the negative factors is smaller than the preset threshold, determining target session data in the target role session data according to the scores and outputting the target session data.

The execution main body can determine the target role conversation data of which the result of the percentile model is more than 80 points, the result of the positive factor model is the target role conversation data containing the positive factor and the result of the negative factor model does not contain the negative factor as the target conversation data, namely the excellent customer service case, and can push the target role conversation data to a terminal corresponding to a new person in a mail, short message or other modes for the new person to learn.

For example, the screening result is a one-pass conversation, the percentage model result is 90 points, and meanwhile, positive factors have responsibility and negative factors do not exist, and then, the screening is passed. If the one-pass session is scored as 70 points, then the screening is not passed because the requirement score must be >80, or there is a negative factor such as "answer questions" in the one-pass session and the probability of occurrence is greater than 50%, then the screening is also not passed.

Specifically, determining and outputting target session data in the target role session data according to each score includes:

and in response to the fact that the scores are larger than a preset score threshold value in the scores, determining target role session data corresponding to the scores larger than the preset score threshold value in the scores as target session data and outputting the target role session data.

And determining which target role session data are the target session data by judging the fraction output by the percentage model on the premise that the probability containing the positive factors is greater than a preset threshold and the probability containing the negative factors is less than the preset threshold. Specifically, the target role session data corresponding to the score greater than the preset score threshold, for example, the score of 80, is determined as the target session data and output.

Specifically, the method further comprises:

and in response to the fact that the probabilities of the positive factors are smaller than a preset threshold value or the probabilities of the negative factors are larger than the preset threshold value, determining that the target role session data do not exist, and ending the data processing process.

As long as the target character session data does not include the positive factors (i.e., the probabilities of each including the positive factors are smaller than the preset threshold) or includes the negative factors (i.e., the probabilities of each including the negative factors are greater than the preset threshold), the target character session data does not meet the requirements of the customer service excellent case, and cannot be output as the target session data, and the process of determining the target session data is terminated.

Fig. 3 is a schematic view of an application scenario of a data processing method according to a third embodiment of the present application. The data processing method of the embodiment of the application is applied to a scene of selecting excellent customer service cases from numerous customer service session data. As shown in fig. 3, after receiving the entire session, the executive body may extract the customer service session (i.e., customer service session data, that is, target role session data in the present application), splice the extracted customer service session, input the spliced customer service session into the percentile model, the positive factor model, and the negative factor model, output the scores, the probability including the positive factor, and the probability including the negative factor, respectively, and then, the executive body may perform result screening on the input regular session based on the output scores, the probability including the positive factor, and the probability including the negative factor, where the specific rule of the screening is as follows: (1) percent modeling results are greater than 80; (2) The forward factor model results in the inclusion of the forward factor (the probability of inclusion of the forward factor is greater than 50%); (3) And (4) the negative factor model result is that the negative factor is not included (the probability of including the negative factor is less than or equal to 50%), and finally, excellent response cases (namely target session data) which accord with the screening rule are obtained and output.

The embodiment of the application can be mapped to the scores of 0-100 according to the performance of customer service, the granularity is finer, and the statistical analysis and quantification are easy to realize; positive and negative factors are customized and corresponding models are trained to serve as excellent response case screening rules. The method realizes the fast and accurate mining of excellent customer service cases for new people to learn.

Fig. 6 is a schematic diagram of main units of a data processing apparatus according to an embodiment of the present application. As shown in fig. 6, the data processing apparatus 600 includes a receiving unit 601, a data extracting unit 602, a probability predicting unit 603, and an output unit 604.

A receiving unit 601 configured to receive a data processing request, and obtain corresponding session data and a role identifier;

a data extraction unit 602 configured to extract target role session data from the session data according to the role identification;

a probability prediction unit 603 configured to predict a probability that the target role session data belongs to a preset category, a probability including a positive direction factor, and a probability including a negative direction factor;

and an output unit 604 configured to determine and output target session data from the target character session data according to the probability of belonging to the preset category, the probability of containing a positive-going factor, and the probability of containing a negative-going factor.

In some embodiments, the output unit 604 is further configured to: for each piece of target role session data, determining a corresponding score according to the probability belonging to a preset category and the score of the preset category; and determining and outputting target session data in the target role session data according to each score in response to the fact that the probability containing the positive factors is larger than a preset threshold and the probability containing the negative factors is smaller than the preset threshold.

In some embodiments, the output unit 604 is further configured to: and in response to the fact that the scores are larger than the preset score threshold value in the scores, determining the target role session data corresponding to the scores larger than the preset score threshold value in the scores as target session data and outputting the target role session data.

In some embodiments, the output unit 604 is further configured to: and determining that the target role session data does not exist in the target role session data in response to the fact that the probabilities containing the positive factors are all smaller than a preset threshold or the probabilities containing the negative factors are all larger than the preset threshold, and ending the data processing process.

In some embodiments, the probability prediction unit 603 is further configured to: calling a percentile model to input the target role conversation data into the percentile model, and further outputting the probability that the target role conversation data belongs to a preset category; calling a forward factor model to input the target role session data into the forward factor model, and further outputting the probability that the target role session data contains a forward factor; and calling a negative factor model to input the target role session data into the negative factor model, and further outputting the probability that the target role session data contain the negative factors.

In some embodiments, the data extraction unit 602 is further configured to: and determining a target role according to the role identification, and further extracting target role session data corresponding to the target role from the session data.

In some embodiments, the data processing apparatus further comprises a data splicing unit, not shown in fig. 6, configured to: determining time corresponding to the target role session data, and splicing the target role session data according to the time to generate spliced session data; the target role session data is updated with the splicing session data.

It should be noted that, in the present application, the data processing method and the data processing apparatus have corresponding relation in the specific implementation contents, and therefore, the repeated contents are not described again.

Fig. 7 shows an exemplary system architecture 700 to which the data processing method or data processing apparatus of the embodiments of the present application may be applied.

As shown in fig. 7, the system architecture 700 may include

terminal devices

701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the

terminal devices

701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the

terminal devices

701, 702, 703.

The

terminal devices

701, 702, 703 may be various electronic devices having a data processing screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 705 may be a server providing various services, such as a background management server (for example only) providing support for data processing requests submitted by users using the

terminal devices

701, 702, 703. The background management server can receive the data processing request and acquire the corresponding session data and the role identification; extracting target role session data from the session data according to the role identification; predicting the probability that the target role session data belong to a preset category, the probability containing a positive factor and the probability containing a negative factor; and determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor. The target session data is a result screened by a plurality of deep learning methods together, so that the target session data can be rapidly and accurately mined for new people to learn.

It should be noted that the data processing method provided in the embodiment of the present application is generally executed by the server 705, and accordingly, the data processing apparatus is generally disposed in the server 705.

It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing a terminal device of an embodiment of the present application. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the computer system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a liquid crystal credit authorization query processor (LCD), and the like, and a speaker; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to embodiments disclosed herein, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments disclosed herein include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, a data extraction unit, a probability prediction unit, and an output unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs, which when executed by a device, cause the device to receive a data processing request and obtain corresponding session data and a role identifier; extracting target role session data from the session data according to the role identification; predicting the probability of the target role conversation data belonging to a preset category, the probability containing a positive factor and the probability containing a negative factor; and determining and outputting target session data from the target role session data according to the probability of belonging to the preset category, the probability of containing the positive factor and the probability of containing the negative factor.

According to the technical scheme of the embodiment of the application, the target session data are jointly screened results based on a plurality of deep learning methods, and the target session data can be rapidly and accurately mined for new people to learn.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data processing method, comprising:

predicting the probability that the target role session data belongs to a preset category, the probability containing a positive factor and the probability containing a negative factor;

and determining and outputting target session data from the target role session data according to the probability belonging to the preset category, the probability containing the positive factor and the probability containing the negative factor.

2. The method according to claim 1, wherein the determining and outputting target character conversation data from the target character conversation data according to the probability of belonging to the preset category, the probability of containing a positive factor and the probability of containing a negative factor comprises:

for each piece of target role session data, determining a corresponding score according to the probability belonging to the preset category and the preset category score;

and in response to the fact that the probability containing the positive factors is larger than a preset threshold and the probability containing the negative factors is smaller than the preset threshold, determining target session data in the target role session data according to each score and outputting the target session data.

3. The method according to claim 2, wherein the determining and outputting target session data in the target role session data according to each score comprises:

and in response to the fact that the score is larger than a preset score threshold value in each score, determining target role session data corresponding to the score larger than the preset score threshold value in each score as target session data and outputting the target role session data.

4. The method of claim 2, further comprising:

and in response to the fact that the probability of each positive-direction-containing factor is smaller than a preset threshold or the probability of each negative-direction-containing factor is larger than a preset threshold, determining that target session data does not exist in the target role session data, and ending the data processing process.

5. The method of claim 1, wherein predicting the probability that the target role session data belongs to a preset category, the probability comprising a positive factor, and the probability comprising a negative factor comprises:

calling a percentile model to input the target role session data into the percentile model, and further outputting the probability that the target role session data belong to a preset category;

and calling a negative factor model to input the target role session data into the negative factor model, and further outputting the probability that the target role session data contains the negative factor.

6. The method of claim 1, wherein extracting target role session data from the session data according to the role identification comprises:

7. The method of claim 1, wherein prior to the predicting the probability that the target role session data belongs to a preset category, the probability comprising a positive going factor, and the probability comprising a negative going factor, the method further comprises:

determining time corresponding to target role session data, and splicing the target role session data according to the time to generate spliced session data;

and updating the target role session data by utilizing the splicing session data.

8. A data processing apparatus, characterized by comprising:

a probability prediction unit configured to predict a probability that the target character session data belongs to a preset category, a probability including a positive factor, and a probability including a negative factor;

and the output unit is configured to determine and output target session data from the target role session data according to the probability of belonging to the preset category, the probability containing a positive factor and the probability containing a negative factor.

9. The apparatus of claim 8, wherein the output unit is further configured to:

10. The apparatus of claim 9, wherein the output unit is further configured to:

11. An electronic device for data processing, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.