CN113452853A

CN113452853A - Voice interaction method and device, electronic equipment and storage medium

Info

Publication number: CN113452853A
Application number: CN202110760477.XA
Authority: CN
Inventors: 张琳; 刘俭; 贺嘉; 夏泽保; 吴文韬
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2021-09-28
Anticipated expiration: 2041-07-06
Also published as: CN113452853B

Abstract

The disclosure provides a voice interaction method and device, electronic equipment and a storage medium, and relates to the technical field of computers. The voice interaction method comprises the following steps: collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters; setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task; performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene; and performing voice interaction with the target object through the interactive voice data so as to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes. The technical scheme of the embodiment of the disclosure can effectively improve the voice outbound efficiency and reduce the voice outbound cost.

Description

Voice interaction method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a voice interaction method, a voice interaction apparatus, an electronic device, and a computer-readable storage medium.

Background

With the rapid development of communication networks, the size of users using communication network services has greatly increased, and accordingly, many network services need to be provided for users in a voice outbound manner.

Currently, Voice call-out is realized either manually or by Interactive Voice Response (IVR) technology based on Voice recognition, semantic understanding, and Voice synthesis. However, the voice outbound is realized in a manual mode, and under the condition that the scale of a user is large, not only is the efficiency low and the labor cost huge, but also the emotional problem of service personnel may exist; and the voice outbound is realized through the interactive voice response technology, the response of voice interaction is relatively fixed, the phenomenon of question answering can be caused, the interaction effect is not ideal, and the efficiency of voice interaction is low.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a voice interaction method, a voice interaction apparatus, an electronic device, and a computer-readable storage medium, so as to overcome the problems of low voice outbound efficiency and unsatisfactory voice outbound effect in related schemes at least to a certain extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present disclosure, there is provided a voice interaction method, including:

collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;

setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;

performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;

and performing voice interaction with the target object through the interactive voice data so as to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.

In some example embodiments of the present disclosure, based on the foregoing solution, the collecting voice interaction tasks from different data sources includes:

acquiring voice interaction tasks imported in batches from a management system based on a preset outbound task import template; and/or

Acquiring a voice interaction task from a third-party system based on an open standard interface; and/or

And capturing the voice interaction task from the third-party system based on the data acquisition tool.

In some example embodiments of the present disclosure, based on the foregoing, the method further includes:

converting the collected voice interaction tasks into voice interaction tasks in a standard format according to a pre-configured field conversion mapping relation; the voice interaction task in the standard format comprises a fixed static field and a dynamic personalized parameter.

In some example embodiments of the present disclosure, based on the foregoing scheme, setting a priority of each voice interaction task based on an outbound scenario corresponding to each voice interaction task includes:

constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities;

and acquiring the priority attribute of the voice interaction task, and placing the voice interaction task into the number pool according to the priority attribute so as to finish setting the priority of each voice interaction task.

and distributing the voice relay lines corresponding to different outbound scenes.

filtering the voice interaction tasks under different outbound scenes according to preset filtering conditions;

the preset filtering conditions comprise black and white list filtering conditions, number segment filtering conditions, calling time period filtering conditions, multi-scene cross filtering conditions, repeated calling filtering conditions and calling scene directional filtering conditions.

In some example embodiments of the present disclosure, based on the foregoing solution, the generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalization parameter and the outbound scenario includes:

acquiring a mechanical speech template corresponding to the outbound scene; the machine telephony template comprises a plurality of voice interaction nodes;

determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;

and assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.

if the voice interaction task is detected to be completed, acquiring a voice interaction record corresponding to the voice interaction task;

and if the voice interaction task is from a third-party system, returning the voice interaction record to the third-party system so that the third-party system can perform data association and other subsequent logic processing.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice interaction apparatus, including:

the voice interaction task acquisition module is used for acquiring voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;

the priority determining module is used for setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;

the interactive voice data generation module is used for carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;

and the voice interaction module is used for carrying out voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction tasks under a plurality of outbound scenes.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory having computer readable instructions stored thereon, the computer readable instructions, when executed by the processor, implementing the voice interaction method of any one of the above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a voice interaction method according to any one of the above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the voice interaction method in the disclosed example embodiment collects voice interaction tasks from different data sources, the voice interaction tasks comprise dynamic personalized parameters, and then sets the priority of each voice interaction task based on an outbound scene corresponding to each voice interaction task; performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and an outbound scene; and performing voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction task under a plurality of outbound scenes. On one hand, interactive voice data corresponding to the voice interaction task are generated through the dynamic personalized parameters contained in the collected voice interaction task, so that the interactive voice data can change along with the change of the dynamic personalized parameters under different outbound scenes, the interactive voice data can better conform to interactive scenes, the flexibility and the accuracy of the interactive voice data are improved, and the user experience is improved; on the other hand, the voice interaction tasks are collected from different data sources, and the voice outbound operation is carried out on the target object corresponding to the voice interaction tasks according to the priority, so that the repeated outbound of the same target is avoided, the multi-scene concurrent voice outbound is possible, the voice outbound efficiency is effectively improved, and the labor cost is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a voice interaction method, in accordance with some embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow diagram for building priorities for voice interaction tasks, in accordance with some embodiments of the present disclosure;

FIG. 3 schematically illustrates a flow diagram for prioritizing construction of number pools, in accordance with some embodiments of the present disclosure;

FIG. 4 schematically illustrates a flow diagram for filtering voice interaction tasks, in accordance with some embodiments of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for assembling interactive voice data, in accordance with some embodiments of the present disclosure;

FIG. 6 schematically illustrates a flow diagram for implementing an outbound operation with assembled interactive voice data, in accordance with some embodiments of the present disclosure;

FIG. 7 schematically illustrates a flow diagram of interaction records corresponding to a returned voice interaction task, in accordance with some embodiments of the present disclosure;

FIG. 8 schematically illustrates a flow diagram for enabling voice interaction, in accordance with some embodiments of the present disclosure;

FIG. 9 schematically illustrates an application scenario diagram of a voice interaction method, in accordance with some embodiments of the present disclosure;

FIG. 10 schematically illustrates a schematic diagram of a voice interaction device, in accordance with some embodiments of the present disclosure;

FIG. 11 schematically illustrates a structural schematic of a computer system of an electronic device, in accordance with some embodiments of the present disclosure;

fig. 12 schematically illustrates a schematic diagram of a computer-readable storage medium, according to some embodiments of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

Furthermore, the drawings are merely schematic illustrations and are not necessarily drawn to scale. The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

In this exemplary embodiment, first, a voice interaction method is provided, where the voice interaction method may be applied to a server or a terminal device, and this is not particularly limited in this exemplary embodiment. In the following, a voice interaction method is described by taking an example of the method performed by a server, fig. 1 schematically illustrates a schematic diagram of a flow of the voice interaction method according to some embodiments of the present disclosure, and referring to fig. 1, the voice interaction method may include the following steps:

step S110, collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;

step S120, setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;

step S130, carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;

and step S140, performing voice interaction with the target object through the interactive voice data to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.

According to the voice interaction method in the embodiment, on one hand, interactive voice data corresponding to the voice interaction task are generated through the dynamic personalized parameters contained in the collected voice interaction task, so that the interactive voice data can change along with the change of the dynamic personalized parameters under different outbound scenes, the interactive voice data can better conform to interactive scenes, the flexibility and the accuracy of the interactive voice data are improved, and the user experience is improved; on the other hand, the voice interaction tasks are collected from different data sources, and the voice outbound operation is carried out on the target object corresponding to the voice interaction tasks according to the priority, so that the repeated outbound of the same target is avoided, the multi-scene concurrent voice outbound is possible, the voice outbound efficiency is effectively improved, and the labor cost is reduced.

Next, the voice interaction method in the present exemplary embodiment will be further explained.

In step S110, voice interaction tasks are collected from different data sources; the voice interaction task includes dynamic personalization parameters.

In an example embodiment of the present disclosure, the voice interaction task refers to an outbound task of an interactive voice response provided to the user based on a specific requirement, for example, the voice interaction task may be an outbound task of a data query service provided to the user based on the interactive voice response, such as a voice prompt user to select different number keys to implement query of data such as telephone charge, traffic, and the like; the voice interaction task may also be an outbound task of a recommendation service such as call package recommendation and traffic package recommendation provided to the user based on the interactive voice response, for example, contents of the call package and the traffic package are introduced by voice, and operations such as fast opening or canceling of the call package and the traffic package are realized by an answer of the user.

Based on this, the present exemplary embodiment provides a scheme for acquiring voice interaction tasks from different data sources, so as to extend a source range of the voice interaction tasks, extend an application scenario of the voice interaction tasks, and improve flexibility.

Specifically, the different data sources may include a management system, a third-party system, and the like, and the voice interaction tasks that are imported in batches may be acquired from the management system based on a preset outbound task import template; and/or acquiring a voice interaction task from a third-party system based on an open standard interface; and/or capturing voice interaction tasks from a third-party system based on the data collection tool.

The preset outbound task importing template is a template which is generated by a management system and used for triggering a voice interaction task, a machine dialect can be selected when an outbound scene is created, the management system can automatically generate the outbound task importing template according to dialect configuration and personalized variable information, an operator can download the outbound task importing template, and outbound of the voice interaction task can be triggered by filling related data of the voice interaction task in a template format of the outbound task importing template.

The open standard interface can include but is not limited to an HTTP POST JSON standard interface, and can support the submission of a single voice interaction task of a third-party system, the submission of batch voice interaction tasks and the submission of streaming voice interaction tasks, so that the requirements of the third-party system in most application scenes can be met, the application scenes which can be supported by the expanded voice interaction tasks can be effectively promoted, and the flexibility of deployment in each application scene can be promoted.

In this example, in addition to passively acquiring the voice interaction task by actively submitting it from another data source, the voice interaction task may also be acquired by actively acquiring it, for example, the voice interaction task may be captured from a database, an FTP (File Transfer Protocol) server, a message queue, and other services in a third-party system by a preset data acquisition tool.

Of course, the above is merely an illustrative example, and the voice interaction task may be collected from different data sources in various ways in the present exemplary embodiment, which is not limited in this exemplary embodiment.

In an exemplary embodiment of the present disclosure, in order to ensure compatibility of each voice collecting task and enable the voice collecting task to be successfully called out, after the voice interaction tasks are collected from different data sources, the collected voice interaction tasks may be uniformly passed through a data protocol adaptation layer to perform protocol and data conversion on the voice interaction tasks, and during the conversion, the collected voice interaction tasks may be converted into the voice interaction tasks in a standard format according to a pre-configured field conversion mapping relationship.

Specifically, the protocol of each voice acquisition task can be uniformly converted into a protocol (such as an HTTP POST JSON protocol) inside the system; then, according to the configured field conversion rule, converting the field in the original data message into a standard field and a format required by a task interacting with voice in a specified system, wherein information required by the system can be uniformly converted into json format data; then, according to the configured field conversion rule, checking a dynamic personalized variable required by the outbound scene and json format data for comparison, and executing the next logic processing after the comparison is passed; after conversion, standard format data (fixed static field + dynamic json data) corresponding to the voice interaction task can be stored in the database, the voice interaction task data in the system is generated, a unified processing mechanism of multi-source data fusion is realized, and the docking efficiency among systems of different data sources is greatly improved.

In step S120, a priority of each of the voice interaction tasks is set based on an outbound scenario corresponding to each of the voice interaction tasks.

In an example embodiment of the present disclosure, the outbound scenario refers to an application scenario corresponding to different voice interaction tasks, for example, an outbound scenario in which the voice interaction tasks such as telephone fee query and traffic query belong to a self-service query type, and an outbound scenario in which the voice interaction tasks such as arrearage notification belong to an active notification type, and of course, the outbound scenario may be set by a user according to different types of voice interaction tasks, which is not particularly limited in this example embodiment.

Specifically, step S120 may further include step S210 and step S220, and the priority of each voice interaction task may be set based on the outbound scenario corresponding to each voice interaction task, which is implemented based on step S210 and step S220, and as shown in fig. 2, the method specifically includes the following steps:

step S210, constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities;

step S220, obtaining the priority attribute of the voice interaction task, and placing the voice interaction task into the number pool according to the priority attribute to finish setting the priority of each voice interaction task.

The number pool refers to an outbound number set by an administrator in relay management, and the priority attribute refers to attribute information for judging priority set for each voice interaction task, for example, the priority attribute may be a priority level set for the voice interaction task, such as priority level 1, priority level 2, priority level 3, priority level 4, priority level 5, and the like; the priority attribute may also be a type of an outbound scenario corresponding to the voice interaction task, and certainly, the priority attribute may also be other attribute data capable of distinguishing a priority level corresponding to the voice interaction task, which is not particularly limited in this example embodiment.

In an example embodiment of the present disclosure, voice trunk lines corresponding to different outbound scenarios may be assigned. The voice Trunk Line (Trunk Line) is directly connected with all lines and affiliated equipment between two switching systems, and independent voice Trunk lines are distributed for different outbound scenes, so that when a plurality of outbound scenes simultaneously send outbound calls, the outbound call success rate of a voice interaction task is improved, and the outbound call efficiency is improved.

Fig. 3 schematically illustrates a flow diagram of prioritizing construction of number pools, according to some embodiments of the present disclosure.

Referring to fig. 3, in an application scenario where a multi-outbound scenario is concurrently outbound, a priority function of a voice interaction task may be implemented by dividing a voice relay line and managing a number pool, and specifically may be implemented by the following steps:

step S310, when the outbound scene is created based on the mechanical telephony template, the required number of voice trunk lines can be allocated, the allocated voice trunk lines of the part are exclusively occupied by the corresponding outbound scene, and of course, the number of voice trunk lines can be readjusted in the system according to the actual scene requirement;

step S320, when the voice interaction task needing the outbound exists in the outbound scene, a plurality of number pools can be set and constructed according to the priority level by default, the voice interaction task is placed in the appointed number pool according to the priority attribute corresponding to the voice interaction task, and then the voice interaction task can be extracted from the number pools by the scheduling thread according to the priority order of the number pools to carry out rule detection.

It should be noted that, if the voice interaction task has the same number already being called due to the parallel outbound in the multi-outbound scenario, the queuing parameter (for example, the combination of the timestamp and the sequence number) of the voice interaction task is adjusted to be placed at the end of the number pool queue in the home outbound scenario, so that the problem of poor user outbound experience due to continuous outbound to the user can be effectively avoided.

In an example embodiment of the present disclosure, the voice interaction task extracted from the number pool is subjected to rule detection, and specifically, the voice interaction tasks in different outbound scenes may be filtered according to preset filtering conditions, where the preset filtering conditions may include a black-and-white list filtering condition, a number segment filtering condition, an outbound time period filtering condition, a multi-scene cross filtering condition, a repeat outbound filtering condition, and an outbound scene directional filtering condition.

FIG. 4 schematically illustrates a flow diagram for filtering voice interaction tasks, according to some embodiments of the present disclosure.

Referring to fig. 4, step S410, checking according to the black and white list filtering condition: the system can preset a global black and white list and an outbound scene level black and white list, wherein the outbound scene level black and white list is prior to the global black and white list, and if the check fails, the voice interaction task is directly updated to be in an outbound failure state;

step S420, checking according to the number segment filtering condition: checking whether the number section of the voice interaction task meets the number section pre-configured in the outbound scene, and if the number section does not meet the home range configured in the outbound scene, updating the voice interaction task to be in an outbound failure state;

step S430, according to the outbound time period filtering condition check: whether the time range of the voice interaction task is within the time range pre-configured for the batch and the outbound scene is checked, and because the numbers in the number pool are sequenced according to the outbound time stamps calculated in advance, when a certain voice interaction task does not meet the time requirement, the rest outbound tasks in the whole number pool need to wait for the next outbound time period of the batch or the outbound scene to initiate outbound, so that unnecessary delay is caused, and the outbound efficiency is reduced, therefore, the outbound time period filtering condition check is performed on the voice interaction task, and the outbound efficiency can be effectively improved;

step S440, checking according to the multi-scene cross filtering condition: acquiring a cross configuration rule of a multi-outbound scene to circularly check whether a rule that the outbound does not exceed N times in M days is met (wherein M and N are positive integers greater than or equal to 1), wherein a specific multi-scene cross filtering process comprises the following steps: (1) each outbound scene stores the outbound record according to the day, and is used for carrying out repeated outbound inspection and multi-scene cross rule inspection in the outbound scene; (2) acquiring a multi-outbound scene crossing rule related to an outbound task; (3) circulating each outbound scene in the rule, traversing outbound records from the date before M days to the current date in the outbound scene, and checking whether the same number exists or not; (4) if the same number exists, accumulating the number corresponding to the number, judging whether the accumulated number exceeds N times, if so, checking the multi-scene cross filtering condition to fail, and updating the voice interaction task to be in an outbound failure state;

step S450, checking according to the repeated outbound filtering condition: the voice interaction task can be directly updated to the state of outbound failure if the same number is contained in the outbound record from the date before M days to the current date in the outbound scene by traversing the outbound scene according to the repeated outbound check configuration of the outbound scene;

step S460, checking directional filtering conditions according to an outbound scene: the method can be used for adapting different outbound scenes by opening a general interface protocol, realizes service logic verification based on the outbound scenes, checks whether the outbound is allowed from a service angle, for example, aiming at the outbound scenes of arrearage notification, the interface can be called to check whether a user pays the fee before the outbound, and the like, and directly updates the voice interaction task to be in an outbound failure state if the interface detects that the outbound of the voice interaction task is not allowed.

Continuing to refer to fig. 1, in step S130, a voice outbound operation is performed on the target object corresponding to the voice interaction task according to the priority, and interactive voice data corresponding to the voice interaction task is generated by combining the dynamic personalized parameter and the outbound scene.

In an example embodiment of the present disclosure, the interactive voice data refers to a voice played by the voice interaction task to the user in the process of voice call, the voice outbound operation refers to an operation of making an outbound request to a user phone number included in the voice interaction task, a phone call channel can be established with the target user through the voice outbound operation, and the corresponding interactive voice data is played through the phone call channel and the user triggering operation.

Specifically, step S130 may include step S510 to step S530, and the step S510 to step S530 may implement generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameter and the outbound scenario, as shown in fig. 5, which specifically includes:

step S510, a machine telephony template corresponding to the outbound scene is obtained; the machine telephony template comprises a plurality of voice interaction nodes;

step S520, determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;

step S530, assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.

The machine language template refers to a template which is preset according to different calling scenes and is used for generating interactive voice data, for example, the machine language template can be ' inquire { variable 1} if needed, ask { variable 2} ask, press 1 ', ask 2 '.

The voice interaction node refers to an interaction node which is set by a machine language template and can realize voice interaction with a user, for example, the machine language template can be "call charge request is pressed 1 if needed, remaining traffic request is pressed 2" if needed ", when the user selects 1 or 2, the interaction voice data will jump to a part corresponding to" 1 "or" 2 ", where the interaction voice part corresponding to" 1 "or" 2 "is the voice interaction node in the machine language template.

The input information of the target object refers to information of a voice interaction node for switching interactive voice data, which is selectively input by a user, for example, the interactive voice data may be "please press 1 if it is required to query { telephone charge } and please press 2 if it is required to query { remaining traffic }, then" 1 "or" 2 "of the voice input by the user through an input control of the user terminal or the user is input information of the target object, of course, the input information of the target object may also be information of a voice interaction node for switching interactive voice data input in other manners, which is not particularly limited in this example.

The reply voice data refers to information in the machine telephony template after the voice interaction node of the interactive voice data is switched by the input information, for example, for the interactive voice data, "call charge is requested to be inquired as needed, call remaining flow is requested to be pressed as 1, call remaining flow is requested to be pressed as 2", after the input information of the user selects "2", and the voice data replied by the robot "your remaining flow is the acquired flow value" may be considered as the reply voice data, which is merely an illustrative example, and the exemplary embodiment is not limited thereto.

FIG. 6 schematically illustrates a flow diagram for implementing an outbound operation with assembled interactive voice data, in accordance with some embodiments of the present disclosure.

Referring to fig. 6, matching may be performed according to the required personalized variables and the dynamic personalized parameters in the outbound task during the outbound process, so as to implement related logical judgment, assemble a voice interaction data set, and play the voice interaction data set to the user, which specifically includes the following steps:

step S610, before the outbound task subsystem submits the outbound request to the CTI, personalized variables required by the outbound scene can be extracted and synchronized to the intelligent dialogue subsystem;

step S620, when the call-out starts, the intelligent dialogue subsystem can be routed to the starting node of the appointed mechanical dialogue template according to the parameters in the CTI request;

step S630, obtaining the configuration of the reply voice data of the start node, assembling the configuration into complete voice interaction data by combining with the dynamic personalized parameters of the outbound task, returning the complete voice interaction data to the CTI and playing the complete voice interaction data to the user;

step S640, in the calling-out process, the routing of the voice interaction node can be carried out according to the dynamic personalized parameters of the voice interaction task, besides the routing of the dialogue node can be carried out according to the input information of the user;

step S650, after routing to the target voice interaction node, synchronizing step S630, assembling the reply voice data to generate voice interaction data, and sending to the CTI for playing.

Continuing to refer to fig. 1, in step S140, performing voice interaction with the target object through the interactive voice data to implement concurrent outbound of voice interaction tasks in multiple outbound scenarios.

In an example embodiment of the present disclosure, the target object refers to an object targeted by the voice interaction task, for example, the target object may be a user corresponding to the voice interaction task, or may also be a test script corresponding to the voice interaction task during testing, and of course, the target object may also be a robot capable of implementing voice interaction, which is not limited in this example embodiment.

After the interactive voice data corresponding to the voice interaction task is generated, the voice outbound operation can be triggered through the outbound priority and the voice relay line to establish a call channel with a user, and the established call channel carries out voice interaction with a target object based on the interactive voice data, so that concurrent outbound of the voice interaction task under a plurality of outbound scenes can be realized.

In an example embodiment of the present disclosure, after finishing voice interaction with a target object and completing a voice interaction task, data and audio data in a process of sorting out a conversation may be collected, a source of an outbound task may be determined, and if the outbound task originates from a third-party system, call data may be assembled and fed back to the third-party system together with original collected data.

Specifically, when it is detected that the voice interaction task is completed, a voice interaction record corresponding to the voice interaction task may be obtained, and when it is detected that the voice interaction task originates from a third-party system, the voice interaction record may be returned to the third-party system, so that the third-party system performs data association and other subsequent logic processing.

Fig. 7 schematically illustrates a flow diagram of interaction logging corresponding to a return voice interaction task, in accordance with some embodiments of the present disclosure.

Referring to fig. 7, step S710 generates a dialog log: after the outbound call is finished, a question-answer dialogue log can be generated according to the execution track of the machine tactical template in the voice interaction process;

step S720, generating a call tag: the tag information of the outbound call can be generated by combining the buried point information set by the outbound scene according to the execution track of the mechanical speech template in the voice interaction process, and the tag information is used for subsequently providing a data basis for constructing a user portrait for the user;

step S730, call recording: in the communication process, the CTI can generate an original audio file (for example, an audio file in an alaw format of 8k, 8bit mono channel) for the user side and the machine side respectively, and after the call-out is finished, the two files are fused according to a time line to generate a whole-process audio file to be uploaded to a storage service; meanwhile, in the conversation process, each time of voice input information of a user generates a segmented audio file which can be used for follow-up log check and complaint processing, and the segmented audio file is also uploaded to a storage service for storage;

step S740, returning the call result: and judging the data source of the voice interaction task, and if the voice interaction task is from a third-party system, packaging and returning the outbound result, the conversation log, the audio file, the conversation label and the originally acquired dynamic parameter together so as to facilitate the third-party system to perform data association and subsequent logic processing.

FIG. 8 schematically illustrates a flow diagram for enabling voice interaction, in accordance with some embodiments of the present disclosure.

Referring to fig. 8, step S810, data reception/acquisition;

step S820, constructing a number pool according to scene and priority;

step S830, checking the outbound rule;

step 840, constructing an outbound personalized variable;

step S850, submitting an outbound task to CTI;

step S860, the user and the intelligent voice interaction robot realize intelligent conversation;

step S870, generating a dialog log and tag information;

step S880, processing the whole audio file and the segmented audio file;

and step S890, packaging the outbound result, the dialog log, the tag information, the audio and the original dynamic parameter and feeding back the result, the dialog log, the tag information, the audio and the original dynamic parameter to a third-party system.

Fig. 9 schematically illustrates an application scenario diagram of a voice interaction method according to some embodiments of the present disclosure.

Referring to fig. 9, the voice interaction method in this exemplary embodiment may be applied to a system formed by a plurality of systems, where the system may include a multi-source fusion outbound task distribution system 901, a CTI system 902, and an intelligent dialog system 903, and the voice interaction method may be executed by the multi-source fusion outbound task distribution system 901, or may be executed by other systems according to a specific application scenario, which is not particularly limited in this exemplary embodiment.

The multi-source fusion outbound task distribution system 901 can acquire or collect voice interaction tasks from different data sources, submit the voice interaction tasks to the CTI system 902, generate a conversation request based on the voice interaction tasks by the CTI system 902, and send the conversation request to the intelligent conversation system 903, meanwhile, the multi-source fusion outbound task distribution system 901 can also send dynamic personalized parameters corresponding to the voice interaction tasks to the intelligent conversation system 903, the intelligent conversation system 903 generates voice interaction data according to the dynamic personalized parameters and the conversation request, and the intelligent conversation system 903 realizes voice interaction with a user to complete outbound operation of the voice interaction tasks.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In addition, in the present exemplary embodiment, a voice interaction apparatus is also provided. Referring to fig. 10, the voice interaction apparatus 1000 includes: the voice interaction task collection module 1010, the priority determination module 1020, the interaction voice data generation module 1030, and the voice interaction module 1040. Wherein:

the voice interaction task collection module 1010 is used for collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;

the priority determining module 1020 is configured to set a priority of each voice interaction task based on an outbound scenario corresponding to each voice interaction task;

the interactive voice data generation module 1030 is configured to perform a voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generate interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;

the voice interaction module 1040 is configured to perform voice interaction with the target object through the interactive voice data, so as to implement concurrent outbound of voice interaction tasks in multiple outbound scenarios.

In an exemplary embodiment of the disclosure, based on the foregoing solution, the voice interaction task collecting module 1010 may be configured to:

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction task conversion module, and the voice interaction task conversion module may be configured to:

In an exemplary embodiment of the disclosure, based on the foregoing scheme, the priority determining module 1020 may further be configured to:

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice trunk line allocation module, and the voice trunk line allocation module may be configured to:

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction task filtering module, and the voice interaction task filtering module may be configured to:

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the interactive voice data generating module 1030 may further be configured to:

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction recording feedback module, and the voice interaction recording feedback module may be configured to:

The specific details of each module of the voice interaction apparatus have been described in detail in the corresponding voice interaction method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the voice interaction device are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above voice interaction method is also provided.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1100 according to such an embodiment of the disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 11, electronic device 1100 is embodied in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: the at least one processing unit 1110, the at least one memory unit 1120, a bus 1130 connecting different system components (including the memory unit 1120 and the processing unit 1110), and a display unit 1140.

Wherein the storage unit stores program code that is executable by the processing unit 1110 to cause the processing unit 1110 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of the present specification. For example, the processing unit 1110 may execute step S110 shown in fig. 1, collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters; step S120, setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task; step S130, carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene; and step S140, performing voice interaction with the target object through the interactive voice data to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.

The storage unit 1120 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)1121 and/or a cache memory unit 1122, and may further include a read-only memory unit (ROM) 1123.

The storage unit 1120 may also include a program/utility 1124 having a set (at least one) of program modules 1125, such program modules 1125 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1130 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1100 may also communicate with one or more external devices 1170 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1100, and/or any devices (e.g., router, modem, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1150. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1160. As shown, the network adapter 1160 communicates with the other modules of the electronic device 1100 over the bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

Referring to fig. 12, a program product 1200 for implementing the above-described voice interaction method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of voice interaction, comprising:

2. The voice interaction method of claim 1, wherein the collecting voice interaction tasks from different data sources comprises:

3. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:

4. The voice interaction method according to claim 1, wherein setting the priority of each voice interaction task based on the outbound scenario corresponding to each voice interaction task comprises:

5. The voice interaction method of claim 4, further comprising:

6. The voice interaction method of claim 4, further comprising:

7. The voice interaction method according to claim 1, wherein generating interactive voice data corresponding to the voice interaction task in combination with the dynamic personalization parameter and the outbound scenario includes:

8. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:

9. A voice interaction apparatus, comprising:

10. An electronic device, comprising:

a processor; and

a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the voice interaction method of any of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of voice interaction according to any one of claims 1 to 8.