CN108833590B

CN108833590B - Voice recognition service proxy server and proxy method

Info

Publication number: CN108833590B
Application number: CN201810758656.8A
Authority: CN
Inventors: 戴俊; 常月; 黄国瑞; 张伟冬; 先永春
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2021-10-26
Anticipated expiration: 2038-07-11
Also published as: CN108833590A

Abstract

The invention discloses a voice recognition service proxy server and a proxy method, wherein the server comprises an MRCP proxy module used for receiving a voice stream processing request sent by media equipment; sending the voice stream processing request to a service agent module for processing; receiving a processing result from the service agent module; sending the processing result to a service system application corresponding to the voice service request; a service agent module, configured to receive the voice stream processing request from the MRCP agent module; and interacting with a voice service server to realize the processing of the voice stream, and returning a processing result to the MRCP agent module. Decoupling between the media device and the ASR service, facilitating service expansion; the ASR service request is authenticated, flow controlled, safe, anti-cheating and the like through the service agent service, and the method can be well applied to public cloud; and other business modules are connected under the business agent service, so that the ASR service is expanded, such as searching, translation, intention recognition and the like.

Description

Voice recognition service proxy server and proxy method

[ technical field ] A method for producing a semiconductor device

The present invention relates to computer application technologies, and in particular, to a voice recognition service proxy server and a proxy method.

[ background of the invention ]

As ASR (Automatic Speech Recognition) technologies become mature and combined with each other, value-added services based on ASR are continuously emerging and rapidly developing.

In the prior art, a Media device may be directly connected to an ASR service through an MRCP (Media Resource Control Protocol), and such an application mode has two disadvantages:

1, the method has good support for private cloud and poor support for public cloud, because the public cloud needs mechanisms such as authentication, flow control, safety, anti-cheating and the like. Native methods either forego these mechanisms or couple them with ASR services to support public clouds;

and 2, for ASR service, only recognized texts can be returned, good expansion cannot be realized, and if the ASR service needs to be expanded, the ASR service needs to be coupled with other service modules.

In addition, the media device can also interface the ASR service through the HTTP interface common cloud platform, and such application mode has the following disadvantages:

because media devices typically do not have the capability to HTTP interface to ASR, modifications are needed. Therefore, the butt joint is inconvenient and the reconstruction cost is high. Moreover, for ASR service, only recognized text can be returned, and there is no good expansion.

[ summary of the invention ]

Aspects of the present application provide a speech recognition service proxy server, method, device, and storage medium, which can decouple a media device and a service system, provide services such as public cloud authentication, flow control, charging, security, and extend knowledge for ASR services.

In one aspect of the present application, there is provided a speech recognition service proxy server, the server including:

MRCP agent module, which is used to receive voice flow processing request sent by media device; sending the voice stream processing request to a service agent module for processing; receiving a processing result from the service agent module; sending the processing result to a service system application corresponding to the voice service request;

a service agent module, configured to receive the voice stream processing request from the MRCP agent module; interacting with the voice service server, realizing the processing of the voice stream, and returning the processing result to the MRCP agent module.

The above-described aspects and any possible implementation further provide an implementation in which the speech service server is an ASR server and/or an extended service server.

The above-described aspects and any possible implementation further provide an implementation, where the service agent module includes:

the ASR processing submodule is used for sending an ASR request to the ASR server and receiving a voice recognition result returned by the ASR server; and/or the presence of a gas in the gas,

and the extended service processing submodule is used for sending an extended service request to the extended service server and receiving an extended service processing result returned by the extended service server.

The above-mentioned aspects and any possible implementation manners further provide an implementation manner, and the service agent module is further configured to interact with a control server to implement control over the media device.

The above-described aspects and any possible implementation further provide an implementation, where the control server is an authentication/flow control/charging/security server;

the service agent module further comprises:

the authentication submodule is used for interacting with the authentication server to realize the authentication of the media equipment;

the flow control submodule is used for interacting with the flow control server to realize the flow control of the media equipment;

the charging submodule is used for interacting with the charging server to realize charging of the media equipment;

and the safety submodule is used for interacting with the safety server to realize the safety service of the media equipment.

According to another aspect of the present application, there is provided a voice recognition service proxy method based on the above voice recognition service proxy server, the method including:

the proxy server receives a voice stream processing request sent by the media equipment;

interacting with a voice service server to realize the processing of the voice stream and obtain a processing result;

and sending the processing result to a service system application corresponding to the voice service request.

The foregoing aspect and any possible implementation manner further provide an implementation manner, where the receiving, by the proxy server, the voice stream processing request sent by the media device includes:

the MRCP agent module of the agent server receives the voice stream processing request sent by the media equipment, and sends the voice stream processing request to the service agent module for processing.

The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where interacting with a voice service server, to implement processing of a voice stream includes:

and the service agent module of the agent server receives the voice stream processing request, interacts with the voice service server, processes the voice stream, and returns a processing result to the MRCP agent module.

The foregoing aspect and any possible implementation manner further provide an implementation manner, where sending the processing result to a service system application corresponding to the voice service request includes:

and the MRCP agent module sends the processing result to a service system application corresponding to the voice service request.

The foregoing aspect and any possible implementation manner further provide an implementation manner, where the service agent module receives the voice stream processing request, and interacts with a voice service server, and implementing processing on the voice stream includes:

sending an ASR request to the ASR server, and receiving a speech recognition result returned by the ASR server; and/or the presence of a gas in the gas,

and sending an extended service request to the extended service server, and receiving an extended service processing result returned by the extended service server.

The above-described aspects and any possible implementations further provide an implementation, and the method further includes:

and the service agent module interacts with the control server to realize the control of the media equipment.

the service agent module interacts with the control server to realize the control of the media device, and the method further comprises the following steps:

interacting with the authentication server to realize the authentication of the media device;

interacting with the flow control server to realize the flow control of the media equipment;

interacting with the charging server to realize charging of the media equipment;

and interacting with the security server to realize the security service of the media equipment.

In another aspect of the present invention, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.

In another aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method as set forth above.

Based on the introduction, the proposal of the invention increases MRCP proxy service and business proxy service between the media device and ASR service, and is beneficial to service expansion; secondly, the ASR service request is authenticated, flow controlled, safe, anti-cheating and the like through the service agent service, and the method can be well applied to public cloud; and thirdly, hanging other business modules through the business agent service, and expanding the ASR service (such as searching, translating, recognizing intention and the like).

[ description of the drawings ]

FIG. 1 is a schematic diagram of an embodiment of a speech recognition service proxy server according to the present invention;

FIG. 2 is a flow chart of a speech recognition service agent method of the present invention;

fig. 3 illustrates a block diagram of an exemplary computer system/server 012 suitable for use in implementing embodiments of the invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of an implementation of the speech recognition service proxy server in the speech synthesis platform of the present invention, as shown in fig. 1, showing a media device, a speech recognition proxy server, a service system, an ASR server, an extended service server, and a control server.

The media device is connected to a speech recognition proxy server, and the speech recognition proxy server is respectively connected with the service system application, the ASR server, the extended service server and the control server. Specifically, the voice recognition proxy server includes an MRCP proxy module and a service proxy module, the media device is connected to the MRCP proxy module, and the MRCP proxy module is connected to the service proxy module. The MRCP agent module is connected with the service system. The service agent module is respectively connected with the ASR server, the extended service server and the control server.

The voice recognition proxy server is used for receiving a voice stream processing request sent by the media equipment; interacting with a voice service server to realize the processing of the voice stream and obtain a processing result; and sending the processing result to a service system application corresponding to the voice service request.

The voice recognition proxy server includes:

a service agent module, configured to receive the voice stream processing request from the MRCP agent module; and interacting with a voice service server to realize the processing of the voice stream, and returning a processing result to the MRCP agent module.

Preferably, the MRCP proxy module is connected to a media device. The media device sends a voice stream processing request, such as an ASR processing request, to the MRCP proxy module.

And the MRCP agent module sends the voice stream processing request to a service agent module for processing.

The service agent module comprises an ASR processing submodule and is used for generating a corresponding voice service request according to the voice stream processing request and sending the voice service request to a corresponding voice service server. For example, according to the ASR processing request, an ASR request is generated and sent to an ASR server for processing, and a speech recognition result returned by the ASR server is received.

In this embodiment, the service proxy module is connected to the extended service server to implement an extended voice service, such as a translation service. The service agent module comprises an extended service processing submodule and is used for processing a request according to a voice stream sent by the media equipment and forwarded by the MRCP agent module, wherein the voice stream processing request is a translation processing request, generating a translation request and sending the translation request to a translation server for processing, and receiving a translation result returned by the translation server.

And after receiving the processing result of the ASR server and/or the extended service server, the service proxy module sends the processing result to the MRCP proxy server so that the MRCP proxy server sends the processing result to the service system application corresponding to the voice service request.

Preferably, the MRCP proxy server sends the processing result to the media device, and the media device sends the processing result to the service system application.

Because the proxy server is transparent to the media device, the user does not perceive that the processing process is different from the operation process of the existing media device in which the voice stream processing request is sent to the ASR server, the voice recognition result of the ASR server is received, and the voice recognition result is sent to the service system application.

Preferably, in a preferred implementation manner of this embodiment, in order to solve mechanisms such as authentication, flow control, charging, security, anti-cheating, and the like required for the public cloud, the service agent module provides a connection with a control server such as authentication, flow control, charging, security, anti-cheating, and the like, interacts with the control server, and implements control over the media device.

Preferably, the voice stream processing request further includes an authentication request, and an account and a user password of a user logging in the media device.

In this embodiment, the service agent module further includes an authentication submodule, configured to interact with the authentication server to implement authentication on the media device; the flow control submodule is used for interacting with the flow control server to realize the flow control of the media equipment; the charging submodule is used for interacting with the charging server to realize charging of the media equipment; and the safety submodule is used for interacting with the safety server to realize the safety service of the media equipment.

The authentication submodule sends an authentication application request to an authentication server, the authentication application request comprises an account number and a user password of a user logging in the media equipment, the authentication server carries out authentication according to the account number and the user password, and if the authentication application request is in a right state, an authentication passing signal is returned to the authentication submodule. After the authentication is successful, the service agent module generates a corresponding voice service request according to the voice stream processing request and sends the voice service request to a corresponding ASR server and/or an extended service server; and after receiving the processing result of the ASR server and/or the extended service server, sending the processing result to the MRCP proxy server so that the MRCP proxy server sends the processing result to the service system application corresponding to the voice service request.

Preferably, when the resources of the ASR server and/or the extended service server are ready, a connection success message is returned to the service broker module. After the service agent module obtains the successful connection message, the service agent module establishes connection with the ASR server and/or the extended service server, and simultaneously, the charging submodule sends a charging start signal to the charging server; and the flow control submodule sends a flow control starting signal to the flow control server. And when the ASR server and/or the extended service server finish the voice service request of the user, returning an identification finishing message to the service agent module. After obtaining the recognition completion message, the service agent module sends a request for disconnecting the resource link to the ASR server and/or the expansion service server, and meanwhile, the charging submodule sends a message for stopping charging to the charging server; and the flow control submodule sends a message for stopping flow control to the flow control server. Preferably, the charging server charges according to the duration or the flow of the voice service request.

Preferably, in this embodiment, accessing the service server of the third party through the public network easily brings about a security risk, including: security issues for session creation, protection of control sessions, protection of media sessions, indirect content access, protection of stored media files. Therefore, the safety sub-module of the business agent module is connected with a safety server, and the safety server provides safety service.

Preferably, the authentication, flow control, charging and security server can also be integrated in the service agent module to directly provide authentication, flow control, charging and security services.

By the proxy server in the embodiment, MRCP proxy service and service proxy service are added between the media device and the ASR service, so that decoupling is realized between the media device and the ASR service, and service expansion is facilitated; secondly, the ASR service request is authenticated, flow controlled, safe, anti-cheating and the like through the service agent service, and the method can be well applied to public cloud; and thirdly, hanging other business modules through the business agent service, and expanding the ASR service (such as searching, translating, recognizing intention and the like).

In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processor, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Fig. 2 is a flowchart of a voice recognition service proxy method based on the voice recognition service proxy server of the present invention, as shown in fig. 2, the method includes:

step S21, the proxy server receives the voice stream processing request sent by the media device;

step S22, interacting with the voice service server, realizing the processing of the voice stream and obtaining the processing result;

and step S23, sending the processing result to the service system application corresponding to the voice service request.

In one preferred implementation of step S21,

preferably, the MRCP proxy module of the proxy server receives a voice stream processing request sent by the media device;

preferably, the MRCP proxy module is connected to the media device. The media device sends a voice stream processing request, such as an ASR processing request, to the MRCP proxy module. And the MRCP module sends the voice stream processing request, such as an ASR processing request, to a service agent module of the agent server for processing.

In one preferred implementation of step S22,

the proxy server interacts with the voice service server to realize the processing of the voice stream and obtain a processing result;

preferably, the service agent module of the agent server receives the voice stream processing request sent by the MRCP agent module, interacts with the voice service server, and implements processing of the voice stream

Preferably, the service agent module includes a processing submodule, configured to generate a corresponding voice service request according to the voice stream processing request, and send the corresponding voice service request to a corresponding voice service server. For example, the ASR processing sub-module generates an ASR request according to the ASR processing request, sends the ASR request to an ASR server for processing, and receives a speech recognition result returned by the ASR server.

In this embodiment, the service proxy module is connected to the extended service server to implement an extended voice service, such as a translation service. The service agent module comprises an extended service processing submodule and is used for processing a request according to a voice stream which is sent by the media equipment and forwarded by the MRCP agent module, wherein the voice stream processing request is a translation processing request, a translation request is generated and sent to a translation server for processing, and a translation result returned by the translation server is received.

In one preferred implementation of step S23,

Preferably, the MRCP proxy server sends the processing result received from the service proxy module to the service system application.

Preferably, the MRCP proxy server sends the processing result received from the service proxy module to the media device, and the media device sends the processing result to the service system application.

By the method, the service agent service is used for authenticating the ASR service request, controlling the flow, ensuring the safety, preventing cheating and the like, and the method can be well applied to public cloud; and (3) hanging other business modules under the business agent service to expand the ASR service (such as searching, translating, recognizing intention and the like).

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

The above is a description of method embodiments, and the embodiments of the present invention are further described below by way of apparatus embodiments.

Fig. 3 illustrates a block diagram of an exemplary computer system/server 012 suitable for use in implementing embodiments of the invention. The computer system/server 012 shown in fig. 3 is only an example, and should not bring any limitations to the function and the scope of use of the embodiments of the present invention.

As shown in fig. 3, the computer system/server 012 is embodied as a general purpose computing device. The components of computer system/server 012 may include, but are not limited to: one or more processors or processors 016, a system memory 028, and a bus 018 that couples various system components including the system memory 028 and the processors 016.

Bus 018 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 012 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 012 and includes both volatile and nonvolatile media, removable and non-removable media.

System memory 028 can include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)030 and/or cache memory 032. The computer system/server 012 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 034 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 018 via one or more data media interfaces. Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the present invention.

Program/utility 040 having a set (at least one) of program modules 042 can be stored, for example, in memory 028, such program modules 042 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof might include an implementation of a network environment. Program modules 042 generally perform the functions and/or methodologies of embodiments of the present invention as described herein.

The computer system/server 012 may also communicate with one or more external devices 014 (e.g., keyboard, pointing device, display 024, etc.), hi the present invention, the computer system/server 012 communicates with an external radar device, and may also communicate with one or more devices that enable a speaker to interact with the computer system/server 012, and/or with any device (e.g., network card, modem, etc.) that enables the computer system/server 012 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 022. Also, the computer system/server 012 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 020. As shown in fig. 3, the network adapter 020 communicates with the other modules of the computer system/server 012 via bus 018. It should be appreciated that although not shown in fig. 3, other hardware and/or software modules may be used in conjunction with the computer system/server 012, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 016 executes programs stored in the system memory 028 to perform the functions and/or methods of the described embodiments of the present invention.

The computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes the one or more computers to perform the method flows and/or apparatus operations shown in the above-described embodiments of the invention.

With the development of time and technology, the meaning of media is more and more extensive, and the propagation path of computer programs is not limited to tangible media any more, and can also be downloaded from a network directly and the like. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the speaker computer, partly on the speaker computer, as a stand-alone software package, partly on the speaker computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the speaker's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A speech recognition service proxy server, the server comprising:

MRCP agent module, which is used to receive voice flow processing request sent by media device; sending the voice stream processing request to a service agent module for processing; receiving a processing result from the service agent module; sending the processing result to a service system application corresponding to the voice service request; the voice service request is generated by the service agent module according to the voice stream processing request;

a service agent module, configured to receive the voice stream processing request from the MRCP agent module; interacting with a voice service server, realizing the processing of the voice stream, and returning the processing result to the MRCP agent module;

the media device is connected to the MRCP agent module, and the MRCP agent module is connected to the service agent module; the MRCP agent module is connected with the service system; the service agent module is respectively connected with the ASR server, the extended service server and the control server.

2. The proxy server of claim 1, wherein the voice service server is an ASR server and/or an extended service server.

3. The proxy server of claim 2, wherein the service proxy module comprises:

4. The proxy server of claim 1, wherein the service proxy module is further configured to interact with a control server to implement control of the media device.

5. The proxy server of claim 4,

the control server is an authentication/flow control/charging/security server;

the service agent module further comprises:

6. A speech recognition service proxy method based on the speech recognition service proxy server of any one of claims 1-5, the method comprising:

7. The method according to claim 6, wherein the proxy server receiving the voice stream processing request sent by the media device comprises:

8. The method of claim 7, wherein interacting with a voice service server to perform processing of a voice stream comprises:

9. The method of claim 8, wherein sending the processing result to a service system application corresponding to the voice service request comprises:

10. The method according to claim 9, characterized in that the speech service server is an ASR server and/or an extended service server.

11. The method of claim 10, wherein the service agent module receives the voice stream processing request and interacts with a voice service server, and wherein the processing of the voice stream is implemented by the service agent module includes:

12. The method of claim 9, further comprising:

13. The method of claim 12,

the control server is an authentication/flow control/charging/security server;

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements a method as claimed in any one of claims 6 to 13.

15. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 6 to 13.