CN113035226A

CN113035226A - Voice call method, communication terminal, and computer-readable medium

Info

Publication number: CN113035226A
Application number: CN201911348597.8A
Authority: CN
Inventors: 颜蓓; 任鹏
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2021-06-25
Anticipated expiration: 2039-12-24
Also published as: CN113035226B

Abstract

The present disclosure provides a voice call method, including: acquiring voice content information of a first terminal through a first core network; acquiring voice characteristic information of the first terminal through a second core network; and restoring the original audio according to the voice content information and the voice characteristic information. The present disclosure also provides a communication terminal and a computer readable medium.

Description

Voice call method, communication terminal, and computer-readable medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a voice call method, a communication terminal, and a computer-readable medium.

Background

At present, voice call services of various communication terminals are mostly based on a Circuit Switched (CS) domain, and the voice call quality depends on a sampling rate, a transmission rate and a sound spectrum width in a unit time. At present, the speech coding methods applied to the circuit switching domain include AMR-NB (with a spectral width of 100Hz to 4KHz and a maximum transmission rate of 12.2Kbps), AMR-WB (with a spectral width of 100Hz to 8KHz and a maximum transmission rate of 23.65Kbps), and EVS-SWB (with a spectral width of 100Hz to 20KHz and a maximum transmission rate of 128Kbps), but only the transmission bit rate of the mono channel of the lossless audio needs to reach more than 192 Kbps. Based on this, for the voice service, the highest 128Kbps rate of the circuit switching domain can only satisfy the normal call, ensure the voice clarity of the speaker, and realize relative fidelity, but cannot realize the transmission requirement of lossless audio, and simultaneously cannot ensure that the environmental sound, background sound and other special sounds except the voice can be restored at the opposite end of the call.

Disclosure of Invention

The present disclosure is directed to at least one of the technical problems occurring in the prior art, and provides a voice call method, a communication terminal and a computer readable medium.

In order to achieve the above object, in a first aspect, an embodiment of the present disclosure provides a voice call method, including:

acquiring voice content information of a first terminal through a first core network;

acquiring voice characteristic information of the first terminal through a second core network;

and restoring original audio according to the voice content information and the voice characteristic information.

In a second aspect, an embodiment of the present disclosure provides another voice call method, including:

acquiring original audio;

extracting voice content information from the original audio;

sending the voice content information to a second terminal through a first core network;

and controlling a second core network to send the voice feature information to the second terminal.

In a third aspect, an embodiment of the present disclosure provides a communication terminal, including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a voice call method as in any one of the above embodiments.

In a fourth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, where the program is executed by a processor to implement the steps in the voice call method as described in any of the above embodiments.

The present disclosure has the following beneficial effects:

the disclosed embodiments provide a voice call method, a communication terminal, and a computer readable medium, which can realize that voice content information and voice feature information are respectively transmitted through different networks to improve the utilization rate of each network voice service resource, and realize high-quality voice call.

Drawings

Fig. 1 is a flowchart of a voice call method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of another voice call method provided in the embodiment of the present disclosure;

fig. 3 is a flowchart of another voice call method provided in the embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating an embodiment of step S7 according to the present disclosure;

fig. 5 is a flowchart of another voice call method according to an embodiment of the present disclosure;

fig. 6 is a signaling diagram of another voice call method according to an embodiment of the present disclosure;

fig. 7 is a signaling diagram of another voice call method according to an embodiment of the disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present disclosure, the voice call method, the communication terminal and the computer readable medium provided by the present disclosure are described in detail below with reference to the accompanying drawings.

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Thus, a first element, component, or module discussed below could be termed a second element, component, or module without departing from the teachings of the present disclosure.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The voice call method provided by the disclosure can be used for respectively transmitting the voice content information and the voice characteristic information to the opposite call terminal through different networks, improving the utilization rate of each network voice service resource and realizing the high-quality operation of voice call.

Fig. 1 is a flowchart of a voice call method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:

and step S1, acquiring the voice content information of the first terminal through the first core network.

In step S1, after the call is established, the voice content information of the first terminal is obtained, where the voice content information includes an original audio, that is, an audio collected by the opposite end of the call, and after being extracted by voice recognition, the readable voice content does not include the voice feature information.

In some embodiments, the first core network comprises: a 2G core network, a 3G core network, or a 4G core network. When the first core network is a 2G core network or a 3G core network, step S1, acquiring the voice content information of the first terminal through the first core network specifically includes:

the voice content information is acquired through a circuit switched domain of the first core network.

Specifically, voice content information transmitted by the first terminal is received through a circuit switched domain in a 2G or 3G network. The circuit switching domain is responsible for voice services in a 2G or 3G network, the voice services of users are carried out in a mode of monopolizing channel resources, and stability and safety are high.

Correspondingly, when the first core network is a 4G core network, step S1, acquiring the voice content information of the first terminal through the first core network specifically includes:

the voice content information is acquired through an IP Multimedia system (IMS for short) of the first core network.

In some embodiments, based on the 4G core network, the voice call may be performed by using a CS Fallback (CSFB) or by using an IP multimedia system, and when the CS Fallback is used, the voice content information sent by the first terminal is substantially received through a circuit switched domain in the 2G or 3G network.

And step S2, acquiring the voice characteristic information of the first terminal through the second core network.

In some embodiments, the speech feature information includes: spectral characteristic information. Generally, the spectral characteristic information characterizes the timbre of the speaker, corresponding to the timbre information of the original audio.

In some embodiments, the second core network comprises: 5G core network. Step S2, the step of obtaining the voice feature information of the first terminal through the second core network specifically includes:

and acquiring the voice characteristic information through a Packet Switched (PS) domain of the second core network.

Specifically, the voice feature information sent by the first terminal is received through the 5G core network. Generally speaking, the 4G core network and the 5G core network do not include a circuit switching domain, and the packet switching domain thereof is based on a mode of sharing channel resources by multiple users, so that the transmission rate and the resource utilization rate are high, but it cannot be guaranteed that all data information can safely reach a communication opposite terminal.

In the embodiment of the present disclosure, since the information sent through the packet-switched network is non-voice content information, even if there is a loss of part of the information in the packet-switched network (for example, an area where 5G signals are not good), the restoration and transmission of the information sent in the circuit-switched network are not affected; and because the voice content message is completely transmitted through the circuit switching network, the voice content message is very safe and stable, and the loss of key information in the conversation process can not be caused.

And step S3, restoring the original audio according to the voice content information and the voice characteristic information.

In some embodiments, the speech content information and the speech feature information are synthesized based on respective speech synthesis algorithms to restore the original audio.

Fig. 2 is a flowchart of another voice call method according to an embodiment of the present disclosure. As shown in fig. 2, the method is an embodiment alternative based on the method shown in fig. 1. Specifically, the method includes not only step S1 to step S2, but also step S301 and step S302, and step S302 is a specific implementation of step S3. Only step S301 and step S302 will be described in detail below.

Step S301, obtaining the environmental audio information of the first terminal through the second core network.

The environment audio information is environment sound, background sound, lossless music or other special sound after the original audio is extracted by speech recognition.

Correspondingly, step S3, the step of restoring the original audio according to the speech content information and the speech feature information specifically includes:

and S302, restoring original audio according to the voice content information, the voice characteristic information and the environment audio information.

Fig. 3 is a flowchart of another voice call method according to an embodiment of the present disclosure. As shown in fig. 3, the method includes:

and step S4, acquiring the original audio.

In some embodiments, after the voice call is established, the original audio from the local speaker is obtained by the audio capture device.

And step S5, extracting the voice content information from the original audio.

Based on voice recognition and corresponding analysis technology, voice content information is extracted from original audio.

And step S6, sending the voice content information to the second terminal through the first core network.

In some embodiments, the first core network comprises: a 2G core network, a 3G core network, or a 4G core network. When the first core network is a 2G core network or a 3G core network, step S6, the step of sending the voice content information to the second terminal through the first core network specifically includes:

the voice content information is transmitted over a circuit switched domain of the first core network.

Correspondingly, when the first core network is a 4G core network, step S6, the step of sending the voice content information to the second terminal through the first core network specifically includes:

the voice content information is transmitted through an IP multimedia system of the first core network.

And step S7, controlling the second core network to send the voice characteristic information to the second terminal.

In some embodiments, step S7, the step of controlling the second core network to send the voice feature information to the second terminal includes:

and sending a control instruction to the second core network to instruct the second core network to acquire the corresponding voice characteristic information from the pre-stored database and send the voice characteristic information to the second terminal.

Fig. 4 is a flowchart illustrating an embodiment of step S7 according to the present disclosure. Wherein the second core network comprises: 5G core network. As shown in fig. 4, before the step of controlling the second core network to send the voice feature information to the second terminal in step S7, the method further includes:

step S701a, extracting speech feature information from the original audio.

Based on voice recognition and corresponding analysis technology, voice feature information is extracted from original audio.

Correspondingly, in step S7, the step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

step S702a, sending the voice feature information to the second terminal through the second core network.

The voice feature information is sent to the second terminal through the second core network, that is, the voice feature information extracted from the original audio in real time is sent to the second terminal through the second core network.

The embodiment of the disclosure provides a voice call method, which can be used for sending voice content information to a call opposite terminal through a circuit switching network when voice call is carried out, ensuring that voice is completely sent to the call opposite terminal, ensuring safety and stability, sending voice characteristic information to the call opposite terminal through a packet switching network, effectively utilizing voice service network resources, and improving transmission rate, so that even if part of information sent through the packet switching network is lost, the whole call quality is not influenced.

Fig. 5 is a flowchart of another voice call method according to an embodiment of the present disclosure. As shown in fig. 5, the method is an embodied alternative embodiment based on the method shown in fig. 3. Specifically, the method includes not only the steps S4 to S7 but also the steps S8 to S10. Only step S8 to step S10 will be described in detail below.

And step S8, extracting environment audio information from the original audio.

The environmental audio information is the residual environmental sound, background sound or other special sounds after the information related to the speaker voice is extracted from the original audio based on the voice recognition and corresponding analysis technology.

And step S9, sending the environmental audio information to the second terminal through the second core network.

And step S10, sending a synchronization instruction to the first core network and the second core network to instruct the first core network and the second core network to slice and number the voice content information and the environmental audio information respectively according to the synchronization instruction.

In step S10, the first core network corresponds to the circuit switched domain or the IP multimedia system, and the second core network corresponds to the packet switched domain, and sends a synchronization instruction to the first core network and the first core network to instruct the first core network and the first core network to slice and number the voice content information and the environmental audio information according to the synchronization instruction. Therefore, the second terminal receives the voice content information and the environmental audio information after the slice numbering, and can synchronously synthesize the voice content information and the environmental audio information according to the corresponding numbers. Meanwhile, even if part of the environmental audio information is lost, the audio information can be repaired according to the slice information and the corresponding serial number.

The embodiment of the disclosure provides a voice call method, which can be used for sending environmental audio information to a call opposite terminal through a packet switching network, so that transmission rate is guaranteed, meanwhile, lossless audio transmission is realized, and voice call quality is improved.

Fig. 6 is a signaling diagram of another voice call method according to an embodiment of the present disclosure. As shown in fig. 6, includes:

BZ01, the first terminal acquires a first original audio (not shown).

The BZ02, the first terminal extracts first voice content information, voice feature information and first environment audio information (not shown in the figure) from the first original audio.

The case that the information extracted from the first original audio includes the first environmental audio information is only an optional implementation in the embodiment of the present disclosure.

The BZ101, the first terminal, sends the first voice content information to the circuit switched domain (based on the circuit switched domain in the 2G or 3G network).

The BZ102, the first terminal, sends the voice feature information and the first environment audio information to a packet switched domain (based on a 5G core network).

The BZ2, the first terminal sends a synchronization indication to the circuit switched domain and the packet switched domain.

And the BZ201 and the circuit switching domain slice and number the first voice content information according to the synchronous indication.

The BZ2021, the packet switched domain, slices and numbers the first ambient audio information according to the synchronization indication.

BZ2022, packet switched domain stores the voice feature information in a database.

The BZ301 and the circuit switching domain send the first voice content information after the slice number to the second terminal.

The BZ302 and the packet switching domain send the voice characteristic information and the first environment audio information after the slice numbering to the second terminal.

And the BZ4 and the second terminal restore the first original audio according to the first voice content information, the voice characteristic information and the first environment audio information based on the corresponding numbers.

Fig. 7 is a signaling diagram of another voice call method according to an embodiment of the disclosure. As shown in fig. 7, includes:

the BZ501, the first terminal, obtains a second original audio (not shown in the figure).

BZ502, the first terminal extracts the second voice content information (not shown in the figure) from the second original audio.

And the BZ601 and the first terminal send the second voice content information to the circuit switching domain.

The BZ602, the first terminal sends a control instruction to the packet switched domain.

BZ701 and the circuit switching domain send second voice content information to the second terminal.

And the BZ7021 and the packet switching domain acquire the voice characteristic information corresponding to the first terminal from a pre-stored database according to the control instruction.

BZ7022, the packet switching domain sends the voice characteristic information to the second terminal.

And the BZ8 and the second terminal restore second original audio according to the second voice content information and the voice characteristic information.

The embodiment of the present disclosure further provides a communication terminal, including: one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement any of the voice call methods as in the above embodiments.

The disclosed embodiments also provide a computer readable medium, on which a computer program is stored, which when executed by a processor implements the steps in any of the voice call methods in the above embodiments.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods disclosed above, functional modules/units in the apparatus, may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A voice call method, comprising:

2. The voice call method according to claim 1, wherein the first core network comprises: a 2G core network, a 3G core network, or a 4G core network;

the second core network includes: 5G core network.

3. The voice call method according to claim 2, wherein, when the first core network is a 2G core network or a 3G core network, the step of acquiring the voice content information of the first terminal through the first core network specifically includes:

obtaining the voice content information through a circuit switched domain of the first core network;

when the first core network is a 4G core network, the step of acquiring the voice content information of the first terminal through the first core network specifically includes:

and acquiring the voice content information through an IP multimedia system of the first core network.

4. The voice call method according to claim 2, wherein the step of acquiring the voice feature information of the first terminal through the second core network specifically includes:

the voice feature information is acquired through a packet switched domain of a second core network.

5. The voice call method according to claim 1, wherein the voice feature information includes: spectral characteristic information.

6. The voice call method according to any one of claims 1 to 5, wherein before the step of restoring original audio according to the voice content information and the voice feature information, the method further comprises:

acquiring the environmental audio information of the first terminal through the second core network;

the step of restoring the original audio according to the voice content information and the voice feature information specifically includes:

and restoring the original audio according to the voice content information, the voice characteristic information and the environment audio information.

7. A voice call method, comprising:

acquiring original audio;

extracting voice content information from the original audio;

8. The voice call method according to claim 7, comprising:

the first core network includes: a 2G core network, a 3G core network, or a 4G core network;

the second core network includes: 5G core network.

9. The voice call method according to claim 8, wherein, when the first core network is a 2G core network or a 3G core network, the step of sending the voice content information to the second terminal through the first core network specifically includes:

transmitting the voice content information through a circuit switched domain of the first core network;

when the first core network is a 4G core network, the step of sending the voice content information to the second terminal through the first core network specifically includes:

and sending the voice content information through an IP multimedia system of the first core network.

10. The voice call method according to claim 7, wherein before the step of controlling the second core network to transmit the voice feature information to the second terminal, the method further comprises:

extracting voice characteristic information from the original audio;

the step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

and sending the voice characteristic information to a second terminal through a second core network.

11. The voice call method according to claim 7, wherein the step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

and sending a control instruction to the second core network to instruct the second core network to acquire the corresponding voice characteristic information from a pre-stored database and send the voice characteristic information to the second terminal.

12. The voice call method according to claim 7, wherein the voice feature information includes: spectral characteristic information.

13. The voice call method according to claims 7-12, further comprising:

extracting environmental audio information from the original audio;

and sending the environmental audio information to the second terminal through the second core network.

14. The voice call method according to claim 13, further comprising:

and sending a synchronization instruction to the first core network and the second core network to instruct the first core network and the second core network to slice and number the voice content information and the environmental audio information respectively according to the synchronization instruction.

15. A communication terminal, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a voice call method as recited in any of claims 1-14.

16. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the voice call method according to any one of claims 1 to 14.