CN112750443A

CN112750443A - Call voice output method and device, storage medium and electronic equipment

Info

Publication number: CN112750443A
Application number: CN201911046857.6A
Authority: CN
Inventors: 刘高森
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2021-05-04

Abstract

The present disclosure relates to a call voice output method, device, storage medium and electronic apparatus, the method comprising: when the voice call is triggered, displaying a tone adjusting unit in a voice call interface of the terminal; when a first voice is received through the radio equipment of the terminal, the first voice is converted into a second voice with a target tone according to voice conversion information, and the voice conversion information is a tone parameter which is set by a user through the tone adjusting unit and is used for representing the target tone; and outputting the second voice as the call voice of the voice call. Can be used for controlling the button of tone change through the output at voice call interface, realize the real-time conversion of the tone of conversation pronunciation at voice call in-process, when keeping the smoothness nature of whole conversation process, improve the real-time nature of voice tone conversion, richen voice call's function.

Description

Call voice output method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of application development, and in particular, to a method and an apparatus for outputting a call voice, a storage medium, and an electronic device.

Background

With the continuous development of intelligent terminals, the demand of people for richness of voice call functions is increasing. When a user needs to perform a voice call with a voice having a different tone from the voice of the caller himself, for example, during a game, during a voice call communication, or during a voice call simulating a conversation of a story character of a child, it is necessary to convert the tone of the voice during the call. In the related art, when the tone of the voice during the call is converted, a third-party application related to the voice conversion is usually installed for the call terminal. Before the call starts, the third party application is opened, the required tone is set as the tone of the voice output in the call process, and then the voice call function is triggered by the user so as to carry out the voice call with the voice with the set tone.

Disclosure of Invention

To overcome the problems in the related art, it is an object of the present disclosure to provide a call voice output method, apparatus, storage medium, and electronic device.

In order to achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a call voice output method, including:

when the voice call is triggered, displaying a tone adjusting unit in a voice call interface of the terminal;

when a first voice is received through the radio equipment of the terminal, the first voice is converted into a second voice with a target tone according to voice conversion information, and the voice conversion information comprises voice parameters which are set through the tone adjusting unit and used for representing the target tone;

and outputting the second voice as the communication voice of the voice communication.

Optionally, before the converting the first voice into the second voice with the target timbre according to the sound conversion information, the method further includes:

generating the sound conversion information according to the operation information of the user aiming at the tone color mixing unit; wherein the content of the first and second substances,

the tone adjusting unit includes: the sound intensity adjusting unit and the tone color selecting unit, wherein the generating of the sound conversion information according to the operation information of the user for the tone color mixing unit comprises:

generating voice parameters corresponding to the first operation information and used for representing the sound intensities of different frequency points of voice according to the first operation information of the user aiming at the sound intensity adjusting unit, and using the voice parameters as the voice conversion information; and/or the presence of a gas in the gas,

and generating a voice parameter which is corresponding to the second operation information and used for representing a preset tone as the voice conversion information according to the second operation information of the user aiming at the tone selecting unit.

Optionally, when receiving a first voice through a radio device of a terminal, converting the first voice into a second voice with a target tone according to a voice conversion signal, including:

acquiring the sound received by the radio equipment every other preset time;

if the target sound acquired within the preset target duration contains a voice, taking the target sound as the first voice, wherein the preset target duration is any preset duration within a plurality of preset durations in the voice call process;

determining the voice conversion information at the starting time point of the target preset duration;

and converting the first voice into the second voice according to the voice conversion information.

Optionally, the method further includes:

acquiring one or more preset timbres input by a user;

and responding to user operation, and binding the tone color selection unit with one or more preset tone colors.

Optionally, the obtaining one or more preset timbres input by the user includes:

acquiring a first voice parameter input by a user, and taking a tone corresponding to the first voice parameter as the preset tone; and/or the presence of a gas in the gas,

and analyzing the audio data input by the user into a second voice parameter, and taking the tone corresponding to the second voice parameter as the preset tone.

According to a second aspect of the embodiments of the present disclosure, there is provided a call voice conversion apparatus, the apparatus including:

the unit display module is configured to display the tone adjusting unit in a voice call interface of the terminal when the voice call is triggered;

the voice conversion module is configured to convert a first voice into a second voice with a target tone according to voice conversion information when the first voice is received through a radio device of the terminal, wherein the voice conversion information comprises voice parameters which are set by a user through the tone adjustment unit and used for representing the target tone;

a voice output module configured to output the second voice as a call voice of a voice call.

Optionally, the apparatus further comprises:

an information generation module configured to generate the sound conversion information according to operation information of a user for the tone toning unit; wherein the content of the first and second substances,

the tone adjusting unit includes: a sound intensity adjusting unit and a tone selecting unit, the information generating module being configured to:

Optionally, the voice conversion module is configured to:

acquiring the sound received by the radio equipment every other preset time;

Optionally, the apparatus further comprises:

the system comprises a tone acquisition module, a tone acquisition module and a control module, wherein the tone acquisition module is configured to acquire one or more preset tones input by a user;

and the tone binding module is configured to bind the tone selecting unit with one or more preset tones in response to user operation.

Optionally, the tone obtaining module is configured to:

According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the call voice output method provided by the first aspect of the embodiments of the present disclosure.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a memory having a computer program stored thereon;

a processor, configured to execute the computer program in the memory, so as to implement the steps of the call voice output method provided in the first aspect of the embodiments of the present disclosure.

Through the technical scheme, the tone adjusting unit can be displayed in the voice call interface of the terminal when the voice call is triggered; when a first voice is received through the radio equipment of the terminal, the first voice is converted into a second voice with a target tone according to voice conversion information, and the voice conversion information is a tone parameter which is set by a user through a tone adjusting unit and is used for representing the target tone; and outputting the second voice as the call voice of the voice call. Can be used for controlling the button of tone change through the output at voice call interface, realize the real-time conversion of the tone of conversation pronunciation at voice call in-process, when keeping the smoothness nature of whole conversation process, improve the real-time nature of voice tone conversion, richen voice call's function.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a method of call voice output according to an exemplary embodiment;

fig. 2 is a flowchart illustrating another call voice output method according to the embodiment shown in fig. 1;

fig. 3 is a flow chart of a method for tone conversion of call voice according to the embodiment shown in fig. 2;

fig. 4 is a flowchart illustrating still another call voice output method according to the embodiment shown in fig. 2;

fig. 5 is a block diagram illustrating a call voice output apparatus according to an exemplary embodiment;

fig. 6 is a block diagram showing another call voice output apparatus according to the embodiment shown in fig. 5;

fig. 7 is a block diagram showing still another call voice output apparatus according to the embodiment shown in fig. 6;

FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a call voice output method according to an exemplary embodiment, as shown in fig. 1, the method including:

step 101, when the voice call is triggered, displaying a tone color adjusting unit in a voice call interface of the terminal.

For example, in most mobile terminals supporting voice calls or terminals capable of operating a voice platform, a call interface corresponding to a voice call appears when a user inputs (a number or a voice number of) a call object. The call interface usually displays a plurality of call control buttons such as a hang-up button, a speaker button, a recording button, and a dial display button. In this step 101, after the user triggers a voice call, the control buttons for changing the tone of the call voice (i.e., the tone adjusting unit described above) may be output at the same time as the above call control buttons.

Optionally, the tone color adjusting unit includes: the voice tone quality control device comprises a sound intensity adjusting unit which is preset according to the frequency spectrum characteristics of voice tone and is used for adjusting the sound intensity of different frequency points in voice, and/or a tone selecting unit corresponding to the preset tone.

Specifically, in the embodiment of the present disclosure, the tone adjusting unit may include two forms, i.e., the sound intensity adjusting unit and the tone selecting unit described above. It can be understood that human voice (i.e. voice) corresponds to a sound intensity/spectrum curve, the sound intensities of voices with different timbres at different frequency points are different, and voices made by people with different ages and genders have inherent sound intensity/spectrum characteristics, and within an acceptable frequency point sound intensity range, the smaller the segmented particles are, the more accurate the timbre adjustment is. Therefore, the sound intensity adjusting unit can be set as one or more progress bars, and the two ends of the progress bar are respectively the maximum sound intensity and the minimum sound intensity of the frequency points related to the tone in the voice within the acceptable range. The progress bar is divided into a plurality of segments, and when the user moves the progress bar, the sound intensity of the frequency points can be adjusted between the maximum sound intensity and the minimum sound intensity according to the segments, so that the voice timbre is controlled. For example, the sound intensity of the frequency point corresponding to the male voice (baby voice) can be set as the left end of the progress bar, and the sound intensity of the frequency point corresponding to the female voice (old person voice) can be set as the right end of the progress bar, so that the voice tone color converted out is more inclined to the male voice (baby voice) when the adjusting button of the progress bar slides leftwards on the progress bar, and the voice tone color converted out is more inclined to the female voice (old person voice) when the adjusting button slides rightwards on the progress bar. The male voice, the female voice, the old voice and the infant voice can be displayed in the conversation interface together with the progress bar in a text mode, so that the user can conveniently recognize the useless functions of the progress bars.

Step 102, when a first voice is received through the radio equipment of the terminal, the first voice is converted into a second voice with a target tone according to the voice conversion information.

The voice conversion information is the tone color parameters which are set by the user through the tone color adjusting unit and are used for representing the target tone color.

Step 103, outputting the second voice as a call voice of the voice call.

For example, during a voice call, a sound receiving device of the terminal (e.g., a microphone of the mobile terminal) receives sounds made by a user, and the sounds are output to the other party of the voice call in the form of analog signals and are also output through an earphone of the terminal. When the user adjusts the tone to be output through the tone adjusting unit in the call process, the current tone of the voice can be directly adjusted to the target tone input by the user, and then the subsequent call voice output is performed. The time of the conversion process is extremely short, and therefore, after the tone is converted, the converted second voice is output, and the problem that the voice transmission and reception are delayed does not occur. Meanwhile, sound received by the radio equipment is output through the receiver of the terminal, and a user can determine whether the tone conversion is finished according to the heard sound after hearing the voice.

In summary, the present disclosure can display the tone adjusting unit in the voice call interface of the terminal when the voice call is triggered; when a first voice is received through the radio equipment of the terminal, the first voice is converted into a second voice with a target tone according to voice conversion information, and the voice conversion information is a tone parameter which is set by a user through a tone adjusting unit and is used for representing the target tone; and outputting the second voice as the call voice of the voice call. Can be used for controlling the button of tone change through the output at voice call interface, realize the real-time conversion of the tone of conversation pronunciation at voice call in-process, when keeping the smoothness nature of whole conversation process, improve the real-time nature of voice tone conversion, richen voice call's function.

Fig. 2 is a flowchart illustrating another call voice output method according to the embodiment shown in fig. 1, and as shown in fig. 2, the method further includes:

step 104, generating the sound conversion information according to the operation information of the user aiming at the tone color mixing unit.

Illustratively, the tone color adjustment unit includes: the sound intensity adjusting unit and the tone selecting unit may comprise: generating voice parameters corresponding to the first operation information and used for representing the sound intensities of different frequency points of voice according to the first operation information of the user aiming at the sound intensity adjusting unit, and using the voice parameters as the voice conversion information; and/or generating a voice parameter which is corresponding to the second operation information and used for representing a preset tone color according to the second operation information of the user for the tone color selection unit, wherein the voice parameter is used as the voice conversion information.

Fig. 3 is a flowchart of a method for tone conversion of call voice according to the embodiment shown in fig. 2, where, as shown in fig. 3, the step 102 includes:

step 1021, acquiring the sound received by the radio equipment every preset time length.

For example, during a voice call, the speaking duration of each word and the total duration of the call are random when the caller makes a communication. Therefore, the whole conversation process can be divided into a plurality of segments by dividing the time period, and the sound received in each segment is acquired for the next processing. Wherein, the duration of each segment is the preset duration.

In step 1022, if the target sound acquired within the preset target duration includes a voice, the target sound is used as the first voice.

The target preset duration is any preset duration in a plurality of preset durations in the voice call process.

Illustratively, the target sound includes all sounds received by the radio equipment within the target preset time length, including human voice and various environmental murmurs. Because the tone color adjustment relates to the change of the whole frequency spectrum of all sounds within a period of time, when the ambient noise of non-human voice occurs in the conversation process, in order to avoid the effect of the tone color adjustment on the ambient noise and further output uncontrollable sound to influence the conversation effect of a user and even damage the eardrum of the user, the voice sent by a human can be determined from the collected sounds within the preset time period through a preset voice recognition algorithm before the voice conversion, namely the first voice.

In step 1023, the voice conversion information is determined at the starting time point of the target preset time duration.

Step 1024, converting the first voice into the second voice according to the target tone corresponding to the voice conversion information.

Specifically, when a call is started, the tone of the voice output within the first preset time period may be determined according to the initial position of the tone conversion button or the initial operation instruction (corresponding to the initial voice conversion information). When the first preset time length ends and the second preset time length begins (and the starting time point of the target preset time length), the position or the operation instruction of the tone color conversion button can be obtained again. It should be noted that, at this moment, regardless of whether the position or the operation instruction of the tone color conversion button changes from the initial time point of the first preset time period, the position or the operation instruction of the tone color conversion button at the time point needs to be obtained again, so as to obtain the sound conversion information corresponding to the time point, and then perform voice output with the target tone color corresponding to the sound conversion information corresponding to the time point within the second preset time period. For example, the first voice is converted into voice with timbre a (determined according to the initial voice conversion information) for a first preset time period and then output. At the starting time point of the second preset time length, new voice conversion information is obtained again in any way, and the first voice is converted into voice with the timbre B (determined according to the new voice conversion information) in the second preset time length (namely, the second voice) and output. And then analogizing in turn until the whole voice call process is completed. It will be appreciated that the preset time period may be set small enough, for example, 1 second, according to the processing performance of the terminal, so as to avoid a situation where the user adjusts the tone color conversion button to make the tone color conversion slow.

Fig. 4 is a flowchart illustrating still another call voice output method according to the embodiment shown in fig. 2, and as shown in fig. 4, the method further includes:

step 105, acquiring one or more preset timbres input by a user.

Illustratively, this step 105 may include: the method comprises the steps of firstly, acquiring a first voice parameter input by a user, and taking a tone corresponding to the first voice parameter as the preset tone; and/or in the second mode, the audio data input by the user is analyzed into a second voice parameter, and the tone corresponding to the second voice parameter is used as the preset tone.

For example, the first voice parameter may be a sound intensity parameter or a spectral curve parameter (i.e., a first voice parameter) that is adjusted or input by a user through a button or a knob output by the terminal interface, and then the required timbre of the user is determined according to the first voice parameter. Or, the user may enter a piece of audio data through the radio function of the terminal device, and the terminal device may analyze the voice in the audio data to obtain the voice parameter (the second voice parameter) related to the tone of the voice, and then determine (or simulate) the tone required by the user according to the second voice parameter. It can be understood that most of the terminal users are ordinary users without the tone-related theoretical knowledge, and therefore, it is difficult for the users themselves to know the voice parameters of the specific target tone, and further difficult to manually set the target tone. By the technical means, the tone required by the user can be analyzed and simulated through the terminal.

And step 106, responding to the user operation, and binding the tone color selected unit with one or more preset tone colors.

For example, as described above, a plurality of tone color selection units corresponding to different tone colors may also be output in the call interface. It can be understood that the tone corresponding to the tone selecting unit needs to be set in advance. Specifically, before the step 101, the tone color selection unit a may be bound to the tone color of the star a and the tone color selection unit B may be bound to the tone color of the animated character B in response to a user operation. In this way, after the call starts, the call voice can be directly switched to the tone of the star a by clicking the tone selection unit a. When the tone of the voice needs to be converted into the tone of the animation character B, the tone selection unit B can be directly clicked, and after the tone selection unit B is clicked, the tone of the output call voice is not the tone of the star A but is converted into the tone of the animation character B. Alternatively, in another application scenario, the tone color selection unit C may be bound to the tone color of the parent (simulated by parsing the sound of the parent entered in advance) in response to a user operation. Therefore, when the child is at home alone and receives the incoming call, the child can directly switch the conversation voice into the tone of the parent by clicking the tone selection unit C and converse with the caller by using the tone, so as to avoid illegal behaviors caused by the fact that the caller familiar with the family knows that the parent is not at home.

Fig. 5 is a block diagram illustrating a call voice output apparatus according to an exemplary embodiment, and as shown in fig. 5, the apparatus 500 includes:

a unit display module 510 configured to display a tone adjusting unit in a voice call interface of the terminal when the voice call is triggered;

a voice conversion module 520 configured to, when a first voice is received through the radio equipment of the terminal, convert the first voice into a second voice having a target tone according to voice conversion information, where the voice conversion information is a tone parameter set by a user through a tone adjustment unit and used for representing the target tone;

a voice output module 530 configured to output the second voice as a call voice of a voice call.

Optionally, the tone color adjusting unit includes: the voice processing device comprises a sound intensity adjusting unit which is preset according to the frequency spectrum characteristics of voice timbres and is used for adjusting the sound intensities of different frequency points in the voice, and/or one or more timbre selecting units corresponding to the preset timbres.

Fig. 6 is a block diagram illustrating another call voice output apparatus according to the embodiment shown in fig. 5, and as shown in fig. 6, the apparatus 500 further includes:

an information generating module 540 configured to generate the sound conversion information according to the operation information of the user for the tone color modulation unit; wherein the content of the first and second substances,

the tone adjusting unit includes: a sound intensity adjusting unit and a tone selecting unit, the information generating module 540 is configured to:

and generating a voice parameter which is corresponding to the second operation information and used for representing a preset tone as the voice conversion information according to the second operation information of the user for the tone selecting unit.

Optionally, the voice conversion module 520 is configured to:

acquiring the sound received by the radio equipment every other preset time;

if the target sound acquired within the target preset time length comprises a voice, taking the target sound as the first voice, wherein the target preset time length is any preset time length within a plurality of preset time lengths in the voice call process;

determining the sound conversion information corresponding to the tone adjusting unit at the starting time point of the target preset duration;

and converting the first voice into the second voice according to the target tone corresponding to the voice conversion information.

Fig. 7 is a block diagram of another call voice output apparatus according to the embodiment shown in fig. 6, and as shown in fig. 7, the apparatus 500 further includes:

a tone acquiring module 550 configured to acquire one or more preset tones input by a user;

and a tone color binding module 560 configured to bind the tone color selected unit with one or more of the preset tone colors in response to a user operation.

Optionally, the tone color obtaining module 550 is configured to:

acquiring a first voice parameter input by a user, and taking the tone corresponding to the first voice parameter as the preset tone; and/or the presence of a gas in the gas,

Fig. 8 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. As shown in fig. 8, the electronic device 800 may include: a processor 801, a memory 802, a multimedia component 803, an input/output (I/O) interface 804, and a communications component 805.

The processor 801 is configured to control the overall operation of the electronic device 800, so as to complete all or part of the steps in the above-mentioned call voice output method. The memory 802 is used to store various types of data to support operation at the electronic device 800, such as instructions for any application or method operating on the electronic device 800 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described speech output method.

In another exemplary embodiment, a computer readable storage medium comprising program instructions, such as the memory 802 comprising program instructions, executable by the processor 801 of the electronic device 800 to perform the above-described call voice output method is also provided.

Preferred embodiments of the present disclosure are described in detail above with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and other embodiments of the present disclosure may be easily conceived by those skilled in the art within the technical spirit of the present disclosure after considering the description and practicing the present disclosure, and all fall within the protection scope of the present disclosure. It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. Meanwhile, any combination can be made between various different embodiments of the disclosure, and the disclosure should be regarded as the disclosure of the disclosure as long as the combination does not depart from the idea of the disclosure. The present disclosure is not limited to the precise structures that have been described above, and the scope of the present disclosure is limited only by the appended claims.

Claims

1. A call voice output method, comprising:

2. The method of claim 1, wherein before the converting the first speech into the second speech having the target timbre according to the voice conversion information, the method further comprises:

3. The method of claim 1, wherein converting the first voice into a second voice with a target timbre according to a sound conversion signal when the first voice is received through a sound receiving device of the terminal comprises:

acquiring the sound received by the radio equipment every other preset time;

4. The method of claim 2, further comprising:

acquiring one or more preset timbres input by a user;

5. The method of claim 4, wherein the obtaining one or more preset timbres input by the user comprises:

6. A call voice output apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, wherein the speech conversion module is configured to:

acquiring the sound received by the radio equipment every other preset time;

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 9, wherein the tone acquisition module is configured to:

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

12. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 5.