CN105321526B - Audio processing method and electronic equipment - Google Patents

Audio processing method and electronic equipment Download PDF

Info

Publication number
CN105321526B
CN105321526B CN201510612358.4A CN201510612358A CN105321526B CN 105321526 B CN105321526 B CN 105321526B CN 201510612358 A CN201510612358 A CN 201510612358A CN 105321526 B CN105321526 B CN 105321526B
Authority
CN
China
Prior art keywords
audio data
audio
frequency
processing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510612358.4A
Other languages
Chinese (zh)
Other versions
CN105321526A (en
Inventor
王少敏
陈文辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201510612358.4A priority Critical patent/CN105321526B/en
Publication of CN105321526A publication Critical patent/CN105321526A/en
Application granted granted Critical
Publication of CN105321526B publication Critical patent/CN105321526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application discloses an audio processing method and an electronic device using the same. The audio processing method comprises the following steps: executing first processing on audio data to be processed to obtain first audio data in a first frequency band and second audio data outside the first frequency band; performing second processing on the first audio data to obtain third audio data; and synthesizing the third audio data and the second audio data to generate processed audio data.

Description

Audio processing method and electronic equipment
Technical Field
The present invention relates to the field of audio processing, and more particularly, to an audio processing method and an electronic device using the same.
Background
In currently used audio processing such as sound change processing, tone conversion is generally performed for the entire audio data. Such audio processing lacks accurate channel change, and processes both the human voice signal and the background sound signal contained in the audio data without distinction.
Accordingly, it is desirable to provide an audio processing method and an electronic device using the same, which are capable of performing desired audio processing for a target audio signal (such as a human voice signal) contained in audio data while leaving other non-target audio signals unchanged, thereby achieving accurate audio processing for the target audio signal.
Disclosure of Invention
In view of the above, the present invention provides an audio processing method and an electronic device using the same.
According to an embodiment of the present disclosure, there is provided an audio processing method including: executing first processing on audio data to be processed to obtain first audio data in a first frequency band and second audio data outside the first frequency band; performing second processing on the first audio data to obtain third audio data; and synthesizing the third audio data and the second audio data to generate processed audio data.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the first frequency band is a frequency band within a specific frequency range corresponding to a frequency range in which human hair is voiced.
Further, according to an audio processing method of an embodiment of the present disclosure, wherein the performing of the first processing on the audio data to be processed and the performing of the second processing on the first audio data obtain the first audio data whose frequency is changed and the second audio data which remains unchanged.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the performing of the first processing on the audio data to be processed includes: performing a first filtering process on the audio data to be processed to obtain the first audio data within the first frequency band, and performing a second filtering process on the audio data to be processed to obtain the second audio data outside the first frequency band.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the performing of the second processing on the first audio data includes: changing a frequency of the first audio data.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the changing the frequency of the first audio data includes: performing a first transformation on the first audio data to obtain first audio frequency data corresponding to the first audio data; changing a frequency value of the first audio frequency data to obtain third audio frequency data; and performing a second transform on the third audio frequency number to obtain the third audio data, wherein the first transform is an inverse transform of the first transform.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the performing the second processing on the first audio data further includes: the second processing is performed on the first audio data corresponding to a predetermined period of time.
Further, an audio processing method according to an embodiment of the present disclosure, wherein the performing the second processing on the first audio data further includes: identifying first sub audio data having a first characteristic and second sub audio data having a second characteristic in the first audio data based on the characteristic of the first audio data; and performing a first sub-process on the first sub-audio data and performing a second sub-process on the second sub-audio data.
Furthermore, an audio processing method according to an embodiment of the present disclosure, wherein the features include a voiceprint feature, a timbre feature and/or a tonal feature.
Further, according to an embodiment of the present disclosure, the audio processing method, wherein the synthesizing the third audio data and the second audio data, and the generating the processed audio data includes: extracting a timestamp marker in the third audio data; determining a start time point and an end time point of the third audio data relative to the second audio data based on the timestamp labels; and aligning and combining the third audio data and the second audio data based on the start time point and the end time point to generate processed audio data.
According to another embodiment of the present invention, there is provided an electronic apparatus including: the device comprises a filtering unit, a processing unit and a processing unit, wherein the filtering unit is used for executing first processing on audio data to be processed to obtain first audio data in a first frequency band and second audio data outside the first frequency band; the tone changing unit is used for executing second processing on the first audio data to obtain third audio data; and a synthesizing unit configured to synthesize the third audio data and the second audio data, and generate processed audio data.
Furthermore, according to another embodiment of the present invention, the electronic device, wherein the first frequency band is a frequency band within a specific frequency range, the specific frequency range corresponding to a frequency range in which human hair sounds.
Further, according to an electronic apparatus of another embodiment of the present invention, wherein the synthesizing unit synthesizes the first audio data whose frequency is changed and the second audio data which remains unchanged.
Furthermore, according to another embodiment of the present invention, the filtering unit includes a first filtering subunit configured to perform a first filtering process on the audio data to be processed to obtain the first audio data within the first frequency band, and perform a second filtering process on the audio data to be processed to obtain the second audio data outside the first frequency band.
Further, an electronic apparatus according to another embodiment of the present invention, wherein the transposition unit changes a frequency of the first audio data.
Further, an electronic apparatus according to another embodiment of the present invention, wherein the transposition unit performs a first transformation on the first audio data to obtain first audio frequency data corresponding to the first audio data; changing a frequency value of the first audio frequency data to obtain third audio frequency data; and performing a second transformation on the third audio frequency number to obtain the third audio data,
Further, an electronic apparatus according to another embodiment of the present invention, wherein the transposition unit performs the second processing on the first audio data corresponding to a predetermined period of time.
Further, an electronic apparatus according to another embodiment of the present invention, wherein the transposition unit identifies first sub-audio data having a first characteristic and second sub-audio data having a second characteristic in the first audio data, based on a characteristic of the first audio data; and performing a first sub-process on the first sub-audio data and performing a second sub-process on the second sub-audio data.
Furthermore, an electronic device according to another embodiment of the present invention, wherein the feature includes a voiceprint feature, a timbre feature and/or a tonal feature.
Further, an electronic apparatus according to another embodiment of the present invention, wherein the synthesizing unit extracts a time stamp mark in the third audio data; determining a start time point and an end time point of the third audio data relative to the second audio data based on the timestamp labels; and aligning and combining the third audio data and the second audio data based on the start time point and the end time point to generate processed audio data.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and are intended to provide further explanation of the claimed technology.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 is a flow chart summarizing an audio processing method according to an embodiment of the present invention;
Fig. 2 is a block diagram illustrating the structure of an electronic apparatus according to an embodiment of the present invention;
Fig. 3 is a flowchart illustrating a first example of an audio processing method according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating an audio processing procedure according to an embodiment of the present invention;
Fig. 5 is a flowchart illustrating a second example of an audio processing method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating audio processing according to an embodiment of the present invention; and
Fig. 7 is a flowchart illustrating a third example of an audio processing method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described in the present disclosure without inventive step, shall fall within the scope of protection of the invention.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a flowchart outlining an audio processing method according to an embodiment of the present invention. An audio processing method according to an embodiment of the present invention includes the following steps.
In step S101, a first process is performed on the audio data to be processed, and first audio data within a first frequency band and second audio data outside the first frequency band are obtained. As will be described in detail below, the audio data to be processed is audio data collected by the electronic device according to an embodiment of the present invention. In one embodiment of the present invention, the audio data to be processed includes audio signals respectively in different frequency bands. Generally, sound is generated by vibration of an object, and basic characteristic elements of sound include: tone, intensity, and timbre. In particular, different vibration frequencies are different tones. In audio data acquired by collecting sound, human voice and background have different characteristics and are in different frequency bands. In a preferred embodiment of the present invention, the first frequency band is a frequency band within a specific frequency range corresponding to a frequency range of the human hair sound, for example, 64Hz-523 Hz. Hereinafter, how to obtain the first audio data within the first frequency band and the second audio data outside the first frequency band by performing the first process will be described in further detail with reference to the drawings. Thereafter, the process proceeds to step S102.
In step S102, second processing is performed on the first audio data, and third audio data is obtained. As will be described in detail below, the second process is performed on the first audio data, obtaining the first audio data with the frequency changed. That is, the second process is a process for changing the first audio data. As described above with reference to step S101, the first audio data is audio data within a first frequency band (e.g., corresponding to a frequency range in which a person utters), and then the frequency of the person' S voice in the first audio data will be changed by the second processing.
For example, a 12-equal law is typically used, i.e., the sound is divided into sound levels CDEFGAB. The distance between a certain level and the 8 levels on the network is called "pure octave". The 12 equal law divides a pure octave into 12 equal semitones, the physics between adjacent semitones Difference in vibration frequency of 2 1/12And (4) doubling. That is, the frequencies of the half tones are in an equal ratio.
In one embodiment of the present invention, the signal frequency of the first audio data is f for example, and then the signal frequency of the third audio data obtained by the second processing is f'
f′=f×2d/12,d=±1、±2、±3… (1)
When d > represents that the second processing raises the frequency of the first audio data, namely raises the tone of the first audio data; when d is less than d, the second process reduces the frequency of the first audio data, i.e. the first audio data is tone-reduced.
After obtaining the first audio data (i.e., the third audio data) whose frequency is changed and the second audio data which remains unchanged, the process proceeds to step S103.
In step S103, the third audio data and the second audio data are synthesized, and processed audio data is generated. Specifically, the first audio data with changed frequency is synthesized with the second audio data with the frequency kept unchanged, and processed audio data is obtained. Hereinafter, audio data synthesis in the audio processing method according to the embodiment of the present invention will be described in further detail with reference to the accompanying drawings.
As described above, compared to the audio data to be processed that is input at the beginning of the audio processing method according to the embodiment of the present invention, in the processed audio data, a change in frequency (e.g., up-or down-pitched, that is, female voice is converted into male voice, or old man voice is converted into child voice) occurs in the first audio data corresponding to a specific frequency range (e.g., 64Hz to 523Hz corresponding to human voice), while the second audio data in a non-specific frequency range is kept unchanged in frequency, so that only human voice is subjected to the fine pitch change processing in the synthesized processed audio data, while the background voice is kept unchanged.
Fig. 2 is a block diagram illustrating the structure of an electronic apparatus according to an embodiment of the present invention. The electronic device 10 as shown in fig. 2 is used to perform the audio processing method according to the embodiment of the present invention described with reference to fig. 1.
The electronic device 10 is preferably, for example, an electronic device with audio processing capabilities, including but not limited to portable electronic devices (such as smart phones, personal digital assistants, tablets), personal computers, home theater systems, commercial karaoke entertainment systems, and the like.
As shown in fig. 2, the electronic device 10 according to an embodiment of the present invention includes a processing module 100 and an audio input output module 200. It is easily understood that only components closely related to the present invention are shown in fig. 1 for simplicity of description, and the electronic device 10 according to an embodiment of the present invention may of course further include other components such as a display module, a memory module, and the like.
The processing module 100 is configured to perform audio processing according to an embodiment of the present invention. In one embodiment of the invention, the processing module 100 may be configured by a Central Processing Unit (CPU) of the electronic device 10. Alternatively, the processing module 100 may be configured by a dedicated Audio Processing Unit (APU) of the electronic device 10.
The audio input/output module 200 is configured to acquire audio data to be processed and output processed audio data after audio processing is performed by the processing module 100. In one embodiment of the present invention, the audio input/output module 200 may collect the audio data to be processed by using an audio collecting unit such as a microphone. Alternatively, the audio input output module 200 may retrieve and extract the audio data to be processed from a storage module (not shown) in the electronic device 10 or receive the audio data to be processed from another electronic device via a wired or wireless communication channel. After performing audio processing via the processing module 100, the audio input output module 200 may output the processed audio data via an audio output unit such as a speaker. Alternatively, the audio input output module 200 may store the processed audio data in a storage module in the electronic device 10 or transmit the processed audio data to another electronic device via a wired or wireless communication channel.
More specifically, as shown in fig. 2, the processing module 100 includes a filtering unit 110, a pitch changing unit 120, and a synthesizing unit 130. The filtering unit 110 is configured to perform a first process on the audio data to be processed, and obtain first audio data in a first frequency band and second audio data outside the first frequency band. The pitch changing unit 120 is configured to perform a second process on the first audio data to obtain third audio data. The synthesizing unit 130 is configured to synthesize the third audio data and the second audio data, and generate processed audio data. Hereinafter, an audio processing method according to an embodiment of the present invention performed by the processing module 100 including the filtering unit 110, the transposition unit 120, and the synthesis unit 130 will be further described in detail with reference to the accompanying drawings.
A first example of an audio processing method according to an embodiment of the present invention is described with reference to fig. 3 and 4. Fig. 3 is a flowchart illustrating a first example of an audio processing method according to an embodiment of the present invention. Fig. 4 is a block diagram illustrating an audio processing procedure according to an embodiment of the present invention.
As shown in fig. 3, a first example of an audio processing method according to an embodiment of the present invention includes the following steps.
In step S301, first filtering processing is performed on audio data to be processed to obtain first audio data within a first frequency band.
After step S301, or simultaneously with step S301, in step S302, second filtering processing is performed on the audio data to be processed to obtain second audio data outside the first frequency band.
As shown in FIG. 4, to obtain the first audio data and the second audio data, audio data A to be processed 0Are input to the first 111 and second filtering subunits in the filtering unit 110, respectively. Specifically, the first filtering subunit 111 performs a first filtering process (e.g., a first band-pass filtering process), obtains and outputs first audio data a within a first frequency band (e.g., 64Hz-523Hz corresponding to human voice) 1. The second filtering subunit 112 performs a second filtering process (e.g., a second band-pass filtering process) to obtain second audio data a outside the first frequency band 2
Referring back to fig. 3, after step S302, the process proceeds to step S303. In step S303, a first transformation is performed on the first audio data to obtain first audio frequency data corresponding to the first audio data.
As shown in fig. 4, the first audio data a output from the first filtering subunit 111 1Into a first transformation subunit 121 in the transposition unit 120. The first transform subunit 121 performs a first transform on the first audio data. In an embodiment of the invention said first transformation is a fast fourier transformation for transforming the first audio data a 1First audio frequency data A converted into frequency domain 1f
Referring back to fig. 3, after step S303, the process proceeds to step S304. In step S304, the frequency value of the first audio frequency data is changed to obtain third audio frequency data.
As shown in fig. 4, the first audio frequency data a output from the first transform subunit 121 1fInto a frequency conversion subunit 122 in the transposition unit 120. The frequency conversion subunit 122 changes the first audio frequency data a 1fTo obtain third audio frequency data a 3f. For example, if the first audio frequency data A is increased 1fThat is, the tone-up processing is performed, it is possible to obtain an effect of changing the male voice into the female voice and the old-person voice into the child voice; if the first audio frequency data A is reduced 1fThat is, the tone down processing is performed, it is possible to obtain an effect of changing the female voice into the male voice and the child voice into the old voice.
Specifically, for the pitch up processing, the first audio frequency data a is processed 1fIs extended towards high frequencies.
For example, at a first audio frequency data A 1fBy performing decimation at regular intervals to achieve down-sampling and up-modulation. Similarly, for the pitch reduction processing, the first audio frequency data a is processed 1fThe spectral lines of (a) shrink towards low frequencies. For example, at a first audio frequency data A 1fBy interpolating new data points between two adjacent data points to achieve up-sampling and down-modulation.
For example, assume that the pitch factor is:
Figure BDA0000809337470000081
where M and L are positive integers, any rational multiple of the frequency of the tone can be expressed as:
Figure BDA0000809337470000082
where N is the frame length, [ ] indicates the rounding operation, mod is the modulo operation, when M > L, the up-scaling is achieved, and when M < L, the down-scaling is achieved.
It is important to note that after the transposition, in order to keep the duration of the overall audio data constant, appropriate processing needs to be performed on the transposed data. For example, in the case of rising tone, since the rising tone may shorten the original data, the data after the post-tone processing may be divided into frames, and the data of the last part of each frame may be compensated for after the frame. In one embodiment of the invention, the specific length of the compensated data may be the frame length
Figure BDA0000809337470000083
) And (4) doubling.
Referring back to fig. 3, after step S304, the process proceeds to step S305. In step S305, a second transform is performed on the third audio frequency number to obtain third audio data.
As shown in fig. 4, the third audio frequency data a output from the frequency conversion subunit 122 3fInto a second transformation subunit 123 in the transposition unit 120. The second conversion subunit 123 performs conversion on the third audio frequency data a 3fA second transformation is performed. In an embodiment of the invention, said second transform is an inverse fast fourier transform for converting the third audio frequency data a 3fThird audio data A converted into time domain 3
Referring back to fig. 3, after step S305, the process proceeds to step S306. In step S306, the third audio data and the second audio data are synthesized, and processed audio data is generated.
As shown in fig. 4, the third audio data a output from the second transforming sub-unit 123 in the transposition unit 120 3And the second audio data a output from the second filtering subunit 112 2Enters the synthesis unit 130. The synthesizing unit 130 synthesizes them to generate processed audio data a p
In the first example of the audio processing method according to the embodiment of the present invention described with reference to fig. 3 and 4, the transposition process of changing the frequency is performed corresponding to the first audio data within a specific frequency range (for example, 64Hz-523Hz corresponding to human voice) while keeping the overall length of the processed audio data constant to avoid distortion. So that only human voice is precisely inflected in the synthesized processed audio data while the background sound is kept unchanged.
A second example of an audio processing method according to an embodiment of the present invention is described with reference to fig. 5 and 6. Fig. 5 is a flowchart illustrating a second example of an audio processing method according to an embodiment of the present invention. Fig. 6 is a schematic diagram illustrating audio processing according to an embodiment of the present invention.
As shown in fig. 5, a second example of an audio processing method according to an embodiment of the present invention includes the following steps.
In step S501, first processing is performed on audio data to be processed, and first audio data within a first frequency band and second audio data outside the first frequency band are obtained. The processing in step S501 is equivalent to the processing performed in the first filtering subunit 111 and the second filtering subunit 112 described with reference to fig. 3 and steps S301 and S302 and described with reference to fig. 4. Thereafter, the process proceeds to step S502.
In step S502, second processing is performed on the first audio data corresponding to the predetermined period of time, and third audio data is obtained. Unlike the first example of the audio processing method according to the embodiment of the present invention described with reference to fig. 3, in the second example of the audio processing method according to the embodiment of the present invention, only for the first audio data a corresponding to the predetermined time period 1The second process is executed. In one embodiment of the invention The predetermined period of time may be specified by the user so that the transposition is performed only for the voice of the predetermined period of time, while for the first audio data a 1The human voice in other periods of time is not modified. In this case, a single human voice is transposed for a predetermined period of time (for example, a male voice becomes a female voice), so that the effect of the male and female voices singing is obtained by the combination of the transposed voice for the predetermined period of time and the unvoiced voice outside the predetermined period of time. Thereafter, the process proceeds to step S503.
Steps S503 to S505 correspond to the synthesis process described with reference to step S306 of fig. 3. The synthesis process will be described in further detail herein by means of three steps S503 to S505 and the schematic diagram of fig. 6.
In step S503, the time stamp mark in the third audio data is extracted.
As shown in fig. 6, the synthesizing unit 130 extracts the third audio data a output via the transposition unit 120 3Time stamp T of 1And T 2And second audio data a output via the filtering unit 110 2Time stamp T of 0And T T
Thereafter, the process proceeds to step S504. In step S504, a start time point and an end time point of the third audio data with respect to the second audio data are determined based on the time stamp mark.
As shown in fig. 6, the synthesizing unit 130 bases on the third audio data a extracted in step S503 3Time stamp T of 1And T 2And second audio data A 2Time stamp T of 0And T TDetermining the third audio data A 3With respect to the second audio data A 2A starting time point and an ending time point.
Thereafter, the process proceeds to step S505. In step S505, the third audio data and the second audio data are aligned and combined based on the start time point and the end time point to generate processed audio data.
As shown in fig. 6, the synthesizing unit 130 aligns and combines the third audio data a based on the start time point and the end time point acquired in step S504 3And second audio data A 2To generate processed audio data A p
In the second example of the audio processing method according to the embodiment of the present invention described with reference to fig. 5 and 6, the transposition process of changing the frequency is performed on the first audio data corresponding to the predetermined period of time, achieving a combination including both the transposed human voice and the non-transposed human voice. In addition, the alignment of the respective audio data is accurately achieved based on the time stamp marks in the respective audio data in the process of synthesis, avoiding unnecessary noise due to misalignment.
A third example of an audio processing method according to an embodiment of the present invention is described with reference to fig. 7. As shown in fig. 7, a third example of an audio processing method according to an embodiment of the present invention includes the following steps.
In step S701, a first process is performed on the audio data to be processed, and first audio data within a first frequency band and second audio data outside the first frequency band are obtained. The processing in step S701 is equivalent to the processing performed in the first filtering subunit 111 and the second filtering subunit 112 described with reference to step S301 and S302 described with reference to fig. 3 and step S501 described with reference to fig. 5, and described with reference to fig. 4. Thereafter, the process proceeds to step S702.
Step S702 and step S703 are executed by the transposition unit 120. Unlike steps S303 to S305 described with reference to fig. 3 and step S502 described with reference to fig. 5, in the third example of the audio processing method according to the embodiment of the present invention, in step S702, first sub audio data having a first feature and second sub audio data having a second feature in the first audio data are identified based on the feature of the first audio data. In one embodiment of the invention, the features include voiceprint features, timbre features and/or tonal features.
For example, the transposition unit 120 identifies different users in the first audio data based on the voiceprint characteristics of the first audio data. And, when the first sub-audio data satisfying the first feature (i.e., having the specific first voiceprint) and/or the second sub-audio data satisfying the second feature (i.e., having the specific second voiceprint) is identified, the process proceeds to step S703.
In step S703, the transposition unit 120 performs first sub-processing on the first sub-audio data and performs second sub-processing on the second sub-audio data, obtaining third audio data. That is, the transposition unit 120 performs different first sub-processing and second sub-processing for first sub-audio data and second sub-audio data having different voiceprint characteristics, respectively. That is, it is realized that a specific transposition process is performed for a specific user having a specific voiceprint feature.
Similarly, the transposition unit 120 identifies audio data at different pitches (i.e., frequency bands) in the first audio data based on the pitch characteristics of the first audio data. When the first sub audio data satisfying the first characteristic (i.e., having the specific first frequency band) and/or the second sub audio data satisfying the second characteristic (i.e., having the specific second frequency band) is identified, the process proceeds to step S703.
In step S703, the transposition unit 120 performs first sub-processing on the first sub-audio data and performs second sub-processing on the second sub-audio data, obtaining third audio data. That is, the transposition unit 120 performs different first sub-processing and second sub-processing for first sub-audio data and second sub-audio data having different pitch characteristics, respectively. For example, the up-modulation process is performed for the first sub audio data of a specific first frequency band, and the down-modulation process is performed for the second sub audio data of a specific second frequency band.
Thereafter, the process proceeds to step S704. The combining processing of steps S704 to S706 is the same as steps S503 to S505 described with reference to fig. 5, respectively, and a repetitive description thereof will be omitted here.
In the third example of the audio processing method according to the embodiment of the present invention described with reference to fig. 7, different sub-processes are performed on sub-audio data satisfying different characteristics based on the characteristics of the first audio data, thereby realizing that the corresponding audio sub-processes are performed based on specific characteristics of the acquired audio signal.
The audio processing method and the electronic device using the audio processing method according to the embodiments of the present invention are described above with reference to fig. 1 to 7, and are capable of performing desired audio processing on a target audio signal (such as a human voice signal) contained in audio data while leaving other non-target audio signals unchanged, thereby achieving accurate audio processing on the target audio signal.
It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that the series of processes described above includes not only processes performed in time series in the order described herein, but also processes performed in parallel or individually, rather than in time series.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.
The present invention has been described in detail, and the principle and embodiments of the present invention are explained herein by using specific examples, which are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (16)

1. An audio processing method, comprising:
Executing first processing on audio data to be processed to obtain first audio data in a first frequency band and second audio data outside the first frequency band;
Performing second processing on the first audio data to obtain third audio data; and
Synthesizing the third audio data and the second audio data to generate processed audio data,
Wherein the first audio data and the third audio data have a same time duration,
Wherein the synthesizing the third audio data and the second audio data, and the generating the processed audio data comprises:
Extracting a timestamp marker in the third audio data;
Determining a start time point and an end time point of the third audio data relative to the second audio data based on the timestamp labels; and
Aligning and combining the third audio data and the second audio data based on the start time point and the end time point to generate processed audio data,
Wherein the performing of the second processing on the first audio data comprises:
In a case where the second process is a pitch-up process, dividing the first audio data subjected to the pitch-up process into frames, compensating data of a last part of each frame after the frame to obtain third audio data,
Wherein the performing second processing on the first audio data further comprises:
Identifying first sub audio data having a first characteristic and second sub audio data having a second characteristic in the first audio data based on the characteristic of the first audio data; and
Performing a first sub-process on the first sub-audio data, and performing a second sub-process on the second sub-audio data.
2. The audio processing method of claim 1, wherein the first frequency band is a frequency band within a specific frequency range, the specific frequency range corresponding to a frequency range of human voice.
3. The audio processing method according to claim 1 or 2, wherein said performing a first process on the audio data to be processed and said performing a second process on the first audio data obtain the first audio data with a frequency changed and the second audio data with a frequency maintained unchanged.
4. The audio processing method of claim 1 or 2, wherein the performing of the first processing on the audio data to be processed comprises:
Performing a first filtering process on the audio data to be processed to obtain the first audio data within the first frequency band, and performing a second filtering process on the audio data to be processed to obtain the second audio data outside the first frequency band.
5. The audio processing method of claim 1 or 2, wherein the performing second processing on the first audio data comprises:
Changing a frequency of the first audio data.
6. The audio processing method of claim 5, wherein the changing the frequency of the first audio data comprises:
Performing a first transformation on the first audio data to obtain first audio frequency data corresponding to the first audio data;
Changing a frequency value of the first audio frequency data to obtain third audio frequency data; and
Performing a second transformation on the third audio frequency number to obtain the third audio data,
Wherein the second transform is an inverse of the first transform.
7. The audio processing method of claim 1 or 2, wherein the performing second processing on the first audio data further comprises:
The second processing is performed on the first audio data corresponding to a predetermined period of time.
8. The audio processing method of claim 1, wherein the features of the first audio data, the first features, and the second features comprise voiceprint features, timbre features, and/or tonal features.
9. An electronic device, comprising:
The device comprises a filtering unit, a processing unit and a processing unit, wherein the filtering unit is used for executing first processing on audio data to be processed to obtain first audio data in a first frequency band and second audio data outside the first frequency band;
The tone changing unit is used for executing second processing on the first audio data to obtain third audio data; and
A synthesizing unit for synthesizing the third audio data and the second audio data to generate processed audio data,
Wherein the first audio data and the third audio data have a same time duration,
Wherein the synthesizing unit extracts a time stamp mark in the third audio data;
Determining a start time point and an end time point of the third audio data relative to the second audio data based on the timestamp labels; and
Aligning and combining the third audio data and the second audio data based on the start time point and the end time point to generate processed audio data,
Wherein the pitch shifting unit divides the first audio data subjected to the pitch up processing into frames, compensates data of a last part of each frame after the frame, and obtains third audio data, in a case where the second processing is the pitch up processing,
Wherein the transposition unit identifies first sub audio data having a first characteristic and second sub audio data having a second characteristic in the first audio data based on a characteristic of the first audio data; and
Performing a first sub-process on the first sub-audio data, and performing a second sub-process on the second sub-audio data.
10. The electronic device of claim 9, wherein the first frequency band is a frequency band within a particular frequency range, the particular frequency range corresponding to a frequency range of human hair sounds.
11. The electronic device according to claim 9 or 10, wherein the synthesizing unit synthesizes the first audio data whose frequency is changed and the second audio data which remains unchanged.
12. The electronic device of claim 9 or 10, wherein the filtering unit comprises a first filtering subunit for performing a first filtering process on the audio data to be processed to obtain the first audio data within the first frequency band and performing a second filtering process on the audio data to be processed to obtain the second audio data outside the first frequency band.
13. The electronic device of claim 9 or 10, wherein the transposition unit changes a frequency of the first audio data.
14. The electronic device of claim 13, wherein the transposition unit performs a first transformation on the first audio data to obtain first audio frequency data corresponding to the first audio data;
Changing a frequency value of the first audio frequency data to obtain third audio frequency data; and
Performing a second transformation on the third audio frequency number to obtain the third audio data.
15. The electronic device according to claim 9 or 10, wherein the transposition unit performs the second processing on the first audio data corresponding to a predetermined period of time.
16. The electronic device of claim 9, wherein the features of the first audio data, the first features, and the second features comprise voiceprint features, timbre features, and/or tonal features.
CN201510612358.4A 2015-09-23 2015-09-23 Audio processing method and electronic equipment Active CN105321526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510612358.4A CN105321526B (en) 2015-09-23 2015-09-23 Audio processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510612358.4A CN105321526B (en) 2015-09-23 2015-09-23 Audio processing method and electronic equipment

Publications (2)

Publication Number Publication Date
CN105321526A CN105321526A (en) 2016-02-10
CN105321526B true CN105321526B (en) 2020-07-24

Family

ID=55248774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510612358.4A Active CN105321526B (en) 2015-09-23 2015-09-23 Audio processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN105321526B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869621B (en) * 2016-05-20 2019-10-25 广州华多网络科技有限公司 Audio synthesizer and its audio synthetic method
CN106128474A (en) * 2016-07-04 2016-11-16 广东小天才科技有限公司 A kind of audio-frequency processing method and device
CN107707974A (en) * 2017-09-18 2018-02-16 广东九联科技股份有限公司 A kind of realization method and system of special efficacy voice function
CN108965757B (en) * 2018-08-02 2021-04-06 广州酷狗计算机科技有限公司 Video recording method, device, terminal and storage medium
CN111210833A (en) * 2019-12-30 2020-05-29 联想(北京)有限公司 Audio processing method, electronic device, and medium
CN113409801B (en) * 2021-08-05 2024-03-19 云从科技集团股份有限公司 Noise processing method, system, medium and device for real-time audio stream playing

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN102592607A (en) * 2012-03-30 2012-07-18 北京交通大学 Voice converting system and method using blind voice separation
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103280215A (en) * 2013-05-28 2013-09-04 北京百度网讯科技有限公司 Audio frequency feature library establishing method and device
CN103310796A (en) * 2013-06-28 2013-09-18 姜鸿彦 Voice signal extraction method
CN104078051A (en) * 2013-03-29 2014-10-01 中兴通讯股份有限公司 Voice extracting method and system and voice audio playing method and device
CN104704558A (en) * 2012-09-14 2015-06-10 杜比实验室特许公司 Multi-channel audio content analysis based upmix detection
CN104916288A (en) * 2014-03-14 2015-09-16 深圳Tcl新技术有限公司 Human voice highlighting processing method and device in audio

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001306100A (en) * 2000-04-25 2001-11-02 Matsushita Electric Works Ltd Voice conversion system
CN1967657B (en) * 2005-11-18 2011-06-08 成都索贝数码科技股份有限公司 Automatic tracking and tonal modification system of speaker in program execution and method thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN102592607A (en) * 2012-03-30 2012-07-18 北京交通大学 Voice converting system and method using blind voice separation
CN104704558A (en) * 2012-09-14 2015-06-10 杜比实验室特许公司 Multi-channel audio content analysis based upmix detection
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN104078051A (en) * 2013-03-29 2014-10-01 中兴通讯股份有限公司 Voice extracting method and system and voice audio playing method and device
CN103280215A (en) * 2013-05-28 2013-09-04 北京百度网讯科技有限公司 Audio frequency feature library establishing method and device
CN103310796A (en) * 2013-06-28 2013-09-18 姜鸿彦 Voice signal extraction method
CN104916288A (en) * 2014-03-14 2015-09-16 深圳Tcl新技术有限公司 Human voice highlighting processing method and device in audio

Also Published As

Publication number Publication date
CN105321526A (en) 2016-02-10

Similar Documents

Publication Publication Date Title
CN105321526B (en) Audio processing method and electronic equipment
CN106898340B (en) Song synthesis method and terminal
US8706496B2 (en) Audio signal transforming by utilizing a computational cost function
US8280738B2 (en) Voice quality conversion apparatus, pitch conversion apparatus, and voice quality conversion method
CN105957515B (en) Speech synthesizing method, speech synthesizing device and the medium for storing sound synthesis programs
WO2018084305A1 (en) Voice synthesis method
CN106373580A (en) Singing synthesis method based on artificial intelligence and device
JP2014215461A (en) Speech processing device, method, and program
WO2009003347A1 (en) A karaoke apparatus
JP2006215204A (en) Voice synthesizer and program
CN110675886A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
JP2006251375A (en) Voice processor and program
CN111667803B (en) Audio processing method and related products
JP2018004870A (en) Speech synthesis device and speech synthesis method
JP2018077283A (en) Speech synthesis method
JP2011118220A (en) Acoustic processing device
TW201027514A (en) Singing synthesis systems and related synthesis methods
KR20210155520A (en) Method and Apparatus for Synthesizing/Modulating Singing Voice of Multiple Singers
CN100508025C (en) Method for synthesizing speech
Siki et al. Time-frequency analysis on gong timor music using short-time fourier transform and continuous wavelet transform
CN112164387A (en) Audio synthesis method and device, electronic equipment and computer-readable storage medium
US11495200B2 (en) Real-time speech to singing conversion
WO2017164216A1 (en) Acoustic processing method and acoustic processing device
CN112750422B (en) Singing voice synthesis method, device and equipment
JP2009244790A (en) Karaoke system with singing teaching function

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant