CN108495234B

CN108495234B - Multi-channel audio processing method, apparatus and computer-readable storage medium

Info

Publication number: CN108495234B
Application number: CN201810356546.9A
Authority: CN
Inventors: 黄传增
Original assignee: Beijing Microlive Vision Technology Co Ltd
Current assignee: Beijing Microlive Vision Technology Co Ltd
Priority date: 2018-04-19
Filing date: 2018-04-19
Publication date: 2020-01-07
Anticipated expiration: 2038-04-19
Also published as: CN108495234A; WO2019200996A1

Abstract

The present invention relates to a multi-channel audio processing method, apparatus, and computer-readable storage medium. Wherein, the multi-channel audio processing method comprises: receiving multi-channel audio to be processed; detecting the audio characteristics of each channel audio in the multi-channel audio to be processed; and processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel. By adopting the technical scheme, the embodiment of the invention carries out corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby solving the technical problem of how to obtain good user experience effect.

Description

Multi-channel audio processing method, apparatus and computer-readable storage medium

Technical Field

The present invention relates to the field of audio technologies, and in particular, to a method and an apparatus for processing multi-channel audio, and a computer-readable storage medium.

Background

With the popularity of audio interaction, audio is increasingly used as a carrier of information dissemination for such interactions. In order to obtain a good interactive experience, users are increasingly paying attention to the experience of audio.

Currently, the prior art generally processes monaural audio. For multi-channel audio, the method for processing the single-channel audio does not consider the characteristics of each channel audio in the multi-channel audio; therefore, when the existing processing method for the monaural audio is applied to the multichannel audio, a good user experience effect cannot be obtained.

In view of the above, it is an urgent technical problem to provide a multi-channel audio processing method capable of obtaining a good user experience.

Disclosure of Invention

The technical problem to be solved by the invention is to provide various multi-channel audio processing methods to at least partially solve the technical problem of how to obtain good user experience effect; furthermore, a multi-channel audio processing apparatus, a multi-channel audio processing hardware apparatus, and a computer-readable storage medium are provided.

In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:

a multi-channel audio processing method, comprising:

receiving multi-channel audio to be processed;

detecting the audio characteristics of each channel audio in the multi-channel audio to be processed;

and processing the multichannel audio to be processed according to the audio characteristics of the channel audio.

Preferably, the step of detecting the audio characteristics of each channel of the multi-channel audio to be processed includes:

if the multi-channel audio to be processed is off-line audio, detecting the overall audio characteristics of each channel audio in the multi-channel audio to be processed;

and if the multi-channel audio to be processed is online audio, detecting local audio characteristics of each channel of the multi-channel audio to be processed.

Preferably, if the multi-channel audio to be processed is an off-line audio, the step of processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel includes:

determining a first audio processing parameter according to the overall audio characteristic;

and processing the multi-channel audio to be processed based on the first audio processing parameter.

Preferably, the overall audio characteristics include pitch characteristics, acoustic formant characteristics, and transient acoustic impulse characteristics;

the step of determining a first audio processing parameter according to the overall audio characteristic specifically includes:

determining the first audio processing parameter from the pitch characteristic, the acoustic formant characteristic, and the transient acoustic pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a formant amplitude, and a transient pulse;

the step of processing the multi-channel audio to be processed based on the first audio processing parameter includes:

and adjusting the amplitude of the fundamental frequency, smoothing the amplitude of the resonance peak, and carrying out clipping treatment on the transient pulse.

Preferably, the overall audio characteristic comprises a pitch characteristic and a sound formant characteristic;

the step of determining a first audio processing parameter based on the overall audio characteristic comprises:

determining the first audio processing parameter from the pitch characteristic and the sound formant characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a formant amplitude;

the step of processing the multi-channel audio to be processed based on the first audio processing parameter specifically includes:

adjusting the amplitude of the fundamental frequency and smoothing the amplitude of the resonance peak.

Preferably, the overall audio characteristic comprises a pitch characteristic and a transient sound pulse characteristic;

determining the first audio processing parameter from the pitch characteristic and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a transient impulse;

and adjusting the amplitude of the fundamental frequency and carrying out clipping treatment on the transient pulse.

Preferably, the overall audio characteristics include pitch characteristics, sound formant characteristics, transient sound pulse characteristics, and audio phase characteristics;

determining the first audio processing parameter from the pitch characteristic, the acoustic formant characteristic, the transient acoustic pulse characteristic, and the audio phase characteristic; wherein the first audio processing parameters include a fundamental frequency amplitude, a formant amplitude, a transient pulse, and an audio phase;

adjusting the amplitude of the fundamental frequency, smoothing the amplitude of the resonance peak, performing clipping processing on the transient pulse, and adjusting the audio phase.

Preferably, the overall audio characteristics comprise a multi-channel audio downmix characteristic and a main side channel characteristic;

determining the first audio processing parameter according to the multi-channel audio downmix characteristic and the primary side channel characteristic; wherein the first audio processing parameter comprises: strong audio correlation, fundamental frequency amplitude and formant amplitude;

and performing joint processing on all channel audios in the multi-channel audio to be processed, adjusting the fundamental frequency amplitude and smoothing the formant amplitude.

Preferably, if the multichannel audio to be processed is an online audio, the step of processing the multichannel audio to be processed according to the audio characteristics of the channel audios specifically includes:

determining a second audio processing parameter according to the local audio characteristic;

and processing the multi-channel audio to be processed based on the second audio processing parameter.

In order to achieve the above object, according to another aspect of the present invention, the following technical solutions are also provided:

a multi-channel audio processing apparatus comprising:

the receiving module is used for receiving multi-channel audio to be processed;

the detection module is used for detecting the audio characteristics of each channel audio in the multi-channel audio to be processed;

and the processing module is used for processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel.

Preferably, the detection module comprises:

the first detection unit is used for detecting the overall audio characteristics of each channel audio in the multi-channel audio to be processed under the condition that the multi-channel audio to be processed is offline audio;

and the second detection unit is used for detecting the local audio characteristics of each channel audio in the multi-channel audio to be processed under the condition that the multi-channel audio to be processed is online audio.

Preferably, if the multichannel audio to be processed is offline audio, the processing module includes:

a first determining unit, configured to determine a first audio processing parameter according to the overall audio characteristic;

and the first processing unit is used for processing the multi-channel audio to be processed based on the first audio processing parameter.

the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic, the sound formant characteristic, and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a formant amplitude, and a transient pulse;

the first processing unit is specifically configured to adjust the fundamental frequency amplitude, smooth the resonance peak amplitude, and perform clipping processing on the transient pulse.

the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic and the sound formant characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a formant amplitude;

the first processing unit is specifically configured to adjust the fundamental frequency amplitude and smooth the formant amplitude.

the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a transient impulse;

the first processing unit is specifically configured to adjust the fundamental frequency amplitude and perform clipping processing on the transient pulse.

the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic, the sound formant characteristic, the transient sound pulse characteristic, and the audio phase characteristic; wherein the first audio processing parameters include a fundamental frequency amplitude, a formant amplitude, a transient pulse, and an audio phase;

the first processing unit is specifically configured to adjust the fundamental frequency amplitude, smooth the formant amplitude, perform clipping processing on the transient pulse, and adjust the audio phase.

the first determining unit is specifically configured to determine the first audio processing parameter according to the multi-channel audio downmix characteristic and the primary side channel characteristic; wherein the first audio processing parameter comprises: strong audio correlation, fundamental frequency amplitude and formant amplitude;

the first processing unit is specifically configured to perform joint processing on all channel audios in the multi-channel audio to be processed, and adjust the fundamental frequency amplitude and smooth the formant amplitude.

Preferably, if the multichannel audio to be processed is online audio, the processing module further includes:

a second determining unit for determining a second audio processing parameter according to the local audio characteristic;

and the second processing unit is used for processing the multi-channel audio to be processed based on the second audio processing parameter.

In order to achieve the above object, according to another aspect of the present invention, the following technical solutions are further provided:

a multi-channel audio processing hardware apparatus, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions, such that the processor when executing implements the above multi-channel audio processing method.

a computer-readable storage medium storing non-transitory computer-readable instructions which, when executed by a computer, cause the computer to perform the multi-channel audio processing method described above.

The embodiment of the invention provides a multi-channel audio processing method, a multi-channel audio processing device and a computer readable storage medium. Wherein, the multi-channel audio processing method comprises: receiving multi-channel audio to be processed; detecting the audio characteristics of each channel audio in the multi-channel audio to be processed; and processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel. By adopting the technical scheme, the embodiment of the invention performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining good user experience effect.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understandable, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow diagram of a multi-channel audio processing method according to one embodiment of the invention;

FIG. 2 is a schematic flow chart illustrating detection for offline audio and online audio, respectively, according to an embodiment of the present invention;

FIG. 3 is a flow diagram illustrating processing for offline audio according to one embodiment of the present invention;

FIG. 4 is a flow diagram of a multi-channel audio processing method according to one embodiment of the invention;

FIG. 5 is a flow diagram illustrating processing for online audio according to one embodiment of the invention;

FIG. 6 is a flow chart illustrating a multi-channel audio processing method according to an embodiment of the present invention;

FIG. 7 is a block diagram of a multi-channel audio processing apparatus according to an embodiment of the present invention;

FIG. 8 is a block diagram of a processing module according to one embodiment of the invention;

FIG. 9 is a schematic diagram of a processing module according to another embodiment of the present invention;

FIG. 10 is a block diagram of a multi-channel audio processing hardware device according to an embodiment of the invention;

FIG. 11 is a schematic structural diagram of a computer-readable storage medium according to one embodiment of the present invention;

fig. 12 is a schematic structural diagram of a multi-channel audio processing terminal according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a multi-channel audio processing terminal according to another embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in practical implementation, and the type, quantity and proportion of the components in practical implementation can be changed freely, and the layout of the components can be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

In order to solve the technical problem of how to obtain a good user experience effect, an embodiment of the present invention provides a multi-channel audio processing method. As shown in fig. 1, the method may include steps S1 through S3 as follows. Wherein:

step S1: multichannel audio to be processed is received.

The to-be-processed multi-channel audio may be off-line to-be-processed multi-channel audio or on-line to-be-processed multi-channel audio, which is not limited in the present invention. The multi-channel audio includes, but is not limited to, 3.1 channel audio, 5.1 channel audio, 7.1 channel audio, etc.

Step S2: and detecting the audio characteristics of each channel of the multi-channel audio to be processed.

Among other things, audio characteristics include, but are not limited to: pitch characteristics, acoustic formant characteristics, transient acoustic impulse characteristics, audio phase characteristics, multi-channel audio downmix characteristics, primary side channel characteristics, etc.

In this step, one or several audio characteristics may be detected.

Step S3: and processing the multi-channel audio to be processed according to the detection result.

The step is to process the multi-channel audio to be processed correspondingly according to the detected audio characteristics of each channel audio in one or more multi-channel audio to be processed.

In this step, the way to process the multi-channel audio to be processed includes, but is not limited to: joint processing, separation processing, smoothing processing, audio phase processing, fundamental frequency processing, zero setting processing, spectrum expansion processing, amplitude limiting processing and the like.

For the sake of understanding, the following describes each of the above processing modes in detail:

the joint processing refers to processing the audio of each channel together;

the separation processing is to process each sound channel audio respectively;

the smoothing process is to filter out frequency domain data points with abrupt changes, namely peak data of a spectral peak in a smoothed frequency spectrum; in the specific implementation process, a neighborhood average method, a Gaussian smoothing method, a parabolic smoothing method and other methods can be adopted for implementation; taking a neighborhood averaging method as an example, based on a convolution operation principle, smoothing the amplitude of a frequency signal in a frequency spectrum by using a sliding window; taking the gaussian smoothing method as an example, it calculates the weight according to the form of the gaussian distribution function, and performs linear smoothing processing with the weight. The smoothing process may be for the full band of audio or for a portion of the band of audio. After the formants of the audio frequency are subjected to smoothing treatment, the tone-changing effect can be realized;

the fundamental frequency processing refers to adjusting the fundamental frequency of the audio frequency, so that the effect of tone modification is realized;

the audio phase processing refers to adjusting the phase of the audio, and specifically, the audio phase processing can be adjusted according to the audio phase corresponding to a preset sound effect;

the zeroing process is to eliminate the spectrum corresponding to the transient impulse in the whole frequency band of the audio frequency.

The spectrum expansion processing means that the spectrum is expanded by interpolating or extracting the audio frequency spectrum; this process can achieve a speed change effect.

The clipping process described above refers to clipping the amplitude of the transient pulse.

Embodiments of the present invention may employ one or more of the above-described processing approaches for the detected one or more audio characteristics. In practical applications, by adopting one or more processing modes, a speed mode (also referred to as processing speed priority), a quality mode (also referred to as high sound quality priority), a balance mode (which combines processing speed and high sound quality) can be realized, and effects of variable speed and non-variable speed, variable speed and variable tone can be realized.

By adopting the technical scheme, the embodiment of the invention performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining good user experience effect.

In order to perform adaptive processing on multiple sound sources such as an online sound source and an offline sound source, in an alternative embodiment, as shown in fig. 2, step S2 may specifically include:

step S21: if the multi-channel audio to be processed is off-line audio, detecting the overall audio characteristics of each channel audio in the multi-channel audio to be processed;

step S22: and if the multi-channel audio to be processed is online audio, detecting local audio characteristics of each channel of the multi-channel audio to be processed.

In this embodiment, since the online audio is a streaming media, the received audio is a segment of audio. Thus, the characteristic detected for the online audio is a local audio characteristic. For the offline audio, since the offline audio is complete audio that is encoded in advance, the characteristic detected for the offline audio is the overall audio characteristic, so as to ensure that a good user experience can be obtained after the audio processing.

Among other things, overall audio characteristics include, but are not limited to: pitch characteristics, acoustic formant characteristics, transient acoustic impulse characteristics, audio phase characteristics, multi-channel audio downmix characteristics, primary side channel characteristics, etc.

The local audio characteristics include all or part of the overall audio characteristics, which are not described herein again.

By adopting the technical scheme, the embodiment of the invention respectively detects the local audio characteristics and the overall audio characteristics of the obtained online sound source and the offline sound source, thereby realizing the self-adaptive audio characteristic detection, facilitating the self-adaptive processing aiming at different sound sources and improving the user experience effect.

It should be noted that it can be known in advance whether the multi-channel audio to be processed is offline audio or online audio. Of course, it may also be unknown in advance whether the multi-channel audio to be processed is offline audio or online audio.

In this regard, preferably, after the step S1, the multi-channel audio processing method may further include:

it is determined whether the multi-channel audio to be processed is offline audio or online audio.

In this embodiment, the determination may be performed according to respective characteristics of the offline audio and the online audio, for example, the offline audio is a complete audio, and the online audio may be a segment or a packet of a segment transmitted through a real-time messaging protocol, so as to determine whether the multi-channel audio to be processed is the offline audio or the online audio; identification markers may also be pre-added to determine whether the multi-channel audio to be processed is offline audio or online audio. The invention is not limited in this regard.

The embodiment of the invention respectively carries out corresponding processing on the off-line audio and the on-line audio, thereby being capable of adapting to different multi-channel audio application scenes and further obtaining better user experience effect.

In an alternative embodiment, on the basis of the above embodiment of processing different sound sources, if the multi-channel audio to be processed is an offline audio, as shown in fig. 3, the step S3 specifically includes:

step S31: determining a first audio processing parameter according to the overall audio characteristics;

step S32: and processing the multi-channel audio to be processed according to the first audio processing parameter.

The first audio processing parameter includes, but is not limited to, audio correlation strength between channels, fundamental frequency amplitude, resonance peak amplitude, transient impulse, audio envelope, and the like.

For example, if the audio correlation is strong, the audio of each channel in the multi-channel audio to be processed is jointly processed; if the audio correlation is weak, the audio of each channel in the multi-channel audio to be processed is separately processed (i.e., separated).

In a preferred embodiment, if the overall audio characteristics include pitch characteristics, sound formant characteristics, and transient sound pulse characteristics; the step of determining the first audio processing parameter according to the overall audio characteristic specifically comprises: determining a first audio processing parameter according to the pitch characteristic, the sound formant characteristic and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a formant amplitude and a transient pulse; based on the first audio processing parameter, the step of processing the multi-channel audio to be processed specifically includes: adjusting the amplitude of the fundamental frequency, smoothing the amplitude of the resonance peak, and clipping the transient pulse.

In a preferred embodiment, if the overall audio characteristics include pitch characteristics and sound formant characteristics; the step of determining the first audio processing parameter according to the overall audio characteristic specifically comprises: determining a first audio processing parameter according to the pitch characteristic and the sound formant characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a formant amplitude; based on the first audio processing parameter, the step of processing the multi-channel audio to be processed specifically includes: the amplitude of the fundamental frequency is adjusted and the amplitude of the resonance peak is smoothed.

In a preferred embodiment, if the overall audio characteristic comprises a pitch characteristic and a transient sound pulse characteristic; the step of determining a first audio processing parameter based on the overall audio characteristic comprises: determining a first audio processing parameter according to the pitch characteristic and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a transient impulse; based on the first audio processing parameter, the step of processing the multi-channel audio to be processed specifically includes: adjusting the amplitude of the fundamental frequency and carrying out clipping treatment on the transient pulse.

In a preferred embodiment, if the overall audio characteristics include pitch characteristics, sound formant characteristics, transient sound pulse characteristics, and audio phase characteristics; the step of determining the first audio processing parameter according to the overall audio characteristic may specifically include: determining a first audio processing parameter according to a pitch characteristic, a sound formant characteristic, a transient sound pulse characteristic and an audio phase characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a resonance peak amplitude, a transient pulse and an audio phase; the step of processing the multi-channel audio to be processed based on the first audio processing parameter may specifically include: adjusting the amplitude of the fundamental frequency, smoothing the amplitude of the resonance peak, performing clipping processing on the transient pulse, and adjusting the audio phase.

In a preferred embodiment, if the overall audio characteristics comprise multi-channel audio downmix characteristics and main side channel characteristics; the step of determining the first audio processing parameter according to the overall audio characteristic specifically comprises: determining a first audio processing parameter according to the multi-channel audio downmix characteristic and the main side channel characteristic; wherein the first audio processing parameter comprises: strong audio correlation, fundamental frequency amplitude and formant amplitude; based on the first audio processing parameter, the step of processing the multi-channel audio to be processed specifically includes: and performing joint processing on all channel audios in the multi-channel audio to be processed, and adjusting the amplitude of the fundamental frequency and smoothing the amplitude of the resonance peak.

The strong audio correlation can be determined according to the following factors: the spectral characteristics of each channel audio, the timbre of each channel audio source, the acquisition mode of each channel audio, etc., but are by no means limited thereto. Specifically, if the audio sources of the channels are jointly acquired, a joint processing mode can be adopted when the multi-channel audio to be processed is processed; if each channel audio is collected by an independent microphone, a separation processing mode can be adopted when the multi-channel audio to be processed is processed; if the frequency spectrum characteristics of each channel audio are good, a joint processing mode can be adopted when the multi-channel audio to be processed is processed; if the spectral characteristics of the audio of each channel are poor, a separation processing mode can be adopted when the multi-channel audio to be processed is processed; if the amplitude of the formant is larger than the formant threshold value, smoothing the formant contained in the multi-channel audio to be processed; if the audio envelope is shifted, the amplitudes of the fundamental frequency and the formants in the frequency domain of the multi-channel audio to be processed are adjusted.

Therefore, by adopting the technical scheme, the embodiment determines the corresponding first audio processing parameter according to the overall audio characteristic of the off-line multi-channel audio to be processed; then, adaptive processing is performed according to the determined first audio processing parameter, so that different audio effects can be obtained. For example, by adjusting the amplitude of the fundamental frequency, the effect of tonal modification of sound can be achieved; the effect of tone modification of sound can be realized by smoothing the amplitude of the resonance peak; the effect of sound tonal modification can be achieved by performing offset processing on the audio envelope; thereby adaptively modifying the audio frequency; therefore, the embodiment of the invention can obtain good user experience effect.

The present invention is further described in detail with reference to fig. 4 in a specific embodiment.

Step Sa 1: receiving multi-channel audio to be processed;

step Sa 2: if the multi-channel audio to be processed is off-line audio, detecting the overall audio characteristics of each channel audio in the multi-channel audio to be processed;

step Sa 3: determining a strong correlation audio processing parameter according to the overall audio characteristics;

step Sa 4: and performing joint processing on the multi-channel audio to be processed according to the audio processing parameters with strong correlation.

The embodiment detects the overall audio characteristics by receiving the offline multi-channel audio to be processed; then, determining a strong correlation audio processing parameter as a to-be-processed parameter of the to-be-processed multi-channel audio, and finally performing combined processing corresponding to the strong correlation audio processing parameter, thereby realizing adaptive processing and obtaining good user experience effect.

In an alternative embodiment, on the basis of the above embodiment of processing different sound sources, if the multi-channel audio to be processed is an online audio, as shown in fig. 5, the step S3 specifically includes:

step S33: determining a second audio processing parameter according to the local audio characteristics;

step S34: and processing the multi-channel audio to be processed according to the second audio processing parameter.

Wherein the second audio processing parameter may be part or all of the first audio processing parameter.

For the description of the present embodiment, reference may be made to the corresponding description in the embodiment shown in fig. 3, and details are not repeated here.

By adopting the technical scheme, the embodiment of the invention determines the corresponding second audio processing parameter according to the local audio characteristics of the on-line multi-channel audio to be processed; then, adaptive processing is performed according to the determined second audio processing parameter, so that different audio effects can be obtained. For example, by adjusting the amplitude of the fundamental frequency, the effect of tonal modification of sound can be achieved; the effect of tone modification of sound can be realized by smoothing the amplitude of the resonance peak; the effect of sound tonal modification can be achieved by performing offset processing on the audio envelope; thereby adaptively modifying the audio frequency; therefore, the embodiment of the invention can obtain good user experience effect.

For the obviously modified embodiment or the equivalent alternative embodiment for processing the online audio, reference may also be made to the foregoing embodiment for processing the offline audio, and details are not described here again.

In order to facilitate a better understanding of the invention, the invention is described in detail below in a specific embodiment with reference to fig. 6.

As shown in fig. 6, an embodiment of the present invention provides a multi-channel audio processing method, including:

step Sb 1: receiving multi-channel audio to be processed;

step Sb 2: determining whether the multi-channel audio to be processed is offline audio or online audio; if the multi-channel audio to be processed is offline audio, go to step Sb 3; if the multi-channel audio to be processed is online audio, go to step Sb 4;

step Sb 3: detecting the overall audio characteristics of each channel audio in the multi-channel audio to be processed, and executing the step Sb 5;

step Sb 4: detecting local audio characteristics of each channel audio in the multi-channel audio to be processed, and executing step Sb 7;

step Sb 5: determining a first audio processing parameter based on the overall audio characteristics, and performing step Sb 6;

step Sb 6: processing the multi-channel audio to be processed according to the first audio processing parameter;

step Sb 7: determining second audio processing parameters according to the local audio characteristics, and performing step Sb 8;

step Sb 8: and processing the multi-channel audio to be processed according to the second audio processing parameter.

By adopting the technical scheme, the embodiment of the invention respectively determines and processes the corresponding audio processing parameters according to the overall audio characteristics and the local audio characteristics aiming at the offline audio sound source and the online audio sound source, thereby realizing the self-adaptive audio processing and obtaining the good user experience effect.

In the above, although the steps in the embodiment of the multi-channel audio processing method are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiment of the present invention are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, those skilled in the art may also add other steps or delete the above partial steps, and these obvious modifications or equivalents should also be included in the protection scope of the present invention, and are not described herein again.

For convenience of description, only the relevant parts of the embodiments of the present invention are shown, and details of the specific technology are not disclosed, please refer to the embodiments of the present invention.

Based on the same technical concept as the method embodiment, the embodiment of the invention also provides a multi-channel audio processing device. As shown in fig. 7, the apparatus includes: a receiving module 71, a detecting module 72 and a processing module 73. The receiving module 71 is configured to receive multi-channel audio to be processed. The detection module 72 is used to detect audio characteristics of each channel of the multi-channel audio to be processed. The processing module 73 is configured to process the multi-channel audio to be processed according to the audio characteristics of each channel of audio.

By adopting the technical scheme, the processing module 73 performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, which is detected by the detection module 72, so that a good user experience effect is obtained.

In an optional embodiment, the detection module may specifically include:

In the embodiment, the first detection unit and the second detection unit are respectively used for detecting the overall audio characteristic or the local audio characteristic of the multi-channel audio to be processed as the off-line audio or the on-line audio, so that the self-adaptive processing of multiple sound sources is realized, and a user can obtain good experience.

In an alternative embodiment, as shown in fig. 8, if the multi-channel audio to be processed is offline audio, the processing module specifically includes a first determining unit 81 and a first processing unit 82. Wherein the first determining unit 81 is configured to determine the first audio processing parameter based on the overall audio characteristic. The first processing unit 82 is configured to process the multi-channel audio to be processed based on the first audio processing parameter.

For example, if the audio correlation is strong, the audio of each channel in the multi-channel audio to be processed is processed jointly; if the audio correlation is weak, the audio of each channel in the multi-channel audio to be processed is separately processed (i.e., separated).

In an alternative embodiment, the overall audio characteristics include pitch characteristics, acoustic formant characteristics, and transient acoustic impulse characteristics; the first determining unit 81 is specifically configured to determine a first audio processing parameter according to a pitch characteristic, an acoustic formant characteristic, and a transient acoustic pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a formant amplitude and a transient pulse; the first processing unit 82 is specifically configured to adjust the amplitude of the fundamental frequency, smooth the amplitude of the resonance peak, and perform clipping processing on the transient pulse.

In an alternative embodiment, the overall audio characteristics include pitch characteristics and sound formant characteristics; the first determining unit 81 may be further specifically configured to determine a first audio processing parameter according to a pitch characteristic and a sound formant characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a formant amplitude; the first processing unit 82 may also be specifically configured to adjust the amplitude of the fundamental frequency and smooth the amplitude of the formants.

In an alternative embodiment, the overall audio characteristics include a pitch characteristic and a transient sound pulse characteristic; the first determining unit 81 may be further specifically configured to determine a first audio processing parameter according to the pitch characteristic and the transient sound pulse characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude and a transient impulse; the first processing unit 82 may also be specifically configured to adjust the amplitude of the fundamental frequency and perform a clipping process on the transient pulse.

In an alternative embodiment, the overall audio characteristics include pitch characteristics, acoustic formant characteristics, transient acoustic impulse characteristics, and audio phase characteristics; the first determining unit 81 may be further specifically configured to determine a first audio processing parameter according to a pitch characteristic, a sound formant characteristic, a transient sound pulse characteristic, and an audio phase characteristic; wherein the first audio processing parameters comprise a fundamental frequency amplitude, a resonance peak amplitude, a transient pulse and an audio phase; the first processing unit 82 may be further specifically configured to adjust the amplitude of the fundamental frequency, smooth the amplitude of the formant, perform clipping processing on the transient pulse, and adjust the audio phase.

In an alternative embodiment, the overall audio characteristics comprise a multi-channel audio downmix characteristic and a main side channel characteristic; the first determining unit 81 may be further specifically configured to determine a first audio processing parameter according to the multi-channel audio downmix characteristic and the primary side channel characteristic; wherein the first audio processing parameter comprises: strong audio correlation, fundamental frequency amplitude and formant amplitude; the first processing unit 82 may be further specifically configured to jointly process all channel audio of the multi-channel audio to be processed, and adjust the amplitude of the fundamental frequency and smooth the amplitude of the formants.

In an alternative embodiment, as shown in fig. 9, if the multi-channel audio to be processed is online audio, the processing module further includes a second determining unit 91 and a second processing unit 92. Wherein the second determining unit 91 is configured to determine the second audio processing parameter based on the local audio characteristic. A second processing unit 92, configured to process the multi-channel audio to be processed based on the second audio processing parameter.

For the description of the present embodiment, reference may be made to the corresponding description in the foregoing embodiments, which is not repeated herein.

Based on the same technical concept as the above-mentioned multi-channel audio processing method embodiment, the embodiment of the present invention further provides a multi-channel audio processing hardware apparatus. Fig. 10 shows a schematic structural diagram of a multi-channel audio processing hardware apparatus according to an embodiment of the present disclosure. As shown in fig. 10, the multi-channel audio processing hardware apparatus 10 includes a memory 101 and a processor 102. Wherein the memory 101 is for storing non-transitory computer readable instructions; the processor 102 is configured to execute the computer-readable instructions, so that the processor implements the above-mentioned multi-channel audio processing method embodiments when executed.

The memory 101 is used to store, among other things, non-transitory computer readable instructions. In particular, memory 101 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the multi-channel audio processing hardware device 10 to perform desired functions. In an embodiment of the present disclosure, the processor 102 is configured to execute the computer readable instructions stored in the memory 101, so that the multi-channel audio processing hardware device 10 performs all or part of the aforementioned steps of the multi-channel audio processing method according to the embodiments of the present disclosure.

Those skilled in the art should understand that, in order to solve the technical problem of how to obtain a good user experience, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures should also be included in the protection scope of the present invention.

For the detailed description of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.

Based on the same technical concept as the above-described multi-channel audio processing method embodiment, an embodiment of the present invention also provides a computer-readable storage medium. As shown in fig. 11, the computer-readable storage medium 11 is used for storing non-transitory computer-readable instructions 111, which when executed by a computer, cause the computer to perform the steps described in the above-mentioned multi-channel audio processing method embodiment.

The computer-readable storage medium 11 includes, but is not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).

Based on the same technical concept as the above-mentioned multi-channel audio processing method embodiment, the embodiment of the present invention further provides a multi-channel audio processing terminal. Fig. 12 exemplarily shows a structural diagram of a multi-channel audio processing terminal. As shown in fig. 12, the multi-channel audio processing terminal 12 includes the above-described multi-channel audio processing apparatus 121.

The above-described terminal 12 may be implemented in various forms, and the terminal in the present disclosure may include, but is not limited to, mobile terminal devices such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, a vehicle-mounted terminal device, a vehicle-mounted display terminal, a vehicle-mounted electronic rear view mirror, and the like, and fixed terminal devices such as a digital TV, a desktop computer, and the like.

As an equivalent alternative, the multi-channel audio processing terminal may also comprise other components. As shown in fig. 13, the multi-channel audio processing terminal 13 may include a power supply unit 131, a wireless communication unit 132, an a/V (audio/video) input unit 133, a user input unit 134, a sensing unit 135, an interface unit 136, a controller 137, an output unit 138, a memory 139, and the like. Fig. 13 shows a terminal having various components, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components can be implemented instead.

The wireless communication unit 132 allows, among other things, radio communication between the terminal 13 and a wireless communication system or network. The a/V input unit 133 serves to receive an audio or video signal. The user input unit 134 may generate key input data to control various operations of the terminal device according to a command input by a user. The sensing unit 135 detects a current state of the terminal 13, a position of the terminal 13, presence or absence of a touch input of the user to the terminal 13, an orientation of the terminal 13, acceleration or deceleration movement and direction of the terminal 13, and the like, and generates a command or signal for controlling an operation of the terminal 13. The interface unit 136 serves as an interface through which at least one external device is connected to the terminal 13. The output unit 138 is configured to provide output signals in a visual, audio, and/or tactile manner. The memory 139 may store software programs or the like for processing and controlling operations performed by the controller 137, or may temporarily store data that has been output or is to be output. The memory 139 may include at least one type of storage medium. Also, the terminal 13 may cooperate with a network storage device that performs a storage function of the memory 139 through a network connection. The controller 137 generally controls the overall operation of the terminal device. In addition, the controller 137 may include a multimedia module for reproducing or playing back multimedia data. The controller 137 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image. The power supply unit 131 receives external power or internal power and supplies appropriate power required to operate the respective elements and components under the control of the controller 137.

Various embodiments of the multi-channel audio processing method presented in this disclosure may be implemented using a computer-readable medium, such as computer software, hardware, or any combination thereof. For a hardware implementation, various embodiments of the method for matching video features presented in the present disclosure may be implemented by using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, various embodiments of the method for multi-channel audio processing presented in the present disclosure may be implemented in the controller 137. For software implementation, various embodiments of the video feature comparison method presented in the present disclosure may be implemented with a separate software module that allows at least one function or operation to be performed. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory 138 and executed by controller 137.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

Also, as used herein, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that, for example, a list of "A, B or at least one of C" means A or B or C, or AB or AC or BC, or ABC (i.e., A and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.

It is also noted that in the systems and methods of the present disclosure, components or steps may be decomposed and/or re-combined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

Various changes, substitutions and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the claims of the present disclosure is not limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A multi-channel audio processing method, comprising:

receiving multi-channel audio to be processed;

processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel;

the step of detecting the audio characteristics of each channel audio in the multi-channel audio to be processed comprises:

if the multi-channel audio to be processed is online audio, detecting local audio characteristics of each channel audio in the multi-channel audio to be processed;

if the multi-channel audio to be processed is off-line audio, the step of processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel comprises:

processing the multi-channel audio to be processed based on the first audio processing parameter;

if the overall audio characteristic comprises a pitch characteristic, an acoustic formant characteristic, and a transient acoustic impulse characteristic;

2. The method of claim 1, wherein if the overall audio characteristic comprises a pitch characteristic and a sound formant characteristic;

3. The method of claim 1, wherein if the overall audio characteristic comprises a pitch characteristic and a transient sound pulse characteristic;

4. The method of claim 1, wherein if the overall audio characteristic comprises a pitch characteristic, a sound formant characteristic, a transient sound pulse characteristic, and an audio phase characteristic;

5. The method of claim 1, wherein if the overall audio characteristics comprise multi-channel audio downmix characteristics and main side channel characteristics;

6. The method according to claim 1, wherein if the multi-channel audio to be processed is online audio, the step of processing the multi-channel audio to be processed according to the audio characteristics of the channels of audio specifically comprises:

7. A multi-channel audio processing apparatus, comprising:

the receiving module is used for receiving multi-channel audio to be processed;

the processing module is used for processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel;

the detection module comprises:

the second detection unit is used for detecting the local audio characteristics of each channel audio in the multi-channel audio to be processed under the condition that the multi-channel audio to be processed is online audio;

if the multichannel audio to be processed is offline audio, the processing module comprises:

the first processing unit is used for processing the multi-channel audio to be processed based on the first audio processing parameter;

8. The apparatus of claim 7, wherein if the overall audio characteristic comprises a pitch characteristic and a sound formant characteristic;

9. The apparatus of claim 7, wherein if the overall audio characteristic comprises a pitch characteristic and a transient sound pulse characteristic;

10. The apparatus of claim 7, wherein if the overall audio characteristic comprises a pitch characteristic, a sound formant characteristic, a transient sound pulse characteristic, and an audio phase characteristic;

11. The apparatus of claim 7, wherein if the overall audio characteristics comprise multi-channel audio downmix characteristics and main side channel characteristics;

12. The apparatus of claim 7, wherein if the multi-channel audio to be processed is online audio, the processing module further comprises:

13. A multi-channel audio processing hardware apparatus, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the processor when executing implements a multi-channel audio processing method according to any of claims 1 to 6.

14. A computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, cause the computer to perform the multi-channel audio processing method of any of claims 1 to 6.

15. A multi-channel audio processing terminal comprising the multi-channel audio processing apparatus of any of claims 7 to 12.