CN118301535A

CN118301535A - Audio upmixing algorithm processing method and device, electronic equipment and storage medium

Info

Publication number: CN118301535A
Application number: CN202410246957.8A
Authority: CN
Inventors: 焦其金; 张洋
Original assignee: Shenzhen Ruili Intelligent Innovation Technology Co ltd
Current assignee: Shenzhen Ruili Intelligent Innovation Technology Co ltd
Priority date: 2024-03-05
Filing date: 2024-03-05
Publication date: 2024-07-05

Abstract

The application is suitable for the technical field of audio processing, and provides an audio upmixing algorithm processing method, an audio upmixing algorithm processing device, electronic equipment and a storage medium. The audio upmixing algorithm processing method comprises the following steps: acquiring a first audio signal to be processed, wherein the first audio signal comprises at least two channels; performing time domain processing on the first audio signal to obtain a second audio signal; and mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels. The embodiment of the application can improve the quality of the audio signal.

Description

Audio upmixing algorithm processing method and device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of audio processing, and particularly relates to an audio upmixing algorithm processing method, an audio upmixing algorithm processing device, electronic equipment and a storage medium.

Background

The rise in digital audio technology has made the processing and transmission of audio signals more flexible and efficient. With the rapid development of digital audio technology, the requirements of people on audio quality and hearing experience are continuously improved, and especially in the fields of high-end sound systems, music production, film and television production and the like, users expect clearer and vivid hearing feeling. Therefore, there is a need for an audio processing technique that improves the quality of the audio signal to better meet the hearing needs of the user.

Disclosure of Invention

The embodiment of the application provides an audio upmixing algorithm processing method, an audio upmixing algorithm processing device, electronic equipment and a storage medium, which can improve the quality of audio signals and meet the hearing demands of users.

A first aspect of an embodiment of the present application provides a method for processing an audio upmix algorithm, including: acquiring a first audio signal to be processed, wherein at least two channels of channels in the first audio signal; performing time domain processing on the first audio signal to obtain a second audio signal; and mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels.

In some implementations of the first aspect, the performing time-domain processing on the first audio signal to obtain a second audio signal includes: acquiring scene information, and determining a target time domain processing algorithm from a plurality of preset time domain processing algorithms according to the scene information, wherein the plurality of preset time domain processing algorithms at least comprise a time domain transformation algorithm, a time domain filtering algorithm, a time domain compression algorithm, a time domain modification effect algorithm and a time domain synchronization algorithm; and performing time domain processing on the first audio signal according to the target time domain processing algorithm to obtain the second audio signal.

In some implementations of the first aspect, the mixing the second audio signal according to a preset mixing rule to obtain a third audio signal includes: the phase of the second audio signals is adjusted, so that the phases of the second audio signals are consistent; and mixing the second audio signal after phase adjustment according to the preset mixing rule to obtain the third audio signal.

In some implementations of the first aspect, the mixing the second audio signal according to a preset mixing rule to obtain a third audio signal includes: and respectively mixing the second audio signals on a plurality of channels according to the preset mixing rule to obtain the third audio signals.

In some implementations of the first aspect, before the time-domain processing of the first audio signal, the method further includes: detecting sound characteristic information and environment characteristic information of the first audio signal; and adjusting the dynamic range and the volume of the first audio signal according to the sound characteristic information and the environment characteristic information.

In some implementations of the first aspect, before the time-domain processing of the first audio signal, the method further includes: and detecting noise in the first audio signal, and denoising the first audio signal.

In some implementations of the first aspect, after the obtaining the third audio signal, the method further includes: acquiring feedback adjustment data, wherein the feedback adjustment data comprises user behavior data, environment data and characteristic data of the third audio signal; and adjusting the preset mixing rule according to the feedback adjustment data, and mixing the second audio signal according to the adjusted mixing rule to obtain the third audio signal.

A second aspect of the present application provides an audio upmixing algorithm processing apparatus, including: an acquisition unit, configured to acquire a first audio signal to be processed, where the first audio signal includes at least two channels; the time domain processing unit is used for performing time domain processing on the first audio signal to obtain a second audio signal; and the mixing unit is used for mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels.

A third aspect of the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above-mentioned audio upmixing algorithm processing method when executing the computer program.

A fourth aspect of the embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described audio upmix algorithm processing method.

A fifth aspect of the embodiments of the present application provides a computer program product for causing an electronic device to execute the steps of the audio upmixing algorithm processing method described above when the computer program product is run on the electronic device.

In the embodiment of the application, the first audio signal containing at least two channels to be processed is obtained, the first audio signal is subjected to time domain processing to obtain the second audio signal, then the second audio signal is mixed according to the preset mixing rule to obtain the third audio signal, and the audio upmixing algorithm processing can be realized, so that the number of the channels containing the channels of the third audio signal is larger than that of the channels containing the channels of the first audio signal, further more channel sound output is provided, and then the third audio signal can represent the mixing result of the audio signals of a plurality of audio sources, the quality of the audio signals is improved, and more excellent hearing experience can be provided for users.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic implementation flow diagram of a processing method of an audio upmixing algorithm according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an audio upmixing algorithm processing device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be protected by the present application based on the embodiments of the present application.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the description of the present specification and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Fig. 1 is a schematic implementation flow diagram of an audio upmixing algorithm processing method according to an embodiment of the present application, where the method may be applied to an electronic device. The electronic device may be an intelligent device such as a computer or a smart phone, and in some scenarios, the electronic device may also be a device for audio processing or audio output, which is not limited to this application.

Specifically, the above audio upmixing algorithm processing method may include the following steps S101 to S104.

Step S101, a first audio signal to be processed is acquired.

In an embodiment of the application, the first audio signal may be audio content from different sources of music, speech, film and the like. It will be appreciated that the first audio signal may be collected by a sound sensor or may be input by a user or downloaded by authorization, and the application is not limited in this respect. In an embodiment of the present application, the first audio signal may comprise at least two channels, for example, the first audio signal may refer to an audio signal comprising left and right channels.

Step S102, performing time domain processing on the first audio signal to obtain a second audio signal.

In the embodiment of the application, the time domain processing refers to processing the first audio signal in the time domain, and when different audio sources are contained in the first audio signal, the time coordination of the different sources can be ensured to be consistent, so as to obtain the second audio signal.

Step S103, mixing the second audio signals according to a preset mixing rule to obtain third audio signals.

In the embodiment of the present application, after the time domain processing is completed, signals of different sources included in the second audio signal may be mixed according to a preset mixing rule. The mixing rule determines how the different audio signals combine to produce the final third audio signal. The third audio signal may be output via an output device such as a sound.

Also, the number of channels of the third audio signal containing channels may be greater than the number of channels of the first audio signal containing channels. Typically, the first audio signal may be a conventional 2.0 stereo signal, which comprises channels of left and right. By mixing, the third audio signal may be made available for further channels, for example, the third audio signal may be a 3.0、3.1、3.1.2、5.0、5.1、5.1.2、5.1.4、7.0、7.1、7.1.2、7.1.4、7.1.6、9.0、9.1、9.1.2、9.1.4、9.1.6、9.2.2、9.2.4、9.2.6、9.4.4、9.4.6 -channel signal. Of course, the third audio signal may be other signals than the above examples, and the present application is not limited to the specific number of channels of the third audio signal (i.e., the type of the third audio signal).

The meaning and application of these channel formats is explained in detail below: in the channel signal of the "a.b.c" channel, a represents the number of channels of the main speaker, B represents the number of subwoofer channels, and C represents the number of top speaker channels, or the number of channels called sky channels. Specifically, the channel signal of 3.0 channels includes a front left channel, a front right channel, and a center channel, and is suitable for some basic stereo audio scenes, such as music appreciation, voice playing, and the like. The 3.1 channel signal is a bass channel added on the basis of 3.0 channels, and the bass channel is used for enhancing the low-frequency effect and providing richer audio experience. The channel signal of 3.1.2 channels is that two top sound box channels are added on the basis of 3.1 channels. The channel signal of 5.0 channels includes a front left channel, a front right channel, a center channel, a rear left channel, and a rear right channel. The channel signals of the 5.1 channels comprise a front left channel, a front right channel, a center channel, a rear left channel, a rear right channel and a bass gun channel, are common home cinema audio formats for providing immersive surround sound effects, and are suitable for movie viewing and the like. The channel signal of 5.1.2 channels is to add two top sound box channels on the basis of 5.1 channels. And 5.1.4 channels of channel signals are added on the basis of 5.1 channels, so that four top sound box channels are added for realizing more stereo and immersive audio experience, and the method is suitable for equipment supporting audio height information. The channel signal of 7.0 channels is added with two side back channels on the basis of 5.0 channels. The 7.1 channel signal is added with two side back channels on the basis of 5.1 channels, so that the surround sound effect is further enhanced, and a more real and dynamic audio experience is provided. The channel signal of 7.1.2 channels is to add two top sound box channels on the basis of 7.1 channels. And the channel signal of 7.1.4 channels is added with four top sound box channels on the basis of 7.1 channels, so that the stereo effect and the audio surround feeling are further improved. The 7.1.6-channel signal is that six top sound box channels are added on the basis of 7.1 channels, so that the stereo effect and the audio surround feeling are further improved. The 9.0 channel signal contains nine front channels, and two side surround output channels are added on the basis of 7.0 channels. Channel signals of 9.1 channels, including nine front channels and a subwoofer, are commonly used in home cinema surround sound systems. The channel signals of 9.1.2, 9.1.4 and 9.1.6 channels are respectively added with two, four or six top sound box channels on the basis of 9.1 channels. The channel signals of 9.2.2, 9.2.4 and 9.2.6 channels are added with a low-sound gun channel based on 9.1.2, 9.1.4 and 9.1.6 channels. The channel signals 9.4.4 and 9.4.6 comprise nine front channels, four subwoofer channels or six subwoofer channels and six top speaker channels for realizing extremely realistic and immersive audio experience, and are suitable for the fields of professional audio production and the like.

Therefore, in practical application, the number of channels can be designed according to the requirement, and the third audio signal with the corresponding format can be output. Thus, the audio signal is upmixed to more channels, and the third audio signal can provide richer, real and immersive audio experience due to the fact that the number of channels is higher than that of the first audio signal, so that the requirements of different users and different scenes are met.

In the embodiment of the application, the first audio signal containing at least two channels to be processed is obtained, the first audio signal is subjected to time domain processing to obtain the second audio signal, then the second audio signal is mixed according to the preset mixing rule to obtain the third audio signal, and the audio upmixing algorithm processing can be realized, so that the number of channels containing channels of the third audio signal is larger than that of channels containing channels of the first audio signal, thereby providing more channel sound output, and then the third audio signal can represent the mixing result of the audio signals of a plurality of audio sources, thereby improving the quality of the audio signals and providing more excellent hearing experience for users.

In some embodiments of the present application, the first audio signal may be preprocessed after the first audio signal is acquired. Subsequently, step S102 and step S103 may be performed on the preprocessed audio signal.

In particular, in some embodiments of the application, the preprocessing may include sample rate enhancement processing. The processing of increasing the sampling rate refers to processing the first audio signal through interpolation and filtering technology, so as to increase the sampling rate of the audio signal, and further increase the detail and the precision of the first audio signal.

In other embodiments of the application, the preprocessing may include frequency domain processing and audio signal spectral reconstruction. The frequency domain processing and the audio signal spectrum reconstruction are to perform frequency domain processing on the first audio signal, and adopt fourier transform and other technologies to enhance the spectrum information of the first audio signal, thereby enhancing the resolution and definition of the first audio signal.

In other embodiments of the present application, the preprocessing may further include a dynamic range and volume adjustment process. Specifically, in some embodiments of the present application, the electronic device may detect sound characteristic information and environment characteristic information of the first audio signal before performing time domain processing on the first audio signal, and then adjust the dynamic range and volume of the first audio signal according to the sound characteristic information and the environment characteristic information.

In the embodiment of the application, the acoustic properties of the audio signals and the conditions of the surrounding environment can be known by introducing the elements of the acoustic model and the environment analysis, so that the electronic equipment can more intelligently adjust the first audio signals, the output audio signals are more in line with the perception of human ears, and the adaptability of the audio signals in various environments is improved.

More specifically, one or more of the following sound characteristic information may be obtained by acoustic model analysis: 1. frequency response characteristics: for reflecting the relative intensities and distribution of the different frequency components in the signal. 2. Audio frequency spectrum: including the energy distribution of the spectrum, the number and characteristics of the frequency components, etc. 3. Time domain characteristics: such as amplitude, waveform, duration, etc., for reflecting the time distribution law of the audio signal. 4. Dynamic range: i.e. the range of variation of the signal in amplitude. 5. Distortion characteristics: the type and extent of distortion present in the signal, such as harmonic distortion, intermodulation distortion, etc., is used to reflect the distortion repair process that needs to be performed.

One or more of the following environmental characteristic information may be obtained by environmental analysis: 1. noise environment: various types of noise may be present in the environment in which the audio signal is located, such as background noise, electromagnetic interference, and the like. 2. Reverberation characteristics: including reverberation time, reverberation strength, reverberation tail, etc., for reflecting the degree of influence of the reverberation to which the signal is subjected. 3. Room model: such as room size, shape, reflective characteristics, etc. of the room in which the audio is recorded, for reflecting the influence of the audio room on the audio signal. 4. Ambient noise spectrum: for reflecting the noise energy distribution in different frequency ranges. 5. Environmental change: for reflecting the influence of variations in factors such as temperature, humidity, air flow, etc. on sound propagation and signal characteristics.

According to the sound characteristic information and the environment characteristic information, the dynamic range and the volume of the first audio signal can be adjusted.

In some embodiments of the present application, the dynamic range of the audio signal may be adjusted by introducing dynamic compression techniques to ensure that the audio output remains consistent in different scenarios. The dynamic range, i.e., the dynamic intensity range of a signal, refers to the range of amplitude variations in an audio signal, typically expressed in decibels (dB). A larger dynamic range means that the audio signal contains more volume changes and thus has a richer audio rendering.

The dynamic range needs to be adjusted for different environmental characteristics depending on the noise level of the environment, the degree of reverberation and the desired audio effect. For example: for high noise environments, the dynamic range may be extended to ensure that the signal is clearly audible in the noise, i.e., the dynamic range of the audio signal is increased so that the peak portion of the signal is more prominent to overcome the interference of the ambient noise. For low noise environments, the dynamic range may be reduced to avoid excessive signal salience, resulting in audible discomfort in a quiet environment. For high reverberation environments, the dynamic range may be increased to ensure that weaker portions of the signal are clearly discernible in the reverberation, thereby reducing the shadowing effect of the reverberation on the signal. For low reverberation environments, the dynamic range may be reduced so that the signal's intensity changes more gradually and the audio sounds softer to avoid the signal appearing too hard or unnatural in the low reverberation environment.

Meanwhile, the volume adjustment can be performed, so that the audio can keep clear and balanced performance at various volume levels. For example, the volume may be adjusted in one or more of the following ways: 1. gain adjustment: the volume level is adjusted by increasing or decreasing the gain of the overall audio signal, wherein increasing the gain increases the overall volume of the audio and decreasing the gain decreases the overall volume of the audio. 2. Compression adjustment: the dynamic range of the audio signal is adjusted using compression techniques so that the stronger amplitude portions become softer, thereby reducing the difference in audio at different tone levels. 3. And (3) balance adjustment: the volume of different frequency ranges is adjusted through the equalizer, so that sounds of different frequencies are balanced at various volume levels, and bass, midrange and treble can be clearly heard in the adjusted audio. 4. And (3) amplitude limiting adjustment: a limiter is used to limit the maximum amplitude of the audio signal, preventing distortion or overcompression of the signal at excessive volumes, thereby maintaining the sharpness and balance of the audio.

In other embodiments of the present application, the preprocessing may further include noise and distortion removal processing. Specifically, before performing time domain processing on the first audio signal, the method further includes: noise in the first audio signal is detected, the first audio signal is subjected to denoising treatment, the purity of the audio is improved, and the influence of distortion on auditory perception is reduced.

Next, in step S102, a time domain process may be performed on the preprocessed first audio signal.

Specifically, the time domain processing may be implemented using a preset time domain processing algorithm. The preset time domain processing algorithm may include one or more of a time domain transformation algorithm, a time domain filtering algorithm, a time domain compression algorithm, a time domain modification effect algorithm, and a time synchronization algorithm.

Among them, the time domain transform algorithm refers to transforming an audio signal from the time domain to the frequency domain using a time domain transform technique such as a fast fourier transform (Fast Fourier Transform, FFT) or a wavelet transform, which facilitates more flexible and accurate processing of the signal in the frequency domain. The time domain filtering algorithm refers to applying a time domain filter to adjust the frequency response of the audio signal. Time domain filtering algorithms can be used to emphasize or deemphasize certain frequency components, thereby affecting the audio characteristics after mixing. Time domain compression algorithms refer to time domain compression techniques in which compression processing (e.g., using a compressor or limiter) is performed in the time domain to adjust the dynamic range of the audio signal, thereby balancing the audio strengths of the different sources and avoiding that some sources are too significant or submerged when mixed. The time domain modification effect algorithm refers to using time domain modification effects, such as chorus, tremolo and the like, to enrich audio mixing effects, and the effects can adjust the duration, delay and amplitude of audio in the time domain, so as to increase the expressive force of the audio. The time domain synchronization algorithm considers the time domain synchronization between different audio sources, ensures that the time domain synchronization is kept when the audio sources are mixed, and avoids poor mixing effect caused by time offset between the different sources.

In some embodiments of the present application, the plurality of preset time domain processing algorithms may include at least a time domain transformation algorithm, a time domain filtering algorithm, a time domain compression algorithm, a time domain modification effect algorithm, and a time domain synchronization algorithm. At this time, the electronic device may acquire the scene information, determine a target time-domain processing algorithm from a plurality of preset time-domain processing algorithms according to the scene information, and then perform time-domain processing on the first audio signal according to the target time-domain processing algorithm to obtain the second audio signal.

For example, if the scene information indicates that a chorus effect needs to be added to the audio mix, then a time domain modification effect algorithm needs to be used; if the scene information indicates that the influence of a specific frequency component needs to be reduced, a time domain filtering algorithm needs to be used. For another example, if the scene information indicates that the computing resource is reduced, the first N time domain processing algorithms may be selected from the above multiple time domain processing algorithms according to a preset priority order, where N is greater than or equal to 1.

In other words, all or part of the time domain processing techniques may be selected to be performed to achieve the best audio mixing effect, depending on the specific scenario.

After the time-domain processing is completed, in step S103, the second audio signal may be mixed according to a preset mixing rule, so as to obtain a third audio signal.

The preset mixing rule may be weighted mixing, dynamic mixing, additive mixing, multiplicative mixing, etc.

Weighted mixing is a method of achieving mixing by assigning weights to the signals of the different audio sources contained within the second audio signal and adding them. These weights may be determined based on the importance of the signals of the different audio sources or the volume level that needs to be adjusted. By adjusting the weights, the relative contributions of the signals of the different audio sources in the mixing can be controlled.

Dynamic mixing involves dynamically adjusting mixing parameters, such as mixing ratios or mixing rules, during the mixing process. This may be done according to the characteristics of the audio signal or the needs of the user to accommodate different scenes or changing environmental conditions. Dynamic mixing can be adjusted based on real-time feedback or an automatic control system to optimize the mixing effect.

Specifically, in some embodiments of the present application, the phase of the second audio signal may be adjusted so that phases of signals of different audio sources included in the second audio signal are consistent, and then the phase-adjusted second audio signal is mixed according to a preset mixing rule to obtain a third audio signal. In this way, the waveform superposition problem caused by the phase difference can be avoided.

In other embodiments of the present application, the second audio signal may be mixed on a plurality of channels according to a preset mixing rule, respectively, to obtain the third audio signal. By comprehensively considering and processing the interrelationships of the multiple channels, the high-quality effect of the multi-channel audio mixing can be realized, so that the mixed audio presents the effects of uniform distribution and natural stereo in the whole sound field. Wherein each channel may correspond to a channel, respectively.

In particular, mixing over multiple channels may be achieved by one or more of the following: 1. channel balance: the audio signal of each channel is ensured to be balanced at a level of Cheng Zhongyin mixed with spectral characteristics, i.e. the audio signals of different channels should have similar volume levels and spectral characteristics to avoid that some channels are too pronounced or submerged. 2. Frequency allocation: the frequency ranges of different channels can be reasonably distributed in a frequency equalization or frequency banding mode, and poor mixing effect caused by frequency conflict or frequency overlapping is avoided. 3. Time domain synchronization: by means of delay adjustment and the like, the audio signals of different channels are ensured to be kept synchronous in the time domain, and poor mixing effect caused by time offset is avoided. 4. Spatial positioning: for stereo or multi-channel audio mixing, the localization and distribution of the audio in space is considered to ensure that the mixed audio is evenly distributed throughout the sound field, making the auditory effect more natural and stereo. 5. Dynamic balance: the dynamic range and the volume variation conditions of different channels are considered to ensure the consistency and balance of the dynamic range of the mixed audio under different audio scenes.

In addition, for the third audio signal, feedback adjustment may also be performed.

Specifically, in some embodiments of the present application, after the third audio signal is obtained, feedback adjustment data may be further obtained, and according to the feedback adjustment data, a preset mixing rule is adjusted, and the second audio signal is mixed according to the adjusted mixing rule, so as to obtain the third audio signal.

The feedback adjustment data comprises user behavior data, environment data and characteristic data of the third audio signal. Specifically, the electronic device may collect behavior data of the user, such as play preferences of the user, volume adjustment behavior, and the like, and then automatically adjust the mixing parameters according to the user behavior data to satisfy the preferences and demands of the user. The electronic device can also monitor changes of environmental conditions, such as noise level, echo condition and the like, and can automatically adjust mixing parameters based on the changes of the environmental conditions so as to adapt to different environments and ensure the stability and consistency of the mixing effect. The electronic device can evaluate the mixing result according to predefined audio quality indexes, wherein the indexes can comprise signal-to-noise ratio, distortion degree, spectrum balance and the like, and further the electronic device can automatically adjust mixing parameters according to the change condition of the indexes so as to improve the overall quality of the audio, can analyze the characteristics of the mixed audio, such as spectrum distribution, dynamic range and the like, and automatically adjust the mixing parameters according to the change of the characteristics so as to optimize the hearing effect of the audio.

Meanwhile, for finally generating the third audio signal, the electronic equipment can also improve the resolution and quality of the third audio signal, and ensure that a user can feel a more real, clear and dynamic audio effect.

According to the embodiment of the application, through preprocessing operations such as sampling rate improvement, frequency domain processing and the like, high-resolution restoration of the audio signal can be realized, and more real and fine audio effects can be presented. The user can feel more audio details, and the fidelity of the hearing perception is improved. Meanwhile, based on the application of an acoustic model and the adjustment of dynamic compression and volume, the audio processing can be more personalized, different listening scenes can be adapted, the intelligent level of the system is improved, the consistency of the dynamic range of the audio under different scenes is ensured, and the distortion or the excessive compression of the audio output under different volume levels is prevented. Through the processing of the mixing algorithm, the time consistency of the audio signals can be ensured, and the interaction between the audio sources is fully considered, so that the mixed sound is more natural and smooth. Meanwhile, a feedback mechanism is added, so that parameters of an algorithm can be adjusted in real time to adapt to the changes of different audio contents and environments.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders in accordance with the application.

Fig. 2 is a schematic structural diagram of an audio upmix algorithm processing device 200 according to an embodiment of the present application, where the audio upmix algorithm processing device 200 is configured on an electronic device.

Specifically, the audio upmixing algorithm processing apparatus 200 may include:

An acquisition unit 201, configured to acquire a first audio signal to be processed, where the first audio signal includes at least two channels;

a time domain processing unit 202, configured to perform time domain processing on the first audio signal to obtain a second audio signal;

And a mixing unit 203, configured to mix the second audio signal according to a preset mixing rule, so as to obtain a third audio signal, where the number of channels of the third audio signal including channels is greater than the number of channels of the first audio signal including channels.

In some embodiments of the present application, the time domain processing unit 202 may be specifically configured to: acquiring scene information, and determining a target time domain processing algorithm from a plurality of preset time domain processing algorithms according to the scene information, wherein the plurality of preset time domain processing algorithms at least comprise a time domain transformation algorithm, a time domain filtering algorithm, a time domain compression algorithm, a time domain modification effect algorithm and a time domain synchronization algorithm; and performing time domain processing on the first audio signal according to the target time domain processing algorithm to obtain the second audio signal.

In some embodiments of the application, the mixing unit 203 may be specifically configured to: the phase of the second audio signals is adjusted, so that the phases of the second audio signals are consistent; and mixing the second audio signal after phase adjustment according to the preset mixing rule to obtain the third audio signal.

In some embodiments of the application, the mixing unit 203 may be specifically configured to: and respectively mixing the second audio signals on a plurality of channels according to the preset mixing rule to obtain the third audio signals.

In some embodiments of the present application, the audio upmixing algorithm processing apparatus 200 may further include a preprocessing unit, specifically configured to: detecting sound characteristic information and environment characteristic information of the first audio signal; and adjusting the dynamic range and the volume of the first audio signal according to the sound characteristic information and the environment characteristic information.

In some embodiments of the application, the preprocessing unit may be specifically configured to: and detecting noise in the first audio signal, and denoising the first audio signal.

In some embodiments of the present application, the audio upmixing algorithm processing apparatus 200 may further include a feedback adjustment unit, specifically configured to: acquiring feedback adjustment data, wherein the feedback adjustment data comprises user behavior data, environment data and characteristic data of the third audio signal; and adjusting the preset mixing rule according to the feedback adjustment data, and mixing the second audio signal according to the adjusted mixing rule to obtain the third audio signal.

It should be noted that, for convenience and brevity of description, the specific working process of the audio upmix algorithm processing apparatus 200 may refer to the corresponding process of the method described in fig. 1, and will not be described herein again.

Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present application. Specifically, the electronic device 3 may include: a processor 30, a memory 31 and a computer program 32, such as an audio upmix algorithm processing program, stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, implements the steps of the above-described embodiments of the audio upmix algorithm processing method, such as steps S101 to S103 shown in fig. 1. Or the processor 30, when executing the computer program 32, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the acquisition unit 201, the time domain processing unit 202, and the mixing unit 203 shown in fig. 2.

The computer program may be divided into one or more modules/units which are stored in the memory 31 and executed by the processor 30 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the electronic device.

For example, the computer program may be split into: an acquisition unit, a time domain processing unit and a mixing unit. The specific functions of each unit are as follows: an acquisition unit configured to acquire a first audio signal to be processed, the first audio signal including at least two channels; the time domain processing unit is used for performing time domain processing on the first audio signal to obtain a second audio signal; and the mixing unit is used for mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels.

The electronic device may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of an electronic device and is not meant to be limiting, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.

The Processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory 31 may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device. The memory 31 is used for storing the computer program and other programs and data required by the electronic device. The memory 31 may also be used for temporarily storing data that has been output or is to be output.

It should be noted that, for convenience and brevity of description, the structure of the electronic device may refer to a specific description of the structure in the method embodiment, which is not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for processing an audio upmix algorithm, comprising:

Acquiring a first audio signal to be processed, wherein the first audio signal comprises at least two channels;

Performing time domain processing on the first audio signal to obtain a second audio signal;

and mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels.

2. The method of processing an audio upmix algorithm of claim 1, wherein the performing time-domain processing on the first audio signal to obtain a second audio signal comprises:

Acquiring scene information, and determining a target time domain processing algorithm from a plurality of preset time domain processing algorithms according to the scene information, wherein the plurality of preset time domain processing algorithms at least comprise a time domain transformation algorithm, a time domain filtering algorithm, a time domain compression algorithm, a time domain modification effect algorithm and a time domain synchronization algorithm;

and performing time domain processing on the first audio signal according to the target time domain processing algorithm to obtain the second audio signal.

3. The method for processing the audio upmixing algorithm as claimed in claim 1, wherein said mixing the second audio signal according to a preset mixing rule to obtain a third audio signal comprises:

The phase of the second audio signals is adjusted, so that the phases of the second audio signals are consistent;

and mixing the second audio signal after phase adjustment according to the preset mixing rule to obtain the third audio signal.

4. The method for processing the audio upmixing algorithm as claimed in claim 1, wherein said mixing the second audio signal according to a preset mixing rule to obtain a third audio signal comprises:

And respectively mixing the second audio signals on a plurality of channels according to the preset mixing rule to obtain the third audio signals.

5. The audio upmixing algorithm processing method of claim 1, further comprising, prior to said time domain processing of said first audio signal:

Detecting sound characteristic information and environment characteristic information of the first audio signal;

And adjusting the dynamic range and the volume of the first audio signal according to the sound characteristic information and the environment characteristic information.

6. The audio upmixing algorithm processing method of claim 1, further comprising, prior to said time domain processing of said first audio signal:

and detecting noise in the first audio signal, and denoising the first audio signal.

7. The audio upmixing algorithm processing method according to any one of claims 1 to 6, further comprising, after the obtaining of the third audio signal:

Acquiring feedback adjustment data, wherein the feedback adjustment data comprises user behavior data, environment data and characteristic data of the third audio signal;

And adjusting the preset mixing rule according to the feedback adjustment data, and mixing the second audio signal according to the adjusted mixing rule to obtain the third audio signal.

8. An audio upmixing algorithm processing apparatus, comprising:

an acquisition unit, configured to acquire a first audio signal to be processed, where the first audio signal includes at least two channels;

The time domain processing unit is used for performing time domain processing on the first audio signal to obtain a second audio signal;

And the mixing unit is used for mixing the second audio signals according to a preset mixing rule to obtain third audio signals, wherein the number of channels of the third audio signals containing channels is larger than that of channels of the first audio signals containing channels.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the audio upmixing algorithm processing method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the audio upmixing algorithm processing method according to any of claims 1 to 7.