CN116193350A - Audio signal processing method, device, equipment and storage medium - Google Patents

Audio signal processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN116193350A
CN116193350A CN202111432455.7A CN202111432455A CN116193350A CN 116193350 A CN116193350 A CN 116193350A CN 202111432455 A CN202111432455 A CN 202111432455A CN 116193350 A CN116193350 A CN 116193350A
Authority
CN
China
Prior art keywords
audio
signal
component
output signal
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111432455.7A
Other languages
Chinese (zh)
Inventor
江建亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd, Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN202111432455.7A priority Critical patent/CN116193350A/en
Publication of CN116193350A publication Critical patent/CN116193350A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The method for processing the audio signal comprises the following steps: acquiring an audio signal to be processed, and performing component separation and component identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components; based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained, and binaural rendering is respectively carried out on the component audio components through a preset head behavior function, so that a first left signal and a first right signal corresponding to the component audio components are obtained; and respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to left playing equipment and right playing equipment, thereby improving the three-dimensional space effect generated by audio playing.

Description

Audio signal processing method, device, equipment and storage medium
Technical Field
The present invention relates to the technical field of audio devices, and for example, to a method, an apparatus, a device, and a storage medium for processing an audio signal.
Background
There are a number of single-channel, two-channel stereo and multi-channel surround sound songs and programs that, when played using headphones, mostly directly feed the left and right channels of the headphones. To enhance the spatial perception of a listener during headphones playback, binaural rendering of single-channel, two-channel stereo signals, or multi-channel surround sound signals may be performed using virtual hearing techniques.
In the prior art, a single loudspeaker, two loudspeakers or a plurality of loudspeaker channels of an actual playing scene are virtualized by adopting a virtual hearing technology to perform virtual playback of signals, and the method can enable a listener to generate a certain sense of space. However, since music signals contain various musical components such as musical instruments and human voices, virtual playback methods cannot well play back three-dimensional space sensations of the musical components such as various musical instruments in single-channel, two-channel stereo and multi-channel surround-sound music songs and programs.
Disclosure of Invention
The purpose of the present application is: provided are a method, apparatus, device, and storage medium for processing an audio signal, which can improve a three-dimensional spatial effect generated by audio playback.
In order to achieve the above purpose, the present application adopts the following technical scheme:
provided herein is a processing method of an audio signal, including:
acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components;
based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained;
respectively carrying out binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components;
and respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
The application also provides a processing device of the audio signal, comprising:
the component separation unit is used for acquiring an audio signal to be processed, and carrying out component separation on the audio signal through a preset separation algorithm to obtain a plurality of component audio components;
the information acquisition unit is used for acquiring rendering information of the component audio components based on a preset target scene;
the binaural rendering unit is used for respectively performing binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components;
and the signal superposition unit is used for respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
The application also provides an audio processing device comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the audio signal processing method.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of processing an audio signal as described in any of the above.
According to the processing method of the audio signal, the audio signal is divided into a plurality of component audio components by the audio signal component separation method, so that effective separation of the audio components is achieved, and the audio signals with different components are rendered independently; the rendering information corresponding to the target scene is generated, so that the separated component audio components are rendered based on the rendering information and the collected head behavior data, the signal rendering effect is improved, and a first left signal corresponding to a left auditory canal and a first right signal corresponding to a right auditory canal are obtained; and respectively superposing the first left signal component and the first right signal corresponding to each component audio component after independent rendering to obtain a corresponding left output signal and right output signal, thereby improving the rendering effect of the whole output signal, and forming a three-dimensional space playing effect with better space sense after the left output signal and the right output signal are respectively sent to left playing equipment and right playing equipment.
Drawings
Fig. 1 is a flow chart illustrating a processing method of an audio signal according to an embodiment;
fig. 2 is a schematic structural diagram of a processing device for audio signals according to an embodiment;
fig. 3 is a block diagram schematically illustrating the structure of an audio processing apparatus according to an embodiment.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, modules, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any module and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, a flow chart of a power supply control method of an audio processing device disclosed in the present embodiment is shown, where the method includes:
s1: acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components;
s2: based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained;
s3: respectively carrying out binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components;
s4, respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively sending the left output signal and the right output signal to a left playing device and a right playing device.
In one embodiment, the audio signal processing method disclosed in the present embodiment is applied to an audio device, where the audio device generally includes a left ear canal earphone and a right ear canal earphone, and the earphones may be separate or may be head-mounted, and in normal use of the audio device, a portion for wearing to a head of a user is generally provided with a three-degree-of-freedom or six-degree-of-freedom head motion tracker for acquiring a head pose, and specifically may be a gyroscope, an image detection unit, an electromagnetic tracking unit, or the like, so as to acquire head position information of the user in real time.
After the audio signal with the space sensation to be improved is obtained in the step S1, the audio component separation algorithm based on machine learning can be obtained through deep learning of the neural network model, the pure voice, the pure music and the voice and music aliasing signal are separated from the audio signal through the audio component separation algorithm, the characteristic parameters related to the pure voice and the pure music are obtained through the signal analysis and modeling technology, and then the component separation is performed on the audio signal through the characteristic parameters.
As described in step S2, the target scene corresponding to the audio signal is obtained, and the target scene may be generated according to a certain rule, for example, a concert hall may be used as the target scene, and metadata of music components of the corresponding musical instrument are generated according to spatial arrangement of various musical instruments in the stage symphony at this time, and sound field description parameters for rendering each component in the audio signal are generated according to the metadata.
As described in the above step S3, the head behavior function may be BRIR (Binaural Room Impulse Reesponse, binaural room impulse response), which is obtained by determining a room impulse response reaching a user position according to rendering information including physical and geometric characteristics in a target scene, selecting a corresponding BRIR and each component audio component according to an reaching spatial orientation of each component audio component with respect to a center of a listener head, and performing frequency domain multiplication or time domain convolution, respectively, to obtain rendered binaural sound signals, i.e. a first left signal and a first right signal.
As described in step S4 above, after a first left signal and a first right signal are obtained for each component audio component, the first left signal and the first right signal are respectively superimposed according to the left ear and the right ear to obtain final binaural output signals, and the final binaural output signals are directly fed to a left playing device and a right playing device for playing, where the left playing device and the right playing device may be a left ear canal earphone and a right ear canal earphone.
In summary, the audio signal is divided into a plurality of component audio components by the audio signal component separation method, so that effective separation of the audio components is achieved, and the audio signals of different components are rendered independently; the rendering information corresponding to the target scene is generated, so that the separated component audio components are rendered based on the rendering information and the collected head behavior data, the signal rendering effect is improved, and a first left signal corresponding to a left auditory canal and a first right signal corresponding to a right auditory canal are obtained; and respectively superposing the first left signal component and the first right signal corresponding to each component audio component after independent rendering to obtain a corresponding left output signal and right output signal, thereby improving the rendering effect of the whole output signal, and forming a three-dimensional space playing effect with better space sense after the left output signal and the right output signal are respectively sent to left playing equipment and right playing equipment.
In one embodiment, before the performing component separation and identification on the audio signal by using a preset separation algorithm to obtain a plurality of component audio components, the method further includes:
performing environment separation on the audio signal to obtain a direct sound signal and an environment sound signal;
as described above, the direct sound signal and the ambient sound signal are generally separated from each other by analyzing the time-frequency domain characteristics of the signals and the relationships between the path signals, and thus, the components of the different sound source signals, directions, and the ambient sound signals, and specifically, the covariance matrix between the multi-path signals may be analyzed to obtain the direction signals and the ambient sound signals.
In one embodiment, the performing component separation and identification on the audio signal by using a preset separation algorithm to obtain a plurality of audio components includes:
performing component separation on the direct sound signal to obtain the component audio component;
and performing decorrelation processing on the ambient sound signals to obtain a left ambient audio component and a right ambient audio component.
As described above, the ambient sound signal may be used as one path, and the sign of the ambient sound signal may be inverted to be used as another path, so as to obtain decorrelated ambient signals of surround sound, that is, the left ambient audio component and the right ambient audio component, thereby improving the editability of the audio signal.
In one embodiment, the overlapping the first left signal and the first right signal corresponding to each of the constituent audio components to obtain a left output signal and a right output signal includes:
superposing the first left signals corresponding to each component audio component to obtain a second left signal, and superposing the first right signals corresponding to each component audio component to obtain a second right signal;
adding the left environmental audio component to the second left signal to obtain the left output signal;
and adding the right environmental audio component to the second right signal to obtain the right output signal.
As described above, in order to improve the surround effect and the stereoscopic impression of the binaural audio, after obtaining the separate left and right environmental audio components and the constituent audio components, the first left signal superposition and the first right signal superposition are separately performed on the constituent audio components, and at this time, stereo sound of a pure human voice version is obtained, and when a user needs to obtain a complete audio output, the left environmental audio component is superimposed into the first left signal, and the right environmental audio component is superimposed into the first right signal, thereby improving the environmental spatial impression of the final output audio.
In one embodiment, before the binaural rendering of the constituent audio components by the preset head behavior functions, respectively, the method further includes:
acquiring head motion information acquired by a head motion tracking device according to the frequency of refreshing the position of the preset tracking device;
based on spatial interpolation, the head behavior function is updated according to the head motion information.
As described above, since the user may walk, turn around, swing around, etc. during actually wearing the audio playing device such as the earphone, etc., and the audio theoretically heard in the three-dimensional space may also change as the head posture changes, the embodiment obtains the head motion information collected by the head motion tracking device according to the preset frequency, where the head motion tracking device may be a gyroscope, an image detection unit, an electromagnetic tracking unit, etc. When the preset position refreshing frequency of the tracking equipment is higher, the head movement information can be acquired in real time, so that timeliness of the change of the audio space sense is improved.
As described above, when the head behavior function is not the full spatial resolution, the head behavior function of the target bearing (the position of the component audio component corresponding to the position updated with the head motion information) can be obtained by the limited discrete head behavior function by the spatial interpolation method. One possible implementation is: after the head movement information is obtained, the head behavior function of the known discrete points is deduced through spherical harmonics, and finally the head behavior function of the target azimuth is obtained, so that the naturalness of the spatial sense change is improved. Specifically, the spatial interpolation method may be an interpolation method based on a spherical harmonic function, in which a spherical harmonic expansion coefficient matrix is obtained from a head behavior function in a discrete direction, and then the head behavior function of the target position is obtained by combining the spherical harmonic expansion coefficients of the target position.
In one embodiment, the binaural rendering of the constituent audio components by a preset head behavior function respectively includes:
when the head behavior function is an HRTF function, performing time domain convolution or frequency domain multiplication calculation on the component audio components through the HRTF function to complete binaural rendering;
and when the head behavior function is a BRIR function, performing time domain convolution or frequency domain product calculation on the component audio components through the BRIR function to complete binaural rendering.
As described above, when the head behavior function is an HRTF (Head Related Transfer Functions, head related transfer function), the arrival spatial orientations of the direct sound and each reflected sound with respect to the center of the listener head are determined according to the rendering information including the physical and geometric characteristics in the target scene, and the HRTFs of the direct sound and the reflected sound directions corresponding to each component audio component are selected or synthesized to be frequency-domain multiplied or time-domain convolved with each component audio component, so as to obtain a rendered binaural sound signal, and generate a corresponding three-dimensional spatial y sound effect.
When the head behavior function is a BRIR function, the room impulse response reaching the user position is determined according to the rendering information containing the physical and geometric characteristics in the target scene, the corresponding BRIR and the corresponding component audio components are selected or synthesized according to the reaching space orientation of the component audio components relative to the center of the listener head, frequency domain multiplication or time domain convolution is carried out, the rendered binaural sound signal is obtained, and the corresponding three-dimensional space sound effect is generated.
In one embodiment, after the performing the environmental separation on the audio signal to obtain the direct sound signal and the environmental sound signal, the method further includes:
acquiring a first maximum length sequence and a second maximum length sequence, wherein the first maximum length sequence and the second maximum length sequence are uncorrelated;
and respectively carrying out convolution calculation on the environmental sound signal and the first maximum length sequence and the second maximum length sequence to obtain the decorrelated environmental sound signal.
As described above, the correlation control processing is performed on the separated ambient sound signals by using a pair of uncorrelated maximum length sequences (Maximum Length Sequence, MLS) to obtain decorrelated ambient sound signals; specifically, an MLS sequence may be generated, and an inverse sequence of the sequence may be generated, such that the MLS sequence and the inverse sequence are decorrelated, and the ambient sound signal and the MLS sequence and the inverse sequence are respectively convolved, so as to perform a decorrelation calculation on the ambient sound signal.
In one embodiment, the step of obtaining the rendering information of the component audio component in the corresponding target scene based on the preset target scene includes:
acquiring target scenes corresponding to the component audio components, and acquiring a room acoustic model corresponding to the target scenes;
and identifying acoustic parameters of the room acoustic model, and obtaining the rendering information according to the acoustic parameters.
As described above, the room acoustic model is derived based on acoustic parameters of the shape, sound absorption coefficient, reverberation time, etc. of the room, and a sound field description for describing at least one reference position in the room acoustic model can be generated by the sound field generator; the acoustic parameters comprise geometric parameters such as distance, angle and the like between the acoustic parameters and the sound field, so that virtual source remodeling is carried out on each signal component in the music signal in a virtual three-dimensional space by taking a preset target scene as a reference, and the spatial sense of a user on the audio signal is improved.
In summary, in the power supply control method of the audio processing apparatus 1 provided in the embodiments of the present application, an audio signal is divided into a plurality of component audio components by a method of audio signal component separation, so as to obtain effective separation of audio components, so as to facilitate separate rendering of audio signals with different components; the rendering information corresponding to the target scene is generated, so that the separated component audio components are rendered based on the rendering information and the collected head behavior data, the signal rendering effect is improved, and a first left signal corresponding to a left auditory canal and a first right signal corresponding to a right auditory canal are obtained; and respectively superposing the first left signal component and the first right signal corresponding to each component audio component after independent rendering to obtain a corresponding left output signal and right output signal, thereby improving the rendering effect of the whole output signal, and forming a three-dimensional space playing effect with better space sense after the left output signal and the right output signal are respectively sent to left playing equipment and right playing equipment.
Referring to fig. 2, a block diagram of an audio signal processing apparatus according to the present disclosure includes:
the component separation unit 100 is configured to obtain an audio signal to be processed, and perform component separation and identification on the obtained audio signal to be processed through a preset separation algorithm to obtain a plurality of component audio components;
an information obtaining unit 200, configured to obtain rendering information of the component audio component in a corresponding target scene based on a preset target scene;
a binaural rendering unit 300, configured to perform binaural rendering on the constituent audio components through a preset head behavior function based on the rendering information, to obtain a first left signal and a first right signal corresponding to the constituent audio components;
and the signal superimposing unit 400 is configured to superimpose the first left signal and the first right signal corresponding to each of the constituent audio components, obtain a left output signal and a right output signal, and send the left output signal and the right output signal to a left playing device and a right playing device, respectively.
In one embodiment, the component separation unit 100 is further configured to:
and carrying out environment separation on the audio signals to obtain direct sound signals and environment sound signals.
In one embodiment, the component separation unit 100 is further configured to:
performing component separation on the direct sound signal to obtain the component audio component;
and performing decorrelation processing on the ambient sound signals to obtain a left ambient audio component and a right ambient audio component.
In an embodiment, the binaural rendering unit 300 is further adapted to:
superposing the first left signals corresponding to each component audio component to obtain a second left signal, and superposing the first right signals corresponding to each component audio component to obtain a second right signal;
adding the left environmental audio component to the second left signal to obtain the left output signal;
and adding the right environmental audio component to the second right signal to obtain the right output signal.
In an embodiment, the binaural rendering unit 300 is further adapted to:
acquiring head motion information acquired by the head motion tracking equipment according to a preset refresh frequency of the motion tracking equipment;
based on spatial interpolation, the head behavior function is updated according to the head motion information.
In an embodiment, the binaural rendering unit 300 is further adapted to:
when the head behavior function is an HRTF function, performing time domain convolution or frequency domain product calculation on the component audio components through the HRTF function to complete binaural rendering;
and when the head behavior function is a BRIR function, performing time domain convolution or frequency domain product calculation on the component audio components through the BRIR function to complete binaural rendering.
In one embodiment, the component separation unit 100 is further configured to:
acquiring a first maximum length sequence and a second maximum length sequence, wherein the first maximum length sequence and the second maximum length sequence are uncorrelated;
and respectively carrying out convolution calculation on the environmental sound signal and the first maximum length sequence and the second maximum length sequence to obtain the decorrelated environmental sound signal.
In an embodiment, the binaural rendering unit 300 is further adapted to:
acquiring target scenes corresponding to the component audio components, and acquiring a room acoustic model corresponding to the target scenes;
and identifying acoustic parameters of the room acoustic model, and obtaining the rendering information according to the acoustic parameters.
Referring to fig. 3, an audio processing device is further provided in an embodiment of the present application, where the audio processing device may be a server, and the internal structure of the audio processing device may be as shown in fig. 3. The audio processing device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the audio processing device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the audio processing device is used for storing data such as processing methods of audio signals. The network interface of the audio processing device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing an audio signal. The processing method of the audio signal comprises the following steps: acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components; based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained; respectively carrying out binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components; and respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for processing an audio signal, including the steps of: acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components; based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained; respectively carrying out binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components; and respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
In summary, for the processing method, system, device and storage medium of audio signals provided in the embodiments of the present application, the audio signals are divided into a plurality of component audio components by an audio signal component separation method, so as to obtain effective separation of audio components, so that audio signals with different components are rendered separately; the rendering information corresponding to the target scene is generated, so that the separated component audio components are rendered based on the rendering information and the collected head behavior data, the signal rendering effect is improved, and a first left signal corresponding to a left auditory canal and a first right signal corresponding to a right auditory canal are obtained; and respectively superposing the first left signal component and the first right signal corresponding to each component audio component after independent rendering to obtain a corresponding left output signal and right output signal, thereby improving the rendering effect of the whole output signal, and forming a three-dimensional space playing effect with better space sense after the left output signal and the right output signal are respectively sent to left playing equipment and right playing equipment.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (11)

1. A method of processing an audio signal, wherein the method comprises:
acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components;
based on a preset target scene, rendering information of the component audio components in the corresponding target scene is obtained;
respectively carrying out binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components;
and respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
2. The method for processing an audio signal according to claim 1, wherein before the audio signal is subjected to component separation and identification by a preset separation algorithm to obtain a plurality of component audio components, the method further comprises:
and carrying out environment separation on the audio signals to obtain direct sound signals and environment sound signals.
3. The method for processing an audio signal according to claim 2, wherein the performing component separation and identification on the audio signal by a preset separation algorithm to obtain a plurality of audio components includes:
performing component separation on the direct sound signal to obtain the component audio component;
and performing decorrelation processing on the ambient sound signals to obtain a left ambient audio component and a right ambient audio component.
4. A method of processing an audio signal according to claim 3, wherein the superimposing the first left signal and the first right signal corresponding to each of the constituent audio components to obtain a left output signal and a right output signal includes:
superposing the first left signals corresponding to each component audio component to obtain a second left signal, and superposing the first right signals corresponding to each component audio component to obtain a second right signal;
adding the left environmental audio component to the second left signal to obtain the left output signal;
and adding the right environmental audio component to the second right signal to obtain the right output signal.
5. The method for processing an audio signal according to claim 1, wherein before binaural rendering of the constituent audio components by the preset head behavior functions, respectively, further comprises:
acquiring head motion information acquired by a head motion tracking device according to a preset frequency;
based on spatial interpolation, the head behavior function is updated according to the head motion information.
6. The method for processing an audio signal according to claim 5, wherein binaural rendering of the constituent audio components by a preset head behavior function, respectively, comprises:
when the head behavior function is an HRTF function, performing time domain convolution or frequency domain convolution calculation on the component audio components through the HRTF function to complete binaural rendering;
and when the head behavior function is a BRIR function, performing time domain convolution or frequency domain product calculation on the component audio components through the BRIR function to complete binaural rendering.
7. The method for processing an audio signal according to claim 2, wherein after performing the environmental separation on the audio signal to obtain the direct sound signal and the environmental sound signal, the method further comprises:
acquiring a first maximum length sequence and a second maximum length sequence, wherein the first maximum length sequence and the second maximum length sequence are uncorrelated;
and respectively carrying out convolution calculation on the environmental sound signal and the first maximum length sequence and the second maximum length sequence to obtain the decorrelated environmental sound signal.
8. The method for processing an audio signal according to claim 1, wherein the step of acquiring rendering information in the corresponding target scene of the component audio component based on the preset target scene comprises:
acquiring target scenes corresponding to the component audio components, and acquiring a room acoustic model corresponding to the target scenes;
and identifying acoustic parameters of the room acoustic model, and obtaining the rendering information according to the acoustic parameters.
9. An apparatus for processing an audio signal, wherein the apparatus comprises:
the component separation unit is used for acquiring an audio signal to be processed, and carrying out component separation and identification on the audio signal through a preset separation algorithm to obtain a plurality of component audio components;
the information acquisition unit is used for acquiring rendering information of the component audio components based on a preset target scene;
the binaural rendering unit is used for respectively performing binaural rendering on the component audio components through a preset head behavior function based on the rendering information to obtain a first left signal and a first right signal corresponding to the component audio components;
and the signal superposition unit is used for respectively superposing the first left signal and the first right signal corresponding to each component audio component to obtain a left output signal and a right output signal, and respectively transmitting the left output signal and the right output signal to a left playing device and a right playing device.
10. An audio processing device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, carries out the steps of the method of processing an audio signal according to any one of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of processing an audio signal according to any one of claims 1 to 8.
CN202111432455.7A 2021-11-29 2021-11-29 Audio signal processing method, device, equipment and storage medium Pending CN116193350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111432455.7A CN116193350A (en) 2021-11-29 2021-11-29 Audio signal processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111432455.7A CN116193350A (en) 2021-11-29 2021-11-29 Audio signal processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116193350A true CN116193350A (en) 2023-05-30

Family

ID=86437016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111432455.7A Pending CN116193350A (en) 2021-11-29 2021-11-29 Audio signal processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116193350A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118061906A (en) * 2024-04-17 2024-05-24 深圳唯创知音电子有限公司 Audio playing method, device, equipment and medium for automobile

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118061906A (en) * 2024-04-17 2024-05-24 深圳唯创知音电子有限公司 Audio playing method, device, equipment and medium for automobile
CN118061906B (en) * 2024-04-17 2024-07-16 深圳唯创知音电子有限公司 Audio playing method, device, equipment and medium for automobile

Similar Documents

Publication Publication Date Title
US11770671B2 (en) Spatial audio for interactive audio environments
AU2018298874C1 (en) Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description
CN108370487B (en) Sound processing apparatus, method, and program
US10779103B2 (en) Methods and systems for audio signal filtering
CN113889125B (en) Audio generation method and device, computer equipment and storage medium
WO2007031906A2 (en) A method of and a device for generating 3d sound
WO2022108494A1 (en) Improved modeling and/or determination of binaural room impulse responses for audio applications
CN116193350A (en) Audio signal processing method, device, equipment and storage medium
CN108038291B (en) Personalized head-related transfer function generation system and method based on human body parameter adaptation algorithm
El-Mohandes et al. DeepBSL: 3-D Personalized Deep Binaural Sound Localization on Earable Devices
Pelzer et al. Continuous and exchangeable directivity patterns in room acoustic simulation
JP2023122230A (en) Acoustic signal processing device and program
CN118301536A (en) Audio virtual surrounding processing method and device, electronic equipment and storage medium
JP2022131067A (en) Audio signal processing device, stereophonic sound system and audio signal processing method
CN117793609A (en) Sound field rendering method and device
Salvador et al. Enhancing the binaural synthesis from spherical microphone array recordings by using virtual microphones
Zotkin et al. Signal processing for Audio HCI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination