CN116055983B

CN116055983B - Audio signal processing method and electronic equipment

Info

Publication number: CN116055983B
Application number: CN202211048513.0A
Authority: CN
Inventors: 魏彤; 曾青林; 张海宏
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2023-11-07
Anticipated expiration: 2042-08-30
Also published as: CN116055983A

Abstract

The application discloses an audio signal processing method and electronic equipment, relates to the field of audio, and can realize rapid and accurate reflected sound rendering and effectively reduce calculation force expenditure. The specific scheme is as follows: at least two secondary virtual sources are provided in the virtual space, the at least two secondary virtual sources being used to identify at least one virtual source included in the virtual space. And determining a first early-stage reverberant sound rendering result of the first virtual source and a first late-stage reverberant sound rendering result of the first virtual source through the at least two secondary virtual sources. The first virtual source is included in the at least one virtual source. And acquiring a first reflected sound rendering result corresponding to the first virtual source according to the first early reflected sound rendering result and the first late reverberant sound rendering result.

Description

Audio signal processing method and electronic equipment

Technical Field

The present application relates to the field of audio, and in particular, to an audio signal processing method and an electronic device.

Background

When the electronic device provides a spatial audio playing function for a user, the electronic device needs to respectively render direct sound and reflected sound. The rendering process of the reflected sound needs to collect a large amount of data, and the requirement on calculation force is high. This results in limited spatial audio playback functionality in power-limited electronic devices.

Disclosure of Invention

The application provides an audio signal processing method and electronic equipment, which can realize rapid and accurate reflected sound rendering and effectively reduce calculation force expenditure.

In order to achieve the above purpose, the application adopts the following technical scheme:

in a first aspect, an audio signal processing method is provided, and the audio signal processing method is applied to an electronic device, where the electronic device has a function of simulating spatial audio in a virtual space, and the virtual space includes at least one virtual source, and the method includes: at least two secondary virtual sources are provided in the virtual space, the at least two secondary virtual sources being used to identify at least one virtual source included in the virtual space. And determining a first early-stage reverberant sound rendering result of the first virtual source and a first late-stage reverberant sound rendering result of the first virtual source through the at least two secondary virtual sources. The first virtual source is included in the at least one virtual source. And acquiring a first reflected sound rendering result corresponding to the first virtual source according to the first early reflected sound rendering result and the first late reverberant sound rendering result.

Thus, the electronic device may describe the virtual source by setting a secondary virtual source in the virtual space. Therefore, the BRIR of the virtual source is not required to be determined, and the rendering of the reflected sound of the virtual source can be realized. Therefore, the rendering efficiency is improved, and the calculation overhead is reduced.

Optionally, the method further comprises: a first head related transfer function HRTF of the first virtual source is measured. And acquiring a first direct sound rendering result corresponding to the first virtual source according to the first HRTF. In this way, the electronic device can also realize the rendering of the direct sound of the virtual source through the scheme.

Optionally, the method further comprises: and determining a spatial audio rendering result of the first virtual source according to the first reflected sound rendering result and the first direct sound rendering result. Therefore, the rendering result of the direct sound and the reflected sound is combined, and the complete spatial audio rendering result corresponding to one virtual source can be obtained.

Optionally, before the determining, by the at least two secondary virtual sources, a first early reflected sound rendering result of a first virtual source and a first late reflected sound rendering result of the first virtual source, the method further comprises: and performing first spherical harmonic transformation on the first virtual source to obtain a first spherical harmonic coordinate corresponding to the first virtual source. And performing second spherical harmonic transformation on the at least two secondary virtual sources to obtain a secondary virtual source spherical harmonic coordinate matrix corresponding to the at least two secondary virtual sources. In this way, the coordinates of the virtual source and the secondary virtual source are linked in the spherical harmonic space through spherical harmonic transformation. Thereby achieving the effect of describing the virtual source by the secondary virtual source. It will be appreciated that when a plurality of secondary virtual sources and virtual sources are provided in space, there will be more elements in the matrix to which the first spherical harmonic transform and the second spherical harmonic transform correspond respectively, but an increase in the number of secondary virtual sources or virtual sources will not result in an increase in the number of spherical harmonic transform calculations. Thereby enabling efficient data processing in complex environments. Furthermore, based on the spherical harmonic transformation, the coordinates of both the virtual source and the secondary virtual source are normalized to 4 dimensions of the spherical harmonic space, thereby making the subsequent rendering operations easier to perform.

Optionally, the first spherical harmonic coordinate includes coordinate data of four dimensions x, y, z, w corresponding to the first virtual source. The secondary virtual source spherical harmonic coordinate matrix comprises x, y, z, w four-dimensional coordinate data corresponding to the at least two secondary virtual sources. Wherein x, y and z can respectively correspond to one direction in the spherical harmonic space, and w can identify the overall offset condition of the coordinate in the space.

Optionally, before determining the first early-stage reflected sound rendering result of the first virtual source and the first late-stage reflected sound rendering result of the first virtual source by the at least two secondary virtual sources, the method further includes: and determining an early reflection sound matrix and a late reflection sound matrix of the at least two secondary virtual sources, determining a first early reflection sound rendering result according to the first virtual source and the early reflection sound matrix, and determining a first late reflection sound rendering result according to the first virtual source and the late reflection sound matrix. Therefore, the early reflected sound and the late reverberant sound are distinguished, and the reflected sound with stronger directivity and the reverberant sound with stronger space can be processed respectively more pertinently, so that better rendering effect can be obtained.

Optionally, before the determining the early reflected sound matrix and the late reverberant sound matrix of the at least two secondary virtual sources, the method includes: a reverberation time of the at least two secondary virtual sources is determined, and the early reflected sound matrix is determined from sound signals of the secondary virtual sources preceding the reverberation time. The late reverberant matrix is determined from the sound signals of the secondary virtual sources after the reverberation time. Thus, the early reverberant sound and the late reverberant sound can be divided by the reverberation time. The reverberation time may be determined by signal processing means such as sound pressure and time of the sound signal. The processing method may be preset in the electronic device.

Optionally, the determining the late reverberant sound matrix based on the sound signals of the secondary virtual source after the reverberation time includes: and determining the late reverberant sound matrix according to the set of w elements in the sound signal after the reverberation time and the preset adjusting parameter g after the second spherical harmonic transformation is carried out on the at least two secondary virtual sources. In this way, the reverberant sound having no directivity is simplified, for example, to the w-dimension. And meanwhile, the amplitude of the late reverberant matrix is consistent with that before the simplification processing according to the adjustment parameter g. For example, g may be set to 4.

Optionally, the determining the first late reverberant sound rendering result according to the first virtual source and the late reverberant sound matrix includes: and determining the first late reverberant rendering result according to the w element in the first spherical harmonic coordinate and the late reverberant matrix. In this way, the fast rendering of late reverberant sound can be realized with a smaller calculation amount by only convolution calculation of w dimension capable of identifying each azimuth in space.

Optionally, the virtual space includes at least one virtual source and a second virtual source, and the method further includes: and determining a second early reverberant sound rendering result of the second virtual source and a second late reverberant sound rendering result of the second virtual source through the at least two secondary virtual sources. And obtaining a second reflected sound rendering result corresponding to the second virtual source according to the second early reflected sound rendering result and the second late reflected sound rendering result. A second head related transfer function HRTF of the second virtual source is measured. And acquiring a second direct sound rendering result corresponding to the second virtual source according to the second HRTF. In this way, when a plurality of virtual sources are arranged in the space, the calculation scheme of the first virtual source can be referred to, so that quick and accurate reflected sound rendering can be realized. It will be appreciated that the greater the number of virtual sources, the greater the number of matrix elements corresponding to the first spherical harmonic transformation, but without introducing additional computation times.

Optionally, the method further comprises: and synthesizing a first direct sound rendering result of the first virtual source, a first reflected sound rendering result of the first virtual source, a second direct sound rendering result of the second virtual source, and a second reflected sound rendering result of the second virtual source, and obtaining spatial audio rendering results corresponding to the first virtual source and the second virtual source.

In a second aspect, there is provided an audio signal processing apparatus comprising: a direct sound rendering module for performing a rendering operation of direct sound of at least one virtual source in the virtual space according to the method as shown in the first aspect and any one of its possible designs, and a reflected sound rendering module for performing a rendering operation of reflected sound of at least one virtual source in the virtual space according to the method as shown in the first aspect and any one of its possible designs.

Optionally, the reflected sound rendering module includes: and the first spherical harmonic transformation unit is used for performing first spherical harmonic transformation on the at least one virtual source. And the second spherical harmonic transformation unit is used for performing second spherical harmonic transformation on at least one secondary virtual source. And the reflected sound sub-conversion unit is used for determining the reverberation time and acquiring an early reflected sound matrix and a late reverberant sound matrix according to the reverberation time.

In a third aspect, there is provided an electronic device comprising an audio signal processing apparatus as in any of the second aspect and its alternative designs.

In a fourth aspect, there is provided an electronic device comprising a processor and a memory for storing instructions executable by the processor, the processor being configured to, when executed, cause the electronic device to implement a method as shown in the first aspect and any one of its possible designs.

In a fifth aspect, a computer-readable storage medium having computer program instructions stored thereon is provided. The computer program instructions, when executed by an electronic device, cause the electronic device to implement the method as claimed in the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, a computer program product is provided, comprising computer readable code which, when run in an electronic device, causes the electronic device to implement the method according to the first aspect or any of the possible implementations of the first aspect.

In a seventh aspect, a chip system is provided, the chip system comprising an interface circuit and a processor; the interface circuit and the processor are interconnected through a circuit; the interface circuit is used for receiving signals from the memory and sending signals to the processor, and the signals comprise computer instructions stored in the memory; when the processor executes the computer instructions, a system-on-chip provided in the electronic device performs the method as described in the first aspect above and any of the various possible designs.

It should be appreciated that the advantages of the second to seventh aspects may be referred to in the description of the first aspect, and are not described herein.

Drawings

FIG. 1 is a schematic diagram of a distribution of direct sound and reflected sound;

FIG. 2 is a logical schematic diagram of a spatial audio rendering;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present application;

fig. 4 is a schematic diagram of a spatial audio rendering device according to an embodiment of the present application;

fig. 5 is a flowchart of an audio signal processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of setting a secondary virtual source according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a spherical harmonic transformation according to an embodiment of the present application;

FIG. 8 is a schematic diagram of the distribution of direct sound and early reflected sound and late reverberant sound provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of the distribution of direct sound and early reflected sound and late reverberant sound provided by an embodiment of the present application;

fig. 10 is a schematic diagram of a reflected sound rendering module according to an embodiment of the present application;

fig. 11 is a flowchart of an audio signal processing method according to an embodiment of the present application;

FIG. 12 is a schematic diagram illustrating the operation of a reflected sound rendering module according to an embodiment of the present application;

fig. 13 is a schematic diagram of an electronic device according to an embodiment of the present application;

fig. 14 is a schematic diagram of a system-on-chip according to an embodiment of the present application.

Detailed Description

The electronic device may provide the user with a simulated function of the virtual environment so that the user may feel visual and auditory near the real environment in the virtual space. The simulated functions of the Virtual environment may correspond to functions of augmented Reality (Augmented Real ity, AR), virtual Reality technology (VR), and the like.

Taking the example of an electronic device providing a spatial sound simulation function to a user.

In a real environment, as shown in fig. 1, the sound may include a portion directly entering the human ear, i.e., a direct sound, after being emitted from the sound source. In addition, the sound, after being emitted by the sound source, may also include a portion reflected into the human ear by other objects in the scene, i.e. reflected sound. The direct sound can provide the user with the sense of orientation corresponding to the sound source. The reflected sound can then provide the user with a sense of space corresponding to the current environment.

As shown in fig. 2, when the electronic device provides sound signals to a user in a virtual environment, direct sound and reflected sound corresponding to each sound source may be obtained through a spatial audio rendering technology. The electronic device may synthesize the direct sound and the reflected sound, i.e. may obtain a spatial audio signal provided to the user. In this example, the sound sources in the virtual space may be referred to as virtual sources.

In general, the electronic device may obtain direct sound as well as reflected sound via spatial audio rendering based on head related transfer functions (head related transfer funct ions, HRTF), and binaural room impulse response (binaural room impulse response, BRIR).

The HRTF can be used to describe the transmission of sound waves from a sound source to two ears, and is a sound localization algorithm. When sound is transmitted to us, the head related transfer function will correspond to the phase and frequency response of our head. Thus, the sound image on the corresponding position of the head can be processed through the HRTF, so that the user perceives the corresponding position of the sound image.

Room impulse response refers to measuring a sequence of signals radiated by a pulsed sound source received at a receiving location in the sound field of a room. The sound field characteristics of the corresponding room can be embodied. BRIR describes the acoustic transmission process from the sound source to the listener at both ears of the acoustic receiving location, integrates the influence of the room and the listener on the sound waves, and can embody the sound field characteristics perceived by the listener in the corresponding room.

For example, for sound rendering of direct sound, the electronic device may be implemented with a convolved HRTF from a dry signal generated by a virtual source. For virtual sources at different positions or in motion, because the HRTF has the characteristics of short data volume and easy measurement, the electronic equipment can measure and acquire the HRTF in real time for rendering direct sound. In this example, since the direct sound is used to provide a sense of orientation, the rendering of the direct sound may also be referred to as sense of orientation rendering.

For the rendering of reflected sound, the electronic device can be realized in the environment binaural impulse response BRIR according to the dry signal convolution sound source generated by the virtual source. Unlike HRTF, for virtual sources in different locations, or virtual sources in motion, the BRIR cannot be measured and acquired in real time by an electronic device to support the rendering of reflected sounds because the BRIR has a large data volume and is difficult to measure. In this example, since reflected sound is used to provide a sense of space, the rendering of reflected sound may also be referred to as a sense of space rendering.

In order to perform the rendering of the reflected sound in real time, a technology of analog synthesis may be used to implement the rendering of the spatial sensation of the sound object. Illustratively, the analog synthetic acoustic object spatially-sensed signal is primarily a reverberant signal. The simulation of the reverberant signal may include: physical simulation methods and perceptual simulation methods.

The physical simulation method can comprise a virtual source method, a sound ray method and the like. Although the method can realize the simulation of reflected sound with higher precision, the method cannot be applied to a real-time rendering system due to large operation amount.

The perceptual simulation method may include schemes based on a functional distribution network (Function Del ivery Network, FDN) network or Schroeder reverberator. Although the method can reduce the requirement on calculation force, the simulation accuracy is low, and the sound reflection condition in the current virtual environment cannot be accurately simulated.

In order to solve the problems, the technical scheme provided by the embodiment of the application can respectively render the direct sound and the reflected sound, and further synthesize and acquire a spatial audio rendering result. In the rendering of the reflected sound, the positions of the virtual sources are described by introducing a plurality of preset secondary virtual sources. Thereby enabling a fast rendering of reflected sound. Illustratively, in some implementations, combining spherical harmonic transforms with different processing mechanisms for early reflected sound (early reflection) and late reverberant sound (later reverberation), the computation process of rendering reflected sound corresponding to each virtual source through a secondary virtual source can be further simplified, reducing computational overhead. Therefore, the electronic equipment can provide accurate spatial audio rendering results for users in real time.

It should be noted that, according to the scheme provided by the embodiment of the application, sound field simulation can be performed on a plurality of different virtual scenes, and corresponding spatial audio rendering results can be provided for users. The virtual scene may include a mall, theater, conference room, concert hall, office, kitchen, living room, etc.

In addition, the scheme provided by the embodiment of the application can be applied to electronic equipment or a system capable of providing a spatial sound simulation function for a user. For example, the electronic device provides a spatial sound simulation function to a user through headphones that establish a signal connection with the electronic device.

The electronic device may be a cell phone, tablet, handheld computer, PC, cellular phone, personal digital assistant (personal digital assistant, PDA), wearable device (e.g., smart watch, smart bracelet), smart home device (e.g., television), car set (e.g., car-mounted computer), smart screen, projector, game console, camera, and augmented reality (augmented reality, AR)/Virtual Reality (VR) device, etc. The earphone may be a headphone, an in-ear earphone, an ear-hanging earphone, or the like, according to the wearing mode. According to different connection modes, the earphone in the embodiment of the application can be a Bluetooth earphone, a wired earphone, a true wireless stereo (true wireless stereo, TWS) earphone and the like.

The embodiment of the application does not limit the specific device form of the electronic device or the earphone in particular.

As an example, taking an electronic device as a mobile phone as an example, fig. 3 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.

As shown in fig. 3, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) connector 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a user identification module (subscriber identification module, SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The processor can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 may be a cache memory. The memory may hold instructions or data that are used or used more frequently by the processor 110. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others. The processor 110 may be connected to the touch sensor, the audio module, the wireless communication module, the display, the camera, etc. module through at least one of the above interfaces.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The USB connector 130 is an interface that meets the USB standard, and may be used to connect the electronic device 100 to a peripheral device, specifically, a Mini USB connector, a Micro USB connector, a USB Type C connector, etc. The USB connector 130 may be used to connect to a charger to charge the electronic device 100, or may be used to connect to other electronic devices to transfer data between the electronic device 100 and the other electronic devices. And may also be used to connect headphones through which audio stored in the electronic device is output. The connector may also be used to connect other electronic devices, such as VR devices, etc. In some embodiments, the standard specifications for universal serial buses may be USB b1.X, USB2.0, USB3.X, and USB4.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), bluetooth low energy (blue tooth low energy, BLE), ultra Wide Band (UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied on the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with networks and other electronic devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 100 may implement display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or more display screens 194.

The electronic device 100 may implement camera functions through a camera module 193, an isp, a video codec, a GPU, a display screen 194, and an application processor AP, a neural network processor NPU, etc.

The camera module 193 may be used to acquire color image data as well as depth data of a subject. The ISP may be used to process color image data acquired by the camera module 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to the naked eye. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be disposed in the camera module 193.

In some embodiments, the camera module 193 may be composed of a color camera module and a 3D sensing module.

In some embodiments, the photosensitive element of the camera of the color camera module may be a charge coupled device (charge coupled device, CCD) or a complementary metal oxide semiconductor (complementary metal-oxide-semiconductor, CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format.

In some embodiments, the 3D sensing module may be a (time of flight) 3D sensing module or a structured light (3D) sensing module. The structured light 3D sensing is an active depth sensing technology, and basic components of the structured light 3D sensing module may include an Infrared (Infrared) emitter, an IR camera module, and the like. The working principle of the structured light 3D sensing module is that a light spot (pattern) with a specific pattern is emitted to a shot object, then a light spot pattern code (light coding) on the surface of the object is received, and the difference between the light spot and an original projected light spot is compared, and the three-dimensional coordinate of the object is calculated by utilizing the triangle principle. The three-dimensional coordinates include the distance from the electronic device 100 to the subject. The TOF3D sensing may be an active depth sensing technology, and the basic components of the TOF3D sensing module may include an Infrared (Infrared) emitter, an IR camera module, and the like. The working principle of the TOF3D sensing module is to calculate the distance (namely depth) between the TOF3D sensing module and the shot object through the time of infrared ray turn-back so as to obtain a 3D depth map.

The structured light 3D sensing module can also be applied to the fields of face recognition, somatosensory game machines, industrial machine vision detection and the like. The TOF3D sensing module can also be applied to the fields of game machines, augmented reality (augmented reality, AR)/Virtual Reality (VR), and the like.

In other embodiments, camera module 193 may also be comprised of two or more cameras. The two or more cameras may include a color camera that may be used to capture color image data of the object being photographed. The two or more cameras may employ stereoscopic vision (stereo) technology to acquire depth data of the photographed object. The stereoscopic vision technology is based on the principle of parallax of human eyes, and obtains distance information, i.e., depth information, between the electronic device 100 and the object to be photographed by shooting images of the same object from different angles through two or more cameras under a natural light source and performing operations such as triangulation.

In some embodiments, electronic device 100 may include 1 or more camera modules 193. Specifically, the electronic device 100 may include 1 front camera module 193 and 1 rear camera module 193. The front camera module 193 can be used to collect color image data and depth data of a photographer facing the display screen 194, and the rear camera module can be used to collect color image data and depth data of a photographed object (such as a person, a landscape, etc.) facing the photographer.

In some embodiments, a CPU or GPU or NPU in the processor 110 may process color image data and depth data acquired by the camera module 193. In some embodiments, the NPU may identify color image data acquired by the camera module 193 (specifically, the color camera module) by a neural network algorithm, such as a convolutional neural network algorithm (CNN), based on which the skeletal point identification technique is based, to determine skeletal points of the captured person. The CPU or GPU may also be operable to run a neural network algorithm to effect determination of skeletal points of the captured person from the color image data. In some embodiments, the CPU or GPU or NPU may be further configured to confirm the stature (such as body proportion, weight of the body part between the skeletal points) of the photographed person based on the depth data collected by the camera module 193 (which may be a 3D sensing module) and the identified skeletal points, and further determine body beautification parameters for the photographed person, and finally process the photographed image of the photographed person according to the body beautification parameters, so that the body form of the photographed person in the photographed image is beautified. In the following embodiments, how to make the body-beautifying process on the image of the photographed person based on the color image data and the depth data acquired by the camera module 193 will be described in detail, and will not be described in detail.

The digital signal processor is used for processing digital signals, and can also process other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card. Or transfer files such as music, video, etc. from the electronic device to an external memory card.

The internal memory 121 may be used to store computer executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional methods or data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music through the speaker 170A or output an audio signal for hands-free calling.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and controls the lens to move in the opposite direction to counteract the shake of the electronic device 100, thereby realizing anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates altitude from the barometric pressure value measured by the barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip cover using the magnetic sensor 180D. When the electronic device is a foldable electronic device, the magnetic sensor 180D may be used to detect the folding or unfolding, or folding angle, of the electronic device. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outward through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When the intensity of the detected reflected light is greater than the threshold value, it may be determined that there is an object in the vicinity of the electronic device 100. When the intensity of the detected reflected light is less than the threshold, the electronic device 100 may determine that there is no object in the vicinity of the electronic device 100. The electronic device 100 can detect that the user holds the electronic device 100 close to the ear by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.

Ambient light sensor 180L may be used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is occluded, e.g., the electronic device is in a pocket. When the electronic equipment is detected to be blocked or in the pocket, part of functions (such as touch control functions) can be in a disabled state so as to prevent misoperation.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.

The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature detected by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in the performance of the processor in order to reduce the power consumption of the electronic device to implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature detected by the temperature sensor 180J is below another threshold. In other embodiments, the electronic device 100 may boost the output voltage of the battery 142 when the temperature is below a further threshold.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the vocal part vibration bone piece obtained by the bone conduction sensor 180M, and implement the voice function. The application processor can analyze heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to realize a heart rate detection function.

The keys 190 may include a power on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or more SIM card interfaces. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

It should be noted that the electronic device shown in fig. 3 is only one illustration.

In some embodiments of the present application, the electronic device may further be provided with a spatial audio rendering apparatus as shown in fig. 4. The spatial audio rendering apparatus may be used to support spatial audio rendering functions of an electronic device. In some implementations, the spatial audio rendering device may also be referred to as an audio signal processing device.

As shown in fig. 4, in this example, the spatial audio rendering apparatus may include a direct sound rendering module and a reflected sound rendering module. The direct sound rendering module can be used for performing directional sense rendering corresponding to the direct sound. The reflected sound rendering module can be used for rendering the space sense corresponding to the reflected sound.

In the following description, a scheme provided by an embodiment of the present application will be described in detail by taking an example in which an electronic device is provided with a spatial audio rendering device as shown in fig. 4.

Fig. 5 is a flowchart of a processing method of an audio signal according to an embodiment of the present application. As shown in fig. 5, the scheme may include:

s501, for each virtual source, the direct sound rendering module measures and updates the HRTF in real time.

In connection with the foregoing description, for the rendering process of the direct sound, it may be implemented by convolving the HRTF with the dry signal generated by the virtual source.

It should be understood that in the case where there are multiple virtual sources in space, and the locations of the virtual sources are different, each virtual source may correspond to an HRTF with respect to the user. Based on the characteristic that the HRTF data amount is small and easy to measure, in this example, the direct sound rendering module in the electronic device may measure the HRTF corresponding to each virtual source separately.

Illustratively, the space includes a virtual source a and a virtual source B. Then, the direct sound rendering mode of the electronic deviceThe block may measure HRTFs corresponding to the respective virtual sources separately. For example, the HRTF corresponding to virtual source a _A HRTF corresponding to virtual source B _B Etc.

At different times, the sound signal emitted by the virtual source may change. Furthermore, the location of the virtual source may also be varied. Thus, in some embodiments, the direct sound rendering module of the electronic device may measure and update HRTFs corresponding to the virtual source a, the virtual source B, and other virtual sources respectively at different times.

S502, obtaining a direct sound rendering result according to the latest HRTF.

After the HRTFs of the virtual sources at the current moment are obtained, the direct sound rendering module of the electronic equipment can calculate and obtain the direct sound rendering result of each virtual source relative to the user according to the HRTFs of the virtual sources.

As an example, the direct sound rendering module of the electronic device may implement direct sound rendering of one virtual source according to the following formula (1).

Formula (1):

wherein S is _dirA And rendering a direct sound corresponding to the virtual source A. S is S _dry A dry signal of the sound signal is emitted for the virtual source a. HRTF (high-temperature transfer function) _A Is the head related transfer function of virtual source a with respect to the user.

Similarly, the direct sound rendering module of the electronic device may also perform direct sound rendering of other virtual sources (such as virtual source B) according to the above formula (1). Therefore, direct sound rendering results respectively corresponding to the virtual sources can be obtained.

In some embodiments, the electronic device may perform convolution synthesis calculation on the direct sound rendering results corresponding to the multiple virtual sources, so as to obtain the direct sound rendering results generated by all virtual sources in the current environment, that is, the direct sound portion in the binaural sound effect.

S503, presetting at least two secondary virtual sources in the space, and determining BRIRs of the secondary virtual sources.

In this example, the electronic device may describe the location of each virtual source through a plurality of secondary virtual sources in the virtual space. The position of the secondary virtual source may be preset.

For example, in some embodiments, the setting of the secondary virtual source may follow the following rules: and building a polyhedron in the space by taking the user as a center according to a preset size. A secondary virtual source may be provided at each vertex of the polyhedron. Alternatively, a secondary virtual source may be provided in the center of each face of the polyhedron. In some embodiments, the polyhedron may be a regular hexahedron. Correspondingly, when the secondary virtual sources are arranged at the vertexes of the regular hexahedron, 8 secondary virtual sources centering on the user can be arranged.

As an example, please refer to fig. 6, which shows a schematic representation of the arrangement of a secondary virtual source corresponding to any two-dimensional plane in space. As shown in fig. 6, in this example, 4 secondary virtual sources, V1, V2, V3, and V4, respectively, may be provided around the user.

It will be appreciated that since the setting of the secondary virtual sources is predetermined, in the case of user location determination, the electronic device may determine BRIRs for each secondary virtual source. For example, the reflected sound rendering module of the electronic device may determine that the BRIR of V1 is VSIR in the current environment _V1 BRIR of V2 is VSIR _V2 BRIR of V3 is VSIR _V3 BRIR of V4 is VSIR _V4 。

S504, for each virtual source, the BRIRs of the virtual sources are synthesized in real time according to the BRIRs of the secondary virtual sources.

In connection with the example of fig. 6, it will be appreciated that the locations of the respective secondary virtual sources are determined as a result of the space. Thus, the location of any one virtual source may be described by at least two secondary virtual sources in a fixed coordinate system of the respective secondary virtual sources.

Similarly, BRIRs for any one virtual source may also be accurately identified by BRIRs for at least two secondary virtual sources and their respective weights.

By way of example, the following equation (2) shows a specific implementation of BRIR describing virtual source a as shown in fig. 6 by V1-V4.

Formula (2):

wherein, BRIR _A Binaural room impulse response for virtual source a.To describe virtual source a, the weights of secondary virtual source V1. />To describe the weight of the secondary virtual source V2, virtual source a is described. />To describe the weight of the secondary virtual source V3, virtual source a is described. />To describe the weight of the secondary virtual source V4, virtual source a is described. />

Similarly, the electronic device may determine the BRIR of virtual source B by the following equation (3).

Equation (3):

wherein, BRIR _B Is the binaural room impulse response of virtual source B.To describe virtual source B, the weight of secondary virtual source V1. />To describe the weight of the secondary virtual source V2 when virtual source B. />To describe the weight of the secondary virtual source V3, virtual source B. />To describe the weight of the secondary virtual source V4 when virtual source B.

By analogy, when multiple virtual sources exist in space, the electronic device can describe BRIR of each virtual source through V1-V4 by a scheme similar to the above formula (2) or formula (3).

It should be noted that, the formula (2) and the formula (3) are only examples of describing virtual sources by 4 secondary virtual sources. In other implementations, the electronic device may implement the description of the virtual source through at least two secondary virtual sources.

For example, the reflected sound rendering module of the electronic device may perform piecewise fitting on any two secondary virtual source combinations according to the position of any one virtual source, thereby determining the combination of two secondary virtual sources with weights being positive numbers in V1-V4.

For example, the electronic device, in determining BRIR for virtual source a as shown in fig. 6, may perform a piecewise fit for virtual source a using V1 and V2, V2 and V3, V3 and V4, and a combination of V4 and V1, respectively. When V1 and V2 are determined to describe the virtual source A, the weights are positive numbers. The fitting calculation is stopped. The reflected sound rendering module of the electronic device may determine that the virtual source a is described by V1 and V2 and respective corresponding weights.

As another example, the electronic device may segment fit the virtual source B using V1 and V2, V2 and V3, V3 and V4, and a combination of V4 and V1, respectively, in determining BRIR of the virtual source B as shown in fig. 6. When V1 and V2 are determined to describe the virtual source B, the weights thereof have at least one negative number. The electronic device may continue to attempt to describe virtual source B using V2 and V3 and determine that its weights are both positive numbers. The fitting calculation is stopped. The reflected sound rendering module of the electronic device may determine that the virtual source B is described by V2 and V3 and respective corresponding weights.

Thus, the electronic device can respectively determine BRIRs of the virtual sources through a plurality of preset secondary virtual sources. For example, the BRIR of virtual source a is determined to be BRIR _A . As another example, the BRIR of virtual source B is determined to be BRIR _B Etc. Thus, the electronic equipment can obtain the reflected sound signals of each virtual source more accurately without measuring or simulating the reflected sound signals in real time BRIR corresponding to each virtual source.

S505, obtaining a reflected sound rendering result according to the BRIRs of the virtual sources.

In connection with the description of S504, the reflected sound rendering module of the electronic device may determine BRIRs corresponding to the virtual sources, respectively. For example, BRIR corresponding to virtual source a _A . As another example, BRIR corresponding to virtual source B _B 。

In this example, the reflected sound rendering module of the electronic device may also determine, according to BRIR of each virtual source, a reflected sound rendering result of the virtual source, that is, a spatial sense rendering result.

For example, taking the virtual source a as an example, the electronic device may determine the reflected sound rendering result corresponding to the virtual source a according to the following formula (4).

Equation (4):

wherein S is _refA And rendering a result of the reflected sound corresponding to the virtual source A. S is S _dry A dry signal of the sound signal is emitted for the virtual source a.

Similarly, taking the virtual source B as an example, the electronic device may determine the reflected sound rendering result corresponding to the virtual source B according to the following formula (5).

Equation (5):

wherein S is _refB And rendering a result of the reflected sound corresponding to the virtual source B. S is S _dry A dry signal of the sound signal is emitted for the virtual source B.

Thus, the reflected sound rendering module of the electronic device can determine the reflected sound rendering results corresponding to the virtual sources.

In some embodiments, the reflected sound rendering module of the electronic device may further perform convolution synthesis on the reflected sound rendering results corresponding to the multiple virtual sources, so as to obtain the reflected sound rendering result in the current environment. The reflected sound rendering result in the current environment is also referred to as the reflected sound portion in the binaural sound effect.

S506, synthesizing the direct sound rendering result and the reflected sound rendering result to obtain a spatial audio rendering result.

After the direct sound rendering result and the reflected sound rendering result in the current environment are obtained according to the scheme, the electronic equipment can synthesize the direct sound rendering result and the reflected sound rendering result. Thereby obtaining a spatial audio signal comprising direct sound for reflecting the sense of orientation and reflected sound for reflecting the sense of space in the current scene.

It will be appreciated that in the scheme shown in fig. 5, the electronic device may describe each virtual source by a piecewise fit of the secondary virtual source, thereby enabling reflected sound rendering. However, in the case of a large number of secondary virtual sources and/or virtual sources, this solution still places high demands on the computing power of the electronic device.

In order to further reduce the cost in the reflected sound rendering process and improve the calculation efficiency, the embodiment of the application also provides a reflected sound rendering scheme based on spherical harmonic transformation. The scheme can replace S503-S505 in the scheme shown in FIG. 5, and a faster and more accurate reflected sound rendering effect is realized.

As an example, fig. 7 shows a corresponding case of a spherical harmonic transformation. After spherical harmonic transformation, a coordinate position can be identified by elements in four dimensions of x, y, z and w. Wherein x corresponds to the offset of the position in the x direction in the spherical harmonic space. y corresponds to the offset in the y direction in the spherical harmonic space with the position. z corresponds to the z-direction offset from this position in spherical harmonic space. w can identify the offset of each direction from the position in spherical harmonic space.

It will be appreciated that in connection with the illustration as in fig. 5, in performing reflected sound rendering, the electronic device may perform a piecewise fit based on the virtual source and the position of the secondary virtual source in space to describe the virtual source by the secondary virtual source.

In this example, unlike the form of segment fitting described above, the electronic device can transform the position information of the virtual source according to the corresponding spherical harmonic transformation matrix (e.g., denoted as matrix) And performing spherical harmonic transformation so as to obtain first-order spherical harmonic coordinates of the virtual source in a spherical harmonic transformation coordinate system. For example, the first order spherical harmonic coordinates may be identified as a coordinate matrix of 1*4 or 4*1 comprising four elements of x, y, z and w. Wherein, matrix->Can be determined according to the position relation of each virtual source and the user in the virtual space.

In this way, even if there are a plurality of virtual sources in the virtual space, the position information of the plurality of virtual sources can constitute a virtual source coordinate set. The virtual source coordinate set can be transformed by a matrix once and passed through the matrixAnd transforming to obtain a virtual source spherical harmonic coordinate matrix corresponding to the virtual source coordinate set.

Illustratively, the virtual space includes N virtual sources. Then by once based on the matrixCan obtain a virtual source spherical harmonic coordinate matrix of N4 or 4*N.

Correspondingly, the electronic device can also convert the position information of the secondary virtual source according to the spherical harmonic transformation matrix (such as matrix) And performing spherical harmonic transformation so as to obtain first-order spherical harmonic coordinates of the secondary virtual source in a spherical harmonic transformation coordinate system. Similar to the coordinate transformation of the virtual source, the first order spherical harmonic coordinates of one secondary virtual source may be identified as a coordinate matrix of 1*4 or 4*1 comprising four elements of x, y, z and w. Wherein, matrix->May be determined based on the respective secondary virtual sources and the positional relationship of the user in the virtual space. In this way the first and second light sources,even if there are a plurality of secondary virtual sources in the virtual space, the position information of the plurality of secondary virtual sources may constitute a secondary virtual source coordinate set. The secondary virtual source coordinate set can be transformed by a primary matrix, by a matrix +.>And transforming to obtain a secondary virtual source spherical harmonic coordinate matrix corresponding to the virtual source coordinate set.

Illustratively, the virtual space includes M secondary virtual sources. Then by once based on the matrixAnd (5) spherical harmonic transformation of the virtual source, namely obtaining a virtual source spherical harmonic coordinate matrix of M4 or 4*M. />

It will be appreciated that in various embodiments, the matrixMatrix->May be the same or different. However, even though the elements in the two matrices are not exactly the same, they are all built on user-centric spherical harmonic coordinates. Therefore, by the above spherical harmonic transformation, the virtual source and the secondary virtual source can be transformed into the same coordinate system. That is, the relative relationship between the virtual source and the secondary virtual source may be embodied by spherical harmonic coordinates.

Therefore, the spherical harmonic coordinates of the virtual source and the secondary virtual source are obtained through spherical harmonic coordinate transformation, and the effect of describing the virtual source through the secondary virtual source can be achieved. Meanwhile, as the object of the spherical harmonic transformation can be a coordinate set formed by virtual sources or secondary virtual sources, even if the virtual space comprises a plurality of virtual sources or secondary virtual sources, the electronic equipment can acquire the spherical harmonic coordinate matrixes of the virtual sources corresponding to all the virtual sources through four spherical harmonic transformations respectively corresponding to four dimensions. Similarly, the electronic device can acquire the secondary virtual source spherical harmonic coordinate matrixes corresponding to all the secondary virtual sources through four spherical harmonic transformations respectively corresponding to four dimensions. Therefore, the problem of significant increase in computation in a complex environment including a plurality of virtual sources and/or secondary virtual sources can be avoided by this scheme.

Furthermore, in other embodiments of the application, the electronic device may subdivide the reflected sound signal into an early reflected sound signal and a late reverberant sound signal during the rendering of the reflected sound. Different treatments are performed for different characteristics of early reflected sound and late reverberant sound. For example, late reverberant sounds are simply processed. The corresponding computational overhead is thereby reduced without affecting the rendering effect.

For example, as shown in fig. 8, before the sound signal emitted from the virtual source reaches the user, the reflected sound may include early reflected sound having a smaller number of reflections, in addition to the direct sound. The early reflected sound has a strong directivity. In addition, the reflected sound may include late reverberant sound with a large number of reflections. The late reverberant sound does not have significant directivity due to multiple reflections. The overall performance of the late reverberant sound may approximate the reverberant effect resulting from the linear superposition of any of the sound signals of the late reverberant sound.

Fig. 9 shows an illustration of the division of the sound emitted by a virtual source from a sound pressure angle. As shown in fig. 9, the sound signal emitted earliest by the sound pressure is the direct sound. The sound pressure of the sound signal gradually decreases with time. Between the reverberation times, the reflected sound has a larger sound pressure, and the directivity is also stronger, corresponding to the early reflected sound. After the reverberation time, the reflected sound has smaller sound pressure, basically no directivity, and corresponds to late reverberant sound.

In this example, the electronic device may perform corresponding rendering processing for early reflected sound and late reverberant sound, respectively. For example, for early reflected sound, the reflected sound rendering module of the electronic device may normally perform convolution calculation on the sound signal to obtain an early reflected sound rendering result capable of reflecting directionality of the early reflected sound. While for late reverberant sounds, the reflected sound rendering module of the electronic device may simplify the convolution operation. For example, the late reverberant sound signal is reduced to a linear superposition of sound signals at some instant. Based on the simplified late reverberant sound signal, performing convolution calculation on the dry signal corresponding to the virtual source to obtain a corresponding late reverberant sound rendering result. Therefore, the calculation force expenditure in the late reverberant sound rendering process can be saved, and the calculation efficiency is improved.

It should be understood that, in different implementations, the technical solution based on the spherical harmonic transformation provided in the foregoing embodiment and the technical solution provided in the foregoing embodiment for the separate rendering operations of the early reflected sound and the late reverberant sound may be selectively applied to one implementation, so that a corresponding effect may be obtained. In other embodiments, the spherical harmonic transformation and the differentiation processing of the reflected sound in different periods can be combined, so that the calculation cost is reduced and the calculation efficiency is improved while the better reflected sound rendering result is obtained.

In the following description, the reflected sound rendering is exemplified by the simultaneous use of spherical harmonic transformation and the differentiation processing of reflected sound at different times.

In this embodiment, the reflected sound rendering module in the electronic device may include a plurality of different units for implementing the above-mentioned spherical harmonic transformation and differentiation processing of different actual reflected sounds.

As a possible implementation, please refer to fig. 10, which is a schematic diagram of a reflected sound rendering module according to an embodiment of the present application.

As shown in fig. 10, in this example, the reflected sound rendering module may include a first spherical harmonic transformation unit, a second spherical harmonic transformation unit, a reflected sound dividing unit, and a convolution calculating unit.

The first spherical harmonic transformation unit is used for performing spherical harmonic transformation on coordinates of a plurality of virtual sources. In this example, the first spherical harmonic transformation unit may store therein a plurality of virtual sources and a first spherical harmonic transformation matrix corresponding to the positions of the users in the virtual spaceThrough the first spherical harmonic transformation matrix +.>The first spherical harmonic transformation unit may implement spherical harmonic transformation of a virtual source coordinate set including a plurality of virtual source coordinates, thereby obtaining a virtual source spherical harmonic coordinate matrix of n×4 or 4*N as an output.

The second spherical harmonic transformation unit is used for performing spherical harmonic transformation on coordinates of the plurality of secondary virtual sources. In this example, the second spherical harmonic transformation unit may store therein a plurality of secondary virtual sources and a second spherical harmonic transformation matrix corresponding to the positions of the users in the virtual spaceThrough the second spherical harmonic transformation matrix +.>The second spherical harmonic transformation unit may implement spherical harmonic transformation of a secondary virtual source coordinate set including a plurality of secondary virtual source coordinates, thereby obtaining a secondary virtual source spherical harmonic coordinate matrix of m×4 or 4*M as an output.

The reflected sound dividing unit is used for dividing early reflected sound and late reverberant sound in the reflected sound. In a specific implementation, the reflected sound dividing unit may determine the corresponding reverberation time according to signal statistics such as a sound pressure variation situation and/or a propagation time of sound based on the sound pressure distribution as shown in fig. 9. Thus, the sound signal before the reverberation time excluding the direct sound is the early reflected sound. Correspondingly, the sound signal after the reverberation time is the late reverberant sound.

In this example, the reflected sound dividing unit may further output an early reflected sound matrix and a late reverberant sound matrix in combination with the reverberation time according to the secondary virtual source spherical harmonic coordinate matrix output by the second spherical harmonic conversion unit.

Wherein, the early reflection sound matrix can reflect the distribution of each sound signal of the early reflection sound in the time domain. As an example, in the case where each secondary virtual source corresponds to an early reflected sound, then m×4 matrix elements may also be included in the early reflected sound matrix. For example, each secondary virtual source corresponds to a reflected sound matrix elementMay include w _vr 、x _vr 、y _vr 、z _vr The coordinates of the early reflected sound in the spherical harmonic domain in four dimensions.

The late reverberant matrix can embody the distribution situation of late reverberant. As an example, the late reverberant matrix can be identified by a w parameter that can embody a spatial offset case. For example, the electronic device may extract the w-parameter (e.g., denoted as w) of any secondary virtual source of late reverberant sound _vl ) And the late reverberant sound matrix can be obtained through adjustment of the adjustment parameter g corresponding to the secondary virtual source. That is, the late reverberant sound matrix can be g×w _vl 。

Thus, the reflected sound dividing unit can output an early reverberation matrix and a late reverberation matrix according to the input secondary virtual source spherical harmonic coordinate matrix.

The convolution computing unit may include at least one module having convolution computing capability. For example, a plurality of convolution calculation modules may be included in the convolution calculation unit for implementing parallel convolution calculations.

In some embodiments, the convolution calculation unit may be configured to obtain, through convolution calculation, an early reflected sound rendering result according to the virtual source spherical harmonic coordinate matrix output by the first spherical harmonic transformation unit and the early reflected sound matrix. It can be understood that the early reflection sound matrix can include a detailed matrix element with four-dimensional coordinates, so that based on convolution calculation of the early reflection sound matrix, the electronic device can obtain a detailed and accurate early reflection sound rendering result capable of reflecting directivity.

In other embodiments, the convolution calculating unit may be configured to obtain the late reverberant rendering result through convolution calculation according to the virtual source spherical harmonic coordinate matrix output by the first spherical harmonic transformation unit and the late reverberant matrix. It can be appreciated that the advanced reverberant sound matrix can include matrix elements of a simplified one-dimensional (w) coordinate, so that the electronic device can quickly acquire the advanced reverberant sound rendering result capable of representing the space property based on the convolution calculation of the advanced reverberant sound matrix.

In some implementations, the convolutionThe calculation unit can calculate the spherical harmonic coordinates of x, y, z, w four dimensions in the spherical harmonic coordinate matrix of the virtual source according to g×w _vl And convolution calculation is carried out to obtain a late reverberant sound rendering result.

In other implementations, the convolution calculation unit may calculate the convolution coordinates based on the spherical coordinates of at least one of the x, y, z, w four dimensions in the virtual source spherical coordinate matrix, and g×w _vl And convolution calculation is carried out to obtain a late reverberant sound rendering result.

For example, the convolution calculation unit may calculate the coordinate matrix according to the spherical coordinates of the w dimension in the virtual source spherical coordinates matrix, and g×w _vl And convolution calculation is carried out to obtain a late reverberant sound rendering result.

In other embodiments, the convolution calculating unit may be configured to convolutionally synthesize the early reflection sound rendering result and the late reverberant sound rendering result, so as to obtain the reflection sound rendering result in the current environment.

Therefore, the reflected sound rendering module in the electronic equipment can obtain an accurate early reflected sound rendering result capable of reflecting directivity by performing finer convolution calculation on early reflected sound. The reflected sound rendering module in the electronic equipment can also simplify convolution calculation of late reverberant sound, so that a rendering result of the late reverberant sound which can embody space can be rapidly obtained through lower calculation power cost.

In order to enable those skilled in the art to more clearly understand the implementation of the solution provided in the above embodiment, please refer to fig. 11, which is a schematic flow chart of yet another solution provided in the embodiment of the present application. The scheme can be used for realizing reflected sound rendering to obtain a reflected sound rendering result. In some embodiments, S1101-S1106 shown in fig. 11 may replace S503-S505 shown in fig. 5, so as to achieve fast and accurate acquisition of the reflected sound rendering result. As shown in fig. 11, the scheme may include:

s1101, for each virtual source, the first spherical harmonic transformation unit performs spherical harmonic transformation on the virtual source to obtain first-order spherical harmonic coordinates corresponding to the virtual source.

For example, the first spherical harmonic transformation unit of the electronic device may be provided withA first spherical harmonic transformation matrix corresponding to the virtual source coordinate set

In connection with the example of fig. 12, the virtual source coordinate set includes a first virtual source. The first spherical harmonic transformation unit can be used for transforming the matrix according to the first spherical harmonicThe first virtual source is subjected to spherical harmonic transformation, and spherical harmonic coordinates (w ₁ ，x ₁ ，y ₁ ，z ₁ )。

In some implementations, each dimension value of the spherical harmonic coordinates of the first virtual source may be output onto one output port of the first spherical harmonic transformation unit. Taking the example of the output ports of the first spherical harmonic transformation unit comprising ports 1-4. w (w) ₁ Can be output to port 1, x ₁ Can be output to port 2, y ₁ Can be output to port 3, z ₁ May be output to port 4.

Similarly, when other virtual sources are included in the virtual source coordinate set, the first spherical harmonic transformation unit may transform the matrix according to the first spherical harmonicAnd performing spherical harmonic transformation on the virtual source, and outputting corresponding spherical harmonic coordinates.

It will be appreciated that when a plurality of virtual sources are included in the virtual source coordinate set, then the values of one dimension corresponding to the spherical harmonic coordinates of each virtual source may be output at ports 1 to 4, respectively.

Thus, the position information of N virtual sources in the virtual source coordinate set can be normalized to a matrix of 4 1*N in spherical harmonic space.

S1102, aiming at each secondary virtual source, a second spherical harmonic transformation unit performs spherical harmonic transformation on the secondary virtual source to acquire first-order spherical harmonic coordinates corresponding to the secondary virtual source.

Exemplary, a second spherical harmonic transformation unit of the electronic device may be provided with a second spherical harmonic transformation unit corresponding to the secondary virtual source coordinate setSpherical harmonic transformation matrix

In connection with the example of fig. 12, the secondary virtual source coordinate set is exemplified as including a first secondary virtual source. The second spherical harmonic transformation unit can be based on the second spherical harmonic transformation matrixThe first secondary virtual source is subjected to spherical harmonic transformation, and spherical harmonic coordinates (w _v ，x _v ，y _v ，z _v )。

In some implementations, each dimension value of the spherical harmonic coordinates of the first secondary virtual source may be output onto one output port of the second spherical harmonic transformation unit, similar to the spherical harmonic processing of the virtual source. Taking the example of the output port of the second spherical harmonic transformation unit comprising port 5-port 8. w (w) _v Can be output to port 5, x _v Can be output to port 6, y _v Can be output to the port 7,z _v May be output to port 8.

Similarly, when the secondary virtual source coordinate set includes other secondary virtual sources, the second spherical harmonic transformation unit may be configured to transform the matrix according to a second spherical harmonic transformation matrixAnd performing spherical harmonic transformation on the secondary virtual source and outputting corresponding spherical harmonic coordinates.

It will be appreciated that when a plurality of secondary virtual sources are included in the secondary virtual source coordinate set, then the values of one dimension corresponding to the spherical harmonic coordinates of each secondary virtual source may be output at ports 5 to 8, respectively.

Thus, the location information of M secondary virtual sources in the secondary virtual source coordinate set can be normalized to a matrix of 4 1*M in spherical harmonic space.

In this way, S1101-S1102, the position information of the virtual source and the secondary virtual source are converted into the spherical harmonic space through spherical harmonic transformation, so that the correspondence of the virtual source and the secondary virtual source under the same coordinate system is realized.

S1103, for each secondary virtual source, the reflected sound dividing unit obtains the early reflected sound coordinates and the late reverberant sound coordinates corresponding to the secondary virtual source according to the spherical harmonic coordinates of the secondary virtual source.

For example, the reflected sound dividing unit may determine the reverberation time according to a preset analysis method according to the signal statistical characteristics.

According to the reverberation time, the reflected sound dividing unit may divide the secondary virtual source spherical harmonic coordinate matrix acquired in S1102, thereby acquiring an early reflected sound coordinate matrix and a late reverberant sound coordinate matrix.

As an example, in connection with fig. 12, for spherical harmonic coordinates (w _v ，x _v ，y _v ，z _v ) According to the reverberation time division, the reflected sound dividing unit can obtain the corresponding first early reflected sound coordinate as (w _vr ，x _vr ，y _vr ，z _vr ). The reflected sound dividing unit can simplify the late reverberant sound of the first secondary virtual source into g x w _vl . Wherein g is an adjustment parameter. For example, the g may be set to 4, thereby keeping the corresponding sound signal intensity of the first late reverberant sound coordinates modulated by the tuning parameter g consistent with the 4-channel signal before simplification.

In the above example, a secondary virtual source (e.g., a first secondary virtual source) is illustrated. It should be appreciated that where a plurality of secondary virtual sources are included in the secondary virtual source coordinate set, then the other secondary virtual sources may be handled in a manner similar to that described above for the first secondary virtual source.

Thus, after the M secondary virtual sources are divided, the early reflected sound matrix and the late reverberant sound matrix corresponding to the M secondary virtual sources can be obtained.

The early reflected sound matrix may include 1*4 dimensional coordinates of an early reflected sound corresponding to each of the M secondary virtual sources. Then the early reflected sound matrix may comprise M x 4 elements.

The late reverberant matrix can include 1-dimensional (e.g., g×w) of one late reverberant sound corresponding to each of the M secondary virtual sources _vl ) Coordinates.Then the late reverberant sound matrix can include M1 elements.

S1104, the convolution calculation unit obtains an early reflected sound rendering result according to the first-order spherical harmonic coordinates corresponding to the virtual sources and the early reflected sound coordinates corresponding to each secondary virtual source.

Wherein the early reflected sound coordinates corresponding to each virtual source can form the early reflected sound matrix.

In this example, the convolution calculation unit may respectively convolve one dimension of the spherical harmonic coordinates of the virtual source with an element of a corresponding dimension in the early reflection acoustic matrix, that is, may calculate and obtain an early reflection acoustic rendering result corresponding to each dimension.

Illustratively, in connection with FIG. 12, taking the w dimension as an example, the convolution calculation unit may take w ₁ And w _vr And carrying out convolution to obtain early reflection acoustic convolution results corresponding to the first secondary virtual source and the first virtual source under the w dimension. Then, for other secondary virtual sources as well as virtual sources, the convolution calculation unit may perform similar operations, thereby obtaining a plurality of early reflected acoustic convolution results in the w dimension. In specific implementation, the convolution process of the multiple virtual sources and the secondary virtual sources can be completed by performing one convolution calculation on a 1*N matrix including N elements corresponding to the N virtual sources in the w dimension and a 1*M matrix including M elements corresponding to the M secondary virtual sources in the w dimension. The result matrix obtained by convolution calculation of the two matrices also comprises early reflection sound rendering results of all N virtual sources and M secondary virtual sources in w dimension.

For other dimensions, such as an x-dimension, a y-dimension, and a z-dimension, the convolution calculation unit may perform a calculation method similar to that in the w-dimension described above, thereby obtaining early reflected sound rendering results corresponding to the x-dimension, the y-dimension, and the z-dimension, respectively.

S1105, the convolution calculation unit obtains a late reverberant sound rendering result according to the first-order spherical harmonic coordinates corresponding to the virtual sources and the late reverberant sound coordinates corresponding to each virtual source.

The late reverberant sound coordinates corresponding to each virtual source may form the late reverberant sound matrix. In combination with the foregoing description, reflected sound is generatedIn the dividing process, the late reverberant sound can be simplified, for example, the late reverberant sound corresponding to the first secondary virtual source is simplified into g×w _vl . Then, for a virtual environment provided with M secondary virtual sources, the more late reverberant matrix can include the late reverberant coordinates of one w dimension for each of the M secondary virtual sources.

In some embodiments, the convolution calculating unit may perform convolution to obtain the late reverberant matrix according to the coordinates of the x, y and z dimensions corresponding to the virtual source and the late reverberant matrix. It will be appreciated that each dimension corresponds to one matrix convolution calculation, and then a complete late reverberant sound matrix can be obtained by 3 convolution calculations.

In other embodiments, the convolution calculating unit may perform convolution to obtain the late reverberant matrix according to the coordinates of the w dimension corresponding to the virtual source and the late reverberant matrix. It will be appreciated that each dimension corresponds to one matrix convolution calculation, and then a complete late reverberant matrix can be obtained by only 1 convolution calculation.

In this example, the late reverberant matrix can also be referred to as a late reverberant rendering result.

And S1106, obtaining a reflected sound rendering result according to the early reflected sound rendering result and the late reflected sound rendering result.

Through the above-described S1104 to S1105, the separate rendering of the early reflected sound and the late reflected sound can be completed.

For example, as illustrated in S1104, the convolution calculation unit may calculate four early reflected sound rendering results that acquire x, y, z, and w dimensions. The convolution calculating unit may also calculate the acquired late reverberant sound rendering result.

In some embodiments, the convolution computing unit may convolutionally synthesize four early reflection sound rendering results in x, y, z, and w dimensions, thereby obtaining a complete early reflection rendering result. The convolution calculation unit may further perform convolution synthesis on the early reflection rendering result and the late reverberant sound rendering result, so as to obtain a complete reflection sound rendering result.

In other embodiments, the convolution calculating unit may directly perform convolution synthesis on the four early reflection sound rendering results and the late reflection sound rendering result in the x, y, z and w dimensions, so as to obtain a complete reflection sound rendering result.

It will be appreciated that when the reflected sound rendering is performed by the scheme shown in fig. 11, the electronic device may perform rendering operations on the early reflected sound and the late reflected sound, respectively. And combining the coordinate matrix obtained by the spherical harmonic transformation, and performing matrix calculation with a very small number to complete the reflected sound rendering operation under the complex environment comprising a plurality of virtual sources and secondary virtual sources. In addition, since the advanced reverberant sound is simplified, the rendering overhead of the advanced reverberant sound can be remarkably saved.

After the reflected sound rendering result is obtained through the scheme shown in fig. 11, in combination with the scheme shown in fig. 5, the electronic device may synthesize the reflected sound rendering result and the direct sound rendering result, so as to obtain a complete spatial audio rendering result.

It should be understood that the division of units or modules (hereinafter referred to as units) in the above apparatus is merely a division of logic functions, and may be fully or partially integrated into one physical entity or may be physically separated. And the units in the device can be all realized in the form of software calls through the processing element; or can be realized in hardware; it is also possible that part of the units are implemented in the form of software, which is called by the processing element, and part of the units are implemented in the form of hardware.

For example, each unit may be a processing element that is set up separately, may be implemented as integrated in a certain chip of the apparatus, or may be stored in a memory in the form of a program, and the functions of the unit may be called and executed by a certain processing element of the apparatus. Furthermore, all or part of these units may be integrated together or may be implemented independently. The processing element described herein, which may also be referred to as a processor, may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each unit above may be implemented by an integrated logic circuit of hardware in a processor element or in the form of software called by a processing element.

In one example, the units in the above apparatus may be one or more integrated circuits configured to implement the above method, for example: one or more ASICs, or one or more DSPs, or one or more FPGAs, or a combination of at least two of these integrated circuit forms.

For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as CPUs or other processors that may invoke programs. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In one implementation, the above means for implementing each corresponding step in the above method may be implemented in the form of a processing element scheduler. For example, the apparatus may comprise a processing element and a storage element, the processing element invoking a program stored in the storage element to perform the method described in the above method embodiments. The memory element may be a memory element on the same chip as the processing element, i.e. an on-chip memory element.

In another implementation, the program for performing the above method may be on a memory element on a different chip than the processing element, i.e. an off-chip memory element. At this point, the processing element invokes or loads a program from the off-chip storage element onto the on-chip storage element to invoke and execute the method described in the method embodiments above.

For example, please refer to fig. 13, which is a schematic diagram illustrating a composition of an electronic device according to an embodiment of the present application. As shown in fig. 13, the electronic device 1300 may include: a processor 1301, and a memory 1302. The memory 1302 is used to store computer-executable instructions. For example, in some embodiments, the processor 1301, when executing instructions stored in the memory 1302, may cause the electronic device 1300 to perform the technical solutions provided in the above embodiments.

It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

In yet another implementation, the unit implementing each step in the above method may be configured as one or more processing elements, where the processing elements may be disposed on the electronic device corresponding to the above, and the processing elements may be integrated circuits, for example: one or more ASICs, or one or more DSPs, or one or more FPGAs, or a combination of these types of integrated circuits. These integrated circuits may be integrated together to form a chip.

Illustratively, fig. 14 shows a schematic diagram of the composition of a chip system 1400. The chip system 1400 may include: a processor 1401 and a communication interface 1402 to support the relevant devices to implement the functions referred to in the above embodiments. In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data for the terminal. The chip system can be composed of chips, and can also comprise chips and other discrete devices. It should be noted that, in some implementations of the present application, the communication interface 1402 may also be referred to as an interface circuit.

The embodiment of the application also provides a computer program product, which comprises the electronic equipment, and the computer instructions for the electronic equipment to operate.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product, such as: and (5) program. The software product is stored in a program product, such as a computer readable storage medium, comprising instructions for causing a device (which may be a single-chip microcomputer, chip or the like) or processor (processor) to perform all or part of the steps of the methods described in the various embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

For example, embodiments of the present application may also provide a computer readable storage medium having computer program instructions stored thereon. The computer program instructions, when executed by an electronic device, cause the electronic device to implement the audio signal processing method as described in the foregoing method embodiments.

The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An audio signal processing method, characterized by being applied to an electronic device, the electronic device having a function of simulating spatial audio in a virtual space, the virtual space including at least one virtual source therein, the method comprising:

setting at least two secondary virtual sources in the virtual space, wherein the at least two secondary virtual sources are used for identifying at least one virtual source included in the virtual space;

determining reverberation times of the at least two secondary virtual sources;

determining an early reflected sound matrix from the sound signals of the secondary virtual source before the reverberation time;

Determining a late reverberant sound matrix according to the sound signals of the secondary virtual source after the reverberation time;

determining a first early reflected sound rendering result according to a first virtual source and the early reflected sound matrix, wherein the first virtual source is included in the at least one virtual source;

determining a first late reverberant sound rendering result according to the first virtual source and the late reverberant sound matrix;

and acquiring a first reflected sound rendering result corresponding to the first virtual source according to the first early reflected sound rendering result and the first late reverberant sound rendering result.

2. The method according to claim 1, wherein the method further comprises:

measuring a first head related transfer function HRTF of the first virtual source;

and acquiring a first direct sound rendering result corresponding to the first virtual source according to the first Head Related Transfer Function (HRTF).

3. The method according to claim 2, wherein the method further comprises:

and determining a spatial audio rendering result of the first virtual source according to the first reflected sound rendering result and the first direct sound rendering result.

4. A method according to any of claims 1-3, wherein prior to said determining, by said at least two secondary virtual sources, a first early reflected sound rendering result for a first virtual source and a first late reflected sound rendering result for said first virtual source, the method further comprises:

Performing first spherical harmonic transformation on the first virtual source to obtain a first spherical harmonic coordinate corresponding to the first virtual source;

and performing second spherical harmonic transformation on the at least two secondary virtual sources to obtain a secondary virtual source spherical harmonic coordinate matrix corresponding to the at least two secondary virtual sources.

5. The method of claim 4, wherein the first spherical harmonic coordinates comprise coordinate data of four dimensions x, y, z, w corresponding to the first virtual source;

the secondary virtual source spherical harmonic coordinate matrix comprises x, y, z, w four-dimensional coordinate data corresponding to the at least two secondary virtual sources.

6. The method of claim 5, wherein the determining the late reverberant matrix from the sound signals of the secondary virtual source after the reverberation time includes:

and determining the late reverberant sound matrix according to a set of w coordinates in the sound signal after the reverberation time and a preset adjusting parameter g after the second spherical harmonic transformation is carried out on the at least two secondary virtual sources.

7. The method of claim 6, wherein the determining the first late reverberant rendering result based on the first virtual source and the late reverberant matrix includes:

And determining the first late reverberant sound rendering result according to the w coordinate in the first spherical harmonic coordinates and the late reverberant sound matrix.

8. A method according to any of claims 1-3 or any of claims 5-6, wherein the virtual space comprises at least one virtual source and further comprises a second virtual source, the method further comprising:

determining, by the at least two secondary virtual sources, a second early reverberant sound rendering result for the second virtual source and a second late reverberant sound rendering result for the second virtual source;

acquiring a second reflected sound rendering result corresponding to the second virtual source according to the second early reflected sound rendering result and the second late reverberant sound rendering result;

measuring a second head related transfer function HRTF of the second virtual source;

and acquiring a second direct sound rendering result corresponding to the second virtual source according to the second Head Related Transfer Function (HRTF).

9. The method of claim 8, wherein the method further comprises:

synthesizing a first direct sound rendering result of the first virtual source, and acquiring a spatial audio rendering result corresponding to the first virtual source according to the first direct sound rendering result and the first reflected sound rendering result;

And synthesizing a second direct sound rendering result of the second virtual source, and obtaining a spatial audio rendering result corresponding to the second virtual source according to the second direct sound rendering result and the second reflected sound rendering result.

10. An audio signal processing apparatus, comprising: a direct sound rendering module for performing a rendering operation of direct sound of at least one virtual source in a virtual space according to the method of claim 2 or 3 or 8 or 9, and a reflected sound rendering module for performing a rendering operation of reflected sound of at least one virtual source in a virtual space according to the method of any one of claims 1 to 9.

11. The apparatus of claim 10, wherein the reflected sound rendering module comprises:

the first spherical harmonic transformation unit is used for performing first spherical harmonic transformation on the at least one virtual source;

the second spherical harmonic transformation unit is used for performing second spherical harmonic transformation on at least one secondary virtual source;

and the reflected sound sub-conversion unit is used for determining the reverberation time and acquiring an early reflected sound matrix and a late reverberant sound matrix according to the reverberation time.

12. An electronic device, characterized in that the electronic device comprises an audio signal processing apparatus as claimed in claim 10 or 11.

13. An electronic device comprising a processor and a memory for storing instructions executable by the processor, the processor being configured to cause the electronic device to implement the method of any one of claims 1-9 when the instructions are executed.