CN116709159A - Audio processing method and terminal equipment - Google Patents

Audio processing method and terminal equipment Download PDF

Info

Publication number
CN116709159A
CN116709159A CN202211214411.1A CN202211214411A CN116709159A CN 116709159 A CN116709159 A CN 116709159A CN 202211214411 A CN202211214411 A CN 202211214411A CN 116709159 A CN116709159 A CN 116709159A
Authority
CN
China
Prior art keywords
audio signal
head motion
motion data
target
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211214411.1A
Other languages
Chinese (zh)
Other versions
CN116709159B (en
Inventor
湛彻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202211214411.1A priority Critical patent/CN116709159B/en
Publication of CN116709159A publication Critical patent/CN116709159A/en
Application granted granted Critical
Publication of CN116709159B publication Critical patent/CN116709159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices
    • G10K11/17853Methods, e.g. algorithms; Devices of the filter
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • G10K11/20Reflecting arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/033Headphones for stereophonic communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/307Frequency adjustment, e.g. tone control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The embodiment of the application provides an audio processing method and terminal equipment, which are applied to the technical field of terminals. The method comprises the steps of obtaining the residual electric quantity of the earphone device and the available main frequency of the terminal device; if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is larger than the preset electric quantity, processing the first audio signal by adopting any one of a direct sound part and an early reflected sound part in an FIR filter, an IIR filter or a BRIR filter; and if the available main frequency is greater than the first preset frequency and the residual electric quantity is less than or equal to the preset electric quantity, processing the first audio signal by adopting a BRIR filter corresponding to the second head motion data or the predicted first predicted head motion data. Therefore, when the available main frequency of the terminal equipment is insufficient, the calculated amount of the space audio rendering can be reduced, and when the residual electric quantity of the earphone equipment is insufficient, the loss of head movement data can be made up, so that the audio listening experience of a user is improved.

Description

Audio processing method and terminal equipment
Technical Field
The present application relates to the field of terminal technologies, and in particular, to an audio processing method and a terminal device.
Background
With the rapid development of terminal devices, users have also put higher and higher demands on audio experience. For example, when the terminal device and the earphone device are in communication connection, and the user wears the earphone device to listen to the audio, in order to improve the sense of reality and the sense of reality of the sound heard by the user, the first audio signal may be spatially and audio rendered, and the target audio signal obtained after the spatially and audio rendering is played by adopting the earphone device.
Currently, a terminal device may use a binaural room impulse response (binaural room impulse response, BRIR) filter to perform spatial audio rendering on a first audio signal, so that a target audio signal obtained after rendering may generate a better spatial effect when played through an earphone device.
However, in the process that the user wears the earphone device to listen to the audio, the available main frequency of the terminal device or the residual electric quantity of the earphone device may be insufficient, so that the audio listening experience of the user is affected.
Disclosure of Invention
The embodiment of the application provides an audio processing method and terminal equipment, wherein when the available main frequency of the terminal equipment is insufficient, a target filter is adopted for rendering so as to reduce the calculated amount of spatial audio rendering, and when the residual electric quantity of earphone equipment is insufficient, head motion compensation data are adopted to compensate the loss of head motion data, so that the audio listening experience of a user is improved.
In a first aspect, an embodiment of the present application provides an audio processing method, which is applied to a terminal device, where the terminal device is connected with an earphone device in a communication manner, and the terminal device includes a camera; the method comprises the following steps: the terminal equipment acquires the residual electric quantity of the earphone equipment and the available main frequency of the terminal equipment; if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is larger than the preset electric quantity, the terminal equipment adopts a first audio processing mode to process the first audio signal to obtain a target audio signal; if the available main frequency is larger than the first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, the terminal equipment adopts a second audio processing mode to process the first audio signal to obtain a target audio signal; the terminal device transmits the target audio signal to the headphone device. The first audio processing mode is used for indicating a target filter corresponding to first head motion data issued by the earphone device to render the first audio signal, and the target filter comprises any one of the following components: a finite impulse response (finite impulse response, FIR) filter, an infinite impulse response (infinite impulse response, IIR) filter, or a direct sound portion and an early reflected sound portion in a BRIR filter; the second audio processing mode is used for indicating that when the first head motion data issued by the earphone device does not exist, the first audio signal is rendered by a BRIR filter corresponding to the head motion compensation data, and the head motion compensation data comprises: second head motion data generated based on a user image captured by the camera, or first predicted head motion data predicted based on historical head motion data.
In this way, when the available main frequency of the terminal equipment is insufficient, the terminal equipment can adopt the target filter to render the first audio signal, so that the calculated amount of the spatial audio rendering is reduced, the calculation speed of the spatial audio rendering is increased, the terminal equipment can render in real time to obtain the target audio signal, the earphone equipment can play the target audio signal in real time, and the audio listening experience of a user is improved; when the residual electric quantity of the earphone device is insufficient, the terminal device can acquire the head motion compensation data to compensate for the deficiency of the first head motion data issued by the earphone device, so that the hysteresis of a target audio signal rendered by the terminal device is reduced, the hysteresis of the earphone device when the target audio signal is played is reduced, and the audio listening experience of a user is improved.
In one possible implementation manner, the terminal device processes the first audio signal in a first audio processing mode to obtain a target audio signal, including: the terminal equipment acquires first head motion data issued by the earphone equipment; the terminal equipment acquires a target filter corresponding to the first head motion data; the terminal equipment adopts a target filter corresponding to the first head motion data to render the first audio signal to obtain a second audio signal; and the terminal equipment performs down-mixing processing on the second audio signal to obtain a target audio signal. In this way, the steps of the FIR filter and the IIR filter are shorter than those of the BRIR filter, and the direct sound part and the early reflected sound part in the BRIR filter are part of the complete BRIR filter, so that the complexity of the spatial audio rendering can be simplified and the calculated amount of the spatial audio rendering can be reduced when the target filter is adopted for rendering the first audio signal; and in the first audio processing mode, the head motion data capturing module and the head motion data predicting and compensating module can be turned off to further reduce the calculation amount of the spatial audio rendering.
In one possible implementation manner, the terminal device acquires a target filter corresponding to the first head motion data, including: the terminal equipment acquires a BRIR filter corresponding to the first head motion data from a BRIR database; the terminal equipment intercepts a direct sound part and an early reflected sound part in a BRIR filter corresponding to the first head motion data; or the terminal equipment acquires an FIR filter corresponding to the first head motion data from the FIR database; or the terminal equipment acquires an IIR filter corresponding to the first head motion data from the IIR database. In this way, the direct sound part and the early reflected sound part in the BRIR filter, the FIR filter, and the way of obtaining the IIR filter are provided.
In a possible implementation manner, the terminal device renders the first audio signal by using a target filter corresponding to the first head motion data to obtain a second audio signal, including: if the first audio signal comprises at least three channels, the terminal equipment performs down-mixing processing on the first audio signal; the terminal equipment adopts a target filter corresponding to the first head motion data to render the first audio signal after the down-mixing processing to obtain a second audio signal; or if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device directly adopts a target filter corresponding to the first head motion data to render the first audio signal, so as to obtain a second audio signal. In this way, in the first audio processing mode, the number of channels of the first audio signal convolved with the target filter is made smaller, thereby further reducing the calculation amount of spatial audio rendering.
In one possible implementation manner, the terminal device processes the first audio signal in the second audio processing mode to obtain a target audio signal, including: the terminal equipment determines whether first head motion data issued by the earphone equipment exist or not; if the first head motion data does not exist, the terminal equipment acquires the head motion compensation data; the terminal equipment acquires a BRIR filter corresponding to the head motion compensation data from a BRIR database; the terminal equipment adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal to obtain a third audio signal; and the terminal equipment performs down-mixing processing on the third audio signal to obtain a target audio signal. Wherein, in the first audio processing mode, the earphone device transmits first head movement data at a first transmission frequency; in the second audio processing mode, the headset device transmits the first head movement data at a second transmission frequency, and the second transmission frequency is less than the first transmission frequency. In this way, when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient, the earphone device actively reduces the capturing frequency of the first head motion data, so that the first head motion data on which the terminal device depends when performing spatial audio rendering may be missing. Therefore, the loss of the first head motion data issued by the earphone device is compensated by acquiring the head motion compensation data, the hysteresis of the target audio signal rendered by the terminal device is reduced, the hysteresis of the earphone device when playing the target audio signal is reduced, and the audio listening experience of a user is improved.
In one possible implementation, if the first head motion data does not exist, the terminal device acquires head motion compensation data, including: if the first head motion data does not exist, the terminal equipment determines whether second head motion data exists or not; if the second head motion data exists, the terminal equipment determines the second head motion data as head motion compensation data; if the second head movement data does not exist, the terminal equipment predicts and obtains first prediction head movement data based on the historical head movement data acquired in the previous N times. Wherein N is a positive integer; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera. In this way, the second head movement data or the predicted first predicted head movement data may be used as head movement compensation data to compensate for the lack of the first head movement data issued by the headphone apparatus.
In one possible implementation manner, the terminal device predicts, based on the previous N times of acquired historical head motion data, first predicted head motion data, including: the terminal equipment determines a first time difference between the current time and the time when the historical head movement data is acquired last time, wherein the historical head movement data acquired last time comprises a historical rotation angle and a historical rotation angular speed; the terminal equipment sums the product of the historical rotation angular velocity and the first time difference with the historical rotation angle to obtain first prediction head movement data. Thus, the first predicted head motion data can be predicted only by the previous acquired historical head motion data, and the prediction mode is simpler.
In one possible implementation manner, the terminal device predicts, based on the previous N times of acquired historical head motion data, first predicted head motion data, including: the terminal equipment determines the difference between the twice first historical moment data and the second historical moment data as first predicted head movement data; the first historical moment data are the historical head movement data acquired in the previous time, and the second historical moment data are the historical head movement data acquired in the previous time. Thus, the first predicted head movement data can be predicted by the historical head movement data acquired in the previous two times, and the prediction mode is simpler.
In one possible implementation manner, the terminal device predicts, based on the previous N times of acquired historical head motion data, first predicted head motion data, including: the terminal equipment performs weighted summation on the historical head motion data acquired for the previous N times to obtain first prediction head motion data; wherein N is a positive integer greater than 1, and the time difference between the time at which the historical head movement data is acquired each time and the current time is inversely proportional to the weight corresponding to the historical head movement data. In this way, since the correlation between the historical head motion data and the first predicted head motion data is smaller as the time difference between the time at which the historical head motion data is acquired and the current time is larger each time, the accuracy of the predicted first predicted head motion data can be improved in such a manner that the time difference between the time at which the historical head motion data is acquired and the current time is inversely proportional to the weight corresponding to the historical head motion data.
In one possible implementation manner, the terminal device predicts, based on the previous N times of acquired historical head motion data, first predicted head motion data, including: the terminal equipment adopts an artificial intelligence (artificial intelligence, AI) model to process the historical head motion data acquired for the previous N times, and predicts to obtain first predicted head motion data; the AI model is trained based on a plurality of sample head motion data. Thus, a manner of implementing the first predicted head motion data based on the AI model is provided, which may improve the accuracy of the predicted first predicted head motion data.
In one possible implementation manner, the terminal device renders the first audio signal by using a BRIR filter corresponding to the head motion compensation data to obtain a third audio signal, including: if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device performs upmixing processing on the first audio signal; the terminal equipment adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal after the upmixing processing to obtain a third audio signal; or if the first audio signal comprises at least three channels, the terminal equipment directly adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal, so as to obtain a third audio signal. In this way, in the second audio processing mode, the number of channels of the first audio signal convolved with the BRIR filter is made to be larger, so that the spatial audio rendering effect is further improved.
In one possible implementation manner, after the terminal device obtains the remaining power of the earphone device and the available main frequency of the terminal device, the method further includes: if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, the terminal equipment adopts a third audio processing mode to process the first audio signal, and a target audio signal is obtained; the third audio processing mode is used for indicating a target filter corresponding to preset head motion data to render the first audio signal; the target filter includes any one of the following: FIR filters, IIR filters, or direct sound and early reflected sound portions in BRIR filters. The preset head motion data are not issued by the earphone device, are not captured by the head motion data capturing module, and are not predicted by the head motion data prediction and compensation module. In this way, the first audio signal is rendered based on any one of the FIR filter, the IIR filter or the direct sound part and the early reflected sound part in the BRIR filter corresponding to the preset head motion data, so that the complexity of the spatial audio rendering is simplified, the calculated amount of the spatial audio rendering is reduced, and a certain spatial audio effect can be maintained; in addition, in the first audio processing mode, the head motion data capturing module and the head motion data prediction and compensation module can be closed so as to further reduce the calculated amount of the spatial audio rendering; in addition, the earphone device is not required to capture and issue the first head movement data, so that the power consumption of the earphone device is reduced.
In one possible implementation manner, the terminal device processes the first audio signal in a third audio processing mode to obtain a target audio signal, including: the terminal equipment acquires a target filter corresponding to preset head motion data; the terminal equipment adopts a target filter corresponding to preset head motion data to render the first audio signal to obtain a fourth audio signal; and the terminal equipment performs down-mixing processing on the fourth audio signal to obtain a target audio signal.
In one possible implementation manner, the terminal device renders the first audio signal by using a target filter corresponding to preset head motion data to obtain a fourth audio signal, including: if the first audio signal comprises at least three channels, the terminal equipment performs down-mixing processing on the first audio signal; the terminal equipment adopts a target filter corresponding to preset head motion data to render the first audio signal after the downmixing processing to obtain a fourth audio signal; or if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device directly adopts a target filter corresponding to the preset head motion data to render the first audio signal, so as to obtain a fourth audio signal. In this way, in the third audio processing mode, the number of channels of the first audio signal convolved with the target filter is made smaller, thereby further reducing the calculation amount of spatial audio rendering.
In one possible implementation, if the available dominant frequency is less than or equal to a first preset frequency and greater than a second preset frequency, the target filter is a direct sound portion and an early reflected sound portion in the BRIR filter, the first preset frequency being greater than the second preset frequency; if the available main frequency is smaller than or equal to the second preset frequency and larger than the third preset frequency, the target filter is an FIR filter, and the second preset frequency is larger than the third preset frequency; if the available dominant frequency is less than or equal to the third preset frequency, the target filter is an IIR filter. In this way, the available dominant frequencies of the terminal device may be further subdivided to determine a more suitable target filter.
In one possible implementation manner, after the terminal device obtains the remaining power of the earphone device and the available main frequency of the terminal device, the method further includes: if the available main frequency is larger than the first preset frequency and the residual electric quantity is larger than the residual electric quantity, the terminal equipment adopts a fourth audio processing mode to process the first audio signal, and a target audio signal is obtained. The fourth audio processing mode is used for indicating a BRIR filter corresponding to the second prediction head motion data to render the first audio signal; the second predicted head movement data is predicted based on the target head movement data, or the second predicted head movement data is predicted based on the target head movement data and the historical head movement data; the target head movement data comprise first head movement data issued by the earphone device and/or second head movement data generated by user images acquired by the camera; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera. In this way, when the available main frequency of the terminal device and the residual capacity of the earphone device are sufficient, the terminal device can adopt the fourth audio processing mode to perform spatial audio rendering so as to improve the rendering effect of the spatial audio, thereby improving the playing effect of the finally obtained target audio signal when played through the earphone device so as to improve the audio listening experience of the user.
In one possible implementation manner, the terminal device processes the first audio signal in a fourth audio processing mode to obtain a target audio signal, including: the terminal equipment predicts and obtains second predicted head motion data according to the target head motion data; or the terminal equipment predicts to obtain second predicted head movement data according to the target head movement data and the historical head movement data; the terminal equipment acquires a BRIR filter corresponding to the second predicted head motion data from a BRIR database; the terminal equipment adopts a BRIR filter corresponding to the second prediction head motion data to render the first audio signal to obtain a fifth audio signal; and the terminal equipment performs down-mixing processing on the fifth audio signal to obtain a target audio signal. In this way, a corresponding BRIR filter may be selected based on the predicted second predicted head motion data to spatially audio render the first audio signal to reduce the lag of head tracking due to spatial audio link delay.
In one possible implementation, the target head motion data includes a target rotation angle and a target rotation angular velocity; the terminal device predicts and obtains second predicted head motion data according to the target head motion data, and the second predicted head motion data comprises: the terminal equipment determines a second time difference between the current time and the time played by the earphone equipment; and the terminal equipment sums the product of the target rotation angular velocity and the second time difference with the target rotation angle to obtain second predicted head movement data. Therefore, the second predicted head motion data can be predicted by only the target head motion data acquired at the current moment, and the prediction mode is simpler.
In one possible implementation, the terminal device predicts the second predicted head motion data according to the target head motion data and the historical head motion data, including: the terminal device determines a difference between the twice target head motion data and the previously acquired historical head motion data as second predicted head motion data. Thus, the second predicted head movement data can be predicted by the target head movement data acquired at the current moment and the historical head movement data acquired at the previous time, and the prediction mode is simpler.
In one possible implementation, the terminal device predicts the second predicted head motion data according to the target head motion data and the historical head motion data, including: and the terminal equipment processes the target head movement data and the historical head movement data acquired in the previous M times by adopting a weighted moving average method to obtain second predicted head movement data. Wherein M is a positive integer; and, the weight corresponding to the target head movement data is greater than the weight corresponding to the history head movement data obtained each time, and the time difference between the moment of obtaining the history head movement data each time and the current moment is inversely proportional to the weight corresponding to the history head movement data. In this way, the weight corresponding to the target head motion data acquired at the current moment is set larger, so that the accuracy of the second predicted head motion data obtained by prediction can be improved.
In one possible implementation, the terminal device predicts the second predicted head motion data according to the target head motion data and the historical head motion data, including: the terminal equipment adopts an AI model to process the target head motion data and the historical head motion data acquired in the previous M times to obtain second predicted head motion data; m is a positive integer, and the AI model is trained based on a plurality of sample head motion data. Thus, a manner of implementing the second predicted head motion data based on the AI model is provided, which may improve the accuracy of the predicted second predicted head motion data.
In a possible implementation manner, the terminal device renders the first audio signal by adopting a BRIR filter corresponding to the second predicted head motion data to obtain a fifth audio signal, including: if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device performs upmixing processing on the first audio signal; the terminal equipment adopts a BRIR filter corresponding to the second prediction head motion data to render the first audio signal after upmixing processing to obtain a fifth audio signal; or if the first audio signal comprises at least three channels, the terminal equipment directly adopts a BRIR filter corresponding to the second prediction head motion data to render the first audio signal, so as to obtain a fifth audio signal. In this way, in the fourth audio processing mode, the number of channels of the first audio signal convolved with the BRIR filter is made to be larger, so that the spatial audio rendering effect is further improved.
In a second aspect, an embodiment of the present application proposes a terminal device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to invoke the computer program to execute the above-mentioned frequency processing method.
In a third aspect, an embodiment of the present application proposes a computer readable storage medium, in which a computer program or instructions are stored, which when executed, implement the above-mentioned audio processing method.
In a fourth aspect, an embodiment of the present application proposes a computer program product comprising a computer program, which when executed, causes a computer to perform the above-mentioned audio processing method.
The effects of each possible implementation manner of the second aspect to the fourth aspect are similar to those of the first aspect and the possible designs of the first aspect, and are not described herein.
Drawings
Fig. 1 is a schematic view of an application scenario of an audio processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a hardware system structure of a terminal device according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hardware system of an earphone device according to an embodiment of the present application;
fig. 4 is a schematic software system structure of a terminal device according to an embodiment of the present application;
Fig. 5 is a schematic flow chart of an audio processing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a decision of a target audio processing mode according to an embodiment of the present application;
fig. 7 is an audio processing flow chart of a first audio signal for mono or bi-channel when the available main frequency of the terminal device and the remaining power of the earphone device are both sufficient in the embodiment of the present application;
fig. 8 is an audio processing flowchart of a first audio signal for at least three channels when the available main frequency of the terminal device and the remaining power of the earphone device are both sufficient in the embodiment of the present application;
FIG. 9 is a schematic diagram of a spatial audio link delay in an embodiment of the present application;
fig. 10 is a flowchart of predicting head movement data when the available main frequency of the terminal device and the remaining power of the earphone device are both sufficient in the embodiment of the present application;
FIG. 11 is a schematic diagram of performing a downmix process on a rendered audio signal according to an embodiment of the present application;
fig. 12 is an audio processing flow chart of a first audio signal for mono or bi-channel when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient in the embodiment of the present application;
fig. 13 is an audio processing flow chart of a first audio signal for at least three channels when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient in the embodiment of the present application;
Fig. 14 is a flowchart of compensating head movement data when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient in the embodiment of the present application;
fig. 15 is an audio processing flow chart of a first audio signal for mono or bi channel when the available main frequency of the terminal device is insufficient and the remaining power of the earphone device is sufficient in the embodiment of the present application;
fig. 16 is an audio processing flow chart of a first audio signal for at least three channels when the available main frequency of the terminal device is insufficient and the remaining power of the earphone device is sufficient in the embodiment of the present application;
FIG. 17 is a graph comparing FIR filters and BRIR filters calculated according to an embodiment of the present application;
FIG. 18 is a graph comparing IIR filter and BRIR filter calculated according to the embodiment of the present application;
fig. 19 is a schematic diagram of further determining a specific type of a target filter according to an available dominant frequency of a terminal device according to an embodiment of the present application;
fig. 20 is an audio processing flow chart of a first audio signal for mono or bi-channel when the available main frequency of the terminal device and the remaining power of the earphone device are both insufficient in the embodiment of the present application;
fig. 21 is an audio processing flowchart of a first audio signal for at least three channels when the available main frequency of the terminal device and the remaining power of the earphone device are both insufficient in the embodiment of the present application;
Fig. 22 is a schematic structural diagram of an audio processing device according to an embodiment of the present application;
fig. 23 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
In order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. For example, the first chip and the second chip are merely for distinguishing different chips, and the order of the different chips is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
Fig. 1 is a schematic view of an application scenario of an audio processing method according to an embodiment of the present application. In the embodiment corresponding to fig. 1, a terminal device is taken as an example for illustrating a mobile phone, and the example does not limit the embodiment of the present application.
As shown in fig. 1, the terminal device 100 and the earphone device 200 worn by the user may be included in the scene. The earphone device 200 may be a headphone as shown in fig. 1, or the earphone device 200 may also be a true wireless stereo (true wireless stereo, TWS) or a wired earphone, etc., and the specific type of the earphone device 200 is not limited in the embodiment of the present application.
For example, in the case where the terminal device 100 establishes a communication connection with the headphone device 200, the terminal device 100 may perform spatial audio rendering on the first audio signal to obtain a target audio signal, and transmit the target audio signal to the headphone device 200, and play the target audio signal through the headphone device 200.
For example, the terminal device 100 may perform spatial audio rendering on the first audio signal through a BRIR filter, so that the rendered target audio signal may generate a better spatial effect when played through the headphone device 200.
It will be appreciated that BRIR filters consider the effect of ambient reflected sound on a sound source, and BRIR filters can be seen as the impulse response of a system of sound source, room environment, ears (including head, torso, pinna), consisting of direct sound, early reflected sound, late reverberation.
In the process that the user wears the earphone device to listen to the audio, the method is limited by the influence of the hardware configuration of a processor in the terminal device, the current use scene of the terminal device, the current residual capacity of the terminal device and other factors, and the available main frequency of the terminal device may be insufficient.
Spatial audio rendering is a computationally intensive and real-time demanding technique with high demands on the computational effort of the terminal equipment. In the process of performing spatial audio rendering on the first audio signal by the terminal equipment, when the available main frequency of the terminal equipment is insufficient, the computing power of the terminal equipment may not meet the requirement, so that the real-time computing requirement of the spatial audio rendering cannot be met, the terminal equipment cannot render in real time to obtain the target audio signal, and therefore the earphone equipment cannot continuously play the target audio signal. In this case, the target audio signal played by the earphone device may have noise, which affects the audio listening experience of the user.
Alternatively, in the process that the user wears the earphone device to listen to the audio, a situation that the remaining power of the earphone device is insufficient may also occur. When the residual electric quantity of the earphone device is insufficient, the earphone device actively reduces the capturing frequency of the first head motion data, the issuing frequency of the earphone device when issuing the first head motion data to the terminal device is reduced, so that the first head motion data which is relied on by the terminal device when performing spatial audio rendering may be lost, a target audio signal rendered by the terminal device is delayed, and correspondingly, the earphone device has a delay sense when playing the target audio signal, so that audio listening experience of a user is also affected.
Based on the above, the embodiment of the application provides an audio processing method, wherein a terminal device can acquire the residual capacity of an earphone device and the available main frequency of the terminal device; if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is larger than the preset electric quantity, a first audio signal is processed by adopting a first audio processing mode, and a target audio signal is obtained; the first audio processing mode is used for indicating a target filter corresponding to first head motion data issued by the earphone device to render the first audio signal, and the target filter comprises any one of the following components: an FIR filter, an IIR filter, or a direct sound portion and an early reflected sound portion in a BRIR filter; if the available main frequency is larger than the first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, a second audio processing mode is adopted to process the first audio signal, and a target audio signal is obtained; the second audio processing mode is used for indicating that when the first head motion data issued by the earphone device does not exist, the first audio signal is rendered by a BRIR filter corresponding to the head motion compensation data, and the head motion compensation data comprises: second head motion data generated based on a user image acquired by a camera, or first predicted head motion data predicted based on historical head motion data; the terminal device transmits the target audio signal to the headphone device. Therefore, when the available main frequency of the terminal equipment is insufficient, the terminal equipment can adopt the target filter to render the first audio signal, so that the calculated amount of the spatial audio rendering is reduced, the calculation speed of the spatial audio rendering is increased, the terminal equipment can render in real time to obtain the target audio signal, the earphone equipment can play the target audio signal in real time, and the audio listening experience of a user is improved; when the residual electric quantity of the earphone device is insufficient, the terminal device can acquire the head motion compensation data to compensate for the deficiency of the first head motion data issued by the earphone device, so that the hysteresis of a target audio signal rendered by the terminal device is reduced, the hysteresis of the earphone device when the target audio signal is played is reduced, and the audio listening experience of a user is improved.
It will be appreciated that the above-described terminal device may also be referred to as a terminal (terminal), a User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), or the like. The terminal device may be a mobile phone, a smart television, a wearable device, a tablet (Pad), a computer with wireless transceiving function, a Virtual Reality (VR) terminal device, an augmented reality (augmented reality, AR) terminal device, a wireless terminal in industrial control (industrial control), a wireless terminal in unmanned driving (self-driving), a wireless terminal in teleoperation (remote medical surgery), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), or the like. The embodiment of the application does not limit the specific technology and the specific equipment form adopted by the terminal equipment.
In order to better understand the embodiments of the present application, the structure of the terminal device of the embodiments of the present application is described below. Fig. 2 is a schematic diagram of a hardware system structure of a terminal device according to an embodiment of the present application.
The terminal device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriberidentification module, SIM) card interface 195, etc.
It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal device 100. In other embodiments of the application, terminal device 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processingunit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the terminal device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The antennas in the terminal device 100 may be used to cover single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the terminal device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wirelesslocal area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied to the terminal device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
The terminal device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used for displaying images, displaying videos, receiving sliding operations, and the like. The display 194 includes a display panel. In some embodiments, the terminal device 100 may include 1 or more display screens 194.
The terminal device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, terminal device 100 may include 1 or more cameras 193.
Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in various encoding formats, for example: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to realize expansion of the memory capability of the terminal device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer-executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (such as audio data, phonebook, etc.) created during use of the terminal device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The terminal device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The terminal device 100 can listen to music or to handsfree talk through the speaker 170A.
A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the terminal device 100 receives a call or voice message, it is possible to receive voice by approaching the receiver 170B to the human ear.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal device 100 may be further provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the source of sound, implement directional recording functions, etc.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
The keys 190 include a power-on key, a volume key, etc. The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card may be contacted and separated from the terminal apparatus 100 by being inserted into the SIM card interface 195 or by being withdrawn from the SIM card interface 195.
Fig. 3 is a schematic diagram of a hardware system of an earphone device according to an embodiment of the present application;
As shown in fig. 3, the earphone device 200 includes one or more processors 210, one or more memories 220, a communication interface 230, an audio acquisition circuit, and an audio playback circuit. Wherein the audio acquisition circuit further may comprise at least one microphone 240 and an analog-to-digital converter (ADC) 250. The audio playback circuit may further include a speaker 260 and a digital-to-analog converter (DAC) 270.
The headset may also include one or more sensors 280, for example: inertial measurement units (inertial measurement unit, IMU), proximity sensors, motion sensors, and the like. These hardware components may communicate over one or more communication buses.
In an embodiment of the present application, the IMU may be used to measure the motion pose of the headset device 200, e.g., the IMU may be used to determine first head motion data when the headset device 200 is worn by a user. Wherein, the IMU may be provided with a gyro sensor, an acceleration sensor, and the like.
The processor 210 is a control center of the earphone device 200, and the processor 210 may also be referred to as a control unit, a controller, a microcontroller, or some other suitable terminology. The processor 210 connects the various components of the headset using various interfaces and lines, and in a possible embodiment, the processor 210 may also include one or more processing cores. In a possible embodiment, the processor 210 may have integrated therein a main control unit and a signal processing module. The Main Control Unit (MCU) is configured to receive data collected by the sensor 280 or a monitoring signal from the signal processing module or a control signal from a terminal (e.g. a mobile phone APP), and finally control the earphone device 200 through comprehensive judgment and decision.
Memory 220 may be coupled to processor 210 or may be connected to processor 210 via a bus for storing various software programs and/or sets of instructions and data. The memory 220 may also store a communication program that may be used to communicate with the terminal. In one example, memory 220 may also store data/program instructions, and processor 210 may be used to invoke and execute the data/program instructions in memory 220. Alternatively, the memory 220 may be a memory external to the MCU, or may be a storage unit of the MCU itself.
The communication interface 230 is used for communicating with a terminal, and the communication mode may be a wired mode or a wireless mode. When the communication manner is wired communication, the communication interface 230 may be accessed to the terminal through a cable. When the communication mode is wireless communication, the communication interface 230 is configured to receive and transmit radio frequency signals, and the supported wireless communication mode may be at least one of Bluetooth (Bluetooth) communication, wireless-fidelity (Wifi) communication, infrared communication, or cellular 2/3/4/5generation (2/3/4/5 generation, 2G/3G/4G/5G) communication.
The microphone 240 may be used to collect sound signals (or audio signals, which are analog signals), and the analog-to-digital converter 250 is used to convert the analog signals collected by the microphone 240 into digital signals, which are sent to the processor 210 for processing, and in particular embodiments, to a signal processing module for processing. The signal processing module may transmit the processed signal (e.g., the audio signal) to the digital-to-analog converter 270, and the digital-to-analog converter 270 may convert the received signal into an analog signal and transmit the analog signal to the speaker 260, where the speaker 260 is configured to play according to the analog signal, so that the user can hear the sound.
In an embodiment of the present application, the communication interface 230 may be configured to transmit the first head movement data detected by the IMU to the terminal device 100. And, the communication interface 230 may also be used to receive a target audio signal transmitted by the terminal device 100.
It will be appreciated that the above-described earphone device 200 may also be referred to as an earplug, a headset, a walkman, an audio player, a media player, a headphone, an earpiece device, or some other suitable terminology, to which embodiments of the application are not limited.
The software system of the terminal device 100 may employ a layered architecture, an event driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture, etc. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the terminal device 100 is illustrated.
Fig. 4 is a software configuration block diagram of the terminal device 100 of the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.
The application layer may include a series of application packages. As shown in fig. 4, the application package may include applications such as telephone, mailbox, music, video, games, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application layer applications. The application framework layer includes a number of predefined functions.
As shown in fig. 4, the application framework layer may include a head motion data capture module, a decision module, an upmix module, a first downmix module, a head motion data prediction and compensation module, a filter selection and processing module, a spatial audio rendering module, and a second downmix module.
The Head motion data capturing module is used for calling a camera to collect user images through camera driving when being started, and analyzing the user images collected by the camera, such as analyzing the user images by adopting a Head Tracking technology or a Eye Tracking technology, so as to capture second Head motion data of a user.
The decision module is used for determining the working modes of the head motion data capturing module, the upmixing module, the first downmixing module, the head motion data predicting and compensating module and the filter selecting and processing module when the spatial audio rendering is carried out according to the residual electric quantity of the earphone device and the available main frequency of the terminal device.
The upmixing module is used for upmixing the mono or dual-channel first audio signal when being started, so that the upmixed first audio signal can comprise at least three channels.
The first downmixing module is used for downmixing a first audio signal comprising at least three channels when being started, so that the first audio signal after downmixing can be mono or bi-channel.
And the head motion data prediction and compensation module is used for predicting and obtaining first prediction head motion data based on the historical head motion data acquired for the previous N times when the available main frequency of the terminal equipment is larger than a first preset frequency and the residual capacity of the earphone equipment is smaller than or equal to the preset capacity, so as to compensate the deficiency of the first head motion data depending on the space audio rendering. And the method can also be used for predicting and obtaining second predicted head motion data based on the target head motion data acquired at the current moment or based on the target head motion data and the historical head motion data when the available main frequency of the terminal equipment is larger than a first preset frequency and the residual electric quantity of the earphone equipment is larger than a preset electric quantity, so that the terminal equipment can realize spatial audio rendering based on the second predicted head motion data obtained by prediction, and the lag sense of head tracking caused by spatial audio link delay is reduced.
The filter selection and processing module is used for selecting different filters according to the available main frequency of the terminal equipment. In addition, the filter selection and processing module may be further configured to gain adjust the selected filter.
The spatial audio rendering module is used for rendering the first audio signal by adopting the selected filter.
The second down-mixing module is used for performing down-mixing processing on the rendered audio signals to obtain target audio signals. The target audio signal may include a left channel target audio signal and a right channel target audio signal.
Android runtimes include core libraries and virtual machines. Android run time is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in virtual machines. The virtual machine executes java files of the application layer and the application framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional graphics engines (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG2, h.262, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The two-dimensional graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The kernel layer at least comprises a display driver, a Bluetooth driver, a camera driver and the like.
It should be noted that although the embodiment of the present application is described with an Android system, the principles of the audio processing method are equally applicable to terminal devices of iOS or windows and other operating systems.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be implemented independently or combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 5 is a schematic flow chart of an audio processing method according to an embodiment of the present application, which may be used in an application scenario corresponding to fig. 1. Referring to fig. 5, the audio processing method may specifically include the steps of:
s501, the terminal equipment and the earphone equipment establish communication connection.
In some embodiments, the terminal device and the earphone device may establish a communication connection through a wired manner; alternatively, the terminal device and the earphone device may also establish a communication connection in a wireless manner. For example, the communication connection between the terminal device and the earphone device may be established through bluetooth, WIFI, or a wireless manner such as connection to the same cloud account.
S502, the terminal equipment acquires the residual capacity of the earphone equipment.
After the communication connection between the terminal equipment and the earphone equipment is established, the terminal equipment can send an electric quantity inquiry request to the earphone equipment through the established communication link so as to periodically acquire the residual electric quantity of the earphone equipment; alternatively, the earphone device may periodically report its own remaining power to the terminal device through the established communication link.
It is understood that the remaining power of the earphone device may be transmitted in the form of a power percentage, for example, the remaining power of the earphone device may be 30% or 50%, etc.
S503, the terminal equipment acquires the available main frequency of the terminal equipment.
A pseudo file system may be provided in the terminal device, which may provide an interface for operations to access the kernel data in a file system manner. Thus, the decision module in the terminal device can obtain the available dominant frequency of the terminal device itself by reading the directories under these file systems and some related files under the sub-directories. The unit of the available primary frequency of the terminal device may be MHz.
The main frequency refers to the clock frequency of the processor during working, and has a certain relation with the actual operation speed of the processor. And the dominant frequency may be used to represent the dominant frequency remaining by the current processor for consumption by spatial audio rendering.
It will be appreciated that the available primary frequencies may be affected by such factors as the hardware configuration of the processor itself in the terminal device, the current usage scenario of the terminal device, and the current remaining power of the terminal device. For example, when the current remaining power of the terminal device is low, for example, below 20% or 10%, the terminal device may start the power saving mode, and power consumption is reduced by reducing the peak frequency of the processor, so that the available main frequency of the terminal device may be reduced when the terminal device is in the power saving mode.
S504, the terminal equipment determines a target audio processing mode according to the residual capacity of the earphone equipment and the available main frequency of the terminal equipment.
After the decision module in the terminal device obtains the residual capacity of the earphone device and the available main frequency of the terminal device, the decision module can determine a target audio processing mode adopted when spatial audio rendering is performed according to the residual capacity of the earphone device and the available main frequency of the terminal device.
Wherein the target audio processing mode may include: a first audio processing mode, a second audio processing mode, a third audio processing mode, and a fourth audio processing mode.
S505, the terminal equipment adopts a target audio processing mode to process the first audio signal, and a target audio signal is obtained.
After determining a target audio processing mode adopted by the decision module in the terminal equipment when rendering the spatial audio, the decision module in the terminal equipment can control the head motion data capturing module, the head motion data predicting and compensating module and the working mode of the filter selecting and processing module based on the target audio processing mode so as to determine the filter when participating in rendering the spatial audio; the decision module in the terminal device may also control the upmixing module and the first downmixing module based on the target audio processing mode to determine a first audio signal when participating in the spatial audio rendering. Then, the spatial audio rendering module renders a first audio signal when the spatial audio rendering is participated based on a filter when the spatial audio rendering is participated; and then, performing down-mixing processing on the rendered audio signal by adopting a second down-mixing module to obtain a target audio signal.
For example, when the target audio processing mode is the first audio processing mode, the terminal device processes the first audio signal in the first audio processing mode to obtain the target audio signal. And when the target audio processing mode is the second audio processing mode, the terminal equipment adopts the second audio processing mode to process the first audio signal so as to obtain a target audio signal. And when the target audio processing mode is the third audio processing mode, the terminal equipment adopts the third audio processing mode to process the first audio signal so as to obtain a target audio signal. And when the target audio processing mode is the fourth audio processing mode, the terminal equipment adopts the fourth audio processing mode to process the first audio signal to obtain a target audio signal.
S506, the terminal device transmits the target audio signal to the headphone device.
S507, the headphone device plays the target audio signal through the speaker.
In some embodiments, after rendering the target audio signal, the terminal device may send the target audio signal to the headset device over the established communication link. And the earphone device receives the target audio signal sent by the terminal device and plays the target audio signal through a loudspeaker.
It should be noted that the target audio signal may include a left channel target audio signal and a right channel target audio signal, so that the headphone device may play the left channel target audio signal through the left speaker and play the right channel target audio signal through the right speaker.
The decision process of the target audio processing mode used in spatial audio rendering is described below with respect to the corresponding embodiment of fig. 6. Fig. 6 is a schematic diagram illustrating decision of a target audio processing mode according to an embodiment of the present application. Referring to fig. 6, the decision mode of the target audio processing mode may be performed by a decision module in the terminal device, which may specifically include the following steps:
s601, a decision module determines whether an available main frequency of the terminal device is greater than a first preset frequency.
The decision module may compare the available primary frequency with a first preset frequency after obtaining the available primary frequency of the terminal device, and determine whether the available primary frequency of the terminal device is greater than the first preset frequency. When the available main frequency of the terminal device is greater than the first preset frequency, the following S602 is performed; and when the available main frequency of the terminal device is less than or equal to the first preset frequency, the following S603 is performed.
It can be understood that the first preset frequency can be set according to practical situations, and the unit of the first preset frequency can be MHz.
S602, when the available main frequency is larger than a first preset frequency, the decision module determines whether the residual capacity of the earphone device is larger than the preset capacity.
And when the decision module determines that the available main frequency of the terminal equipment is larger than a first preset frequency, the decision module further compares the residual electric quantity of the earphone equipment with the preset electric quantity to determine whether the residual electric quantity of the earphone equipment is larger than the preset electric quantity. When the remaining power of the earphone device is greater than the preset power, the following S604 is performed; and when the remaining power of the earphone device is less than or equal to the preset power, the following S605 is performed.
It can be understood that the preset electric quantity can be set according to actual situations, for example, the preset electric quantity can be 20%, 15% or 10%, etc., and the specific numerical value of the preset electric quantity is not limited in the embodiment of the present application.
And S603, when the available main frequency is smaller than or equal to a first preset frequency, the decision module determines whether the residual electric quantity of the earphone device is larger than the preset electric quantity.
And when the decision module determines that the available main frequency of the terminal equipment is smaller than or equal to a first preset frequency, the decision module further compares the residual electric quantity of the earphone equipment with the preset electric quantity to determine whether the residual electric quantity of the earphone equipment is larger than the preset electric quantity. When the remaining power of the earphone device is greater than the preset power, the following S606 is performed; and when the remaining power of the earphone device is less than or equal to the preset power, the following S607 is performed.
S604, when the available main frequency of the terminal device is greater than the first preset frequency and the residual capacity of the earphone device is greater than the residual capacity, the decision module determines that the target audio processing mode adopted in the spatial audio rendering is a fourth audio processing mode.
When the decision module determines that the available main frequency of the terminal equipment is larger than the first preset frequency and the residual capacity of the earphone equipment is larger than the residual capacity, namely, the available main frequency of the terminal equipment and the residual capacity of the earphone equipment are sufficient, the decision module determines that a target audio processing mode adopted in space audio rendering is a fourth audio processing mode. The fourth audio processing mode may also be referred to as a full rendering link mode.
Wherein, in the spatial audio rendering process, the fourth audio processing mode may include a spatial audio rendering strategy of the following five aspects:
in a first aspect, for the case where the first audio signal (which may be understood as an audio input signal) is a mono audio signal or a binaural audio signal, the upmixing module may be enabled such that the upmixing module may upmix the first audio signal, and the upmixed first audio signal may include at least three channels.
For example, the first audio signal is a two-channel audio signal, i.e., the first audio signal includes a left channel and a right channel, and the upmixed first audio signal may be a 3.0-channel signal, i.e., the upmixed first audio signal includes three channels, which may be a left channel, a right channel, and a center channel, respectively.
In a second aspect, for the case of a first audio signal comprising at least three channels, the first downmix module may be turned off such that the first audio signal is not subjected to a downmix process by the first downmix module.
For example, if the first audio signal is a 3.0 channel signal, the first downmix module is turned off in the fourth audio processing mode. Alternatively, the first audio signal is a 5.1 channel signal, i.e. the first audio signal comprises six channels, which are a left channel, a right channel, a center channel, a left surround channel, a right surround channel and a low frequency effects channel (low frenquency effect, LFE), respectively, for which case the first downmix module is also turned off in the fourth audio processing mode. Alternatively, the first audio signal is a 7.1 channel signal, i.e. the first audio signal comprises eight channels, which in addition to the six channels of the 5.1 channel signal comprise a left rear surround channel signal and a right rear surround channel signal, for which case the first downmix module will still be turned off in the fourth audio processing mode.
It will be appreciated that in some embodiments, for a 5.1 channel signal, a 7.1 channel signal, or a first audio signal with more channels, the first downmixing module may be enabled in the fourth audio processing mode, and the first audio signal with more channels may be downmixed into a 3.0 channel signal by the first downmixing module.
In a third aspect, a head motion data capture module may be enabled. The head motion data capturing module can analyze the user image acquired by the camera to obtain second head motion data.
In a fourth aspect, a prediction mode in the head motion data prediction and compensation module may be enabled. The head motion data prediction and compensation module may predict the target head motion data obtained at the current time to obtain second predicted head motion data, or the head motion data prediction and compensation module may predict the target head motion data and the historical head motion data to obtain second predicted head motion data.
The target head motion data comprise first head motion data issued by the earphone device and/or second head motion data generated by a user image acquired by the camera; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera.
In a fifth aspect, a BRIR filter may be employed for spatial audio rendering. The filter selection and processing module in the terminal device can select a corresponding BRIR filter according to the second predicted head motion data obtained through prediction, so that the spatial audio rendering module can perform spatial audio rendering on the first audio signal based on the BRIR filter corresponding to the second predicted head motion data, and lag feeling of head tracking caused by spatial audio link delay is reduced.
In this way, when the available main frequency of the terminal device and the residual capacity of the earphone device are sufficient, the terminal device can adopt the fourth audio processing mode to perform spatial audio rendering so as to improve the rendering effect of the spatial audio, thereby improving the playing effect of the finally obtained target audio signal when played through the earphone device so as to improve the audio listening experience of the user.
S605, when the available main frequency of the terminal device is greater than the first preset frequency and the remaining power of the earphone device is less than or equal to the preset power, the decision module determines that the target audio processing mode adopted in the spatial audio rendering is the second audio processing mode.
When the decision module determines that the available main frequency of the terminal equipment is larger than the first preset frequency and the residual electric quantity of the earphone equipment is smaller than or equal to the preset electric quantity, namely, the available main frequency of the terminal equipment is sufficient and the residual electric quantity of the earphone equipment is insufficient, the decision module determines that a target audio processing mode adopted in space audio rendering is a second audio processing mode. The second audio processing mode may also be referred to as an audio compensation mode.
Wherein, in the spatial audio rendering process, the second audio processing mode includes a spatial audio rendering strategy of the following five aspects:
In the first aspect, for the case that the first audio signal is a mono audio signal or a binaural audio signal, the upmixing module may be enabled, so that the upmixing module may perform upmixing processing on the first audio signal, and the upmixed first audio signal may include at least three channels.
In a second aspect, for the case of a first audio signal comprising at least three channels, the first downmix module may be turned off such that the first audio signal is not subjected to a downmix process by the first downmix module.
In a third aspect, a head motion data capture module may be enabled. The head motion data capturing module can analyze the user image acquired by the camera to obtain second head motion data.
In a fourth aspect, a compensation mode in the head motion data prediction and compensation module may be enabled. When the head motion data prediction and compensation module determines that the first head motion data issued by the earphone device does not exist, the head motion data prediction and compensation module can take second head motion data generated based on user images acquired by a camera as head motion compensation data, or predict and obtain first prediction head motion data based on historical head motion data acquired N times before, and take the first prediction head motion data obtained by prediction as head motion compensation data to compensate for the defect of head motion data depending on spatial audio rendering.
In a fifth aspect, a BRIR filter may be employed for spatial audio rendering. The filter selection and processing module in the terminal device may select a corresponding BRIR filter according to the head motion compensation data, so that the spatial audio rendering module may perform spatial audio rendering on the first audio signal based on the BRIR filter corresponding to the head motion compensation data.
In this way, when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient, the earphone device actively reduces the capturing frequency of the first head motion data, so that the first head motion data on which the terminal device depends when performing spatial audio rendering may be missing. Therefore, in this case, the terminal device of the embodiment of the application can adopt the second audio processing mode to perform spatial audio rendering so as to compensate for the deficiency of the first head motion data issued by the earphone device, and reduce the hysteresis of the target audio signal rendered by the terminal device, thereby reducing the hysteresis of the earphone device when playing the target audio signal and improving the audio listening experience of the user.
S606, when the available main frequency of the terminal device is smaller than or equal to a first preset frequency and the residual capacity of the earphone device is larger than the preset capacity, the decision module determines that the target audio processing mode adopted in the spatial audio rendering is the first audio processing mode.
When the decision module determines that the available main frequency of the terminal equipment is smaller than or equal to a first preset frequency and the residual capacity of the earphone equipment is larger than the preset capacity, namely, the available main frequency of the terminal equipment is insufficient and the residual capacity of the earphone equipment is sufficient, the decision module determines that a target audio processing mode adopted in space audio rendering is a first audio processing mode. The first audio processing mode may also be referred to as a low-power mode.
Wherein, in the spatial audio rendering process, the first audio processing mode may include a spatial audio rendering strategy of the following five aspects:
in the first aspect, for the case where the first audio signal is a mono audio signal or a binaural audio signal, the upmixing module may be turned off, so that the upmixing module does not perform upmixing processing on the first audio signal.
In a second aspect, for the case of a first audio signal comprising at least three channels, the first downmix module may be enabled such that the first downmix module may perform a downmix process on the first audio signal comprising at least three channels, and the first audio signal after the downmix may be mono or bi-channel.
For example, if the first audio signal is a 3.0 channel signal, the first downmixing module may be enabled in the first audio processing mode to downmix the first audio signal into a binaural audio signal.
In a third aspect, the head motion data capture module may be turned off, and the head motion data capture module may not acquire the user image captured by the camera any more, nor may analyze the user image captured by the camera to obtain the second head motion data.
In a fourth aspect, the head motion data prediction and compensation module may be turned off, and then the head motion data prediction and compensation module no longer performs the procedure of the fourth aspect of the fourth audio processing mode and the second audio processing mode described above.
In a fifth aspect, spatial audio rendering is performed using a target filter corresponding to first head motion data issued by a headphone apparatus. The filter selection and processing module in the terminal device can select a corresponding target filter according to the first head motion data issued by the earphone device, so that the spatial audio rendering module can perform spatial audio rendering on the first audio signal based on the target filter corresponding to the first head motion data.
Wherein the target filter comprises any one of the following: FIR filters, IIR filters, or direct sound and early reflected sound portions in BRIR filters.
Since the FIR filter and IIR filter are of shorter order than the BRIR filter, the direct sound portion and early reflected sound portion in the BRIR filter are part of the complete BRIR filter. Therefore, when the available main frequency of the terminal equipment is insufficient and the residual capacity of the earphone equipment is sufficient, the terminal equipment can adopt the first audio processing mode to conduct space audio rendering, so that the complexity of the space audio rendering is simplified, the calculated amount of the space audio rendering is reduced, the calculation speed of the space audio rendering is increased, the terminal equipment can render in real time to obtain a target audio signal, the earphone equipment can play the target audio signal in real time, and the audio listening experience of a user is improved.
S607, when the available main frequency of the terminal device is less than or equal to the first preset frequency and the remaining power of the earphone device is less than or equal to the preset power, the decision module determines that the target audio processing mode adopted in the spatial audio rendering is the third audio processing mode.
When the available main frequency of the terminal equipment is smaller than or equal to a first preset frequency and the residual electric quantity of the earphone equipment is smaller than or equal to a preset electric quantity, namely, the available main frequency of the terminal equipment and the residual electric quantity of the earphone equipment are insufficient, the decision module determines that a target audio processing mode adopted in space audio rendering is a third audio processing mode. The third audio processing mode may also be referred to as a minimum power consumption mode.
Wherein, in the spatial audio rendering process, the third audio processing mode may include a spatial audio rendering strategy of the following five aspects:
in the first aspect, for the case where the first audio signal is a mono audio signal or a binaural audio signal, the upmixing module may be turned off, so that the upmixing module does not perform upmixing processing on the first audio signal.
In a second aspect, for the case of a first audio signal comprising at least three channels, the first downmix module may be enabled such that the first downmix module may perform a downmix process on the first audio signal comprising at least three channels, and the first audio signal after the downmix may be mono or bi-channel.
In a third aspect, the head motion data capture module may be turned off, and the head motion data capture module may not acquire the user image captured by the camera any more, nor may analyze the user image captured by the camera to obtain the second head motion data.
In a fourth aspect, the head motion data prediction and compensation module may be turned off, and then the head motion data prediction and compensation module no longer performs the procedure of the fourth aspect of the fourth audio processing mode and the second audio processing mode described above.
In a fifth aspect, spatial audio rendering may be performed using a target filter corresponding to preset head motion data. In this case, the earphone device does not capture the first head motion data any more, and does not issue the first head motion data to the terminal device, and the filter selection and processing module in the terminal device may directly select a corresponding target filter according to the preset head motion data, so that the spatial audio rendering module may select the corresponding target filter based on the preset head motion data, and perform spatial audio rendering on the first audio signal.
In an embodiment of the present application, the first head movement data, the second head movement data, the historical head movement data, the first predicted head movement data, the second predicted head movement data, the preset head movement data, and other head movement data may include a horizontal azimuth angle, an inclination angle, and a pitch angle.
It will be appreciated that the head movement data described above may include translational distances in the front-rear direction, left-right direction, and up-down direction due to body movement of the user, in addition to the horizontal azimuth (Yaw), inclination angle (Roll), and Pitch angle (Pitch).
The horizontal azimuth refers to a rotation angle of the user's head in the horizontal direction. The horizontal azimuth angle has a value range of [0 DEG, 360 DEG), the azimuth angle corresponding to the right front of the user face is 0 DEG, the azimuth angle corresponding to the right of the user face is 90 DEG, the azimuth angle corresponding to the right rear of the user face is 180 DEG, and the azimuth angle corresponding to the right left of the user face is 270 deg.
Pitch angle refers to the angle of rotation of the user's head in the vertical direction. The pitch angle is within the range of [ -90 degrees, 90 degrees ], the pitch angle corresponding to the horizontal plane is 0 degrees, the pitch angle corresponding to the head of the user right above is 90 degrees, and the pitch angle corresponding to the head of the user right below is-90 degrees.
The tilt angle may be the angle between the tilted position of the head and the horizontal position when the user tilts the head. The range of inclination angle is [ -90 DEG, 90 DEG ], the inclination angle can be negative when the user head is inclined to the left, and the inclination angle can be positive when the user head is inclined to the right.
The translation distance in the front-rear direction generated by the body movement of the user can be used as the translation distance in the X-axis direction, the translation distance in the left-right direction generated by the body movement of the user can be used as the translation distance in the Y-axis direction, and the translation distance in the up-down direction generated by the body movement of the user can be used as the translation distance in the Z-axis direction.
Taking the example that the head motion data includes a horizontal azimuth angle, an inclination angle, a pitch angle, a translation distance in the X-axis direction, a translation distance in the Y-axis direction, and a translation distance in the Z-axis direction, the horizontal azimuth angle, the inclination angle, the pitch angle, the translation distance in the X-axis direction, the translation distance in the Y-axis direction, and the translation distance in the Z-axis direction in the preset head motion data may all be 0.
Wherein the target filter comprises any one of the following: FIR filters, IIR filters, or direct sound and early reflected sound portions in BRIR filters.
In this way, when the available main frequency of the terminal device and the residual capacity of the earphone device are insufficient, the terminal device can close the upmix module, the head motion data capturing module and the head motion data prediction and compensation module, and perform spatial audio rendering by adopting any one of a direct sound part and an early reflected sound part in the FIR filter, the IIR filter and the BRIR filter, so that the complexity of the spatial audio rendering is simplified, and the calculated amount of the spatial audio rendering is reduced; and the terminal equipment can also use the preset head motion data to select a corresponding target filter to perform spatial audio rendering on the first audio signal so as to maintain a certain spatial audio effect.
In summary, the terminal device may adaptively adjust algorithm parameters in the spatial audio rendering process according to the available main frequency of the terminal device and the remaining power of the earphone device, so as to balance the computing power of the terminal device, the power of the earphone device, and the spatial audio rendering effect.
The audio processing method according to the embodiment of the application is described in detail below by using spatial audio rendering processes corresponding to four different scenes. The target audio processing modes adopted by the four scenes in the process of spatial audio rendering are respectively as follows: a first audio processing mode, a second audio processing mode, a third audio processing mode, and a fourth audio processing mode.
In the first scenario, the available main frequency of the terminal device is greater than a first preset frequency, the remaining capacity of the earphone device is greater than the remaining capacity, and the decision module determines that a target audio processing mode adopted during spatial audio rendering is a fourth audio processing mode.
As shown in fig. 7 and 8, the earphone device may collect first head movement data of a user wearing the earphone device and issue the first head movement data to the terminal device, when the earphone device issues the first head movement data at a first issue frequency. And, the head motion data capturing module in the terminal device can acquire the user image acquired by the camera, and generate second head motion data based on the user image acquired by the camera.
Because of the large spatial audio link delay, the spatial audio link delay is typically greater than 250ms. As shown in fig. 9, at time t, the terminal device may acquire the first head motion data issued by the earphone device and/or generate the second head motion data based on the head motion data capturing module, but when spatial audio rendering is performed according to the first head motion data and/or the second head motion data and the rendered target audio signal is sent to the earphone device to play, the time when the earphone device plays the target audio signal is time u, so that a certain hysteresis exists between the acquisition of the head motion data and the playing of the corresponding target audio signal. Wherein the time u is later than the time t, e.g. the interval between the time u and the time t is 250ms.
Therefore, in order to reduce the lag sense of head tracking caused by the delay of the spatial audio link, the terminal device may predict the head motion data corresponding to the u moment by adopting a prediction mode in the head motion data prediction and compensation module, and participate in spatial audio rendering directly based on the predicted second predicted head motion data corresponding to the u moment, so as to reduce the lag sense of head tracking.
Specifically, as shown in fig. 7 and 8, the head motion data prediction and compensation module may obtain first head motion data issued by the earphone device, and second head motion data generated by the head motion data capturing module. The head motion data prediction and compensation module can predict and obtain second predicted head motion data according to the target head motion data; alternatively, the head motion data prediction and compensation module may further predict second predicted head motion data based on the target head motion data and the historical head motion data.
The target head motion data comprise first head motion data issued by the earphone device and/or second head motion data generated based on user images acquired by the camera; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera.
As shown in fig. 10, the head motion data prediction and compensation module may predict and obtain second predicted head motion data according to first head motion data issued by the earphone device and acquired at time t, second head motion data generated based on user images acquired by the camera at time t, first historical head motion data issued by the earphone device and acquired M times before time t, and second historical head motion data generated based on user images acquired by the camera and acquired M times before time t. Wherein M is a positive integer.
The embodiment of the application can predict and obtain the second predicted head motion data in the following four prediction modes. The target head movement data may be first head movement data or second head movement data, and the historical head movement data may be first historical head movement data or second historical head movement data.
At time t, if the terminal device acquires the first head motion data but does not acquire the second head motion data, the target head motion data is the first head motion data; at time t, if the terminal equipment acquires the second head motion data but does not acquire the first head motion data, the target head motion data is the second head motion data; at time t, if the terminal device acquires the first head motion data and the second head motion data, the target head motion data is the first head motion data. Or in some realizable modes, if the terminal device acquires the first head motion data and the second head motion data at the time t, the terminal device may calculate two kinds of second predicted head motion data by using the first head motion data and the second head motion data as target head motion data respectively, and finally, perform weighted summation on the two kinds of second predicted head motion data to obtain final second predicted head motion data.
And, when the target head movement data is the first head movement data, the historical head movement data may be the first historical head movement data; when the target head movement data is second head movement data, the historical head movement data may be second historical head movement data.
In the first prediction method, a target rotational angular velocity in the target head motion data acquired at time t is predicted. The target head motion data includes a target rotation angle and a target rotation angular velocity. The terminal equipment determines a second time difference between the current time (such as t time) and the time (such as u time) of playing the target audio signal through the earphone equipment; and the terminal equipment sums the product of the target rotation angular velocity and the second time difference with the target rotation angle to obtain second predicted head movement data.
Wherein the target rotation angle may include a target horizontal azimuth angle yaw t Target tilt angle roll t And a target pitch anglepitch t The target rotational angular velocity may include a target horizontal azimuth angle yaw t Corresponding first target rotation angular velocity omega 1 Target tilt angle roll t Corresponding second target rotation angular velocity omega 2 And a target pitch angle pitch t Corresponding third target rotation angular velocity omega 3
The second predicted head motion data includes: predicted second horizontal azimuth angle yaw of u time u Second inclination angle roll u And a second pitch angle pitch u . Thus, the second horizontal azimuth angle yaw u Second inclination angle roll u And a second pitch angle pitch u The method can be calculated by the following formula:
it can be seen that the first prediction mode of the second predicted head motion data is that the head motion data prediction and compensation module in the terminal device predicts and obtains the second predicted head motion data based on the target head motion data only; and, the target head movement data includes the target rotational angular velocity corresponding thereto in addition to the target horizontal azimuth, the target inclination angle, and the target pitch angle.
In the second prediction mode, prediction is performed according to the target head motion data acquired at the time t and the historical head motion data acquired before the time t. The terminal device determines a difference between the twice target head motion data and the previously acquired historical head motion data as second predicted head motion data.
Wherein the target head motion data may include a target rotation angle t The previously acquired historical head movement data may include a historical rotation angle t-1 The second predicted head motion data may include a second predicted angle of rotation u
Thus, the second predicted rotation angle u The method can be calculated by the following formula:
angle u =2×angle t -angle t-1
it will be appreciated that the target rotation angle in the above formula t Angle of historical rotation t-1 And a second predicted rotation angle u May be any one of a horizontal azimuth angle, an inclination angle, and a pitch angle.
In addition, the target head motion data, the previously acquired historical head motion data, and the second predicted head motion data may also include a translation distance in the X-axis direction, a translation distance in the Y-axis direction, a translation distance in the Z-axis direction, and the like. In calculating the translation distance in the second predicted head motion data, it may refer to the second predicted rotation angle u Is a calculation formula of (2).
And in a third prediction mode, predicting according to the target head motion data acquired at the moment t and the historical head motion data acquired for the previous M times before the moment t. And the terminal equipment processes the target head movement data and the historical head movement data acquired in the previous M times by adopting a weighted moving average method to obtain second predicted head movement data.
The weighted moving average method is a method of giving different weights to observed values, obtaining a moving average value according to the different weights, and determining a predicted value based on the final moving average value.
Specifically, first prediction candidate data M is calculated according to the following formula t+1
M t+1 =w 1 ×M t +w 2 ×M t-1 +…+w m+1 ×M t-m
Wherein M is t Representing target head motion data, w 1 Representing target head movement data M at first calculation t Corresponding weights; m is M t-1 Representing historical head movement data acquired last time before t time, w 2 Representing historical head movement data M acquired last time at first calculation t-1 Corresponding weights; m is M t-m Representing the M-th acquired calendar before time tHistory of head movement data, w m+1 Representing historical head movement data M acquired the Mth time before time t of the first calculation t-m And (5) corresponding weight. w (w) 1 To w m+1 Which decrease in turn.
Next, second prediction candidate data M is calculated according to the following formula t+2
M t+2 =k 1 ×M t+1 +k 2 ×M t +…+k m+1 ×M t-m+1
Wherein k is 1 Representing first prediction candidate data M at the time of second calculation t+1 Corresponding weight, k 2 Representing target head movement data M at the time of the second calculation t Corresponding weight, M t-m+1 Represents the historical head movement data, k acquired from the M-1 st time before the time t m+1 Representing historical head movement data M acquired from M-1 st time before t time of second calculation t-m+1 And (5) corresponding weight. k (k) 1 To k m+1 Which decrease in turn.
And so on until the second predicted head-motion data M is calculated according to the following formula u
M u =g 1 ×M u-1 +g 2 ×M u-2 +…+g m+1 ×M u-m-1
Wherein M is u-1 Representing motion data M at a second predicted head u Prediction candidate data obtained by previous calculation of g 1 Representing its corresponding weight; m is M u-2 Represented at M u-1 Prediction candidate data obtained by previous calculation of g 2 Representing its corresponding weight; m is M u-m-1 Representing motion data M at a second predicted head u Predictive candidate data, g, obtained from the previous M+1st calculation m+1 Representing their corresponding weights. g 1 To g m+1 Which decrease in turn.
It will be appreciated that the target head motion data, the historical head motion data, and the second predicted head motion data in the above formula may be any one of a horizontal azimuth angle, a tilt angle, a pitch angle, a translational distance in the X-axis direction, a translational distance in the Y-axis direction, and a translational distance in the Z-axis direction. The above formula may be used to predict the horizontal azimuth angle, the inclination angle, the pitch angle, the translational distance in the X-axis direction, the translational distance in the Y-axis direction, and the translational distance in the Z-axis direction included in the second predicted head motion data, respectively.
And, target head motion data M t Corresponding weight w 1 The time difference between the moment of each time of acquiring the historical head movement data and the current moment is inversely proportional to the weight corresponding to the historical head movement data.
That is, when the time difference between the time at which the historical head movement data is acquired each time and the current time is larger, the smaller the correlation of the historical head movement data and the second predicted head movement data is, the smaller the weight corresponding to the historical head movement data is; when the time difference between the time when the historical head movement data is acquired each time and the current time is smaller, the greater the correlation between the historical head movement data and the second predicted head movement data is, the greater the weight corresponding to the historical head movement data is.
And a fourth prediction mode is predicted according to an AI model. The terminal equipment adopts an AI model to process the target head motion data and the historical head motion data acquired in the previous M times to obtain second predicted head motion data; the AI model is trained based on a plurality of sample head motion data.
A plurality of sample head motion data are collected in advance, and are input into a neural network model, and the neural network model outputs a prediction result at each moment. And then, adjusting parameters in the neural network model according to the deviation of the predicted result and the actual result until the deviation of the predicted result and the actual result meets the requirements, so that the AI model is obtained through training.
Therefore, the terminal device may input the target head motion data acquired at the time t and the historical head motion data acquired M times before the time t into the trained AI model, and the AI model may output the second predicted head motion data.
It will be appreciated that the target head motion data, the historical head motion data, and the second predicted head motion data may each include a horizontal azimuth angle, a tilt angle, a pitch angle, a translation distance in the X-axis direction, a translation distance in the Y-axis direction, and a translation distance in the Z-axis direction.
As shown in fig. 7 and 8, the head motion data prediction and compensation module inputs the second predicted head motion data to the filter selection and processing module after predicting the second predicted head motion data. The filter selection and processing module in the terminal device may obtain BRIR filters corresponding to the second predicted head motion data from a BRIR database.
As shown in fig. 7, for the case where the first audio signal is a mono audio signal or a binaural audio signal, an upmix module in the terminal device performs upmix processing on the first audio signal; then, the spatial audio rendering module renders the first audio signal after upmixing processing by adopting a BRIR filter corresponding to the second prediction head motion data to obtain a fifth audio signal; finally, the second down-mixing module performs down-mixing processing on the fifth audio signal to obtain a target audio signal.
Specifically, the spatial audio rendering module convolves the first audio signal after upmixing processing with a BRIR filter corresponding to the second predicted head motion data to obtain a target audio signal.
It should be noted that the parameters input to the filter selecting and processing module may further include the number of channels of the first audio signal. Each channel in the first audio signal corresponds to two BRIR filters. Therefore, in a specific downmix manner of the second downmix module, convolution results of each channel in the first audio signal and its corresponding left BRIR filter are actually added to obtain a left channel target audio signal, and convolution results of each channel in the first audio signal and its corresponding right BRIR filter are added to obtain a right channel target audio signal.
As shown in fig. 11, the upmixed first audio signal may include three channels,which are respectively the left channel first audio signal S 1 First audio signal S of right channel 2 And a center channel first audio signal S 3 . BRIR filters obtained from BRIR database are respectively: left BRIR filter h corresponding to left channel 1 l Right BRIR filter h corresponding to left channel 1 r Left BRIR filter h corresponding to right channel 2 l Right BRIR filter h corresponding to right channel 2 r Left BRIR filter h corresponding to center channel 3 l And a right BRIR filter h corresponding to the center channel 3 r
Therefore, when the BRIR filter corresponding to the second predicted head motion data is used, the first audio signal after the upmixing process is rendered, which is actually: first audio signal S of left channel 1 Left BRIR filter h corresponding to left channel 1 l Convolving the right channel first audio signal S 2 Left BRIR filter h corresponding to right channel 2 l Convolving the center channel first audio signal S 3 Left BRIR filter h corresponding to center channel 3 l Convolving the left channel first audio signal S 1 Right BRIR filter h corresponding to left channel 1 r Convolving the right channel first audio signal S 2 Right BRIR filter h corresponding to right channel 2 r Convolving the center channel first audio signal S 3 Right BRIR filter h corresponding to center channel 3 r And (5) convolution.
Finally, the target audio signal is downmixed using the following formula:
where L represents a left channel target audio signal among the target audio signals, R represents a right channel target audio signal among the target audio signals, conv represents convolution processing.
As shown in fig. 8, for the case that the first audio signal includes at least three channels, the spatial audio rendering module in the terminal device directly adopts a BRIR filter corresponding to the second predicted head motion data to render the first audio signal, so as to obtain a fifth audio signal. Finally, the second down-mixing module performs down-mixing processing on the fifth audio signal to obtain a target audio signal. For a specific manner of the downmix processing, reference may be made to the embodiment corresponding to fig. 11.
In a second scenario, the available main frequency of the terminal device is greater than the first preset frequency, the remaining power of the earphone device is less than or equal to the preset power, and the decision module determines that a target audio processing mode adopted during spatial audio rendering is a second audio processing mode.
As shown in fig. 12 and 13, the headphone device may collect first head movement data of a user wearing the headphone device and issue the first head movement data to the terminal device. The headset device then transmits the first head movement data at the second transmission frequency. And, the head motion data capturing module in the terminal device can acquire the user image acquired by the camera, and generate second head motion data based on the user image acquired by the camera.
When the residual electric quantity is insufficient, the earphone device can actively reduce the capturing and issuing frequency of the first head movement data so as to reduce the power consumption of the earphone device. Thus, in the second audio processing mode, the headset device transmits the first head movement data at a second transmission frequency, and the second transmission frequency is smaller than the first transmission frequency. Therefore, when the first audio signal needs to be rendered, a situation may occur in which the first head motion data is missing.
Therefore, the terminal equipment can adopt the compensation mode in the head motion data prediction and compensation module to compensate the first head motion data loss, and the hysteresis of the target audio signal rendered by the terminal equipment is reduced.
Specifically, fig. 14 is a flowchart of compensating head motion data when the available main frequency of the terminal device is sufficient and the remaining power of the earphone device is insufficient in the embodiment of the present application. Referring to fig. 14, the head motion data prediction and compensation module in the terminal device may perform the following steps:
s1401, the terminal device determines whether there is first head movement data issued by the headphone device.
When the terminal device is ready to render the first audio signal, the head motion data prediction and compensation module in the terminal device may first determine whether the first head motion data exists currently. If the first head movement data does not exist, the following S1402 is performed. And if the first head motion data exists, directly acquiring a BRIR filter corresponding to the first head motion data from a BRIR database so as to render the first audio signal.
S1402, if there is no first head movement data, the terminal device determines whether there is second head movement data.
If the head motion data prediction and compensation module determines that there is currently no first head motion data, it may further determine whether there is second head motion data. If there is the second head motion data, the following S1403 is performed; if not, the following S1404 is performed.
S1403, if there is second head motion data, the terminal device determines the second head motion data as head motion compensation data.
If the second head motion data does not exist in S1404, the terminal device predicts the first predicted head motion data based on the previous N times of acquired historical head motion data.
The terminal device determines the first predicted head motion data as head motion compensation data S1405.
Wherein N is a positive integer; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera.
The embodiment of the application can predict and obtain the first predicted head motion data in the following four prediction modes. Wherein the historical head movement data may be first historical head movement data or second historical head movement data.
Alternatively, in some implementations, the head motion data prediction and compensation module may calculate two types of first predicted head motion data using the first historical head motion data and the second historical head motion data as the historical head motion data, respectively, and finally perform weighted summation on the two types of first predicted head motion data to obtain final first predicted head motion data.
In a first prediction mode, the terminal equipment determines a first time difference between a current time and a time when the historical head motion data is acquired last time, wherein the historical head motion data acquired last time comprises a historical rotation angle and a historical rotation angular velocity; the terminal equipment sums the product of the historical rotation angular velocity and the first time difference with the historical rotation angle to obtain first prediction head movement data.
It should be noted that, the current time may be understood as a time when the first audio signal is ready to be rendered, and the first audio signal may be a frame of audio input signal in the audio input stream.
It will be appreciated that, for the specific implementation of the first prediction mode of the first predicted head motion data, reference may be made to the description of the first prediction mode of the second predicted head motion data, which is not described herein.
In a second prediction mode, the terminal equipment determines the difference between twice of the first historical moment data and the second historical moment data as first prediction head movement data; the first historical moment data are the historical head movement data acquired in the previous time, and the second historical moment data are the historical head movement data acquired in the previous time.
It will be appreciated that, for the specific implementation of the second prediction mode of the first predicted head motion data, reference may be made to the description of the second prediction mode of the second predicted head motion data, which is not described herein.
In a third prediction mode, the terminal equipment performs weighted summation on the historical head motion data acquired in the previous N times to obtain first prediction head motion data; wherein N is a positive integer greater than 1, and the time difference between the time at which the historical head movement data is acquired each time and the current time is inversely proportional to the weight corresponding to the historical head movement data.
A fourth prediction mode is that the terminal equipment adopts an AI model to process the historical head motion data acquired for the previous N times, and predicts to obtain first predicted head motion data; the AI model is trained based on a plurality of sample head motion data.
It can be appreciated that, for the specific implementation of the fourth prediction mode of the first predicted head motion data, reference may be made to the description of the fourth prediction mode of the second predicted head motion data, which is not described herein.
As shown in fig. 12 and 13, after the head motion data prediction and compensation module acquires the head motion compensation data, the head motion compensation data is input to the filter selection and processing module. The filter selection and processing module in the terminal device may obtain BRIR filters corresponding to the head motion compensation data from a BRIR database.
As shown in fig. 12, for the case where the first audio signal is a mono audio signal or a binaural audio signal, an upmix module in the terminal device performs upmix processing on the first audio signal; then, the spatial audio rendering module adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal after the upmixing processing to obtain a third audio signal; finally, the second down-mixing module performs down-mixing processing on the third audio signal to obtain a target audio signal.
As shown in fig. 13, for the case that the first audio signal includes at least three channels, the spatial audio rendering module in the terminal device directly adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal, so as to obtain a third audio signal. Finally, the second down-mixing module performs down-mixing processing on the third audio signal to obtain a target audio signal.
It may be appreciated that the manner in which the second downmix module performs the downmix processing on the third audio signal may refer to the corresponding embodiment of fig. 11.
Thus, it can be seen that the second audio processing mode differs from the fourth audio processing mode in that: in the fourth audio processing mode, the earphone device transmits the first head motion data at a first transmission frequency, and in the second audio processing mode, the earphone device transmits the first head motion data at a second transmission frequency, wherein the second transmission frequency is smaller than the first transmission frequency; and, the head motion data prediction and compensation module in the fourth audio processing mode is in the prediction mode, and the head motion data prediction and compensation module in the second audio processing mode is in the compensation mode. The second audio processing mode is substantially similar to the rest of the content in the fourth audio processing mode.
In a third scenario, the available main frequency of the terminal device is smaller than or equal to a first preset frequency, the remaining capacity of the earphone device is larger than the preset capacity, and the decision module determines that a target audio processing mode adopted during spatial audio rendering is a first audio processing mode.
As shown in fig. 15 and 16, the earphone device may collect first head movement data of a user wearing the earphone device and issue the first head movement data to the terminal device, when the earphone device issues the first head movement data at a first issue frequency. And, the head motion data capturing module in the terminal device is closed, and the head motion data predicting and compensating module in the terminal device is also closed.
The filter selection and processing module in the terminal equipment can acquire a corresponding target filter according to the first head motion data issued by the earphone equipment; the target filter includes any one of the following: FIR filters, IIR filters, or direct sound and early reflected sound portions in BRIR filters.
As shown in fig. 15, for the case that the first audio signal is a mono audio signal or a binaural audio signal, the spatial audio rendering module in the terminal device directly adopts a target filter corresponding to the first head motion data to render the first audio signal, so as to obtain a second audio signal; finally, a second down-mixing module in the terminal equipment performs down-mixing processing on the second audio signal to obtain a target audio signal.
As shown in fig. 16, for the case where the first audio signal includes at least three channels, a first downmix module in the terminal device performs a downmix process on the first audio signal, and downmixes the first audio signal into mono or bi channels; then, a spatial audio rendering module in the terminal equipment adopts a target filter corresponding to the first head motion data to render the first audio signal after the down-mixing processing to obtain a second audio signal; finally, a second down-mixing module in the terminal equipment performs down-mixing processing on the second audio signal to obtain a target audio signal.
It may be appreciated that the manner in which the second audio signal is downmixed by the second downmixing module may refer to the corresponding embodiment of fig. 11.
The complete BRIR filter may include a direct sound portion, an early reflected sound portion, and a late reverberation portion. When the complete BRIR filter is adopted to convolve with the first audio signal, a large amount of calculation force of the terminal equipment is required to be consumed, and when the available main frequency of the terminal equipment is insufficient, the real-time calculation requirement of spatial audio rendering cannot be met.
Wherein the direct sound part and the early reflected sound part contain a great deal of spatial information, and the late reverberation part is mainly used for providing a reverberation feeling. Therefore, the terminal device can intercept the direct sound part and the early reflected sound part in the BRIR filter, and convolve the first audio signal based on the intercepted direct sound part and early reflected sound part, and although the terminal device sacrifices a certain reverberation compared with the complete BRIR filter, the terminal device can achieve the effect of space localization, simplify the complexity of space audio rendering to a certain extent, and reduce the calculation amount of the space audio rendering.
Specifically, when the target filter is a direct sound part and an early reflected sound part in the BRIR filter, the filter selection and processing module in the terminal equipment can acquire a BRIR filter corresponding to the first head motion data from the BRIR database; the filter selection and processing module may then intercept the direct sound portion and the early reflected sound portion in the BRIR filter corresponding to the first head motion data.
In some embodiments, the BRIR filters in the BRIR database may be calculated to obtain an FIR database or an IIR database by solving for the error minima.
Specifically, the error minimum coefficient can be solved according to the following formulaSum coefficient->
Wherein H (e) ) Representing BRIR filters, E ee (e ) The error is indicated as such,and->Is the coefficient that needs to be solved. That is, at error E ee (e ) In the case of minimum value, the coefficients +.>Sum coefficient->
Solving to obtain coefficientsSum coefficient->After that, the coefficients +.>And coefficient->Ratio, or coefficient->And coefficient->Is determined as an FIR filter or an IIR filter.
When the coefficient isAnd coefficient->When one coefficient is fixed, the ratio represents an FIR filter; when coefficient->And coefficient->When none of them is fixed, the ratio mentioned above represents an IIR filter.
As shown in fig. 17, a graph of the calculated FIR filter versus BRIR filter is shown in a manner that solves for the error minima. The abscissa is the sampling point and the ordinate is the amplitude of the signal.
It can be seen that the deviation between the FIR filter and the BRIR filter is small in the direct sound part and the early reflected sound part of the BRIR filter, and thus the complete BRIR filter can be approximately simulated with the FIR filter.
As shown in fig. 18, a graph of the IIR filter versus BRIR filter calculated by solving for the error minima is shown. The abscissa is the sampling point and the ordinate is the amplitude of the signal.
It can be seen that the deviation between the IIR filter and the BRIR filter is relatively small in the direct sound part and the early reflected sound part of the BRIR filter, and thus the complete BRIR filter can be approximately modeled using the IIR filter.
After the FIR database and the IIR database are obtained in the mode, a filter selection and processing module in the terminal equipment can acquire an FIR filter corresponding to the first head motion data from the FIR database; alternatively, the filter selection and processing module in the terminal device may acquire an IIR filter corresponding to the first head motion data from the IIR database.
In this way, the complete BRIR filter is approximately simulated through the FIR filter with a shorter order or the IIR filter with a shorter order, so that the calculated amount of spatial audio rendering is greatly reduced, and the calculation force requirement of terminal equipment is met; the effect of space positioning can be achieved, and the effect of space audio rendering is not reduced greatly.
In one implementation manner, if the BRIR database, the FIR database and the IIR database are included at the same time, the filter selection and processing module in the terminal device may further determine the target filter according to the available dominant frequency of the terminal device.
As shown in fig. 19, the filter selection and processing module in the terminal device may further determine a specific manner of the target filter according to the available dominant frequency of the terminal device, and may include the following steps:
s1901, determining whether the available dominant frequency of the terminal device is greater than a third preset frequency.
S1902, when the available dominant frequency is less than or equal to a third preset frequency, determining that the target filter is an IIR filter.
S1903, when the available dominant frequency is greater than the third preset frequency, determining whether the available dominant frequency of the terminal device is greater than the second preset frequency.
And S1904, when the available dominant frequency is smaller than or equal to a second preset frequency, determining the target filter as an FIR filter.
S1905, when the available dominant frequency is greater than the second preset frequency, determining whether the available dominant frequency of the terminal device is greater than the first preset frequency.
S1906, when the available dominant frequency is less than or equal to the first preset frequency, determining that the target filter is a direct sound part and an early reflected sound part in the BRIR filter.
And S1907, when the available dominant frequency is larger than the first preset frequency, directly adopting a BRIR filter to conduct spatial audio rendering.
The first preset frequency is larger than the second preset frequency, and the second preset frequency is larger than the third preset frequency. That is, if the available dominant frequency is less than or equal to the first preset frequency and greater than the second preset frequency, the target filter is a direct sound portion and an early reflected sound portion in the BRIR filter; if the available main frequency is smaller than or equal to the second preset frequency and larger than the third preset frequency, the target filter is an FIR filter; if the available dominant frequency is less than or equal to the third preset frequency, the target filter is an IIR filter.
Since the order of the IIR filter is shorter than that of the FIR filter, when the available dominant frequency of the terminal device is very low, spatial audio rendering may be performed using the IIR filter to further reduce the amount of computation of the spatial audio rendering.
In a fourth scenario, when the available main frequency of the terminal device is smaller than or equal to a first preset frequency and the remaining capacity of the earphone device is smaller than or equal to a preset capacity, the decision module determines that a target audio processing mode adopted in spatial audio rendering is a third audio processing mode.
As shown in fig. 20 and 21, the headphone device stops collecting the first head movement data of the user wearing the headphone device, and stops issuing the first head movement data to the terminal device. And, the head motion data capturing module in the terminal device is closed, and the head motion data predicting and compensating module in the terminal device is also closed.
And a filter selection and processing module in the terminal equipment directly acquires a target filter corresponding to the preset head motion data. The horizontal azimuth angle, the inclination angle, the pitch angle, the translation distance in the X-axis direction, the translation distance in the Y-axis direction, and the translation distance in the Z-axis direction in the preset head motion data may all be 0. The target filter includes any one of the following: FIR filter, IIR filter, or direct sound part and early reflected sound part in BRIR filter
It will be appreciated that the filter selection and processing module may further determine which of the direct sound portion and the early reflected sound portion of the FIR filter, the IIR filter, or the BRIR filter the target filter is in with respect to the implementation corresponding to fig. 19.
As shown in fig. 20, for the case that the first audio signal is a mono audio signal or a binaural audio signal, the spatial audio rendering module in the terminal device directly adopts a target filter corresponding to the preset head motion data to render the first audio signal, so as to obtain a fourth audio signal. And finally, a second down-mixing module in the terminal equipment performs down-mixing processing on the fourth audio signal to obtain a target audio signal.
As shown in fig. 21, for the case where the first audio signal includes at least three channels, a first downmix module in the terminal device performs a downmix process on the first audio signal; then, a spatial audio rendering module in the terminal equipment adopts a target filter corresponding to preset head motion data to render the first audio signal after the down-mixing processing to obtain a fourth audio signal; and finally, a second down-mixing module in the terminal equipment performs down-mixing processing on the fourth audio signal to obtain a target audio signal.
It may be appreciated that the manner of the second downmix module to downmix the fourth audio signal may refer to the corresponding embodiment of fig. 11.
In this way, when the available main frequency of the terminal device and the residual power of the earphone device are both insufficient, the terminal device may close all the modules that can be closed, and only the first downmix module, the filter selection and processing module, the spatial audio rendering module and the second downmix module that are the most basic remain.
The audio processing method provided by the embodiment of the present application is described above with reference to fig. 5 to 21, and the device for executing the method provided by the embodiment of the present application is described below. Fig. 22 is a schematic structural diagram of an audio processing device according to an embodiment of the present application, as shown in fig. 22. The audio processing device may be a terminal device or a chip system within a terminal device in an embodiment of the present application.
As shown in fig. 22, the audio processing apparatus 2200 includes a processing unit 2201 and a communication unit 2202. Wherein the processing unit 2201 is configured to support the audio processing device 2200 to perform the above-mentioned processing steps; the communication unit 2202 is used to support the audio processing apparatus 2200 to perform the steps of data transmission and data reception described above. The communication unit 2202 may be an input or output interface, a pin, a circuit, or the like.
Specifically, the communication unit 2202 is configured to acquire a remaining power of the earphone device. The processing unit 2201 is configured to obtain an available primary frequency of the terminal device; if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is larger than the preset electric quantity, processing the first audio signal by adopting a first audio processing mode to obtain a target audio signal; if the available main frequency is larger than the first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, the terminal equipment adopts a second audio processing mode to process the first audio signal, and a target audio signal is obtained. The communication unit 2202 is also used to transmit a target audio signal to the headphone device.
In one possible implementation, the audio processing device 2200 also includes a storage unit 2203. The storage unit 2203 and the processing unit 2201 are connected by a line. The storage unit 2203 may include one or more memories, which may be one or more devices, devices in a circuit for storing programs or data. The memory unit 2203 may exist separately and be connected to the processing unit 2201 through a communication bus. The memory unit 2203 may also be integrated with the processing unit 2201.
The storage unit 2203 may store computer-executable instructions of the method in the terminal device to cause the processing unit 2201 to perform the method in the above-described embodiment. The storage unit 2203 may be a register, a cache, or a random access memory (random access memory, RAM), etc., and the storage unit 2203 may be integrated with the processing unit 2201. The storage unit 2203 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, and the storage unit 2203 may be independent of the processing unit 2201.
Fig. 23 is a schematic structural diagram of a chip according to an embodiment of the present application. As shown in fig. 23, the chip 2300 includes one or more (including two) processors 2301, communication lines 2302, and communication interfaces 2303, and optionally, the chip 2300 further includes a memory 2304.
In some implementations, the memory 2304 stores the following elements: executable modules or data structures, or a subset thereof, or an extended set thereof.
The methods described above for embodiments of the present application may be applied to the processor 2301 or implemented by the processor 2301. The processor 2301 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or by instructions in software in the processor 2301. The processor 2301 described above may be a general purpose processor (e.g., a microprocessor or a conventional processor), a DSP, an application specific integrated circuit (application specific integrated circuit, ASIC), an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gates, transistor logic, or discrete hardware components, and the processor 2301 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application.
The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a state-of-the-art storage medium such as random access memory, read-only memory, programmable read-only memory, or charged erasable programmable memory (electricallyerasable programmable read only memory, EEPROM). The storage medium is located in memory 2304, and the processor 2301 reads information in memory 2304 and performs the steps of the method described above in connection with its hardware.
The processor 2301, the memory 2304 and the communication interface 2303 can communicate with each other through a communication line 2302.
In the above embodiments, the instructions stored by the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded in the form of software and installed in the memory.
Embodiments of the present application also provide a computer program product comprising one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), or semiconductor medium (e.g., solid state disk, SSD)) or the like.
The embodiment of the application provides terminal equipment, which comprises a processor and a memory, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program to execute the audio processing method executed by the terminal equipment.
The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.
As one possible design, the computer-readable medium may include compact disc read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disc storage; the computer readable medium may include disk storage or other disk storage devices. Moreover, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, DVD, floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
Combinations of the above should also be included within the scope of computer-readable media. The foregoing is merely illustrative embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present application, and the application should be covered. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (25)

1. The audio processing method is characterized by being applied to terminal equipment, wherein the terminal equipment is in communication connection with earphone equipment and comprises a camera; the method comprises the following steps:
the terminal equipment acquires the residual electric quantity of the earphone equipment and the available main frequency of the terminal equipment;
if the available main frequency is smaller than or equal to a first preset frequency and the residual electric quantity is larger than a preset electric quantity, the terminal equipment adopts a first audio processing mode to process a first audio signal to obtain a target audio signal;
if the available main frequency is larger than the first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, the terminal equipment adopts a second audio processing mode to process the first audio signal to obtain a target audio signal;
The terminal equipment sends the target audio signal to the earphone equipment;
the first audio processing mode is used for indicating a target filter corresponding to first head motion data issued by the earphone device to render the first audio signal; the target filter includes any one of the following: an FIR filter, an IIR filter, or a direct sound portion and an early reflected sound portion in a BRIR filter;
the second audio processing mode is used for indicating that when the first head motion data issued by the earphone device does not exist, a BRIR filter corresponding to the head motion compensation data is adopted to render the first audio signal; the head motion compensation data includes: second head motion data generated based on the user image captured by the camera, or first predicted head motion data predicted based on historical head motion data.
2. The method of claim 1, wherein the terminal device processes the first audio signal in the first audio processing mode to obtain the target audio signal, comprising:
the terminal equipment acquires first head motion data issued by the earphone equipment;
The terminal equipment acquires a target filter corresponding to the first head motion data;
the terminal equipment adopts a target filter corresponding to the first head motion data to render the first audio signal to obtain a second audio signal;
and the terminal equipment performs down mixing processing on the second audio signal to obtain the target audio signal.
3. The method according to claim 2, wherein the terminal device obtains a target filter corresponding to the first head movement data, comprising:
the terminal equipment acquires a BRIR filter corresponding to the first head motion data from a BRIR database;
the terminal equipment intercepts a direct sound part and an early reflected sound part in a BRIR filter corresponding to the first head motion data;
or the terminal equipment acquires an FIR filter corresponding to the first head motion data from an FIR database;
or the terminal equipment acquires an IIR filter corresponding to the first head motion data from an IIR database.
4. The method according to claim 2, wherein the terminal device renders the first audio signal with a target filter corresponding to the first head motion data to obtain a second audio signal, comprising:
If the first audio signal comprises at least three channels, the terminal equipment performs down-mixing processing on the first audio signal;
the terminal equipment adopts a target filter corresponding to the first head motion data to render the first audio signal after the down-mixing processing to obtain the second audio signal;
or if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device directly adopts a target filter corresponding to the first head motion data to render the first audio signal, so as to obtain the second audio signal.
5. The method of claim 1, wherein the terminal device processes the first audio signal in the second audio processing mode to obtain the target audio signal, comprising:
the terminal equipment determines whether first head motion data issued by the earphone equipment exist or not;
if the first head motion data does not exist, the terminal equipment acquires head motion compensation data;
the terminal equipment acquires a BRIR filter corresponding to the head motion compensation data from a BRIR database;
the terminal equipment adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal to obtain a third audio signal;
The terminal equipment performs down-mixing processing on the third audio signal to obtain the target audio signal;
wherein, in the first audio processing mode, the headphone apparatus transmits the first head movement data at a first transmission frequency; in the second audio processing mode, the headset device transmits the first head movement data at a second transmission frequency, and the second transmission frequency is less than the first transmission frequency.
6. The method of claim 5, wherein the terminal device obtaining the head motion compensation data if the first head motion data is not present comprises:
if the first head motion data does not exist, the terminal equipment determines whether the second head motion data exists or not;
if the second head motion data exists, the terminal equipment determines the second head motion data as the head motion compensation data;
if the second head motion data does not exist, the terminal equipment predicts and obtains the first predicted head motion data based on the historical head motion data acquired for the previous N times;
wherein, N is a positive integer; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera.
7. The method of claim 6, wherein the predicting by the terminal device the first predicted head motion data based on the previous N acquired historical head motion data comprises:
the terminal equipment determines a first time difference between the current time and the time of acquiring the historical head motion data at the previous time; the historical head movement data acquired in the previous time comprises a historical rotation angle and a historical rotation angular velocity;
and the terminal equipment sums the product of the historical rotation angular velocity and the first time difference with the historical rotation angle to obtain the first predicted head movement data.
8. The method of claim 6, wherein the predicting by the terminal device the first predicted head motion data based on the previous N acquired historical head motion data comprises:
the terminal equipment determines the difference value between the twice first historical moment data and the second historical moment data as the first predicted head movement data;
the first historical moment data are historical head movement data acquired in the previous time, and the second historical moment data are historical head movement data acquired in the previous time of the first historical moment data.
9. The method of claim 6, wherein the predicting by the terminal device the first predicted head motion data based on the previous N acquired historical head motion data comprises:
the terminal equipment performs weighted summation on the historical head motion data acquired for the previous N times to obtain first prediction head motion data;
wherein N is a positive integer greater than 1, and a time difference between a time at which the historical head movement data is acquired each time and a current time is inversely proportional to a weight corresponding to the historical head movement data.
10. The method of claim 6, wherein the predicting by the terminal device the first predicted head motion data based on the previous N acquired historical head motion data comprises:
the terminal device adopts an AI model to process the historical head motion data acquired for the previous N times, and predicts to obtain the first predicted head motion data; the AI model is trained based on a plurality of sample head motion data.
11. The method of claim 5, wherein the terminal device renders the first audio signal using a BRIR filter corresponding to the head motion compensation data to obtain a third audio signal, comprising:
If the first audio signal is a mono audio signal or a binaural audio signal, the terminal device performs upmixing processing on the first audio signal;
the terminal equipment adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal after upmixing processing to obtain the third audio signal;
or if the first audio signal includes at least three channels, the terminal device directly adopts a BRIR filter corresponding to the head motion compensation data to render the first audio signal, so as to obtain the third audio signal.
12. The method according to claim 1, further comprising, after the terminal device obtains the remaining power of the earphone device and the available dominant frequency of the terminal device:
if the available main frequency is smaller than or equal to the first preset frequency and the residual electric quantity is smaller than or equal to the preset electric quantity, the terminal equipment adopts a third audio processing mode to process the first audio signal to obtain a target audio signal;
the third audio processing mode is used for indicating a target filter corresponding to preset head motion data to render the first audio signal; the target filter includes any one of the following: FIR filters, IIR filters, or direct sound and early reflected sound portions in BRIR filters.
13. The method of claim 12, wherein the terminal device processes the first audio signal in the third audio processing mode to obtain the target audio signal, comprising:
the terminal equipment acquires a target filter corresponding to the preset head motion data;
the terminal equipment adopts a target filter corresponding to the preset head motion data to render the first audio signal to obtain a fourth audio signal;
and the terminal equipment performs down mixing processing on the fourth audio signal to obtain the target audio signal.
14. The method of claim 13, wherein the terminal device renders the first audio signal using a target filter corresponding to the preset head motion data to obtain a fourth audio signal, comprising:
if the first audio signal comprises at least three channels, the terminal equipment performs down-mixing processing on the first audio signal;
the terminal equipment adopts a target filter corresponding to the preset head motion data to render the first audio signal after the down-mixing processing to obtain the fourth audio signal;
Or if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device directly adopts a target filter corresponding to the preset head motion data to render the first audio signal, so as to obtain the fourth audio signal.
15. The method according to claim 1 or 12, wherein the target filter is a direct sound part and an early reflected sound part in the BRIR filter if the available dominant frequency is less than or equal to the first preset frequency and greater than a second preset frequency; the first preset frequency is greater than the second preset frequency;
if the available main frequency is smaller than or equal to the second preset frequency and larger than a third preset frequency, the target filter is the FIR filter; the second preset frequency is greater than the third preset frequency;
and if the available dominant frequency is smaller than or equal to the third preset frequency, the target filter is the IIR filter.
16. The method according to claim 1, further comprising, after the terminal device obtains the remaining power of the earphone device and the available dominant frequency of the terminal device:
If the available main frequency is larger than the first preset frequency and the residual electric quantity is larger than the residual electric quantity, the terminal equipment adopts a fourth audio processing mode to process the first audio signal to obtain a target audio signal;
wherein the fourth audio processing mode is configured to instruct a BRIR filter corresponding to second predicted head motion data to render the first audio signal; the second predicted head motion data is predicted based on target head motion data or the second predicted head motion data is predicted based on the target head motion data and historical head motion data;
the target head motion data comprises first head motion data issued by the earphone device and/or second head motion data generated by a user image acquired by the camera; the historical head movement data comprises first historical head movement data issued before the earphone device and/or second historical head movement data generated before the terminal device based on user images acquired by the camera.
17. The method of claim 16, wherein the terminal device processes the first audio signal in a fourth audio processing mode to obtain the target audio signal, comprising:
The terminal equipment predicts and obtains second predicted head motion data according to the target head motion data;
or the terminal equipment predicts the second predicted head motion data according to the target head motion data and the historical head motion data;
the terminal equipment acquires a BRIR filter corresponding to the second predicted head motion data from a BRIR database;
the terminal equipment adopts a BRIR filter corresponding to the second prediction head motion data to render the first audio signal to obtain a fifth audio signal;
and the terminal equipment performs down-mixing processing on the fifth audio signal to obtain the target audio signal.
18. The method of claim 17, wherein the target head motion data comprises a target rotation angle and a target rotation angular velocity; the terminal device predicts and obtains the second predicted head motion data according to the target head motion data, and the method comprises the following steps:
the terminal equipment determines a second time difference between the current time and the time of playing the target audio signal through the earphone equipment;
and the terminal equipment sums the product of the target rotation angular velocity and the second time difference with the target rotation angle to obtain the second predicted head motion data.
19. The method of claim 17, wherein the predicting by the terminal device the second predicted head movement data from the target head movement data and the historical head movement data comprises:
the terminal device determines, as the second predicted head movement data, a difference between twice the target head movement data and the previously acquired historical head movement data.
20. The method of claim 17, wherein the predicting by the terminal device the second predicted head movement data from the target head movement data and the historical head movement data comprises:
the terminal equipment processes the target head motion data and the historical head motion data acquired in the previous M times by adopting a weighted moving average method to obtain second predicted head motion data;
wherein M is a positive integer; and the weight corresponding to the target head movement data is larger than the weight corresponding to the historical head movement data obtained each time, and the time difference between the moment when the historical head movement data is obtained each time and the current moment is inversely proportional to the weight corresponding to the historical head movement data.
21. The method of claim 17, wherein the predicting by the terminal device the second predicted head movement data from the target head movement data and the historical head movement data comprises:
the terminal device adopts an AI model to process the target head motion data and the historical head motion data acquired in the previous M times to obtain second predicted head motion data; and M is a positive integer, and the AI model is trained based on a plurality of sample head motion data.
22. The method of claim 17, wherein the terminal device renders the first audio signal using a BRIR filter corresponding to the second predicted head motion data to obtain a fifth audio signal, comprising:
if the first audio signal is a mono audio signal or a binaural audio signal, the terminal device performs upmixing processing on the first audio signal;
the terminal equipment adopts a BRIR filter corresponding to the second prediction head motion data to render the first audio signal after upmixing processing to obtain the fifth audio signal;
or if the first audio signal includes at least three channels, the terminal device directly adopts a BRIR filter corresponding to the second predicted head motion data to render the first audio signal, so as to obtain the fifth audio signal.
23. A terminal device comprising a memory for storing a computer program and a processor for invoking the computer program to perform the audio processing method of any of claims 1 to 22.
24. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or instructions which, when executed, implement the audio processing method of any of claims 1 to 22.
25. A computer program product comprising a computer program which, when run, causes a computer to perform the audio processing method of any one of claims 1 to 22.
CN202211214411.1A 2022-09-30 2022-09-30 Audio processing method and terminal equipment Active CN116709159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211214411.1A CN116709159B (en) 2022-09-30 2022-09-30 Audio processing method and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211214411.1A CN116709159B (en) 2022-09-30 2022-09-30 Audio processing method and terminal equipment

Publications (2)

Publication Number Publication Date
CN116709159A true CN116709159A (en) 2023-09-05
CN116709159B CN116709159B (en) 2024-05-14

Family

ID=87826380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211214411.1A Active CN116709159B (en) 2022-09-30 2022-09-30 Audio processing method and terminal equipment

Country Status (1)

Country Link
CN (1) CN116709159B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496938A (en) * 2023-12-22 2024-02-02 浙江恒逸石化有限公司 Noise processing method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714379A (en) * 2008-10-08 2010-05-26 安凯(广州)软件技术有限公司 Audio resampling method
CN103460716A (en) * 2011-04-08 2013-12-18 高通股份有限公司 Integrated psychoacoustic bass enhancement (PBE) for improved audio
US20150030160A1 (en) * 2013-07-25 2015-01-29 Electronics And Telecommunications Research Institute Binaural rendering method and apparatus for decoding multi channel audio
CN106105269A (en) * 2014-03-19 2016-11-09 韦勒斯标准与技术协会公司 Acoustic signal processing method and equipment
CN106165454A (en) * 2014-04-02 2016-11-23 韦勒斯标准与技术协会公司 Acoustic signal processing method and equipment
CN107797643A (en) * 2017-09-04 2018-03-13 努比亚技术有限公司 Reduce method, terminal and the computer-readable recording medium of terminal operating power consumption
US20180091920A1 (en) * 2016-09-23 2018-03-29 Apple Inc. Producing Headphone Driver Signals in a Digital Audio Signal Processing Binaural Rendering Environment
CN111128218A (en) * 2019-12-31 2020-05-08 恒玄科技(上海)股份有限公司 Echo cancellation method and device
CN111385728A (en) * 2018-12-29 2020-07-07 华为技术有限公司 Audio signal processing method and device
CN113709298A (en) * 2020-05-20 2021-11-26 华为技术有限公司 Multi-terminal task allocation method
CN114067810A (en) * 2020-07-31 2022-02-18 华为技术有限公司 Audio signal rendering method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714379A (en) * 2008-10-08 2010-05-26 安凯(广州)软件技术有限公司 Audio resampling method
CN103460716A (en) * 2011-04-08 2013-12-18 高通股份有限公司 Integrated psychoacoustic bass enhancement (PBE) for improved audio
US20150030160A1 (en) * 2013-07-25 2015-01-29 Electronics And Telecommunications Research Institute Binaural rendering method and apparatus for decoding multi channel audio
CN106105269A (en) * 2014-03-19 2016-11-09 韦勒斯标准与技术协会公司 Acoustic signal processing method and equipment
CN106165454A (en) * 2014-04-02 2016-11-23 韦勒斯标准与技术协会公司 Acoustic signal processing method and equipment
US20180091920A1 (en) * 2016-09-23 2018-03-29 Apple Inc. Producing Headphone Driver Signals in a Digital Audio Signal Processing Binaural Rendering Environment
CN107797643A (en) * 2017-09-04 2018-03-13 努比亚技术有限公司 Reduce method, terminal and the computer-readable recording medium of terminal operating power consumption
CN111385728A (en) * 2018-12-29 2020-07-07 华为技术有限公司 Audio signal processing method and device
CN114531640A (en) * 2018-12-29 2022-05-24 华为技术有限公司 Audio signal processing method and device
CN111128218A (en) * 2019-12-31 2020-05-08 恒玄科技(上海)股份有限公司 Echo cancellation method and device
CN113709298A (en) * 2020-05-20 2021-11-26 华为技术有限公司 Multi-terminal task allocation method
CN114067810A (en) * 2020-07-31 2022-02-18 华为技术有限公司 Audio signal rendering method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496938A (en) * 2023-12-22 2024-02-02 浙江恒逸石化有限公司 Noise processing method and device, electronic equipment and storage medium
CN117496938B (en) * 2023-12-22 2024-03-15 浙江恒逸石化有限公司 Noise processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116709159B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
EP3440538B1 (en) Spatialized audio output based on predicted position data
US10924877B2 (en) Audio signal processing method, terminal and storage medium thereof
US20200029164A1 (en) Interpolating audio streams
WO2019128629A1 (en) Audio signal processing method and apparatus, terminal and storage medium
US11429340B2 (en) Audio capture and rendering for extended reality experiences
US20210160644A1 (en) Priority-based soundfield coding for virtual reality audio
CN108346432B (en) Virtual reality VR audio processing method and corresponding equipment
US11580213B2 (en) Password-based authorization for audio rendering
CN114727212B (en) Audio processing method and electronic equipment
EP3550860A1 (en) Rendering of spatial audio content
WO2021003358A1 (en) Timer-based access for audio streaming and rendering
US11937065B2 (en) Adjustment of parameter settings for extended reality experiences
US12010490B1 (en) Audio renderer based on audiovisual information
CN116709159B (en) Audio processing method and terminal equipment
TW202133625A (en) Selecting audio streams based on motion
CN114422935B (en) Audio processing method, terminal and computer readable storage medium
CN116569255A (en) Vector field interpolation of multiple distributed streams for six degree of freedom applications
CN116095254B (en) Audio processing method and device
CN116347320B (en) Audio playing method and electronic equipment
US11601776B2 (en) Smart hybrid rendering for augmented reality/virtual reality audio
CN117676002A (en) Audio processing method and electronic equipment
CN116781817A (en) Binaural sound pickup method and device
US20230413002A1 (en) Electronic device for applying directionality to audio signal, and method therefor
CN116743913B (en) Audio processing method and device
CN117692845A (en) Sound field calibration method, electronic equipment and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant