CN115802244A

CN115802244A - Virtual bass generation method, medium, and electronic device

Info

Publication number: CN115802244A
Application number: CN202211632895.1A
Authority: CN
Inventors: 黄乘黄; 张宁宁
Original assignee: Shanghai Awinic Technology Co Ltd
Current assignee: Shanghai Awinic Technology Co Ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-03-14

Abstract

The application relates to the technical field of audio processing, and discloses a virtual bass generation method, medium and electronic equipment, which can generate virtual bass with better expressive force, so that the tone, loudness and timbre of the virtual bass are close to the subjective perception of the missing actual bass. The method comprises the following steps: acquiring an input signal frame of an audio signal; acquiring a fundamental frequency of the input signal frame; determining harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises each harmonic order and a harmonic amplitude proportion; and generating a harmonic combined signal corresponding to the virtual bass according to the harmonic strategy information. The method is particularly applicable to electronic equipment with small-size loudspeakers.

Description

Virtual bass generation method, medium, and electronic device

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a virtual bass generation method, medium, and electronic device.

Background

In electronic devices provided with small-sized speakers (i.e., micro-speakers), virtual bass techniques based on psychoacoustics are often employed to reproduce missing low-frequency sounds. Virtual bass techniques typically use higher harmonic combinations of the fundamental signal instead of the original signal, while maintaining the same pitch, without replacing larger size speakers or increasing speaker power, thereby avoiding damage to the speakers due to excessive power.

The key points of the virtual bass technology are as follows: 1. mechanism for generation of higher harmonics. Based on the generation mode of each harmonic, the virtual bass technology can be divided into a time domain method and a frequency domain method. The frequency domain method is based on a phase vocoder, and analyzes information such as frequency, phase, amplitude and the like of a low-frequency signal in a frequency domain through a Fast Fourier Transform (FFT), and constructs subharmonics on the basis. The time domain rule is an operation mechanism using a Non Linear Device (NLD) function, and each harmonic is directly generated in the time domain. 2. Harmonic order selection and amplitude adjustment. No matter which type of harmonic generation mechanism is used, the proper harmonic order and the reasonable proportion of the amplitude of each harmonic are required to be determined so as to achieve the best bass experience on subjective audibility after mixing, and the mixed audibility is represented as deep bass, sufficient strength and no excessively turbid or excessively empty audibility.

By using the existing frequency domain method, the frequency, the phase and the amplitude of the fundamental frequency signal at the frequency multiplication part can be accurately adjusted on the frequency domain, and each subharmonic signal on the time domain is obtained through frequency domain-time domain conversion, so that the pitch, the loudness and the timbre of bass can be completely restored theoretically. However, since the signal is obtained by calculation and synthesis, the realization accuracy of the synthesis calculation itself causes the loss of a part of sound characteristics, and the phase adjustment also introduces distortion, resulting in harmonic signal distortion, which shows that the degree of dyeing of the sound is high and the listening feeling is not natural enough; the synthesis calculation may result in a large amount of calculation and memory occupation, and the signals are returned from the time domain to the frequency domain to the time domain, and may also result in a large amount of calculation and memory occupation.

By using the existing time domain method, because signals are processed on the time domain all the time and synthesis calculation on the frequency domain is not performed, the calculation amount and the memory resource occupation are small, and compared with the frequency domain method, the method has the advantages of higher processing speed and more suitability for commercialization. However, the existing time domain method is based on a certain fixed nonlinear device (NLD) function to generate each harmonic, and the corresponding harmonic order and each harmonic amplitude ratio are all in accordance with a fixed strategy, and practice shows that the requirements for restoring fundamental frequency signals of all categories cannot be effectively covered.

Disclosure of Invention

The embodiment of the application provides a virtual bass generation method, medium and electronic equipment, so that the generated virtual bass has better expressive force, and the tone, loudness and timbre are close to the subjective perception of the missing actual bass.

In a first aspect, an embodiment of the present application provides a virtual bass generation method, which is applied to an electronic device, and includes: acquiring an input signal frame of an audio signal; acquiring a fundamental frequency of the input signal frame; determining harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises each harmonic order and a harmonic amplitude proportion; and generating a harmonic combination signal corresponding to the virtual bass according to the harmonic strategy information. The method integrates the advantages of two mechanisms, namely a frequency domain method and a time domain method, and overcomes the respective defects of the two mechanisms. The method is essentially a time-domain virtual bass implementation method, on one hand, the advantage of small calculation amount common in the traditional time-domain method is exerted, and the method is convenient for commercialization; on the other hand, the method overcomes the defect that the proportion of each harmonic order and amplitude cannot be flexibly adjusted by the traditional time domain method, and can freely select reasonable harmonic order and amplitude proportion on the time domain according to the signal characteristics of different audio materials, so that the generated virtual bass has better expressive force, and the tone, loudness and timbre are close to the subjective perception of the missing actual bass.

In a possible implementation of the first aspect, the harmonic strategy information is different for input signal frames of different fundamental frequencies. It will be appreciated that the fundamental frequency of the input signal frame may distinguish between different audio signals, i.e. between different audio material. And the harmonic order strategy and the harmonic amplitude proportion corresponding to the virtual bass of different audio materials are different. That is, usually, the audio signals of different fundamental frequencies have different harmonic strategy information.

In a possible implementation of the first aspect, the generating a harmonic combination signal corresponding to a virtual bass according to the harmonic strategy information includes: using y = f (x) = h ₀ +h ₁ x+h ₂ x ² +h ₃ x ³ + \8230a

Generating a harmonic group corresponding to virtual bass according to the harmonic strategy informationCombining the signals; wherein y represents a harmonic combination signal, x is an original input signal frame, polynomial coefficients h1, h2, h3, \ 8230, and hk is a coefficient of each harmonic in the harmonic combination signal y; c1, c2 and c3 \8230ckis the harmonic amplitude of each harmonic in the harmonic combination signal y, and k is the order number of the highest harmonic; j is a natural number.

In one possible implementation of the first aspect, the fundamental frequency of the input signal frame may be determined based on a time domain method or based on a frequency method. The frequency domain analysis method may perform signal frequency domain analysis based on short-time fourier transform (FFT), and identify a fundamental frequency of a signal frame. The time domain analysis method may use a frequency tracking method to calculate the principal component frequency ω of the input signal by iterative computation ₀ I.e. the fundamental frequency.

In a possible implementation of the first aspect, the method further includes: mixing the harmonic combined signal with the audio signal to obtain a mixed signal; and outputting the mixed signal. It is understood that the mixed signal is an audio signal to which virtual bass is added, and the mixed signal is an audio signal with better virtual bass expressiveness.

In a second aspect, an embodiment of the present application provides a virtual bass generation apparatus, including: the audio acquisition module is used for acquiring an input signal frame of an audio signal; the frequency acquisition module is used for acquiring the fundamental frequency of the input signal frame; the determining module is used for determining harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises each harmonic order and a harmonic amplitude ratio; and the generating module is used for generating a harmonic combination signal corresponding to the virtual bass according to the harmonic strategy information.

In a possible implementation of the second aspect, the harmonic strategy information is different for input signal frames of different fundamental frequencies.

In a possible implementation of the second aspect, the generating module is specifically configured to use y = f (x) = h ₀ +h ₁ x+h ₂ x ² +h ₃ x ³ + \8230a

Generating a harmonic combination signal corresponding to the virtual bass according to the harmonic strategy information; wherein y represents a harmonic combination signal, x is an original input signal frame, polynomial coefficients h1, h2, h3, \ 8230, and hk is a coefficient of each harmonic in the harmonic combination signal y; c1, c2 and c3 \8230ckis the harmonic amplitude of each harmonic in the harmonic combined signal y, and k is the order number of the highest harmonic; j is a natural number.

In one possible implementation of the second aspect, the fundamental frequency of the input signal frame may be determined based on a time domain method or determined based on a frequency method.

In a possible implementation of the second aspect, the apparatus further includes: a mixing unit, configured to mix the harmonic combined signal with the audio signal to obtain a mixed signal; an output unit for outputting the mixed signal.

In a third aspect, this application provides a readable medium having instructions stored thereon, where the instructions, when executed on an electronic device, cause the electronic device to perform the virtual bass generation method in the first aspect and any possible implementation manner thereof.

In a fourth aspect, this application provides an electronic device, including: a memory for storing instructions for execution by one or more processors of an electronic device, and a processor, which is one of the processors of the electronic device, for performing the virtual bass production method of the first aspect and any possible implementation thereof.

Drawings

Fig. 1 illustrates a schematic diagram of an equal loudness curve, according to some embodiments of the present application;

fig. 2 is a schematic flow diagram illustrating a phase vocoder based frequency domain method virtual bass generation method according to some embodiments of the present application;

FIG. 3 illustrates a schematic structural diagram of a virtual bass producing device, in accordance with some embodiments of the present application;

FIG. 4 illustrates a flow diagram of a virtual bass generation method, in accordance with some embodiments of the present application;

fig. 5 illustrates a schematic diagram of a handset, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, a virtual bass generation method, medium, and electronic apparatus.

The micro-speaker is widely applied to various small-size electronic equipment such as mobile phones, tablet computers and smart home products. Due to the limitation of the size, the structure and the frequency response curve of the micro loudspeaker, when audio is played, the fundamental frequency signals of the sound field of the micro loudspeaker are lacked, and the low-pitch playback capability is poor.

Virtual bass techniques reproduce the missing low frequency sound using a higher harmonic combination of the fundamental signal instead of the original signal. To restore the missing bass in the sound signal, it is necessary to start with three dimensions of tone, loudness and timbre. I.e. restoring the missing bass can be done in terms of pitch restoration, loudness restoration and tone restoration.

In the field of psychoacoustics, there is an effect called "virtual pitch" that achieves pitch restoration for missing fundamental frequency components. Specifically, if the fundamental frequency component is missing in a certain section of sound signal, the human ear still can feel the same tone as the missing fundamental frequency signal by the superposition of the high-order harmonic components of the fundamental frequency. Such tones that are subjectively perceived by a person through harmonic signal combination are called "virtual tones".

The basis for loudness restoration is an equal loudness curve. As shown in fig. 1, is a schematic diagram of an equal loudness curve. In this case, the ordinate of the graph shown in fig. 1 shows the Sound Pressure Level (SPL) size in dB. The sound pressure level SPL is a quantity related to the sound power, which characterizes the acoustic energy emitted by the sound source per unit time. The SPL has a non-linear positive correlation with the signal amplitude. The Loudness (Loudness level) of each curve in fig. 1 is equal in audibility, measured in units of square (phon). The lowest dotted line in fig. 1 is the lower limit of the perception Threshold of loudness by human auditory sense (Threshold of hearing), which is generally in the range of 10 to 110 square. The equal loudness curve shows that when the fundamental frequency signal is absent, the loudness identical to or even higher than that of the fundamental frequency signal can be obtained in hearing sense by reasonably configuring the amplitude of each order of high-order harmonic wave of the fundamental frequency signal. This is the theoretical basis for the use of virtual bass techniques for loudness control.

The basis of tone restoration is the recognition of the overtone effect. The pitch is generally the lowest frequency sound, i.e., the sound represented by the fundamental frequency component referred to above. The broad overtone refers to a sound of all frequencies except the fundamental tone, and in a narrow sense refers to a sound set whose frequency values are integer multiples of the fundamental frequency (i.e., each harmonic of the fundamental frequency sound), and the narrow overtone is also called a "chord". The patent specification adopts a narrow definition. Studies have shown that the proportion of different harmonics shapes the timbre of sound, known as "tonal timbre". Therefore, the amplitude ratio of each harmonic can be modulated to a reasonable value until the tone of the harmonic combination is closest to the tone of the original fundamental frequency in subjective auditory perception.

In summary, to recover the sound with the missing fundamental frequency component, a specific higher harmonic signal corresponding to the fundamental frequency needs to be generated and controlled, that is, the order of the higher harmonic of the fundamental frequency can be selected, and the amplitude of each order of the higher harmonic of the fundamental frequency is modulated, so that a reasonable ratio exists between the amplitude values of each order of the higher harmonic.

Based on the above theory, it can realize the 'virtual bass' technology, i.e. for a certain low frequency signal f0 lower than the cut-off frequency fc of the loudspeaker, the harmonic generation mechanism is used to generate the high order harmonics of its orders, and after these harmonics are reasonably combined and the amplitudes are reasonably configured, the fundamental frequency signal f can be generated ₀ Similar bass sounds in subjective perception, i.e. "virtual bass".

In the existing frequency domain virtual bass scheme, as shown in fig. 2, a schematic flow chart of a phase vocoder based frequency domain virtual bass generation method is shown. After being framed, the signal is used as an input signal frame and is subjected to short-time FFT analysis, signal amplitude and phase modulation, signal synthesis and inverse FFT (namely IFFT) transformation, and harmonic signal combination in a time domain, namely virtual bass, is generated.

Although the frequency domain method can accurately adjust the frequency, the phase and the amplitude of the fundamental frequency signal at the frequency multiplication position on the frequency domain, and obtain each harmonic signal on the time domain through frequency domain-time domain conversion, the tone, the loudness and the timbre of bass can be completely restored theoretically. However, the implementation accuracy of signal synthesis calculation causes the loss of some sound characteristics, and the phase adjustment also introduces distortion, resulting in harmonic signal distortion, which shows that the degree of dyeing of sound is high and the hearing is not natural enough; the modulation synthesis of signals and the multiple conversion between time-frequency domains can generate larger calculated amount and memory resource occupation.

In the existing time domain method virtual bass scheme, a common existing time domain method virtual bass scheme is shown in table 1 below, and based on a specific NLD function, a higher harmonic combination of a fundamental frequency is directly generated in a time domain. Specifically, table 1 shows a relationship between an implementation idea of a time domain method, generation of a harmonic order, and a harmonic amplitude ratio in a virtual bass scheme of an existing time domain method.

Table 1:

the time domain method is specifically described by taking two time domain method implementation schemes, i.e., the multiplier in row 1 and the rectifier in row 2 in table 1 as examples.

In the time domain method using a multiplier, this time domain method harmonic generation mechanism is based on a nonlinear operation of the multiplier. The method provides a realization framework in which input signals are processed n times in a circulating way to generate a composite signal x (t) + x (t) 2+ \ 8230, + x (t) n, and the composite signal contains 2-n harmonic signals. The amplitude modulation strategy of each subharmonic used by the method mechanism is to use a fitting polynomial formula, and the coefficient is fixed, so that the amplitude proportion of each subharmonic is fixed.

In need ofIn the time domain method using a rectifier, such a time domain method harmonic generation mechanism is based on a half-wave rectification function or a full-wave rectification function. Typical half-wave rectification function:

typical full wave rectification function: y = | x |. However, although the rectification method is simple, only even harmonics can be generated, and the flexibility is not enough; and the harmonic amplitude ratio is fixed.

In summary, for various implementation methods of the existing time domain method, since signals are processed in the time domain all the time and there is no synthesis calculation in the frequency domain, the calculation amount and the memory resource occupation are small, and compared with the frequency domain method, the processing speed is faster and the method is more suitable for commercialization. However, the existing time domain method performs each harmonic generation based on a certain fixed nonlinear device function (NLD), and the corresponding harmonic order and each harmonic amplitude ratio follow a fixed strategy, and practice shows that the requirement of restoring all kinds of fundamental frequency signals cannot be effectively covered.

In order to solve the problem of generating virtual bass by using the existing frequency domain method and time domain method, the embodiment of the application provides a virtual bass generation method, which can identify the fundamental frequency of an input signal frame, and determine a harmonic strategy according to the fundamental frequency, specifically determine strategy information such as the harmonic order and harmonic amplitude ratio of virtual bass. Further, harmonic generation is performed in a signal path of the input signal frame based on the strategy information, and a harmonic signal combination is obtained. The method combines the advantages of the frequency domain method and the time domain method, and overcomes the respective disadvantages of the frequency domain method and the time domain method. The method is essentially a time-domain virtual bass implementation method, on one hand, the advantage of small calculation amount common in the traditional time-domain method is exerted, and the method is convenient for commercialization; on the other hand, the method overcomes the defect that the proportion of each harmonic order and amplitude cannot be flexibly adjusted by the traditional time domain method, and can freely select reasonable harmonic order and amplitude proportion on the time domain according to the signal characteristics of different audio materials, so that the generated virtual bass has better expressive force, and the tone, loudness and timbre are close to the subjective perception of the missing actual bass.

Referring to fig. 3, a schematic structural diagram of a virtual bass producing apparatus provided in an embodiment of the present application is shown, where the apparatus may be deployed in an electronic device. The device 30 mainly comprises a fundamental frequency identification module 32, a harmonic strategy determination module 33, a harmonic generation module 34, a signal frame input module 31 and a harmonic combination signal output module 35.

The signal frame input module 31 is configured to perform framing processing on the audio signal to obtain an input signal frame.

The fundamental frequency identification module 32 is configured to perform fundamental frequency identification on the input signal frames to obtain the fundamental frequency of each signal frame. Among them, the fundamental frequency (fundamental frequency) is the lowest and usually strongest frequency of a complex sound, most of which are generally considered as the fundamental pitch of the sound.

The harmonic strategy determination module 33 includes a harmonic order selection module 331 and a harmonic amplitude ratio determination module 332. The harmonic order selection module 331 is configured to select a higher harmonic order strategy, such as highest order, continuous order, simple odd order, simple even order, and the like. The harmonic amplitude proportion determining module 332 is configured to determine the selected amplitude proportion of each order harmonic.

And a harmonic generation module 34, configured to generate a harmonic combination signal on a signal channel of the input signal frame according to the harmonic order and the harmonic amplitude ratio determined by the harmonic strategy determination module 33.

The harmonic combined signal output module 35 is configured to output a harmonic combined signal, i.e., a virtual bass signal, on the signal channel.

More specifically, referring to fig. 4, a specific flow of the virtual bass sound generation method provided in the embodiment of the present application is shown in conjunction with the virtual bass sound generation apparatus 30 shown in fig. 3. The execution subject of the method shown in fig. 4 is an electronic device that deploys the apparatus 30 shown in fig. 3, and the method includes the following steps:

s401: the electronic device acquires an audio signal to be processed.

It is understood that the audio signal to be processed is an audio signal requiring generation of a virtual bass, i.e. a user requires recovery of bass in the audio signal.

S402: the electronic equipment carries out framing processing on the audio signal to be processed to obtain an input signal frame.

Wherein, the above S402 may be executed by the signal frame input module 31 of the apparatus 30 in the electronic device.

It is understood that audio data is different from video data, and there is no frame concept, but the audio data collected by us is a segment for transmission and storage. In order for the program to be able to run in batches, it is segmented according to a specified length (time period or number of samples) and structured into a programmable data structure, that is, a frame. The audio signal is not stationary macroscopically, stationary microscopically, and has a short stationarity (the speech signal can be considered approximately constant within 10-30 ms). I.e. an audio signal such as a speech signal has a time-varying characteristic, but within a short time range (generally considered to be within a short time of 10-30 ms), the characteristic thereof remains substantially unchanged, i.e. is relatively stable, and thus it can be considered as a quasi-steady state process, i.e. the speech signal has a short-time stationarity. The audio signal can therefore be processed in short segments, each of which is called a Frame (Frame), which is typically 10 to 30ms long. That is, the assumption that the audio signal is substantially stable can be considered for a short time frame of 10ms to 30ms. As an example, one of the input signal frames described above may be a 10-30ms audio signal.

Furthermore, the short time frame length of an audio signal is typically related to the sampling frequency of the audio signal. For example, for an audio signal with a 48K sampling rate, the short time frame length may be 10ms to 30ms. Where 48K sample rate refers to 48000 samples in 1 second(s), with 1/48000s between each 2 samples.

S403: the electronic equipment acquires a current frame in an input signal frame and carries out fundamental frequency identification to obtain fundamental frequency.

The above S403 may be executed by the fundamental frequency identification module 31 of the apparatus 30 in the electronic device.

In some embodiments, the above algorithm for identifying the fundamental frequency includes, but is not limited to:

(1) Frequency domain analysis: a signal frequency domain analysis may be performed based on a short-time fourier transform (FFT) to identify the fundamental frequency of the signal frame. For example, the frequency domain method may first perform fourier transform on the signal to obtain a frequency spectrum (only taking the magnitude spectrum and discarding the phase spectrum). There are peaks at integer multiples of the fundamental frequency in the spectrum, and the basic principle of the frequency domain method is to require the greatest common divisor of the frequencies of these peaks.

(2) Time domain analysis: the principal component frequency omega of the input signal can be calculated by iterative calculation by using a frequency tracking method ₀ I.e. the fundamental frequency. It is understood that the time domain method takes the waveform of a sound as an input, and the basic principle is to find the minimum positive period of the waveform. Of course, the periodicity of the actual signal can only be approximated.

S404: the electronic equipment determines harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises a harmonic order and a harmonic amplitude proportion.

It can be understood that the information in the harmonic strategy information is not limited to the harmonic order and the harmonic amplitude ratio, and may also be other information characterizing the harmonic characteristics, which is not specifically limited in this application embodiment.

The above S404 may be executed by the harmonic strategy determining module 33 of the apparatus 30 in the electronic device, specifically, the harmonic order is determined by the harmonic order selecting module 331, and the harmonic amplitude ratio is determined by the harmonic amplitude ratio determining module 332.

It will be appreciated that the fundamental frequency of the input signal frame may distinguish between different audio signals, i.e. between different audio material. And harmonic order strategies and harmonic amplitude proportions corresponding to virtual bass of different audio materials are different. That is, usually the harmonic strategy information is different for audio signals of different fundamental frequencies.

In some embodiments, specific harmonic strategy information such as a higher harmonic order strategy (highest order, continuous order/simple odd order/simple even order, etc.), a harmonic amplitude ratio, etc. for subsequent use may be selected according to a preset algorithm rule. For example, it is determined to use 2,3,4 consecutive harmonics, the highest order being 4, and determine the 2,3,4 harmonic amplitude ratio according to the fundamental frequency. In addition, the algorithm applied by the preset algorithm rule includes, but is not limited to, table lookup, pattern matching, decision tree, and the like, which is not specifically limited in this embodiment of the present application.

In some embodiments, the preset algorithm can match different harmonic orders and harmonic amplitude ratios to audio signals of different audio materials, so that the expressive force of subsequently generated virtual bass is better.

S405: and the electronic equipment generates a harmonic combination signal from the input signal frame according to the determined harmonic strategies such as the harmonic order, the harmonic amplitude ratio and the like.

Wherein S405 may be executed by the harmonic generation module 34 of the apparatus 30 in the electronic device.

In some embodiments, in the time domain, the harmonic overlap-add signal y may be represented by a fitting polynomial, where x is the original input signal, as shown in equation (1) below:

y＝f(x)＝h ₀ +h ₁ x+h ₂ x ² +h ₃ x ³ +… (1)

in some embodiments, the Schaefer formula, as shown in equation (2), reveals the quantitative relationship between the polynomial coefficients h1, h2, h3 \8230inequation (1), and the harmonic amplitudes c1, c2, c3, \8230, where k is the order of the highest harmonic.

The corresponding higher harmonic orders k and c can then be determined according to the specific harmonic strategy by inverse using equation (2) _k The value is calculated by the polynomial coefficient h, and a harmonic generator most suitable for the bass characteristic of the input signal frame is constructed, such as the harmonic generation block 33. Finally, the original signal frame is input to the harmonic generation module 33 through the signal path, so that the required virtual bass, that is, the virtual bass represented by the harmonic combination signal, can be directly generated in the time domain.

S406: the electronics output the harmonic combined signal as virtual bass.

Wherein, the above S406 may be executed by the harmonic combined signal output module 35 of the apparatus 30 in the electronic device.

S407: the electronic equipment combines the harmonic combination signal with the audio signal to be processed to obtain a mixed signal, and outputs the mixed signal.

It is to be understood that the above-described mixed signal is an audio signal to which virtual bass is added, and the mixed signal is an audio signal with better virtual bass expressiveness.

In the embodiment of the application, the provided virtual bass generation method is a time-domain virtual bass implementation method, on one hand, the advantage of small general calculated amount of the traditional time-domain method is exerted, and the commercialization is facilitated; on the other hand, aiming at different audio material signals, reasonable harmonic order and harmonic amplitude proportion can be freely selected in the time domain, so that the generated virtual bass has better expressive force, and the tone, loudness and timbre are close to the subjective perception of the missing actual bass, thereby overcoming the problem that the traditional time domain method cannot flexibly adjust the harmonic order and the amplitude proportion. Namely, the method combines the advantages of the frequency domain method and the time domain method, and overcomes the respective disadvantages of the frequency domain method and the time domain method.

Next, a hardware structure of an electronic apparatus of the virtual bass sound generation method provided in the embodiment of the present application will be described. As an example, the electronic device is taken as a mobile phone. Then, a virtual bass sound generation apparatus provided in the embodiment of the present application, such as the apparatus 30 shown in fig. 3, may be disposed in the electronic device to support the electronic device to execute the virtual bass sound generation method provided in the above-described embodiment.

As shown in fig. 5, the mobile phone 40 may include a processor 110, a power module 140, a memory 180, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, a camera 170, an interface module 160, keys 101, a display screen 102, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the mobile phone 40. In other embodiments of the present application, the handset 40 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more Processing units, for example, a Processing module or a Processing circuit that may include a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a Digital Signal Processor (DSP), a Micro-programmed Control Unit (MCU), an Artificial Intelligence (AI) processor, or a Programmable logic device (FPGA), etc. The different processing units may be separate devices or may be integrated into one or more processors. A memory unit may be provided in the processor 110 for storing instructions and data. In some embodiments, the storage unit in processor 110 is cache 180. For example, data such as harmonic strategy information in the above virtual bass sound generation process may be stored in the cache memory 180, and the processor 110 may generate the data such as harmonic strategy information into virtual bass sounds represented by harmonic combination signals corresponding to the audio signals to be processed, thereby implementing the virtual bass sound generation method in the above embodiment.

The power module 140 may include a power supply, power management components, and the like. The power source may be a battery. The power management component is used for managing the charging of the power supply and the power supply of the power supply to other modules. In some embodiments, the power management component includes a charge management module and a power management module. The charging management module is used for receiving charging input from the charger; the power management module is used for connecting a power supply, the charging management module and the processor 110. The power management module receives power and/or charge management module input and provides power to the processor 110, the display 102, the camera 170, and the wireless communication module 120.

The mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, an LNA (Low noise amplifier), and the like. The mobile communication module 130 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the handset 40. The mobile communication module 130 may receive electromagnetic waves from the antenna, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 130 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the same device as at least some of the modules of the processor 110. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), bluetooth (BT), global Navigation Satellite System (GNSS), wireless Local Area Network (WLAN), near Field Communication (NFC), frequency modulation (FM, radio communication) and/or Infrared (IR), etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The wireless communication module 120 may include an antenna, and implement transceiving of electromagnetic waves via the antenna. The wireless communication module 120 may provide a solution for wireless communication applied to the mobile phone 40, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The handset 40 may communicate with the network and other devices via wireless communication techniques.

In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the handset 40 may also be located in the same module.

The display screen 102 is used for displaying human-computer interaction interfaces, images, videos and the like. The display screen 102 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like.

The sensor module 190 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

The audio module 150 is used to convert digital audio information into an analog audio signal output or convert an analog audio input into a digital audio signal. The audio module 150 may also be used to encode and decode audio signals. In some embodiments, the audio module 150 may be disposed in the processor 110, or some functional modules of the audio module 150 may be disposed in the processor 110. In some embodiments, audio module 150 may include speakers, an earpiece, a microphone, and a headphone interface. For example, the audio module 150 may input an audio signal to be processed through a microphone and output a signal after mixing the audio signal to be processed and the generated virtual bass through a speaker or a headphone interface.

The camera 170 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The light receiving element converts the optical Signal into an electrical Signal, and then transmits the electrical Signal to an ISP (Image Signal Processing) to be converted into a digital Image Signal. The mobile phone 40 may implement a shooting function through an ISP, a camera 170, a video codec, a GPU (graphics Processing Unit), the display screen 102, an application processor, and the like.

The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, a Subscriber Identity Module (SIM) card interface, and the like. The external memory interface may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the mobile phone 40. The external memory card communicates with the processor 110 through an external memory interface to implement a data storage function. The usb interface is used for communication between the mobile phone 40 and other electronic devices. The SIM card interface is used to communicate with a SIM card installed in the handset 4010, for example, to read a phone number stored in the SIM card or to write a phone number into the SIM card.

In some embodiments, the handset 40 also includes keys 101, motors, indicators, and the like. The keys 101 may include a volume key, an on/off key, and the like. The motor is used to cause a vibration effect to the cell phone 40, for example, when the user's cell phone 40 is being called, to prompt the user to answer the incoming call from the cell phone 40. The indicators may include laser indicators, radio frequency indicators, LED indicators, and the like.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A virtual bass generation method applied to an electronic device is characterized by comprising the following steps:

acquiring an input signal frame of an audio signal;

acquiring a fundamental frequency of the input signal frame;

determining harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises each harmonic order and a harmonic amplitude proportion;

and generating a harmonic combined signal corresponding to the virtual bass according to the harmonic strategy information.

2. The method of claim 1, wherein the harmonic strategy information is different for input signal frames of different fundamental frequencies.

3. The method of claim 1 or 2, wherein generating a harmonic combination signal corresponding to virtual bass according to the harmonic strategy information comprises:

using y = f (x) = h ₀ +h ₁ x+h ₂ x ² +h ₃ x ³ + \8230a

Generating a harmonic combination signal corresponding to the virtual bass according to the harmonic strategy information;

wherein y represents a harmonic combined signal, x is an original input signal frame, polynomial coefficients h1, h2, h3 \8230, hk is coefficients of each harmonic in the harmonic combined signal y; c1, c2 and c3 \8230ckis the harmonic amplitude of each harmonic in the harmonic combined signal y, and k is the order number of the highest harmonic; j is a natural number.

4. The method according to claim 1, characterized in that the fundamental frequency of the input signal frame can be determined based on a time domain method or based on a frequency method.

5. The method of claim 1, further comprising:

mixing the harmonic combined signal with the audio signal to obtain a mixed signal;

and outputting the mixed signal.

6. A virtual bass generation apparatus, comprising:

the audio acquisition module is used for acquiring an input signal frame of an audio signal;

a frequency obtaining module, configured to obtain a fundamental frequency of the input signal frame;

the determining module is used for determining harmonic strategy information according to the fundamental frequency of the input signal frame, wherein the harmonic strategy information comprises each harmonic order and a harmonic amplitude proportion;

and the generating module is used for generating a harmonic combination signal corresponding to the virtual bass according to the harmonic strategy information.

7. The apparatus of claim 6, wherein the harmonic strategy information is different for input signal frames of different fundamental frequencies.

8. The apparatus according to claim 6 or 7, wherein the generating means is specifically configured to use y = f (x) = h ₀ +h ₁ x+h ₂ x ² +h ₃ x ³ + \8230a

wherein y represents a harmonic combined signal, x is an original input signal frame, polynomial coefficients h1, h2, h3 \8230, hk is coefficients of each harmonic in the harmonic combined signal y; c1, c2 and c3 \8230ckis the harmonic amplitude of each harmonic in the harmonic combination signal y, and k is the order number of the highest harmonic; j is a natural number.

9. The apparatus of claim 6, wherein the fundamental frequency of the input signal frame can be determined based on a time domain method or based on a frequency method.

10. The apparatus of claim 6, further comprising:

a mixing unit, configured to mix the harmonic combined signal with the audio signal to obtain a mixed signal;

an output unit for outputting the mixed signal.

11. A readable medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the virtual bass generation method of any of claims 1-5.

12. An electronic device, comprising: a memory for storing instructions for execution by one or more processors of an electronic device, and a processor, one of the processors of the electronic device, for performing the virtual bass production method of any of claims 1 to 5.