CN109587603B

CN109587603B - Volume control method, device and storage medium

Info

Publication number: CN109587603B
Application number: CN201811506570.2A
Authority: CN
Inventors: 张晨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2020-11-10
Anticipated expiration: 2038-12-10
Also published as: CN109587603A

Abstract

The disclosure relates to a volume control method, a volume control device and a storage medium, and belongs to the technical field of signal processing. The method comprises the following steps: acquiring a sound signal; acquiring an energy matrix of the sound signal; determining an activity level of a target sound in the sound signal based on an energy matrix of the sound signal; determining an actual gain of the sound signal based on the activity level of the target sound; and adjusting the sound signal according to the actual gain so as to control the volume of the sound signal. The activity degree of the target sound in the sound signal is determined through the energy matrix of the sound signal, and the actual gain of the sound signal is determined through the activity degree of the target sound, so that the background noise in the environment can be prevented from being amplified, the control of the volume is enabled to be matched with the auditory characteristic, and the volume control effect can be improved.

Description

Volume control method, device and storage medium

Technical Field

The present disclosure relates to the field of signal processing technologies, and in particular, to a volume control method and apparatus, and a storage medium.

Background

With the rise of the internet, social media relying on the internet are more and more, and live webcasting is one of the social media. The network live broadcast absorbs and continues the advantages of the internet, and the network live broadcast is carried out by utilizing a video mode, because the live broadcast is real-time, the live broadcast environment is different, the sound of the anchor is different in size, and the distance from a microphone is also different, so the loudness difference of the live broadcast sound is very large. In order to make the experience of the loudness of the live broadcast more consistent, the loudness of the live broadcast is prevented from being suddenly changed, and the loudness of the live broadcast needs to be automatically controlled.

In the related art, the gain value is automatically adjusted according to the amplitude of the input signal and the target amplitude so that the amplitude of the output signal approaches the target amplitude.

However, this method is mainly based on the signal amplitude, and may amplify the background noise in the environment, resulting in poor control effect.

Disclosure of Invention

The present disclosure provides a volume control method, apparatus, and storage medium, which can overcome the problems of the related art.

According to a first aspect of embodiments of the present disclosure, there is provided a volume control method, including:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

determining an activity level of a target sound in the sound signal based on an energy matrix of the sound signal;

determining an actual gain of the sound signal based on the activity level of the target sound;

and adjusting the sound signal according to the actual gain so as to control the volume of the sound signal.

In a possible embodiment, the acquiring an energy matrix of the sound signal includes:

converting the sound signal into a frequency domain signal by FFT (Fast Fourier transform);

acquiring a frequency domain energy signal of the frequency domain signal;

and combining the frequency domain energy signals with the frequency domain energy signals of the previous reference quantity to obtain an energy matrix of the sound signals.

In a possible embodiment, the determining the activity of the target sound in the sound signal based on the energy matrix of the sound signal includes:

acquiring features of an equivalent diagram of the energy matrix based on the energy matrix of the sound signal, wherein the features of the equivalent diagram of the energy matrix comprise at least one of gray-scale richness and texture complexity;

determining an activity level of a target sound in the sound signal based on features of an equivalent graph of the energy matrix.

In a possible embodiment, the obtaining the feature of the equivalent diagram of the energy matrix based on the energy matrix of the sound signal includes:

acquiring the variance of the energy matrix and acquiring the mean value of the energy matrix;

and obtaining the gray level richness of the equivalent diagram of the energy matrix according to the variance and the mean value of the energy matrix.

dividing the equivalent diagram of the energy matrix into a plurality of sub-blocks, and performing intra-frame prediction in different directions on each sub-block;

acquiring an absolute value error between a predicted value of any sub-block in any direction and actual pixel values of all rows of the sub-block, averaging the absolute value error of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction;

averaging the block errors of any sub-block in each direction, and taking the obtained average block error as the distortion value of any sub-block;

and averaging the distortion values of all the sub-blocks, and taking the obtained average distortion value as the texture complexity of the equivalent diagram of the energy matrix.

In a possible embodiment, the determining the activity of the target sound in the sound signal based on the features of the equivalent graph of the energy matrix includes:

determining an initial activity level of a target sound in the sound signal based on features of an equivalent graph of the energy matrix;

and determining the activity degree of the target sound in the sound signal according to the relation between the initial activity degree and the activity degree threshold value.

In a possible embodiment, the determining an initial activity level of a target sound in the sound signal based on the features of the equivalent map of the energy matrix includes:

and carrying out weighted summation on the gray scale richness and the texture complexity of the equivalent diagram of the energy matrix, and taking the obtained weighted summation result as the initial activity of the target sound in the sound signal.

In a possible implementation, the determining the activity level of the target sound in the sound signal according to the relationship between the initial activity level and the activity level threshold includes:

if the initial activity is greater than a first activity threshold, the activity of the target sound is a first reference value;

if the initial activity is smaller than a second activity threshold, the activity of the target sound is a second reference value;

if the initial activity is greater than the second activity threshold and smaller than the first activity threshold, acquiring a first difference between the initial activity and the second activity threshold, acquiring a second difference between the first activity threshold and the second activity threshold, and taking a quotient of the first difference and the second difference as the activity of the target sound; wherein the first activity threshold is greater than the second activity threshold, and the first reference value is greater than the second reference value.

In one possible embodiment, the determining the actual gain of the sound signal based on the activity level of the target sound includes:

obtaining the loudness value of the sound signal;

determining a loudness value to be adjusted based on the loudness value of the sound signal and a target loudness value;

obtaining a target gain according to the loudness value to be adjusted;

determining a gain change step size of the sound signal according to the activity degree of the target sound;

and determining the actual gain of the sound signal according to the gain change step length of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

In a possible implementation, after the adjusting the sound signal according to the actual gain, the method further includes:

and carrying out amplitude limiting processing on the adjusted sound signal.

According to a second aspect of the embodiments of the present disclosure, there is provided a volume control device including:

a first acquisition unit configured to acquire a sound signal;

a second acquisition unit configured to acquire an energy matrix of the sound signal;

a first determination unit configured to determine an activity degree of a target sound in the sound signal based on an energy matrix of the sound signal;

a second determination unit configured to determine an actual gain of the sound signal based on the activity degree of the target sound;

a control unit configured to adjust the sound signal according to the actual gain to control a volume of the sound signal.

In a possible embodiment, the second obtaining unit is configured to convert the sound signal into a frequency domain signal by a fourier transform FFT; acquiring a frequency domain energy signal of the frequency domain signal; and combining the frequency domain energy signals with the frequency domain energy signals of the previous reference quantity to obtain an energy matrix of the sound signals.

In a possible implementation, the first determining unit includes:

an acquisition subunit configured to acquire, based on an energy matrix of the sound signal, a feature of an equivalent map of the energy matrix, the feature of the equivalent map of the energy matrix including at least one of grayscale richness and texture complexity;

a determination subunit configured to determine an activity level of a target sound in the sound signal based on a feature of an equivalent graph of the energy matrix.

In a possible implementation, the obtaining subunit is configured to obtain a variance of the energy matrix and obtain a mean of the energy matrix; and obtaining the gray level richness of the equivalent diagram of the energy matrix according to the variance and the mean value of the energy matrix.

In a possible implementation manner, the obtaining subunit is configured to divide the equivalent graph of the energy matrix into a plurality of sub-blocks, and perform intra-frame prediction in different directions on the sub-blocks; acquiring an absolute value error between a predicted value of any sub-block in any direction and actual pixel values of all rows of the sub-block, averaging the absolute value error of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction; averaging the block errors of any sub-block in each direction, and taking the obtained average block error as the distortion value of any sub-block; and averaging the distortion values of all the sub-blocks, and taking the obtained average distortion value as the texture complexity of the equivalent diagram of the energy matrix.

In one possible embodiment, the determining subunit includes:

a first determination module configured to determine an initial activity level of a target sound in the sound signal based on features of an equivalent map of the energy matrix;

a second determination module configured to determine an activity level of a target sound in the sound signal according to a relationship between the initial activity level and an activity level threshold.

In a possible implementation, the first determining module is configured to perform weighted summation on the grayscale richness and the texture complexity of the equivalent diagram of the energy matrix, and the obtained weighted summation result is used as the initial activity of the target sound in the sound signal.

In a possible implementation manner, the second determining module is configured to determine that the activity level of the target sound is a first reference value if the initial activity level is greater than a first activity level threshold; if the initial activity is smaller than a second activity threshold, the activity of the target sound is a second reference value; if the initial activity is greater than the second activity threshold and smaller than the first activity threshold, acquiring a first difference between the initial activity and the second activity threshold, acquiring a second difference between the first activity threshold and the second activity threshold, and taking a quotient of the first difference and the second difference as the activity of the target sound; wherein the first activity threshold is greater than the second activity threshold, and the first reference value is greater than the second reference value.

In a possible implementation, the second determining unit is configured to obtain a loudness value of the sound signal; determining a loudness value to be adjusted based on the loudness value of the sound signal and a target loudness value; obtaining a target gain according to the loudness value to be adjusted; determining a gain change step size of the sound signal according to the activity degree of the target sound; and determining the actual gain of the sound signal according to the gain change step length of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

In a possible implementation, the apparatus further includes:

a clipping unit configured to clip the adjusted sound signal.

According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a terminal, enable the terminal to perform a volume control method, the method comprising:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

According to a fourth aspect of embodiments of the present disclosure, there is provided an application program product, wherein the instructions that, when executed by a processor of a terminal, enable the terminal to perform the following volume control method:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

The technical scheme provided by the embodiment of the disclosure at least comprises the following beneficial effects:

the activity degree of the target sound in the sound signal is determined through the energy matrix of the sound signal, and the actual gain of the sound signal is determined through the activity degree of the target sound, so that the background noise in the environment can be prevented from being amplified, the control of the volume is enabled to be matched with the auditory characteristic, and the volume control effect can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flow chart illustrating a volume control method according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a volume control method according to an example embodiment.

Fig. 3 is a diagram illustrating a loudness curve according to an exemplary embodiment.

Fig. 4 is an overall flowchart illustrating a volume control method according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating a volume control device according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating a first determination unit according to an example embodiment.

FIG. 7 is a block diagram illustrating a determination subunit in accordance with an exemplary embodiment.

Fig. 8 is a block diagram illustrating a volume control device according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an apparatus in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

With the rise of the internet, social media relying on the internet are more and more, and live webcasting is one of the social media. Live webcasting is an emerging social networking mode, and a live webcasting platform becomes a brand-new social media. The network live broadcast absorbs and continues the advantages of the internet, the online live broadcast is carried out by utilizing a video mode, the contents such as product display, related conferences, background introduction, scheme evaluation, online investigation, conversation interview, online training and the like can be released to the internet on site, and the popularization effect of the activity site is enhanced by utilizing the characteristics of intuition, quickness, good expression form, rich contents, strong interactivity, unlimited region, divisible audience and the like of the internet. After the live broadcast is finished, the live broadcast can be continuously played and requested at any time, so that the live broadcast time and space are effectively prolonged, and the maximum value of live broadcast content is exerted.

However, since the live broadcast environments are very different, the sound of the anchor itself is different in size, and the distance from the microphone is also different, so that the loudness of the live broadcast sound is greatly different. In order to make the experience of the loudness of the live broadcast more consistent, the loudness of the live broadcast is prevented from being suddenly changed, and the loudness of the live broadcast needs to be automatically controlled. Therefore, the embodiment of the disclosure provides a volume control method.

Fig. 1 is a flowchart illustrating a volume control method, which is used in a terminal, as shown in fig. 1, according to an exemplary embodiment, and includes the following steps.

In step S11, a sound signal is acquired.

In step S12, an energy matrix of the sound signal is acquired.

In step S13, the activity degree of the target sound in the sound signal is determined based on the energy matrix of the sound signal.

In step S14, the actual gain of the sound signal is determined based on the activity level of the target sound.

In step S15, the sound signal is adjusted according to the actual gain to control the volume of the sound signal.

According to the method provided by the embodiment of the disclosure, the activity degree of the target sound in the sound signal is determined through the energy matrix of the sound signal, and the actual gain of the sound signal is determined through the activity degree of the target sound, so that the background noise in the environment can be prevented from being amplified, the control of the volume is more matched with the auditory characteristic, and the volume control effect can be further improved.

In one possible implementation, acquiring an energy matrix of a sound signal includes:

converting the sound signal into a frequency domain signal by a fourier transform FFT;

acquiring a frequency domain energy signal of the frequency domain signal;

In one possible implementation, determining the activity level of the target sound in the sound signal based on an energy matrix of the sound signal includes:

acquiring characteristics of an equivalent diagram of the energy matrix based on the energy matrix of the sound signal, wherein the characteristics of the equivalent diagram of the energy matrix comprise at least one of gray-scale richness and texture complexity;

the activity level of the target sound in the sound signal is determined based on the features of the equivalent graph of the energy matrix.

In one possible implementation, the feature of obtaining an equivalent map of the energy matrix based on the energy matrix of the acoustic signal includes:

dividing an equivalent diagram of the energy matrix into a plurality of sub-blocks, and performing intra-frame prediction in different directions on each sub-block;

acquiring absolute value errors between a predicted value of any sub-block in any direction and actual pixel values of all rows of any sub-block, averaging the absolute value errors of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction;

averaging the block errors of any sub-block in each direction, and taking the obtained average block error as a distortion value of any sub-block;

In one possible implementation, determining the activity of the target sound in the sound signal based on the features of the equivalent graph of the energy matrix includes:

determining an initial activity degree of a target sound in the sound signal based on the characteristics of the equivalent diagram of the energy matrix;

In one possible implementation, determining an initial activity level of a target sound in a sound signal based on features of an equivalent graph of an energy matrix includes:

In one possible implementation, determining the activity level of the target sound in the sound signal according to a relationship between the initial activity level and an activity level threshold includes:

if the initial activity is greater than the first activity threshold, the activity of the target sound is a first reference value;

if the initial activity is smaller than the second activity threshold, the activity of the target sound is a second reference value;

In one possible implementation, determining the actual gain of the sound signal based on the activity level of the target sound comprises:

acquiring a loudness value of a sound signal;

determining a loudness value to be adjusted based on the loudness value of the sound signal and the target loudness value;

obtaining a target gain according to the loudness value required to be adjusted;

determining a gain change step length of the sound signal according to the activity degree of the target sound;

and determining the actual gain of the sound signal according to the gain change step of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

In one possible implementation, after adjusting the sound signal according to the actual gain, the method further includes:

and carrying out amplitude limiting processing on the adjusted sound signal.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

Fig. 2 is a flow chart illustrating a volume control method according to an exemplary embodiment, which may be applied to a live webcast scenario, but may also be applied to other scenarios involving volume adjustment or control. As shown in fig. 2, the method is used in a terminal and includes the following steps.

In step S21, a sound signal is acquired.

Taking a live network scene as an example, a microphone is arranged on the terminal, and a sound signal in the current environment can be acquired through the microphone, so that the sound signal is acquired. Besides, the sound signal may be collected in real time by the microphone, and the collected sound signal may be obtained, for example, the terminal obtains a sound fragment from the network, thereby obtaining the sound signal. Or the terminal receives the voice clip uploaded by the user, thereby obtaining the voice signal.

In short, the method provided by the embodiment of the present disclosure may be applied to a live webcast scene, but is not limited to a live webcast scene, and the method provided by the embodiment of the present disclosure may be applied to a sound signal obtained in any manner to control the volume. Furthermore, it should be understood that the acquired sound signal may include some noise in the environment in addition to the target sound. For example, for a webcast scene, the sound of the anchor is a target sound, and the acquired sound signal includes some noise in the current webcast environment in addition to the sound of the anchor.

In step S22, an energy matrix of the sound signal is acquired.

When the sound wave is transmitted in the medium, on one hand, medium particles reciprocate near the balance position to generate kinetic energy; on the other hand, the medium generates the compaction process of compression and expansion, so that the medium has the potential energy of deformation. The sum of the two energy components is the sound energy obtained by the medium due to the sound vibration. Based on the acoustic energy, for a sound signal, an energy matrix may be employed to represent the energy of the sound signal.

Optionally, acquiring an energy matrix of the sound signal comprises: converting the sound signal into a frequency domain signal through an FFT; acquiring a frequency domain energy signal of the frequency domain signal; and combining the frequency domain energy signals with the frequency domain energy signals of the previous reference quantity to obtain an energy matrix of the sound signals.

For example, taking the audio signal acquired at the current time t as S (t), the FFT converts the audio signal S (t) into a frequency domain signal S0(k, t) ═ FFT (S (t)), and the energy signal of the frequency domain signal S0(k, t) is E (k, t) ═ 10 log10(S0(k, t) × S0(k, t)), where k is 1 to M and M represents the number of spectral bands (bands).

Taking the reference number N as an example, the frequency domain energy signal is combined with the previous N frames frequency domain energy signal, and the energy matrix of the obtained sound signal is E ═ [ E (t-N +1), …, E (t) ], where E (t) [ [ E (1, t), …, E (M, t) ] ', E (1, t), …, E (M, t) represents one row in the matrix, and E (t) [ [ E (1, t), …, E (M, t) ]' represents one column in the matrix.

It should be understood that the reference number N may be determined according to the scenario of volume control, and may also be set by the user. The energy of the sound signal in the time dimension is increased by combining the frequency domain energy signal of the sound signal at the time t with the frequency domain energy signals of the previous reference quantity to obtain the energy matrix of the sound signal, so that the volume control according to the energy matrix is more accurate. In addition, it should be understood that the above-mentioned obtaining manner of the energy signal of the front N frame frequency domain is the same as the obtaining manner of the energy signal of the frequency domain of the sound signal at the time t, and the details of the embodiments of the present disclosure are not repeated here.

In step S23, the activity degree of the target sound in the sound signal is determined based on the energy matrix of the sound signal.

Since the energy matrix of the sound signal can reflect the energy of different sounds in the sound signal, for a target sound (e.g., a sound other than noise) in the sound signal, the embodiments of the present disclosure adopt a manner of determining the activity level of the target sound in the sound signal to reflect the activity level of the target sound in the sound signal through the activity level. Therefore, the target sound can be distinguished from the noise in the environment, and the loudness difference of the anchor sound in scenes such as live webcasts is not too large.

In a possible implementation manner, the two-dimensional energy matrix E of N × M obtained in the above step may be used as a graph, that is, an equivalent graph of the energy matrix, and each element is a pixel, and then the activity of the target sound in the sound signal is determined based on the energy matrix of the sound signal, including: acquiring characteristics of an equivalent diagram of the energy matrix based on the energy matrix of the sound signal, wherein the characteristics of the equivalent diagram of the energy matrix comprise at least one of gray-scale richness and texture complexity; the activity level of the target sound in the sound signal is determined based on the features of the equivalent graph of the energy matrix.

The gray-scale richness can reflect the overall characteristics of the equivalent diagram, and can also be called as global characteristics. For a target sound in the sound signal, the general energy fluctuation is large, and for a noise in the sound signal, the general energy fluctuation is small, and thus the target sound can be distinguished from the noise by the richness of the gradation. The texture complexity is a feature of the equivalent map as well as the gray level richness, and different from the gray level richness, the texture complexity can reflect a local feature of the equivalent map. For a target sound in the sound signal, its texture is generally more pronounced, whereas for noise in the sound signal, the texture is generally less pronounced, and thus the target sound and the noise can be further distinguished by the complexity of the texture. For this reason, the disclosed embodiments determine the activity of the target sound in the sound signal by at least one of the characteristics of the richness of the gray scale and the complexity of the texture, thereby more matching the auditory characteristics of the human. The determination of the grayscale richness and texture complexity is as follows:

determining the gray-scale richness: taking the feature of the equivalent diagram including the gray-scale richness as an example, the feature of the equivalent diagram of the energy matrix obtained based on the energy matrix of the sound signal includes: acquiring the variance of the energy matrix and acquiring the mean value of the energy matrix; and obtaining the gray level richness of the equivalent diagram of the energy matrix according to the variance and the mean value of the energy matrix.

Optionally, when the gray-scale richness of the equivalent diagram of the energy matrix is obtained according to the variance and the mean of the energy matrix, the gray-scale richness of the equivalent diagram includes, but is not limited to, performing weighted summation on the variance and the mean of the energy matrix, and taking the obtained sum as the gray-scale richness of the equivalent diagram. For example, the grayscale richness of the equivalent diagram is Brightness:

Brightness＝sigma_E+a*mu_E

wherein, sigma _ E is the variance of E, mu _ E is the mean value of E, and a is a weighting factor which is between 0 and 1. The magnitude of the weighting factor may be determined empirically, or a setting entry may be provided by the terminal through which a setting operation is performed by the user. For example, in the embodiment of the present disclosure, the value of the weighting factor a may be 0.3, and the grayscale richness of the equivalent graph is Brightness ═ sigma _ E +0.3 ═ mu _ E.

Texture complexity determination: taking the feature of the equivalent map including the texture complexity as an example, the feature of the equivalent map of the energy matrix obtained based on the energy matrix of the sound signal includes: dividing an equivalent diagram of the energy matrix into a plurality of sub-blocks, and performing intra-frame prediction in different directions on each sub-block; acquiring absolute value errors between a predicted value of any sub-block in any direction and actual pixel values of all rows of any sub-block, averaging the absolute value errors of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction; averaging the block errors of any sub-block in each direction, and taking the obtained average block error as a distortion value of any sub-block; and averaging the distortion values of all the sub-blocks, and taking the obtained average distortion value as the texture complexity of the equivalent diagram of the energy matrix.

For example, the equivalent graph E of the energy matrix is divided into sub-blocks of 8 × 8, intra prediction in both Vertical (Vertical) and Horizontal (Horizontal) directions is performed in each sub-block, and the pixel value of the row above the sub-block is taken as the predicted pixel value of the sub-block (m, n) in the Vertical direction. And calculating the absolute value error between the actual pixel value and the predicted value, namely calculating the absolute value error between the pixel value of the row above the sub-block (m, n) and the pixel value of each row of the sub-block. The calculated absolute value errors are averaged, and the average absolute value error is used as the block error of the sub-block (m, n) in the vertical direction.

The block error of the sub-block (m, n) in the horizontal direction is acquired in such a manner that the block error in the vertical direction is acquired. Then, the block errors in both the vertical direction and the horizontal direction are averaged, and the obtained average block error is used as the distortion value (Best distortion) of the sub-block (m, n). Further, the distortion values of the sub-blocks (m, n) are obtained according to the above-mentioned manner of obtaining the distortion values of the sub-blocks, the distortion values of all the sub-blocks are averaged, and the obtained average distortion value is used as the texture complexity of the equivalent diagram of the energy matrix. That is, the texture complexity of the entire equivalent graph is measured using the mean of the distortions of all sub-blocks.

For example, the distortion value (Best distortion) of the sub-block (m, n) is Texture (m, n):

Texture(m,n)＝(Texture_Vertical(m,n)+Texture_Horizontal(m,n))/2

in this case, Texture _ Vertical (m, n) represents a Vertical Texture of the sub-block (m, n), and Texture _ Horizontal (m, n) represents a Horizontal Texture of the sub-block (m, n).

Averaging the distortion values of all sub-blocks, the obtained average distortion value can be expressed as Texture:

Texture＝average(Texture(m,n))

wherein M is 1-M/8; n is 1 to N/8. Texture is the complexity of the Texture as an equivalent graph of the energy matrix.

Alternatively, if the features of the equivalent map of the energy matrix include both the grayscale richness and the texture complexity, the grayscale richness and the texture complexity may be obtained separately as described above.

Further, whether the features of the equivalent map of the energy matrix include only grayscale richness, only texture complexity, or both grayscale richness and texture complexity, determining the activity of the target sound in the sound signal based on the features of the equivalent map of the energy matrix includes: determining an initial activity degree of a target sound in the sound signal based on the characteristics of the equivalent diagram of the energy matrix; and determining the activity degree of the target sound in the sound signal according to the relation between the initial activity degree and the activity degree threshold value.

Optionally, determining an initial activity level of the target sound in the sound signal based on the features of the equivalent graph of the energy matrix comprises: and carrying out weighted summation on the gray scale richness and the texture complexity of the equivalent diagram of the energy matrix, and taking the obtained weighted summation result as the initial activity of the target sound in the sound signal.

For example, the initial Activity of the target sound in the sound signal is:

Activity＝α*Brightness+β*Texture

wherein, alpha and beta are weighted values of the gray-scale richness and the texture complexity respectively, if only the gray-scale richness is adopted as the characteristic of the equivalent graph, the value of alpha can be set to 0, and the value of beta is set to be not 0; if only texture complexity is adopted as a feature of the equivalent map, the value of β can be set to 0 and the value of α can be set to non-0; if both grayscale richness and texture complexity are used as features of the equivalent map, then both the values of α and β can be set to non-0.

α and β may be determined empirically, or a setting entry through which a set value is input by a user may be provided through the terminal, which is not limited by the embodiment of the present disclosure.

Optionally, determining the activity level of the target sound in the sound signal according to a relationship between the initial activity level and an activity level threshold includes:

Taking the initial Activity as Activity, the first Activity threshold as T1, the second Activity threshold as T0, the first reference value as 1, and the second reference value as 0 as an example, if Activity > T1, the Activity _ factor of the target sound is 1; if Activity < T0, the Activity _ factor of the target sound is 0, and if T0< Activity < T1, the Activity _ factor of the target sound is (Activity-T0)/(T1-T0).

Wherein, T1> T0, T1 and T0 may be empirically determined in size, or a terminal may provide a setting entry through which a user sets the values of T1 and T0, and the size of T1 and T0 is not limited in the disclosed embodiments.

In step S24, the actual gain of the sound signal is determined based on the activity level of the target sound.

Since the activity level of the target sound can reflect the activity level of the target sound and is clearly distinguished from the noise in the environment, the actual gain of the sound signal is determined based on the activity level of the target sound, and the amplification of the background noise in the environment can be avoided. In one possible implementation, the actual gain of the sound signal is determined based on the activity of the target sound, including the following steps S241 to S245.

Step S241, obtaining the loudness value of the sound signal;

when the loudness value of the sound signal is obtained, EQ (Equalizer) weighting may be performed on the frequency domain signal S0(k, t) of the sound signal through a loudness curve l (k), so as to obtain a frequency domain signal S2(k, t) ═ l (k) S0(k, t) reflecting the loudness experience, and then the loudness value of the sound signal may be obtained through the frequency domain signal reflecting the loudness experience.

For example, the loudness curve may be as shown in fig. 3, where the abscissa is frequency and the ordinate is sound pressure level in fig. 3. The sound pressure is a change of atmospheric pressure caused by disturbance, that is, a residual pressure of atmospheric pressure, and is equivalent to a pressure change caused by superimposing a disturbance on the atmospheric pressure. In the figure, the sound pressure levels of different frequencies corresponding to each curve are different, but the loudness sensed by human ears is the same, and a number is marked on each curve, and the unit is the loudness square. It can be known from the equal loudness curve that when the loudness is small, the human ear is insensitive to the high and low tones, and when the loudness is large, the high and low tones are gradually sensitive, and the loudness is most sensitive to the sound between 2000Hz and 5000 Hz.

On the curves of different loudness in fig. 3, the sound pressure levels in the frequency range of 2000Hz to 5000Hz are all at the positions of relatively low sound pressure levels of the whole curve, which shows that the human ear responds sensitively to the medium frequency. On both sides of the low and high frequencies outside this range, the equal loudness curves tilt, indicating a decrease in the sensitivity of the human ear to low and high frequency sounds. The weakest intensity at which the human ear can hear sound is called the threshold of hearing (shown by the dashed line MAF in the figure), and the highest intensity of sound that produces a sensation of pain is called the threshold of pain. Two equal loudness curves, consisting of the auditory threshold and the pain threshold, are the upper and lower limits of the equal loudness curve. The loudness is mainly determined by the sound intensity, and the loudness level is correspondingly increased by increasing the sound intensity. However, the loudness of sound is not determined purely by the sound intensity, but depends on the frequency, and pure tones of different frequencies have different loudness growth rates, wherein the loudness growth rate of low-frequency pure tones is faster than that of medium-frequency pure tones.

EQ weighting, i.e. equal loudness weighting of the sound signal, is performed by the loudness curve l (k), which simulates the auditory properties of the human ear, removing a portion of the unwanted sound energy. For example, when noise contaminates those frequency components that are not sensitive to human ear perception, if all frequency components are treated equally regardless of the frequency response of human ear, these frequencies that are not sensitive to human ear will greatly contaminate the characteristics of the whole sound, so that the subsequent sound recognition rate is also seriously reduced.

After obtaining the frequency domain signal reflecting the loudness experience, the loudness value of the obtained sound signal is loudness (t):

Loudness(t)＝10*log10(Sum(S2(k,t)*S2(k,t)))

where Sum () is the summation function.

Step S242, determining a loudness value to be adjusted based on the loudness value of the sound signal and the target loudness value;

taking the target Loudness value as Loudness _ target as an example, determining the Loudness value to be adjusted as Loudness _ diff based on the Loudness value of the sound signal and the target Loudness value:

Loudness_diff＝Loudness_target-Loudness(t)

that is, the loudness value that needs to be adjusted is the difference between the target loudness value and the loudness value of the sound signal. The target loudness value may be determined empirically, or may be set by providing a setting entry through the terminal, and the setting entry is set by a user.

Step 243, obtaining a target gain according to the loudness value required to be adjusted;

obtaining the target gain as targetgain (t) according to the loudness value needing to be adjusted:

TargetGain(t)＝pow(10，Loudness_diff/10)

where the pow (x, y) function is used to power x to the power y. The target gain is raised to the power of 10 Loudness _ diff/10.

Step 244, determining the gain change step size of the sound signal according to the activity degree of the target sound;

since the target gain varies from frame to frame and changes too quickly with time, the gain is generally adjusted based on the gain of the previous frame according to a gain change step so that the actual gain is moved toward the target gain, i.e., the actual gain is adjusted according to the gain change step. Alternatively, a gain change step of the sound signal is determined by the activity of the target sound, with stepUp as an increasing gain change step and stepDown as a decreasing gain change step, and stepUp and stepDown determined according to the activity of the target sound are as follows:

stepUp＝release_factor*Activity_factor；

stepDown＝attack_factor*Activity_factor；

the release _ factor and the attack _ factor are weighted values respectively, and can be determined empirically, or a setting entry can be provided through a terminal, and a user sets the setting entry through the setting entry, which is not limited in the embodiment of the present disclosure. For example, release _ factor and attack _ factor may take values between 0 and 1, respectively, with the value of attack _ factor being greater than the value of release _ factor. In addition, the reason why the gain change step size is weighted by the Activity _ factor is: the Activity _ factor characterizes the Activity of the target sound, and for the environmental background, the Activity _ factor is substantially equal to 0, so that the background is prevented from being amplified. For the presence of a target sound, music or other valid sound, Activity _ factor is substantially equal to 1, and thus normal zooming in and out is possible. For a transition region (for example, a region where the target sound and the noise exist at the same time, or a region where it is not easy to distinguish between the target sound and the noise), the Activity _ factor is a decimal between 0 and 1, and the variation of the gain is slowed down to some extent, that is, the amplification of the noise is suppressed.

And step 245, determining the actual gain of the sound signal according to the gain change step length of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

Optionally, determining the actual gain of the sound signal according to the gain change step of the sound signal based on the relationship between the target gain and the actual gain of the sound signal of the previous frame, including:

if targetgain (i) > Gain (t-1), Gain (t) ═ Gain (t-1) + stepUp;

if targetgain (i) < Gain (t-1), Gain (t) ═ Gain (t-1) -stepwindow;

if targetgain (i) is Gain (t-1), Gain (t) is Gain (t-1).

Wherein, targetgain (i) is the target Gain, and Gain (t-1) is the actual Gain of the previous frame of sound signal. The Gain (t-1) is obtained in the same manner as the actual Gain of the sound signal in the method provided by the embodiment of the present disclosure, and the principle is not repeated here.

In step S25, the sound signal is adjusted according to the actual gain to control the volume of the sound signal.

For this step, the sound signal is adjusted according to the actual gain, that is, the size of the sound signal is adjusted by the actual gain, so as to realize volume control on the sound signal.

As described above, when the actual gain is gain (t) and the audio signal is s (t), the audio signal is adjusted according to the actual gain, and then the adjusted audio signal is s1(t), and s1(t) is s (t) gain (t).

Because the actual gain in the present application is determined by the activity of the target sound, that is, the method provided by the embodiment of the present application can control the actual gain of the sound signal according to the activity of the target sound, so as to adjust the size of the sound signal and realize volume control on the sound signal. For example, if the activity of the target sound is high, the sound signal may feel that the volume is high, and the actual gain may be controlled to reduce the sound signal, so as to reduce the volume. Similarly, if the activity of the target sound is low, the sound signal may feel that the volume is low, and the sound signal can be increased by controlling the actual gain, so as to achieve the effect of increasing the volume. Therefore, the method provided by the embodiment of the application can avoid the volume from being suddenly changed, solve the problem of too large sound loudness difference, and improve the hearing experience. In addition, since the activity level of the target sound reflects the activity level of the target sound, the actual gain determined according to the activity level can suppress the noise (i.e., the background noise) in the environment from being amplified, thereby further improving the hearing experience.

In step S26, the adjusted sound signal is subjected to clipping processing.

The limiting process refers to an operation of reducing all instantaneous values of a signal having a certain characteristic (e.g., voltage, current, power) exceeding a predetermined threshold value to be close to the threshold value, and preserving all other instantaneous values. The clipping processing manner is not limited in this regard, and may be implemented by a clipping circuit, for example. The adjusted sound signal s1(t) is subjected to amplitude limiting processing to obtain a signal s2(t), wherein s2(t) is the sound signal after the volume is controlled. By performing amplitude limiting processing on the adjusted sound signal, saturation overflow distortion can be avoided.

It should be understood that the step S26 is an optional step, and in the case that the saturation overflow distortion does not occur, the step S26 may not be executed, that is, the adjusted sound signal S1(t) is directly used as the sound signal after the volume is controlled.

Optionally, the implementation process of each step may be shown in fig. 4, and the embodiment of the present disclosure provides a method for automatically controlling a volume for a scene such as webcast based on a basic principle of AGC (Automatic Gain Control), and compared with a method for controlling a volume only by a target amplitude in the related art, the method provided by the embodiment of the present disclosure more matches with an auditory characteristic of a human, and can amplify a bottom noise in an environment, so that a loudness difference of live broadcast sound is not too large, and an auditory experience is further improved.

Fig. 5 is a block diagram illustrating a volume control device according to an exemplary embodiment. Referring to fig. 5, the apparatus includes a first acquisition unit 51, a second acquisition unit 52, a first determination unit 53, a second determination unit 54, and an update unit 55.

A first acquisition unit 51 configured to acquire a sound signal;

a second acquisition unit 52 configured to acquire an energy matrix of the sound signal;

a first determination unit 53 configured to determine an activity degree of a target sound in the sound signal based on an energy matrix of the sound signal;

a second determining unit 54 configured to determine an actual gain of the sound signal based on the activity degree of the target sound;

a control unit 55 configured to adjust the sound signal according to the actual gain to control the volume of the sound signal.

In one possible implementation, the second obtaining unit 52 is configured to convert the sound signal into a frequency domain signal through a fourier transform FFT; acquiring a frequency domain energy signal of the frequency domain signal; and combining the frequency domain energy signals with the frequency domain energy signals of the previous reference quantity to obtain an energy matrix of the sound signals.

In one possible implementation, referring to fig. 6, the first determining unit 51 includes:

an obtaining subunit 511 configured to obtain, based on the energy matrix of the sound signal, a feature of an equivalent diagram of the energy matrix, the feature of the equivalent diagram of the energy matrix including at least one of a grayscale richness and a texture complexity;

a determining subunit 512 configured to determine an activity level of the target sound in the sound signal based on the features of the equivalent map of the energy matrix.

In one possible implementation, the obtaining subunit 511 is configured to obtain a variance of the energy matrix and obtain a mean of the energy matrix; and obtaining the gray level richness of the equivalent diagram of the energy matrix according to the variance and the mean value of the energy matrix.

In one possible implementation manner, the obtaining sub-unit 511 is configured to divide the equivalent diagram of the energy matrix into a plurality of sub-blocks, and perform intra-frame prediction in different directions on the sub-blocks; acquiring absolute value errors between a predicted value of any sub-block in any direction and actual pixel values of all rows of any sub-block, averaging the absolute value errors of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction; averaging the block errors of any sub-block in each direction, and taking the obtained average block error as a distortion value of any sub-block; and averaging the distortion values of all the sub-blocks, and taking the obtained average distortion value as the texture complexity of the equivalent diagram of the energy matrix.

In one possible implementation, referring to fig. 7, the determining subunit 512 includes:

a first determining module 5121 configured to determine an initial activity degree of a target sound in the sound signal based on a feature of an equivalent graph of the energy matrix;

a second determining module 5122 configured to determine the activity level of the target sound in the sound signal according to the relationship between the initial activity level and the activity level threshold.

In one possible implementation, the first determining module 5121 is configured to perform weighted summation on the grayscale richness and the texture complexity of the equivalent diagram of the energy matrix, and the obtained weighted summation result is used as the initial activity of the target sound in the sound signal.

In one possible implementation, the second determining module 5122 is configured to determine the activity level of the target sound to be the first reference value if the initial activity level is greater than the first activity level threshold; if the initial activity is smaller than the second activity threshold, the activity of the target sound is a second reference value; if the initial activity is greater than the second activity threshold and smaller than the first activity threshold, acquiring a first difference between the initial activity and the second activity threshold, acquiring a second difference between the first activity threshold and the second activity threshold, and taking a quotient of the first difference and the second difference as the activity of the target sound; wherein the first activity threshold is greater than the second activity threshold, and the first reference value is greater than the second reference value.

In one possible implementation, the second determining unit 54 is configured to obtain a loudness value of the sound signal; determining a loudness value to be adjusted based on the loudness value of the sound signal and the target loudness value; obtaining a target gain according to the loudness value required to be adjusted; determining a gain change step length of the sound signal according to the activity degree of the target sound; and determining the actual gain of the sound signal according to the gain change step of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

In one possible implementation, referring to fig. 8, the apparatus further includes:

a clipping unit 56 configured to clip the adjusted sound signal.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating a terminal 900 according to an example embodiment. The terminal 900 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement a volume control method provided by method embodiments herein.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, display screen 905, camera 906, audio circuitry 907, positioning component 908, and power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, providing the front panel of the terminal 900; in other embodiments, the number of the display panels 905 may be at least two, and each of the display panels is disposed on a different surface of the terminal 900 or is in a foldable design; in still other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of the terminal 900. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When power source 909 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the touch display 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user on the terminal 900. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 913 may be disposed on the side bezel of terminal 900 and/or underneath touch display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the touch display 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the terminal 900. When a physical key or vendor Logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the touch display 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 905 is turned down. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

Proximity sensor 916, also known as a distance sensor, is typically disposed on the front panel of terminal 900. The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the terminal 900 gradually decreases, the processor 901 controls the touch display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the terminal 900 gradually becomes larger, the processor 901 controls the touch display 905 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a terminal, enable the terminal to perform a volume control method of:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

and adjusting the sound signal according to the actual gain to control the volume of the sound signal.

For example, the non-transitory computer readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided an application program product in which instructions, when executed by a processor of a terminal, enable the terminal to perform the following volume control method:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

It should be understood that, when the instructions in the storage medium are executed by the processor of the terminal, the volume control method that can be executed by the terminal can be referred to in the embodiments of the method, and when the instructions in the application program product are executed by the processor of the terminal, the volume control method that can be executed by the terminal can be referred to in the embodiments of the method, which is not described in detail herein.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of volume control, comprising:

acquiring a sound signal;

acquiring an energy matrix of the sound signal;

determining an activity level of a target sound in the sound signal based on an energy matrix of the sound signal, the activity level reflecting an activity level of the target sound in the sound signal;

2. The volume control method of claim 1, wherein the obtaining an energy matrix of the sound signal comprises:

acquiring a frequency domain energy signal of the frequency domain signal;

3. The volume control method of claim 1, wherein the determining the activity of the target sound in the sound signal based on the energy matrix of the sound signal comprises:

4. The volume control method according to claim 3, wherein the obtaining the feature of the equivalent diagram of the energy matrix based on the energy matrix of the sound signal comprises:

5. The volume control method according to claim 3, wherein the obtaining the feature of the equivalent diagram of the energy matrix based on the energy matrix of the sound signal comprises:

6. The volume control method of claim 3, wherein the determining the activity of the target sound in the sound signal based on the features of the equivalent graph of the energy matrix comprises:

7. The volume control method of claim 6, wherein the determining an initial activity level of a target sound in the sound signal based on the features of the energy matrix equivalent map comprises:

8. The method of claim 6, wherein the determining the activity level of the target sound in the sound signal according to the relationship between the initial activity level and the activity level threshold comprises:

9. The volume control method of claim 1, wherein the determining the actual gain of the sound signal based on the activity level of the target sound comprises:

obtaining the loudness value of the sound signal;

obtaining a target gain according to the loudness value to be adjusted;

10. The volume control method according to any one of claims 1 to 9, wherein after adjusting the sound signal according to the actual gain, further comprising:

and carrying out amplitude limiting processing on the adjusted sound signal.

11. A volume control device, comprising:

a first acquisition unit configured to acquire a sound signal;

a first determination unit configured to determine an activity degree of a target sound in the sound signal based on an energy matrix of the sound signal, the activity degree reflecting an activity degree of the target sound in the sound signal;

12. The volume control device according to claim 11, characterized in that the second acquisition unit is configured to convert the sound signal into a frequency domain signal by fourier transform FFT; acquiring a frequency domain energy signal of the frequency domain signal; and combining the frequency domain energy signals with the frequency domain energy signals of the previous reference quantity to obtain an energy matrix of the sound signals.

13. The volume control device according to claim 11, wherein the first determination unit includes:

14. The volume control device of claim 13, wherein the obtaining subunit is configured to obtain a variance of the energy matrix and obtain a mean of the energy matrix; and obtaining the gray level richness of the equivalent diagram of the energy matrix according to the variance and the mean value of the energy matrix.

15. The volume control device of claim 13, wherein the obtaining subunit is configured to divide the equivalent graph of the energy matrix into a plurality of sub-blocks, and perform intra-frame prediction in different directions on the sub-blocks; acquiring an absolute value error between a predicted value of any sub-block in any direction and actual pixel values of all rows of the sub-block, averaging the absolute value error of any sub-block in any direction, and taking the obtained average absolute value error as a block error of any sub-block in any direction; averaging the block errors of any sub-block in each direction, and taking the obtained average block error as the distortion value of any sub-block; and averaging the distortion values of all the sub-blocks, and taking the obtained average distortion value as the texture complexity of the equivalent diagram of the energy matrix.

16. The volume control device of claim 13, wherein the determining subunit comprises:

17. The volume control device of claim 16, wherein the first determining module is configured to perform weighted summation on the gray-scale richness and the texture complexity of the equivalent diagram of the energy matrix, and the obtained weighted summation result is used as the initial activity of the target sound in the sound signal.

18. The volume control device of claim 16, wherein the second determining module is configured to determine the activity level of the target sound to be a first reference value if the initial activity level is greater than a first activity level threshold; if the initial activity is smaller than a second activity threshold, the activity of the target sound is a second reference value; if the initial activity is greater than the second activity threshold and smaller than the first activity threshold, acquiring a first difference between the initial activity and the second activity threshold, acquiring a second difference between the first activity threshold and the second activity threshold, and taking a quotient of the first difference and the second difference as the activity of the target sound; wherein the first activity threshold is greater than the second activity threshold, and the first reference value is greater than the second reference value.

19. The volume control device according to claim 11, characterized in that the second determination unit is configured to acquire a loudness value of the sound signal; determining a loudness value to be adjusted based on the loudness value of the sound signal and a target loudness value; obtaining a target gain according to the loudness value to be adjusted; determining a gain change step size of the sound signal according to the activity degree of the target sound; and determining the actual gain of the sound signal according to the gain change step length of the sound signal based on the relation between the target gain and the actual gain of the sound signal of the previous frame.

20. The volume control device of any one of claims 11-19, further comprising:

a clipping unit configured to clip the adjusted sound signal.

21. A volume control device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the volume control method of any one of claims 1-10.

22. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform the volume control method of any one of claims 1-10.