CN114428600A

CN114428600A - Audio track gain adjustment method and device, intelligent terminal and storage medium

Info

Publication number: CN114428600A
Application number: CN202011178759.0A
Authority: CN
Inventors: 徐小健
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-05-03

Abstract

The invention discloses a method and a device for adjusting audio track gain, an intelligent terminal and a storage medium, wherein the method comprises the following steps: acquiring audio data to be processed, wherein the audio data comprises singer initial audio track data and user initial audio track data; and when the audio gain value of the user initial audio track data accords with a preset participation rule, performing gain processing on the user initial audio track data and the singer initial audio track data to generate user target audio track data and singer target audio track data. The method and the device can automatically judge whether the user carries out the karaoke according to the initial audio track data of the user, thereby realizing the adjustment of the audio track gain according to whether the user carries out the karaoke in the karaoke process.

Description

Audio track gain adjustment method and device, intelligent terminal and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for adjusting audio track gain, an intelligent terminal and a storage medium.

Background

With the popularization of the internet and the increase of network broadband, various terminals have a function of adjusting the gain of a sound track, such as a smart phone and a smart television. There are at least two conventional modes for track gain adjustment, one is vocal accompaniment mode and one is vocal accompaniment mode. In the audio frequency finally output in the vocal accompaniment mode, the voice part is mainly the voice of the user, is mainly suitable for the user who is familiar with the music, and is realized by weakening the gain adjustment of the music track of the karaoke and enhancing the gain adjustment of the music track of the user; in the audio output finally in the random mode, the voice part is the voice of the singer, and the method is mainly suitable for the user who is not very familiar with the song and is realized by the enhancement of the gain adjustment of the music track of the singer and the weakening of the gain adjustment of the music track of the user.

The two modes can be freely switched, but the switching mode in the market at present needs a user to select the switching mode by himself. The pattern is fixed when the user is busy with something else. For example, if the user sings half of the words and forgets the words suddenly, the voice of the singer is required to be switched to obtain the prompt along with the singing mode, the work in the singer needs to be stopped, and the mode switching is carried out on the terminal. Therefore, current karaoke software or devices cannot automatically gain the appropriate gains for the karaoke music track and the user music track.

Disclosure of Invention

The invention mainly aims to provide a method and a device for adjusting the track gain, an intelligent terminal and a storage medium, and aims to solve the problem that in the prior art, the proper gain cannot be automatically performed on a music track of a singer and a user in the process of singing K.

To achieve the above object, the present invention provides a method for adjusting a track gain, comprising:

acquiring audio data to be processed, wherein the audio data comprises singer initial audio track data and user initial audio track data;

and when the audio gain value of the user initial audio track data accords with a preset participation rule, performing gain processing on the user initial audio track data and the singer initial audio track data to generate user target audio track data and singer target audio track data.

In order to achieve the above object, the present invention also provides a track gain adjustment device, including:

the audio processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring audio data to be processed, and the audio data comprises singer initial audio track data and user initial audio track data;

and the adjusting unit is used for performing gain processing on the user initial audio track data and the singer initial audio track data when the audio gain value of the user initial audio track data accords with a preset participation rule, and generating user target audio track data and singer target audio track data.

In addition, in order to achieve the above object, the present invention further provides an intelligent terminal, which includes a memory, a processor, and a track gain adjustment program stored in the memory and executable on the processor, wherein the processor implements the steps of the track gain adjustment method as above when executing the track gain adjustment program.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium storing a track gain adjustment program, which when executed by a processor, implements the steps of the track gain adjustment method as above.

In the invention, the audio data to be processed is firstly acquired, then whether the audio gain value of the initial audio track data of the user accords with the preset participation rule or not is judged, if the audio gain value of the initial audio track data of the user accords with the preset participation rule or not, the fact that the user participates in the K song is indicated, and therefore, the gain processing is carried out on the initial audio track data of the user and the initial audio track data of a singer. Therefore, the invention can judge whether the user participates in the Karaoke according to the audio gain value in the Karaoke process, thereby automatically carrying out adaptive audio track gain adjustment processing on the initial audio track data of the user and the initial audio track data of the singer so as to realize automatic gain in the Karaoke process.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the track gain adjustment method of the present invention;

FIG. 2 is a flowchart of step S200 according to a preferred embodiment of the track gain adjustment method of the present invention;

FIG. 3 is a functional block diagram of the track gain adjustment apparatus of the present invention;

fig. 4 is a schematic operating environment diagram of an intelligent terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the method for adjusting the track gain according to the preferred embodiment of the present invention, as shown in fig. 1, the method for adjusting the track gain comprises the following steps:

step S100, the intelligent terminal obtains audio data to be processed, wherein the audio data comprises singer initial audio track data and user initial audio track data.

Specifically, in this embodiment, this intelligent terminal includes intelligent television, intelligent lampblack absorber. With the progress of intellectualization, household appliances are also further developed to intellectualization, for example, an intelligent air conditioner which can automatically play the intelligent influence of music and automatically adjust the air temperature according to the body temperature of a user. Wherein, lampblack absorber also develops to intellectuality. Present intelligent lampblack absorber function is abundant, has and is used for carrying out interactive large-scale interactive screen with the user, still possesses intelligent regulation wind-force gear, remote control, and functions such as time delay take a breath, variable light. The interactive screen can provide weather and time display, and even provide entertainment resources such as music, television, news resources, singing and the like besides the interaction with the user. For convenience of description, the following embodiments are described with respect to the intelligent range hood.

When a user is cooking, the user selects a song desired by the user through voice or a key, and the song singing mode of the intelligent range hood is started. When the intelligent range hood enters a singing mode, the microphone is started, the voice of a user is obtained, and the voice is stored as the initial audio track data of the user. Meanwhile, the voice track of the song selected by the user is obtained and used as the initial track data of the singer. The user initial track data and the singer initial track data are saved as audio data to be subjected to gain. And then the audio track gain adjusting program in the intelligent range hood continuously or periodically acquires the audio data which is locally stored and is to be gained.

The audio data further includes background track data, and the data type of the audio data includes mixed audio data and independent audio data, wherein the independent audio data is audio track data in which the singer's initial track data, the background track data, and the user's initial track data exist independently, and the mixed audio data includes audio track data mixed by the singer's initial track data, the background track data, and the user's initial track data.

Specifically, a general song includes background music such as guitar sound and piano sound in addition to the voice of the singer, and in the present embodiment, the audio data to be processed includes background music track data.

In addition, in this embodiment, the audio data that intelligent lampblack absorber obtained includes two data types, one is mixed audio data, and one is independent audio data. Where mixed audio data refers to audio data where there is track mixing, such as where the vocal part and background music part of a song are mixed together in many song libraries, and where it is also possible to mix the artist initial track data, the background track data, and the user initial track data, such as to unify the relative gains of the various tracks.

Further, in order to process different types of audio data, after the audio data to be processed is acquired, the method further includes:

and step S110, when the audio data is mixed audio data, the intelligent terminal splits the audio data according to the frequency of the audio data and the gain value of each sampling point to generate singer initial audio track data, background audio track data and user initial audio track data.

Specifically, if the audio data is independent audio data, the number of tracks in the audio data is 3; if the audio data is the audio data mixed by the initial audio track data of the singer and the background audio track data and the user initial audio track data is independent, the number of the audio tracks is 2; if the audio data is audio data in which singer's initial track data, background track data, and user's initial track data are mixed, the number of tracks is 1. Therefore, it is possible to determine whether the audio data is mixed audio data according to the number of tracks in the audio data.

Generally, a user needs to have a singer's voice reduced and the background voice is inconvenient to obtain the user, so that in order to better realize a singing effect, each track data needs to be processed independently, and if the audio data is independent audio data, the singer's initial track data and the user's initial track data can be directly processed. If the audio track data is mixed audio data, the mixed audio data needs to be split to obtain singer initial audio track data, background audio track data and user initial audio track data, and then gain processing is performed on the singer initial audio track data and the user initial audio track data respectively. The low-frequency region of the audio is 40Hz-80Hz, the middle-frequency region is 160Hz-1280Hz, the high-frequency region is 2560Hz-5120Hz, because the human voice is generally located in the low-frequency region and the middle-frequency region of the audio data, while the background sound, such as various musical instruments, is located in the high-frequency region, and the human voice of the song and the human voice of the user are reflected as different gain values in the audio due to the difference of the recording quality. Therefore, the mixed audio data can be split according to the frequency of the audio data and the gain value of the sampling point, and singer initial audio track data, background audio track data and user initial audio track data are generated.

Further, in order to improve the splitting efficiency while ensuring the quality of the split audio track, the splitting step includes:

step S111, the intelligent terminal calculates a first difference value between the maximum gain value of the middle frequency area and the minimum gain value of the low frequency area of the audio data, and the first difference value is used as the gain value range of the singer initial audio track data.

In particular, since the vocal sounds of songs are generally recorded in a sound studio, the gain value is generally large. Firstly, dividing a middle frequency area, a low frequency area and a high frequency area in the audio data to separate voice from background sound, then calculating the difference value of the maximum gain value of the middle frequency area and the minimum gain value of the low frequency area, and taking the difference value range as the gain value range of the singer initial audio track data.

In step S112, the intelligent terminal calculates a second difference value between the middle gain value and the minimum gain value of the middle frequency region of the audio data, and uses the second difference value as the user initial audio track data gain value range.

Specifically, when a user sings while cooking, due to the fact that a kitchen is poor in sound insulation effect and the sound receiving effect of the intelligent range hood is poor compared with that of professional equipment, the gain value in the audio frequency of the user collected through the microphone is small, and therefore the second difference value between the separated middle gain value in the middle frequency area and the minimum gain value in the low frequency area is calculated and serves as the gain value range of the initial audio track data of the user. The middle gain value may be an average value of each sampling point in the intermediate frequency region, or a median among the gain values of each sampling point, as long as singer audio track data and user audio track data can be distinguished.

In step S113, the intelligent terminal calculates a third difference value between the maximum gain value and the minimum gain value of the high frequency region of the audio data, and uses the third difference value as a gain value range of the background audio track data.

Specifically, although the audio span of the background sound is extremely large, and exists from the high frequency region to the low frequency region, in order to better distinguish the background sound from the human voice, the sound quality between the background track data and the singer track data, the user track data is cleaner, and only the third difference between the maximum gain value and the minimum gain value of the high frequency region is calculated and is taken as the background track data gain value range.

Step S114, the intelligent terminal splits the audio data according to the gain value range of the singer initial audio track data, the gain value range of the background audio track data and the gain value range of the user initial audio track data, and generates the singer initial audio track data, the background audio track data and the user initial audio track data respectively.

Specifically, the audio data is finally split into three audio data according to the gain value range of the singer's initial audio track data, the gain value range of the user's initial audio track data, the gain value range of the background audio track data, and the gain values of the respective sampling points in the audio data, and each audio data is separately stored as one audio track data, that is, the singer's initial audio track data, the background audio track data, and the user's initial audio track data are generated.

And step S200, when the audio gain value of the user initial audio track data accords with a preset participation rule, the intelligent terminal performs gain processing on the user initial audio track data and the singer initial audio track data to generate user target audio track data and singer target audio track data.

Specifically, a participation rule is preset for judging whether the user participates in the karaoke song. In a first implementation manner, the participation rule may include an audio gain threshold, and if the participation rule is smaller than the audio gain threshold, it is determined that the user does not participate in singing; if the audio gain threshold is larger than or equal to the audio gain threshold, the user is judged to participate in singing, so that the user audio track data and the singer audio track data need to be subjected to gain processing. In addition to the participation rule, in a second implementation manner, the participation rule may also be to determine a difference between a gain value of each sampling point in the user initial audio track data and a gain value of each sampling point in the singer initial audio track data, and if the difference is large, determine that the user is not singing but speaking, or is not speaking and is not participating in singing; if the difference is smaller, the user is judged to be in singing.

If the audio gain value of the initial audio track data of the user does not accord with the preset participation rule and the user does not participate in singing, the audio data is not processed and is directly output.

Based on the second implementation manner, it can be further determined whether the user is speaking or not speaking. If speaking, the volume can be automatically reduced; if not, the signal can be directly output.

When the audio gain value of the user's initial track data meets the participation rule, it indicates that the user is participating in singing, and therefore gain processing is required for the user's initial track data and the singer's initial track data. In the embodiment, the first gain mode is to preset a first gain rule for the user to sing, according to the first gain rule, the gain value of each sampling point in the initial audio track data of the user is increased, the gain value of each sampling point in the initial audio track data of the singer is reduced, the specific gain value increasing or reducing mode can be that the gain value of each sampling point in the initial audio track data of the user is directly increased by a certain value, and the gain value of each sampling point in the initial audio track data of the singer is reduced by a certain value, or by a certain proportion, for example by 10%, the gain values of the sampling points in the initial audio track data of the singer are correspondingly decreased, and the like, because the increasing or decreasing modes are many, the description is omitted.

Further, referring to fig. 2, in a second implementation manner provided in this embodiment, step S200 includes:

step S210, the intelligent terminal calculates the user average gain value corresponding to the user initial audio track data and the singer average gain value corresponding to the singer initial audio track data.

Specifically, if the audio gain value of the user initial audio track data conforms to the participation rule, the user average gain value corresponding to the user initial audio track data and the singer average gain value corresponding to the singer initial audio track data are calculated. The calculation method may adopt the gain values of each sampling point in each audio track to calculate corresponding average values, which are respectively used as the user average gain value and the singer average gain value. The maximum value and the minimum value of the gain value in each audio track data can be adopted to calculate the corresponding average value which is respectively used as the average gain value of the user, the average gain value of the singer and the like.

Further, since the gain is required to be performed according to the user average gain value and the singer average gain value subsequently, if the average value is calculated by conventionally calculating the gain values of the respective sampling points, the calculation efficiency is low, and the maximum value and the minimum value of the gain are easy to be ignored, in this embodiment, the step S310 includes:

step S211, the intelligent terminal calculates an average value of the maximum gain value and the minimum gain value of the user initial audio track data, and generates a user average gain value corresponding to the user initial audio track data.

Specifically, for example, if the maximum gain value of the user-initiated soundtrack data is 10 and the minimum gain value is 2, and if the audio gain value of the user-initiated soundtrack data meets the participation rule, the average value of the maximum gain value of the user-initiated soundtrack data and the minimum gain value of the user-initiated soundtrack data, that is, (10+2)/2, is 6, is calculated, and the user-average gain value corresponding to the user-initiated soundtrack data, that is, 6 is generated.

Step S212, the intelligent terminal calculates the average value of the maximum gain value and the minimum gain value of the singer initial audio track data, and generates the singer average gain value corresponding to the singer initial audio track data.

Specifically, for example, the maximum gain value of the singer initial track data is 15, the minimum gain value is 9, if the audio gain value of the singer initial track data conforms to the preset participation rule, the average value of the maximum gain value of the singer initial track data and the minimum gain value of the singer initial track data is calculated, that is, (15+9)/2 is 12, and the singer average gain value corresponding to the singer initial track data, that is, 12 is generated.

Step S220, when the average gain value of the user is larger than or equal to the average gain value of the singer, the intelligent terminal performs gain reduction on the initial audio track data of the singer to generate target audio track data of the singer; and performing gain enhancement on the user initial audio track data to generate user target audio track data.

Specifically, the user average gain value and the singer average gain value are compared in magnitude, thereby judging whether the user average gain value is greater than or equal to the singer average gain value. If the average gain value of the user is more than or equal to the average gain value of the singer, the user is most likely singing, so that the gain attenuation is carried out on the initial track data of the singer, the gain attenuation is carried out on the initial track data of the user, the gain value of the user track data in the output audio is enhanced, and the gain value of the singer track data in the output audio is attenuated, so that the user can hear the singing sound of the user.

Further, step S220 includes:

step S221, according to the average gain value of the user and the preset main track gain value, the intelligent terminal increases the gain value of each sampling point in the user track data frame by frame to generate the user target track data.

Specifically, if the average gain value of the user is greater than or equal to the average gain value of the singer, it indicates that the user is singing high, the gain mode adopted in the second embodiment of the present invention is to perform gain according to a preset main track gain value, where the main track gain value is an average gain value that the track of the main voice in the finally output data should have. When the user is in a high song, the main human voice track in the output data output after default is the user track.

According to the user average gain value and the main track gain value in the currently processed user audio track data, the gain value of each sampling point in the user audio track data is increased frame by frame, the method of increasing frame by frame can be that the difference value of the current user average gain value and the main track gain value is calculated as dividend, the number of certain audio frames in the user audio track data is used as divisor, the corresponding quotient is calculated, then the quotient is used as the gain value increased by each audio frame compared with the previous frame until the gain value of the current audio frame is equal to the main track gain value.

Step S222, according to the average gain value of the singer and the preset secondary track gain value, the intelligent terminal reduces the gain value of each sampling point in the singer track data frame by frame to generate singer target track data.

Specifically, the sub-track gain adjustment value is an average gain value to which the track of the secondary human voice in the output data to be finally output should be added. If the average gain value of the user is larger than or equal to the average gain value of the singer, the singer audio track data is a secondary audio track in the subsequent output data. According to the average singer gain value and the sub-soundtrack gain value in the currently processed singer soundtrack data, the gain value of each sampling point in the singer soundtrack data is reduced frame by frame, the way of reducing frame by frame can be to calculate the difference value between the current average singer gain value and the sub-soundtrack gain value as dividend, and to calculate the corresponding quotient by taking the number of certain audio frames in the singer soundtrack data as divisor, and then to take the quotient as the reduced gain value of each audio frame compared with the previous frame until the gain value of the current audio frame is equal to the sub-soundtrack gain value.

Step S230, when the average gain value of the user is smaller than the average gain value of the singer, the intelligent terminal performs gain enhancement on the initial audio track data of the singer to generate target audio track data of the singer; and performing gain reduction on the user initial audio track data to generate user target audio track data.

Specifically, if the average user gain value is smaller than the average singer gain value, it indicates that the user is not singing in high, and may be humming only or forgetting words, so the main track in the output data outputted thereafter is the singer track data, the gain enhancement is performed on the initial track data of the singer, and the gain reduction is performed on the initial track data of the user. Therefore, it is necessary to gain-enhance the initial audio track data of the singer and to gain-attenuate the initial audio track data of the user. In this embodiment, the manner of performing gain enhancement on the initial audio track data of the karaoke player and performing gain reduction on the initial audio track data of the user is similar to the manner of performing gain reduction on the initial audio track data of the karaoke player and performing gain reduction on the initial audio track data of the user, but the processing objects are different.

Further, step S230 includes:

step S231, according to the user average gain value and the preset main track gain value, the intelligent terminal increases the gain value of each sampling point in the user track data frame by frame, and generates user target track data.

Step S232, according to the average gain value of the singer and the preset secondary track gain value, the intelligent terminal reduces the gain value of each sampling point in the singer track data frame by frame to generate singer target track data.

Since the steps are similar to the above-described steps S331 and S332 except that the gain-enhanced object is changed from the user track data to the singer 'S average gain value, and the gain-diminished object is changed from the singer' S average gain value to the user track data, they will not be described here.

Further, step S200 is followed by:

and the intelligent terminal mixes the singer target audio track data, the user target audio track data and the background audio track data to generate output data and output the output data.

Specifically, after the singer target track data and the user target track data are obtained, the singer target track data, the user target track data and the background track data are mixed according to the time of each track, so that output data are generated, and finally the output data are transmitted to the sound equipment so that the sound equipment can play music according to the output data.

Further, as shown in fig. 3, based on the above-mentioned track gain adjustment method, the present invention further provides a track gain adjustment apparatus 100, where the track gain adjustment apparatus 100 includes:

an acquisition unit 110 for acquiring audio data to be processed, the audio data including singer initial track data and user initial track data;

the adjusting unit 120 is configured to perform gain processing on the user initial audio track data and the singer initial audio track data when the audio gain value of the user initial audio track data meets a preset participation rule, and generate user target audio track data and singer target audio track data.

Wherein the audio data further includes background track data, and the data type of the audio data includes mixed audio data and independent audio data, wherein the independent audio data is audio track data in which singer's initial track data, background track data, and user's initial track data exist independently, and the mixed audio data includes audio track data mixed by the singer's initial track data, the background track data, and the user's initial track data.

The audio track gain adjustment apparatus 100 further includes a splitting unit, where the splitting unit is configured to split the audio data according to the frequency of the audio data and the gain value of each sampling point when the audio data is mixed audio data, and generate singer initial audio track data, background audio track data, and user initial audio track data.

Wherein, the split unit includes:

a first calculating subunit for calculating a first difference value between a maximum gain value of the middle frequency region and a minimum gain value of the low frequency region of the audio data, and taking the first difference value as a singer's initial track data gain value range; and

calculating a second difference value between the middle gain value of the middle frequency area of the audio data and the minimum gain value of the low frequency area, and taking the second difference value as the gain value range of the user initial audio track data; and

calculating a third difference value between the maximum gain value of the high-frequency region of the audio data and the minimum gain value of the high-frequency region, and taking the third difference value as a gain value range of the background audio track data;

and the splitting subunit is used for splitting the audio data according to the gain value range of the singer initial audio track data, the gain value range of the background audio track data and the gain value range of the user initial audio track data to respectively generate the singer initial audio track data, the background audio track data and the user initial audio track data.

Wherein, the adjusting unit 120 includes:

the second calculating subunit is used for calculating a user average gain value corresponding to the user initial audio track data and a singer average gain value corresponding to the singer initial audio track data;

the first adjusting subunit is used for carrying out gain attenuation on the initial audio track data of the singer to generate the target audio track data of the singer when the average gain value of the user is greater than or equal to the average gain value of the singer; performing gain enhancement on the initial audio track data of the user to generate target audio track data of the user; alternatively, the first and second electrodes may be,

the second adjusting subunit is used for performing gain enhancement on the initial audio track data of the singer to generate target audio track data of the singer when the average gain value of the user is smaller than the average gain value of the singer; and carrying out gain attenuation on the initial audio track data of the user to generate target audio track data of the user.

Wherein the second calculating subunit is specifically configured to:

calculating the average value of the maximum gain value and the minimum gain value of the user initial audio track data, and generating a user average gain value corresponding to the user initial audio track data; and

and calculating the average value of the maximum gain value and the minimum gain value of the initial audio track data of the singer, and generating the average gain value of the singer corresponding to the initial audio track data of the singer.

Wherein the first adjusting subunit is specifically configured to:

according to the average gain value of the user and the preset main audio track gain value, increasing the gain value of each sampling point in the user audio track data frame by frame to generate user target audio track data; and

and reducing the gain value of each sampling point in the singer audio track data frame by frame according to the average gain value of the singer and a preset secondary audio track gain value to generate singer target audio track data.

Wherein the second adjusting subunit is specifically configured to:

according to the average gain value of the user and the preset secondary track gain value, reducing the gain value of each sampling point in the user track data frame by frame to generate user target track data; and

and increasing the gain value of each sampling point in the singer audio track data frame by frame according to the average gain value of the singer and the preset main audio track gain value to generate singer target audio track data.

Further, as shown in fig. 4, based on the above audio track gain adjustment method, the present invention also provides an intelligent terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 4 shows only some of the components of the smart terminal, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The storage 20 may be an internal storage unit of the intelligent terminal in some embodiments, such as a hard disk or a memory of the intelligent terminal. The memory 20 may also be an external storage device of the Smart terminal in other embodiments, such as a plug-in hard disk provided on the Smart terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 20 may also include both an internal storage unit and an external storage device of the smart terminal. The memory 20 is used for storing application software installed in the intelligent terminal and various data, such as program codes for installing the intelligent terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a track gain adjustment program 40, and the track gain adjustment program 40 can be executed by the processor 10 to implement the track gain adjustment method of the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip for executing program codes stored in the memory 20 or Processing data, such as executing track gain adjustment methods.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the smart terminal and for displaying a visual user interface. The components 10-30 of the intelligent terminal communicate with each other via a system bus.

In one embodiment, when the processor 10 executes the track gain adjustment program 40 in the memory 20, the following steps are implemented:

After the audio data to be processed is acquired, the method further comprises:

when the audio data is mixed audio data, the audio data is split according to the frequency of the audio data and the gain value of each sampling point, and singer initial audio track data, background audio track data and user initial audio track data are generated.

The method for splitting audio data according to the frequency of the audio data and the gain value of each sampling point to generate singer initial audio track data, background audio track data and user initial audio track data comprises the following steps:

calculating a first difference value between a maximum gain value of a middle frequency region and a minimum gain value of a low frequency region of the audio data, and taking the first difference value as a singer initial audio track data gain value range; and

according to the gain value range of the singer initial audio track data, the gain value range of the background audio track data and the gain value range of the user initial audio track data, the audio data are split, and the singer initial audio track data, the background audio track data and the user initial audio track data are respectively generated.

Wherein, gain processing is performed on the user initial audio track data and the singer initial audio track data, and comprises:

calculating a user average gain value corresponding to the user initial audio track data and a singer average gain value corresponding to the singer initial audio track data;

when the average gain value of the user is larger than or equal to the average gain value of the singer, carrying out gain attenuation on the initial audio track data of the singer to generate target audio track data of the singer; performing gain enhancement on the initial audio track data of the user to generate target audio track data of the user; alternatively, the first and second electrodes may be,

when the average gain value of the user is smaller than the average gain value of the singer, performing gain enhancement on the initial audio track data of the singer to generate target audio track data of the singer; and carrying out gain attenuation on the initial audio track data of the user to generate target audio track data of the user.

Wherein, calculate the user average gain value that user's initial audio track data corresponds, and singer's average gain value that singer's initial audio track data corresponds, include:

calculating the average value of the minimum gain values of the maximum gain values of the user initial audio track data, and generating a user average gain value corresponding to the user initial audio track data; and

The method comprises the following steps of carrying out gain reduction on initial audio track data of a singer, carrying out gain enhancement on initial audio track data of a user, and respectively generating target audio track data of the singer and target audio track data of the user, wherein the method comprises the following steps:

The method comprises the following steps of performing gain enhancement on initial audio track data of a singer, performing gain reduction on initial audio track data of a user, and respectively generating target audio track data of the singer and target audio track data of the user, wherein the method comprises the following steps:

if the average gain value of the user is smaller than the average gain value of the singer, reducing the gain value of each sampling point in the user audio track data frame by frame according to the average gain value of the user and a preset secondary audio track gain adjustment value to generate user target audio track data; and

and increasing the gain value of each sampling point in the singer audio track data frame by frame according to the average singer gain value and a preset main audio track gain adjustment value to generate singer target audio track data.

The present invention also provides a computer-readable storage medium storing a track gain adjustment program, which when executed by a processor implements the steps of the track gain adjustment method as above.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program to instruct relevant hardware (such as a processor, a controller, etc.) to perform the processes, and the processes can be stored in a computer readable storage medium, and when executed, the processes can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A method for adjusting gain of an audio track, comprising:

2. The method of claim 1, wherein the audio data further comprises background audio track data, and wherein the data types of the audio data comprise mixed audio data and independent audio data, wherein the independent audio data is audio track data in which the singer initial audio track data, the background audio track data, and the user initial audio track data exist independently, and wherein the mixed audio data comprises audio track data mixed by the singer initial audio track data, the background audio track data, and the user initial audio track data.

3. The method of claim 2, wherein after the obtaining the audio data to be processed, the method further comprises:

when the audio data is the mixed audio data, splitting the audio data according to the frequency of the audio data and the gain value of each sampling point to generate the singer initial audio track data, the background audio track data and the user initial audio track data.

4. The method of claim 3, wherein said splitting the audio data according to the frequency of the audio data and the gain value of each sample point to generate the singer initial track data, the background track data and the user initial track data comprises:

calculating a first difference value between a maximum gain value of a middle frequency region and a minimum gain value of a low frequency region of the audio data, and taking the first difference value as a singer initial track data gain value range; and

calculating a second difference value between the middle gain value of the middle frequency area of the audio data and the minimum gain value of the low frequency area of the audio data, and taking the second difference value as a user initial audio track data gain value range; and

calculating a third difference value between the maximum gain value of the high frequency region of the audio data and the minimum gain value of the high frequency region, and taking the third difference value as a gain value range of the background music track data;

splitting the audio data according to the gain value range of the singer initial audio track data, the gain value range of the background audio track data and the gain value range of the user initial audio track data to respectively generate the singer initial audio track data, the background audio track data and the user initial audio track data.

5. The method according to any of claims 1-4, wherein said gain processing said user-initiated track data and said singer-initiated track data to generate user-target track data and singer-target track data comprises:

when the user average gain value is larger than or equal to the singer average gain value, carrying out gain reduction on the singer initial audio track data to generate singer target audio track data, and carrying out gain enhancement on the user initial audio track data to generate user target audio track data; alternatively, the first and second electrodes may be,

and when the user average gain value is smaller than the singer average gain value, performing gain enhancement on the singer initial audio track data to generate singer target audio track data, and performing gain reduction on the user initial audio track data to generate user target audio track data.

6. The method of claim 5, wherein said calculating a user average gain value corresponding to said user initial audio track data and a singer average gain value corresponding to said singer initial audio track data comprises:

and calculating the average value of the maximum gain value and the minimum gain value of the singer initial audio track data, and generating the singer average gain value corresponding to the singer initial audio track data.

7. The method of claim 5, wherein gain-attenuating the singer's initial soundtrack data to generate singer's target soundtrack data, and gain-enhancing the user's initial soundtrack data to generate user's target soundtrack data, comprises:

according to the user average gain value and a preset main audio track gain value, increasing the gain value of each sampling point in the user audio track data frame by frame to generate the user target audio track data; and

and reducing the gain value of each sampling point in the singer audio track data frame by frame according to the singer average gain value and a preset secondary audio track gain value, and generating the singer target audio track data.

8. The method of claim 5, wherein gain enhancing the singer initial soundtrack data to generate singer target soundtrack data, and gain attenuating the user initial soundtrack data to generate user target soundtrack data, comprises:

reducing the gain value of each sampling point in the user audio track data frame by frame according to the user average gain value and a preset secondary audio track gain value to generate the user target audio track data; and

and increasing the gain value of each sampling point in the singer audio track data frame by frame according to the singer average gain value and a preset main audio track gain value to generate the singer target audio track data.

9. An apparatus for adjusting gain of a track, comprising:

an acquisition unit configured to acquire audio data to be processed, the audio data including singer initial track data and user initial track data;

10. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a track gain adjustment program stored on the memory and executable on the processor, the processor implementing the steps of the track gain adjustment method according to any one of claims 1-8 when executing the track gain adjustment program.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a track gain adjustment program, which when executed by a processor, implements the steps of the track gain adjustment method according to any one of claims 1 to 8.