CN109256146A

CN109256146A - Audio-frequency detection, device and storage medium

Info

Publication number: CN109256146A
Application number: CN201811278955.8A
Authority: CN
Inventors: 王征韬
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-01-22
Anticipated expiration: 2038-10-30
Also published as: CN109256146B

Abstract

The invention discloses a kind of audio-frequency detection, device and storage mediums, the described method includes: treating acoustic frequency carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured, and obtain the Meier frequency spectrum of the impact signal, then the starting envelope of the impact signal is calculated according to the Meier frequency spectrum, and the autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal, further according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The embodiment of the present invention provides the regular movements sense intensity value of audio fragment, so that the regular movements sense intensity value provided more meets the auditory perception of user by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio.

Description

Audio-frequency detection, device and storage medium

Technical field

The present embodiments relate to field of audio processing, and in particular to a kind of audio-frequency detection, device and storage medium.

Background technique

Regular movements sense, also known as timing refer to the mankind to a kind of subjective feeling of music rhythm, the strong music beat of regular movements sense Point is clear, usually has abundant and rule percussion music content.Regular movements sense be rhythm in music, speed, power, melody and The one mode that the repeat elements such as sound are constituted.

The regular movements sense of music, which has, to be widely applied, such as music is recommended, mood classification, but regular movements sense is that a comparison is subjective Impression, lack the description of reasonable numerical value.

Summary of the invention

The embodiment of the present invention provides a kind of audio-frequency detection, device and storage medium, objective value can be used to measure sound The regular movements sense intensity of frequency.

The embodiment of the present invention provides a kind of audio-frequency detection, which comprises

It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured；

Obtain the Meier frequency spectrum of the impact signal；

The starting envelope of the impact signal is calculated according to the Meier frequency spectrum；

The autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal；

According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.

The embodiment of the present invention also provides a kind of audio detection device, and described device includes:

Signal separation module carries out audio signal separation for treating acoustic frequency, to obtain the harmonic wave of the audio to be measured Signal and impact signal；

First obtains module, for obtaining the Meier frequency spectrum of the impact signal；

Computing module, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum；

Second obtains module, and the auto-correlation speed of the starting envelope is obtained for the starting envelope according to the impact signal Spend spectrogram；

Determining module, for determining the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram Regular movements sense intensity value.

The embodiment of the present invention also provides a kind of storage medium, and the storage medium is stored with a plurality of instruction, and described instruction is suitable It is loaded in processor, executes the step in any audio-frequency detection provided by the embodiment of the present invention.

The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum Number starting envelope, and according to the starting envelope of the impact signal obtain it is described starting envelope autocorrelation velocity spectrogram, then According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The present invention is real Example is applied by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, provides the regular movements of audio fragment Intensity value is felt, so that the regular movements sense intensity value provided more meets the auditory perception of user.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is a kind of flow diagram of audio-frequency detection provided in an embodiment of the present invention.

Fig. 2 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.

Fig. 3 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.

Fig. 4 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.

Fig. 5 is strong regular movements sense schematic diagram provided in an embodiment of the present invention.

Fig. 6 is weak regular movements sense schematic diagram provided in an embodiment of the present invention.

Fig. 7 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.

Fig. 8 is a kind of structural schematic diagram of audio detection device provided in an embodiment of the present invention.

Fig. 9 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.

Figure 10 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.

Figure 11 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.

Figure 12 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.

Figure 13 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.

Figure 14 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.

Term " first " and " second " in the present invention etc. be for distinguishing different objects, rather than it is specific suitable for describing Sequence.In addition, term " includes " and " having " and their any deformations, it is intended that cover and non-exclusive include.Such as comprising The process, method, system, product or equipment of series of steps or module are not limited to listed step or module, and It is optionally further comprising the step of not listing or module, or optionally further comprising for these process, methods, product or equipment Intrinsic other steps or module.

Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments Containing at least one embodiment of the present invention.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.

Thus, the embodiment of the invention provides a kind of audio-frequency detection, device and storage mediums, by analysis audio Strong striking point or thump point occur regularity and intensity, the regular movements sense intensity value of audio fragment is provided, so that providing Regular movements sense intensity value more meet the auditory perception of user.

Audio-frequency detection provided in an embodiment of the present invention is, it can be achieved that in audio detection device, the audio detection device It specifically can integrate in electronic equipment or other equipment with audio, video data processing function, electronic equipment includes but unlimited In equipment such as computer, smart television, intelligent sound box, mobile phone, tablet computers.

Fig. 1 is please referred to Fig. 6, wherein Fig. 1 to Fig. 4 is a kind of audio-frequency detection provided in an embodiment of the present invention Flow diagram, Fig. 5 are strong regular movements sense schematic diagram provided in an embodiment of the present invention, and Fig. 6 is weak rule provided in an embodiment of the present invention Dynamic schematic diagram.The described method includes:

Step 101, it treats acoustic frequency and carries out audio signal separation, to obtain harmonic signal and the impact of the audio to be measured Signal.

For example, the separation of the harmonic wave of audio, impulse source (Harmonic-Percussive Source Separation, HPSS) it is a kind of common preprocessing means, can be used for harmonic source and impulse source in separating audio signals.Wherein, music Etc. audio signals two kinds of distribution forms are typically exhibited out on spectrogram, it is another one is being distributed along time shaft continuously smooth It is to be distributed along frequency axis continuously smooth, usually the source of sound by above two distribution is referred to as harmonic source and impulse source.Musical instrument can It is divided into orchestra and percussion instrument.The source of sound that orchestra generates generally relatively is releived, and is continuously connected between sound and sound, in frequency spectrum Smooth envelope is shown as on figure.The source of sound that percussion instrument generates generally has strong timing, has between sound and sound larger Span, vertical envelope is shown as on spectrogram.Therefore, on spectrogram, by the sound of the generations such as orchestra releived Source is commonly referred to as harmonic source, and the source of sound of the strong timing of the generations such as percussion instrument is commonly referred to as impulse source.

In the embodiment of the present invention, can use harmonic wave, impact source separation method treat acoustic frequency carry out audio signal separation, To obtain the harmonic signal and impact signal of the audio to be measured.

In some embodiments, as shown in Fig. 2, step 101 can be realized by step 1011 to step 1012, specifically Are as follows:

Step 1011, Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, with Obtain the sonograph of the audio to be measured；

Step 1012, median filtering is carried out respectively along the time orientation of the sonograph and frequency direction, described in obtaining The harmonic signal and impact signal of sonograph, wherein the harmonic signal is to carry out median filtering along the time orientation to obtain Signal, the impact signal is to carry out the obtained signal of median filtering along frequency direction.

Wherein, after audio to be measured being sampled according to predeterminated frequency, according still further to default frame length and preset step-length to institute State audio to be measured and carry out Short Time Fourier Transform, to obtain the sonograph of the audio to be measured, such as firstly, by audio to be measured with After 44100 sample rates are read, and it is frame length with 1024, carries out Short Time Fourier Transform (short-time with 441 for step-length Fourier transform, STFT), obtain the STFT sonograph of the audio to be measured.Then, along the sonograph when it is m- Frequency both direction carries out median filtering respectively, can be obtained the audio original signal to be measured the portion Harmonic and The portion Percussive.Wherein, it filters to obtain harmonic wave (portion Harmonic) signal along time orientation, correspond to continuous in audio to be measured Part.Filtered along frequency direction and impacted (portion Percussive) signal, correspond to audio to be measured in have strike sense or Person impacts the part of sense.

In some embodiments, as shown in figure 3, step 1012 can be realized by step 10121 to step 10123, Specifically:

Step 10121, first time median filtering is carried out respectively along the time orientation of the sonograph and frequency direction, to obtain Take the first harmonic signal and the first impact signal of the sonograph；

Step 10122, the first harmonic signal in the sonograph is removed, to obtain being made of first impact signal Target sonograph；

Step 10123, second of median filtering is carried out respectively along the time orientation of the target sonograph and frequency direction, To obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the second of the target sonograph is humorous Wave signal and the second impact signal constitute the harmonic signal and impact signal of the audio to be measured.

Wherein, audio harmonic wave, impulse source separation (Harmonic-Percussive Source Separation, HPSS it) can also be expressed as H-P separation, the portion Harmonic and the portion Percussive isolated by H-P can respectively indicate For the portion H and the portion P, wherein the portion H corresponds to harmonic signal, and the portion P corresponds to impact signal.

For example, doing a H-P separation, the i.e. time orientation along the sonograph to the sonograph of the audio to be measured first First time median filtering is carried out respectively with frequency direction, to obtain first harmonic signal (portion H) and the first punching of the sonograph Hit signal (portion P).Then abandon the portion H and only stay the portion P, that is, remove the first harmonic signal (portion H) in the sonograph, with obtain by The target sonograph of first impact signal (portion the P) composition.Then a H-P separation is done to the portion P again, extracts newly obtain again The portion P, i.e., second of median filtering is carried out respectively along the time orientation of the target sonograph and frequency direction, described in obtaining The second harmonic signal (portion H newly obtained) of target sonograph and the second impact signal (portion P newly obtained), wherein the mesh The second harmonic signal and the second impact signal of marking sonograph constitute the harmonic signal and impact signal of the audio to be measured.This When, the sonograph of audio to be measured is after the separation of H-P twice, and the continuity sound that the portion P newly obtained includes is seldom, big portion Sub-signal is all signal that thwack hits sense or thump sense, such as drum sound, keyboard knock, gong sound etc., can effectively be divided Separate out harmonic signal and impact signal.

Step 102, the Meier frequency spectrum of the impact signal is obtained.

Wherein, in order to more meet the auditory perception of the mankind, impact signal obtained in step 101 can be converted into Meier (Mel) scale frequency spectrum.Such as it can use Mel frequency cepstral coefficient (Mel-Frequency Coefficients, MFCC) and incite somebody to action The impact signal is converted to Mel scale frequency spectrum, and the Meier frequency spectrum of the impact signal is obtained with this.

Wherein, Mel frequency cepstral coefficient is the dimensions in frequency divided according to human hearing characteristic.Mel frequency and practical frequency Relationship between rate can be indicated with following formula:

Mel (f)=2569log (1+f/700), wherein f indicates the actual frequency of the impact signal.

When frequency is in 1000Hz or less, the hearing ability of human ear linearly increases with sound frequency, when frequency exists When 1000Hz or more, the hearing ability and sound frequency of human ear are in log series model.Therefore, according to this corresponding relationship to actual frequency Carry out frequency band division, a series of filter sequence of available triangles, referred to as Mel filter group.For example, taking maximum frequency Rate is the Mel frequency spectrum that 1000Hz calculates the impact signal, and Mel frequency band number is 128.

Step 103, the starting envelope of the impact signal is calculated according to the Meier frequency spectrum.

The starting envelope (onset envelope) of the impact signal is calculated the Mel frequency spectrum of the impact signal, i.e., The envelope of onset point, wherein onset refers in audio the starting point of " event ", and the envelope of onset point refers in audio " thing The line of the starting point of part ".For example, can use envelope detection (envelope-demodulation) device calculates the impact The onset envelope of signal, the peak point for being converted to the impact signal of Mel frequency spectrum carry out line, due to impact letter at this time It number only include strong striking point, therefore obtained onset envelope is actually a series of peak value line of strong striking points.

Step 104, the autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal.

Wherein it is possible to by the local autocorrelation function for the starting envelope for calculating the impact signal, to obtain described rise The autocorrelation velocity spectrogram (onset envelope tempogram) of beginning envelope.The autocorrelative reason in selection part is song Regular movements sense variation in complete bent range may be larger, and global calculation auto-correlation function cannot correctly depict the regular movements of music Sense, and local calculation auto-correlation function can the more acurrate regular movements sense for depicting music.

In some embodiments, as shown in figure 4, step 104 can be realized by step 1041 to step 1042, specifically Are as follows:

Step 1041, sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal, it is multiple to obtain Each local segment is divided into multiple framings according still further to preset step-length by local segment；

Step 1042, it will be counted in the corresponding multiple framings input local autocorrelation functions of each local segment It calculates, to obtain the autocorrelation velocity spectrogram of the starting envelope.

For example, the duration of local segment can be 8.9s, the step-length of framing can be 0.01s.

Wherein, according to time framing and calculate it is that local autocorrelation function obtains the result is that one 2 dimension matrix, referred to as Tempogram, the tempogram are used to indicate the autocorrelation velocity spectrogram of the starting envelope.

Step 105, according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense of the audio to be measured is determined Intensity value.

Wherein, so-called regular movements sense shows on percussion instrument to be exactly that idiophonic regular response and its response are strong The comprehensive function of degree as a result, such as rhythm clear and definite, the stronger percussion instrument of response can bring strong regular movements sense to melody, and The percussion instrument that rhythm is indistinct, response is weaker then gives the regular movements sense of people not strong.The embodiment of the present invention converts the calculating of regular movements sense For the calculating to striking point regularity and intensity.

It in some embodiments, can be corresponding certainly by obtaining multiple local frames described in the autocorrelation velocity spectrogram Related mean value, and time high peak-to-peak value in the corresponding auto-correlation mean value of the multiple local frame is extracted, it is described to be measured to be determined as The regular movements sense intensity value of audio.

For example, taking the auto-correlation mean value of each local segment in tempogram, and taking its second high peak-to-peak value is audio Regular movements sense intensity value.Wherein, the numerical value of the regular movements sense intensity value is normalization peak value, and the range that theoretically numerical value can take is 0~1, the actually numerical value is usually no more than 0.8.When the numerical value is higher than 0.2, general subjectivity, which can acoustically experience audio, to be had Stronger regular movements sense.

For example, Fig. 5 shows that the autocorrelation velocity spectrogram of typical strong regular movements sense, Fig. 6 show typical weak rule Dynamic autocorrelation velocity spectrogram.Wherein, in autocorrelation velocity spectrogram, abscissa is the amplitude that signal deviates to the right, ordinate It is the relevance values of original signal and shifted signal after deviating, autocorrelative calculating is that signal itself deviates to the right certain amplitude, so Calculate the relevance values of shifted signal and original signal again afterwards.

Above-mentioned all technical solutions can form alternative embodiment of the invention using any combination, not another herein One repeats.

The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum Number starting envelope, and according to the starting envelope of the impact signal obtain it is described starting envelope autocorrelation velocity spectrogram, then According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The present invention is real Example is applied by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, provides the regular movements of audio fragment Feel intensity value, the regular movements sense intensity of objective value audio gauge can be used, so that the regular movements sense intensity value provided more meets user Auditory perception.

Referring to Fig. 7, wherein, Fig. 7 is a kind of another process signal of audio-frequency detection provided in an embodiment of the present invention Figure.The described method includes:

Step 201, it treats acoustic frequency and carries out audio signal separation, to obtain harmonic signal and the impact of the audio to be measured Signal.

In some embodiments, the acoustic frequency for the treatment of carries out audio signal separation, to obtain the humorous of the audio to be measured Wave signal and impact signal, comprising:

Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, with obtain it is described to The sonograph of acoustic frequency；

Time orientation and frequency direction along the sonograph carry out median filtering respectively, to obtain the humorous of the sonograph Wave signal and impact signal, wherein the harmonic signal is that the signal that median filtering obtains is carried out along the time orientation, described Impact signal is to carry out the signal that median filtering obtains along frequency direction.

In some embodiments, the time orientation along the sonograph and frequency direction carry out median filtering respectively, To obtain the harmonic signal and impact signal of the sonograph, comprising:

Time orientation and frequency direction along the sonograph carry out first time median filtering respectively, to obtain the sound spectrum The first harmonic signal and the first impact signal of figure；

The first harmonic signal in the sonograph is removed, to obtain the target sound spectrum being made of first impact signal Figure；

Time orientation and frequency direction along the target sonograph carry out second of median filtering respectively, described in obtaining The second harmonic signal and the second impact signal of target sonograph, wherein the second harmonic signal of the target sonograph and Two impact signals constitute the harmonic signal and impact signal of the audio to be measured.

Step 202, the Meier frequency spectrum of the impact signal is obtained.

Wherein, in order to more meet the auditory perception of the mankind, impact signal obtained in step 201 can be converted into Meier (Mel) scale frequency spectrum.Such as it can use Mel frequency cepstral coefficient (Mel-FrequencyCoefficients, MFCC) for institute It states impact signal and is converted to Mel scale frequency spectrum, the Meier frequency spectrum of the impact signal is obtained with this.

Step 203, the starting envelope of the impact signal is calculated according to the Meier frequency spectrum.

Step 204, processing is filtered to the starting envelope of the impact signal, to filter out numerical value in the starting envelope Less than the signaling point of threshold value.

Wherein, the onset envelope obtained from step 203 still include it is some can be with ignored weak response point, this Although a little weak response points are not the principal elements for influencing music mode innervation, subsequent calculate can be had an impact, it therefore, can be by The weak response point in the starting envelope of the impact signal is filtered out according to certain threshold value.For example, selection signal top 0.2 at be threshold value, eliminate it is described starting envelope in numerical value be less than threshold value weak response point.

Step 205, according to the starting envelope of the impact signal after the filtration treatment, the speed of the starting envelope is obtained Spectrogram.

Wherein, after step 204 filters out weak response point, the remaining point of onset envelope is all to have relatively strong ring The rhythm point answered can be by the local autocorrelation function of the starting envelope of the calculating impact signal, to obtain in this step The autocorrelation velocity spectrogram (onset envelope tempogram) of the starting envelope.The autocorrelative reason in selection part It is that regular movements sense variation of the song in complete bent range may be larger, and global calculation auto-correlation function cannot correctly depict music Regular movements sense, and local calculation auto-correlation function can the more acurrate regular movements sense for depicting music.

In some embodiments, the starting envelope according to the impact signal after the filtration treatment obtains described rise The speed spectrogram of beginning envelope, comprising:

Sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal after the filtration treatment, it is more to obtain Each local segment is divided into multiple framings according still further to preset step-length by a local segment；

It will be calculated in the corresponding multiple framings input local autocorrelation functions of each local segment, to obtain State the autocorrelation velocity spectrogram of starting envelope.

Step 206, according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense of the audio to be measured is determined Intensity value.

In some embodiments, also otherwise comprehensive part is autocorrelative as a result, for example taking in onsetenvelope Maximum value, take minimum value or other temporal voting strategies etc. in onset envelope.Then it is otherwise obtained from above-mentioned Signal in obtain regular movements sense intensity value, for example, take the mean value at N number of peak of TOP, time peak is asked after normalizing by peak-peak again Peak value etc., then using time high peak-to-peak value of acquirement as the regular movements sense intensity value of audio to be measured.In addition, strong in analysis audio In the regularity and intensity process that striking point or thump point occur, the parameter of algorithm can be finely tuned, for example, window length, step-length, Mel number of filter, cutoff frequency etc. more accurately provide the regular movements sense intensity value of audio fragment with this.

Step 207, according to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured.

For example, music can be divided into multiple music types according to different regular movements sense intensity values, for example it is divided into light Music and DJ music etc., or be divided into stroll music, walking music, music of jogging, speed and run music etc..Every a piece of music in addition to It marks except music type, the beat point of music can also be recorded.

Step 208, audio is generated according to the audio classification result of multiple audios to be measured and current voice applications scene Recommend inventory.

For example, mobile terminal can pass through fortune when mobile terminal detects the voice applications scene for being currently at running The paces frequency of dynamic sensor senses user, then chooses music beat point most from the audio classification result of multiple audios to be measured Former songs close to the paces frequency of user recommend inventory to recommend to user as music.

The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum Number starting envelope, and processing is filtered to the starting envelope of the impact signal, to filter out numerical value in the starting envelope Less than the signaling point of threshold value, the starting envelope is then obtained according to the starting envelope of the impact signal after the filtration treatment Autocorrelation velocity spectrogram determines the regular movements of the audio to be measured further according to time high peak-to-peak value in the autocorrelation velocity spectrogram Feel intensity value, and according to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured, then basis The audio classification result of multiple audios to be measured and current voice applications scene generate audio and recommend inventory.The embodiment of the present invention By the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, the regular movements sense for providing audio fragment is strong Angle value can use the regular movements sense intensity of objective value audio gauge, so that the regular movements sense intensity value provided more meets listening for user Feel impression, and regular movements sense intensity index can be used as the important feature that the music of a variety of music applications such as running radio station is recommended.

The embodiment of the present invention also provides a kind of audio detection device, and as shown in Figs. 8 to 11, Fig. 8 to Figure 11 is this hair A kind of structural schematic diagram for audio detection device that bright embodiment provides.The audio detection device 30 may include Signal separator Module 31, first obtains module 32, and computing module 33, second obtains module 35 and determining module 36.

Wherein, the signal separation module 31 carries out audio signal separation for treating acoustic frequency, described to be measured to obtain The harmonic signal and impact signal of audio；

Described first obtains module 32, for obtaining the Meier frequency spectrum of the impact signal；

The computing module 33, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum；

Described second obtains module 35, obtains oneself of the starting envelope for the starting envelope according to the impact signal Relevant speed spectrogram；

The determining module 36, for determining described to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram The regular movements sense intensity value of audio.

In some embodiments, as shown in figure 9, the signal separation module 31 includes:

Transformation submodule 311, for being carried out in Fu in short-term according to default frame length and preset step-length to the audio to be measured Leaf transformation, to obtain the sonograph of the audio to be measured；

Submodule 312 is filtered, carries out median filtering respectively with frequency direction for the time orientation along the sonograph, with Obtain the harmonic signal and impact signal of the sonograph, wherein the harmonic signal is to carry out intermediate value along the time orientation Obtained signal is filtered, the impact signal is to carry out the signal that median filtering obtains along frequency direction.

In some embodiments, as shown in Figure 10, the filtering submodule 312 includes:

First filter unit 3121 carries out in first time for the time orientation along the sonograph with frequency direction respectively Value filtering, to obtain the first harmonic signal and the first impact signal of the sonograph；

Removal unit 3122, for removing the first harmonic signal in the sonograph, to obtain being impacted by described first The target sonograph of signal composition；

Second filter unit 3123 carries out second with frequency direction for the time orientation along the target sonograph respectively Secondary median filtering, to obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the target sound spectrum The second harmonic signal and the second impact signal of figure constitute the harmonic signal and impact signal of the audio to be measured.

In some embodiments, as shown in figure 11, the second acquisition module 35 includes:

Framing submodule 351, for carrying out sub-frame processing according to starting envelope of the preset duration to the impact signal, with Multiple local segments are obtained, each local segment is divided into multiple framings according still further to preset step-length；

Computational submodule 352, for will the corresponding multiple framings input local autocorrelation functions of each local segment In calculated, with obtain it is described starting envelope autocorrelation velocity spectrogram.

In some embodiments, the determining module 36 is also used to according to each described in the autocorrelation velocity spectrogram Time high peak-to-peak value of the auto-correlation mean value of local segment, is determined as the regular movements sense intensity value of the audio to be measured.

Audio detection device 30 provided in an embodiment of the present invention treats acoustic frequency by signal separation module 31 and carries out audio Signal separator, to obtain the harmonic signal and impact signal of the audio to be measured, the first acquisition module 32 obtains the impact letter Number Meier frequency spectrum, computing module 33 calculates the starting envelope of the impact signal according to the Meier frequency spectrum, and second obtains mould Block 35 obtains the autocorrelation velocity spectrogram of the starting envelope, 36 basis of determining module according to the starting envelope of the impact signal Time high peak-to-peak value in the autocorrelation velocity spectrogram, determines the regular movements sense intensity value of the audio to be measured.The embodiment of the present invention The audio detection device 30 of offer is given by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio The regular movements sense intensity value of audio fragment out can use the regular movements sense intensity of objective value audio gauge, so that the regular movements sense provided Intensity value more meets the auditory perception of user.

In some embodiments, as shown in figure 12, Figure 12 is a kind of audio detection device provided in an embodiment of the present invention Another structural schematic diagram.The audio detection device 30 may include signal separation module 31, and first obtains module 32, calculate mould Block 33, filtering module 34, second obtains module 35, determining module 36, categorization module 37 and generation module 38.

The filtering module 34 is filtered processing for the starting envelope to the impact signal, to filter out described rise Numerical value is less than the signaling point of threshold value in beginning envelope

Described second obtains module 35, for the starting envelope according to the impact signal after the filtration treatment, obtains institute State the speed spectrogram of starting envelope；

The determining module 36, for determining described to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram The regular movements sense intensity value of audio；

The categorization module 37 carries out the audio to be measured for the regular movements sense intensity value according to the audio to be measured Audio classification；

The generation module 38, for according to multiple audios to be measured audio classification result and current voice applications field Scape generates audio and recommends inventory.

The audio detection device 30 of the embodiment of the present invention.Acoustic frequency, which is treated, by signal separation module 31 carries out audio signal Separation, to obtain the harmonic signal and impact signal of the audio to be measured, the first acquisition module 32 obtains the impact signal Meier frequency spectrum, computing module 33 calculate the starting envelope of the impact signal, 34 pairs of institutes of filtering module according to the Meier frequency spectrum The starting envelope for stating impact signal is filtered processing, to filter out the signaling point that numerical value in the starting envelope is less than threshold value, the Two acquisition modules 35 obtain the auto-correlation speed of the starting envelope according to the starting envelope of the impact signal after the filtration treatment Spectrogram is spent, determining module 36 determines the regular movements of the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram Feel intensity value, categorization module 37 carries out audio classification to the audio to be measured according to the regular movements sense intensity value of the audio to be measured, Generation module 38, which generates audio according to the audio classification result and current voice applications scene of multiple audios to be measured, to be recommended clearly It is single.Audio detection device 30 provided in an embodiment of the present invention passes through the strong striking point or the appearance of thump point analyzed in audio Regularity and intensity, provide the regular movements sense intensity value of audio fragment, can use the regular movements sense intensity of objective value audio gauge, make The regular movements sense intensity value that must be provided more meets the auditory perception of user, and regular movements sense intensity index can be used as running radio station etc. The important feature that the music of a variety of music applications is recommended.

The embodiment of the present invention also provides a kind of server, and as shown in figure 13, it illustrates involved in the embodiment of the present invention The structural schematic diagram of server, specifically:

The server may include one or processor 401, one or more meters of more than one processing core The components such as memory 402, power supply 403 and the input unit 404 of calculation machine readable storage medium storing program for executing.Those skilled in the art can manage It solves, server architecture shown in Figure 13 does not constitute the restriction to server, may include than illustrating more or fewer portions Part perhaps combines certain components or different component layouts.Wherein:

Processor 401 is the control centre of the server, utilizes each of various interfaces and the entire server of connection Part by running or execute the software program and/or module that are stored in memory 402, and calls and is stored in memory Data in 402, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate Managing device 401 may include one or more processing cores；Preferably, processor 401 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401.

Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation Software program and module, thereby executing various function application and data processing.Memory 402 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area, which can be stored, uses created data according to server Deng.In addition, memory 402 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also include Memory Controller, to provide access of the processor 401 to memory 402.

Server further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management system It unites logically contiguous with processor 401, to realize the function such as management charging, electric discharge and power managed by power-supply management system Energy.Power supply 403 can also include one or more direct current or AC power source, recharging system, power failure monitor electricity The random components such as road, power adapter or inverter, power supply status indicator.

The server may also include input unit 404, which can be used for receiving the number or character letter of input Breath, and generation keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal are defeated Enter.

Although being not shown, server can also be including display unit etc., and details are not described herein.Specifically in the present embodiment, Processor 401 in server can according to following instruction, by the process of one or more application program is corresponding can It executes file to be loaded into memory 402, and runs the application program being stored in memory 402 by processor 401, thus Realize various functions, as follows:

It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured；It obtains Take the Meier frequency spectrum of the impact signal；The starting envelope of the impact signal is calculated according to the Meier frequency spectrum；According to described The starting envelope of impact signal obtains the autocorrelation velocity spectrogram of the starting envelope；According in the autocorrelation velocity spectrogram Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured.

The above operation is for details, reference can be made to the embodiment of front, and therefore not to repeat here.

From the foregoing, it will be observed that server provided in this embodiment, treat acoustic frequency and carry out audio signal separation, with obtain it is described to The harmonic signal and impact signal of acoustic frequency, and the Meier frequency spectrum of the impact signal is obtained, then according to the Meier frequency spectrum Calculate the starting envelope of the impact signal, and according to the starting envelope of the impact signal obtain the starting envelope from phase Speed spectrogram is closed, further according to time high peak-to-peak value in the autocorrelation velocity spectrogram, determines that the regular movements sense of the audio to be measured is strong Angle value.The embodiment of the present invention is provided by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio The regular movements sense intensity value of audio fragment, so that the regular movements sense intensity value provided more meets the auditory perception of user.

Correspondingly, the embodiment of the present invention also provides a kind of terminal, as shown in figure 14, the terminal may include radio frequency (RF, Radio Frequency) circuit 501, the memory 502, defeated that includes one or more computer readable storage medium Enter unit 503, display unit 504, sensor 505, voicefrequency circuit 506, Wireless Fidelity (WiFi, Wireless Fidelity) The components such as module 507, the processor 508 for including one or more than one processing core and power supply 509.This field skill Art personnel are appreciated that the restriction of the not structure paired terminal of terminal structure shown in Figure 14, may include than illustrate it is more or Less component perhaps combines certain components or different component layouts.Wherein:

RF circuit 501 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, one or the processing of more than one processor 508 are transferred to；In addition, the data for being related to uplink are sent to Base station.In general, RF circuit 501 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses Family identity module (SIM, Subscriber Identity Module) card, transceiver, coupler, low-noise amplifier (LNA, Low Noise Amplifier), duplexer etc..In addition, RF circuit 501 can also by wireless communication with network and its He communicates equipment.Any communication standard or agreement, including but not limited to global system for mobile telecommunications system can be used in the wireless communication Unite (GSM, Global System of Mobile communication), general packet radio service (GPRS, General Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division it is more Location (WCDMA, Wideband Code Division Multiple Access), long term evolution (LTE, Long Term Evolution), Email, short message service (SMS, Short Messaging Service) etc..

Memory 502 can be used for storing software program and module, and processor 508 is stored in memory 502 by operation Software program and module, thereby executing various function application and data processing.Memory 502 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area, which can be stored, uses created data according to terminal (such as audio data, phone directory etc.) etc..In addition, memory 502 may include high-speed random access memory, can also include Nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.Phase Ying Di, memory 502 can also include Memory Controller, to provide processor 508 and input unit 503 to memory 502 Access.

Input unit 503 can be used for receiving the number or character information of input, and generate and user setting and function Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, in a specific embodiment In, input unit 503 may include touch sensitive surface and other input equipments.Touch sensitive surface, also referred to as touch display screen or touching Control plate, collect user on it or nearby touch operation (such as user using any suitable object such as finger, stylus or Operation of the attachment on touch sensitive surface or near touch sensitive surface), and corresponding connection dress is driven according to preset formula It sets.Optionally, touch sensitive surface may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus is examined The touch orientation of user is surveyed, and detects touch operation bring signal, transmits a signal to touch controller；Touch controller from Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then gives processor 508, and can reception processing Order that device 508 is sent simultaneously is executed.Furthermore, it is possible to a variety of using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Type realizes touch sensitive surface.In addition to touch sensitive surface, input unit 503 can also include other input equipments.Specifically, other are defeated Entering equipment can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse One of mark, operating stick etc. are a variety of.

Display unit 504 can be used for showing information input by user or be supplied to user information and terminal it is various Graphical user interface, these graphical user interface can be made of figure, text, icon, video and any combination thereof.Display Unit 504 may include display panel, optionally, can using liquid crystal display (LCD, Liquid Crystal Display), The forms such as Organic Light Emitting Diode (OLED, Organic Light-Emitting Diode) configure display panel.Further , touch sensitive surface can cover display panel, after touch sensitive surface detects touch operation on it or nearby, send processing to Device 508 is followed by subsequent processing device 508 and is provided on a display panel accordingly according to the type of touch event to determine the type of touch event Visual output.Although touch sensitive surface and display panel are to realize input and defeated as two independent components in Figure 14 Enter function, but in some embodiments it is possible to touch sensitive surface and display panel is integrated and realizes and outputs and inputs function.

Terminal may also include at least one sensor 505, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel, proximity sensor can close display panel and/or back when terminal is moved in one's ear Light.As a kind of motion sensor, gravity accelerometer can detect (generally three axis) acceleration in all directions Size can detect that size and the direction of gravity when static, can be used to identify mobile phone posture application (such as horizontal/vertical screen switching, Dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, strike) etc.；It can also configure as terminal The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.

Voicefrequency circuit 506, loudspeaker, microphone can provide the audio interface between user and terminal.Voicefrequency circuit 506 can By the electric signal after the audio data received conversion, it is transferred to loudspeaker, voice signal output is converted to by loudspeaker；It is another The voice signal of collection is converted to electric signal by aspect, microphone, is converted to audio data after being received by voicefrequency circuit 506, then After the processing of audio data output processor 508, it is sent to such as another terminal through RF circuit 501, or by audio data Output is further processed to memory 502.Voicefrequency circuit 506 is also possible that earphone jack, with provide peripheral hardware earphone with The communication of terminal.

WiFi belongs to short range wireless transmission technology, and terminal can help user's transceiver electronics postal by WiFi module 507 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 14 is shown WiFi module 507, but it is understood that, and it is not belonging to must be configured into for terminal, it can according to need do not changing completely Become in the range of the essence of invention and omits.

Processor 508 is the control centre of terminal, using the various pieces of various interfaces and connection whole mobile phone, is led to It crosses operation or executes the software program and/or module being stored in memory 502, and call and be stored in memory 502 Data execute the various functions and processing data of terminal, to carry out integral monitoring to mobile phone.Optionally, processor 508 can wrap Include one or more processing cores；Preferably, processor 508 can integrate application processor and modem processor, wherein answer With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication. It is understood that above-mentioned modem processor can not also be integrated into processor 508.

Terminal further includes the power supply 509 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe Reason system and processor 508 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.Power supply 509 can also include one or more direct current or AC power source, recharging system, power failure inspection The random components such as slowdown monitoring circuit, power adapter or inverter, power supply status indicator.

Although being not shown, terminal can also include camera, bluetooth module etc., and details are not described herein.Specifically in this implementation In example, the processor 508 in terminal can be corresponding by the process of one or more application program according to following instruction Executable file is loaded into memory 502, and the application program being stored in memory 502 is run by processor 508, from And realize various functions:

From the foregoing, it will be observed that terminal provided in this embodiment, treats acoustic frequency and carries out audio signal separation, it is described to be measured to obtain The harmonic signal and impact signal of audio, and the Meier frequency spectrum of the impact signal is obtained, then according to the Meier spectrometer The starting envelope of the impact signal is calculated, and obtains the auto-correlation of the starting envelope according to the starting envelope of the impact signal Speed spectrogram determines the regular movements sense intensity of the audio to be measured further according to time high peak-to-peak value in the autocorrelation velocity spectrogram Value.The embodiment of the present invention provides sound by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio The regular movements sense intensity value of frequency segment, so that the regular movements sense intensity value provided more meets the auditory perception of user.

It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one In storage media, and is loaded and executed by processor.

For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed Device is loaded, to execute the step in any audio-frequency detection provided by the embodiment of the present invention.For example, the instruction can To execute following steps:

The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.

Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory Body (RAM, Random Access Memory), disk or CD etc..

By the instruction stored in the storage medium, it can execute and appoint audio detection side provided by the embodiment of the present invention Step in method, it is thereby achieved that beneficial achieved by any audio-frequency detection provided by the embodiment of the present invention Effect is detailed in the embodiment of front, and details are not described herein.

It is provided for the embodiments of the invention a kind of audio-frequency detection, device and storage medium above and has carried out detailed Jie It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only It is to be used to help understand method and its core concept of the invention；Meanwhile for those skilled in the art, according to the present invention Thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as Limitation of the present invention.

Claims

1. a kind of audio-frequency detection, which is characterized in that the described method includes:

Obtain the Meier frequency spectrum of the impact signal；

2. audio-frequency detection as described in claim 1, which is characterized in that in the starting packet according to the impact signal Network obtains before the speed spectrogram of the starting envelope, further includes:

Processing is filtered to the starting envelope of the impact signal, to filter out the letter that numerical value in the starting envelope is less than threshold value Number point；

The starting envelope according to the impact signal obtains the autocorrelation velocity spectrogram of the starting envelope, comprising:

According to the starting envelope of the impact signal after the filtration treatment, the speed spectrogram of the starting envelope is obtained.

3. audio-frequency detection as described in claim 1, which is characterized in that the acoustic frequency for the treatment of carries out audio signal point From to obtain the harmonic signal and impact signal of the audio to be measured, comprising:

Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, it is described to acoustic to obtain The sonograph of frequency；

Time orientation and frequency direction along the sonograph carry out median filtering respectively, to obtain the harmonic wave letter of the sonograph Number and impact signal, wherein the harmonic signal is to carry out the obtained signal of median filtering, the impact along the time orientation Signal is to carry out the signal that median filtering obtains along frequency direction.

4. audio-frequency detection as claimed in claim 3, which is characterized in that the time orientation along the sonograph and frequency Rate direction carries out median filtering respectively, to obtain the harmonic signal and impact signal of the sonograph, comprising:

Time orientation and frequency direction along the sonograph carry out first time median filtering respectively, to obtain the sonograph First harmonic signal and the first impact signal；

The first harmonic signal in the sonograph is removed, to obtain the target sonograph being made of first impact signal；

Time orientation and frequency direction along the target sonograph carry out second of median filtering respectively, to obtain the target The second harmonic signal and the second impact signal of sonograph, wherein the second harmonic signal of the target sonograph and the second punching Hit harmonic signal and impact signal that signal constitutes the audio to be measured.

5. audio-frequency detection as described in claim 1, which is characterized in that the starting envelope according to the impact signal Obtain the autocorrelation velocity spectrogram of the starting envelope, comprising:

Sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal, to obtain multiple local segments, then is pressed Each local segment is divided into multiple framings according to preset step-length；

It will be calculated in the corresponding multiple framings input local autocorrelation functions of each local segment, to obtain described rise The autocorrelation velocity spectrogram of beginning envelope.

6. audio-frequency detection as claimed in claim 5, which is characterized in that described according in the autocorrelation velocity spectrogram Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured, comprising:

According to time high peak-to-peak value of the auto-correlation mean value of the local segment each in the autocorrelation velocity spectrogram, it is determined as institute State the regular movements sense intensity value of audio to be measured.

7. audio-frequency detection as described in claim 1, which is characterized in that the method also includes:

According to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured；

Audio, which is generated, according to the audio classification result of multiple audios to be measured and current voice applications scene recommends inventory.

8. a kind of audio detection device, which is characterized in that described device includes:

Signal separation module carries out audio signal separation for treating acoustic frequency, to obtain the harmonic signal of the audio to be measured With impact signal；

Second obtains module, and the autocorrelation velocity spectrum of the starting envelope is obtained for the starting envelope according to the impact signal Figure；

Determining module, for determining the regular movements of the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram Feel intensity value.

9. audio detection device as claimed in claim 8, which is characterized in that described device further include:

Filtering module is filtered processing for the starting envelope to the impact signal, to filter out number in the starting envelope Value is less than the signaling point of threshold value；

Described second obtains module, is also used to the starting envelope according to the impact signal after the filtration treatment, obtains described rise The speed spectrogram of beginning envelope.

10. audio detection device as claimed in claim 8, which is characterized in that the signal separation module includes:

Transformation submodule, for carrying out Short Time Fourier Transform to the audio to be measured according to default frame length and preset step-length, To obtain the sonograph of the audio to be measured；

Submodule is filtered, median filtering is carried out respectively with frequency direction for the time orientation along the sonograph, to obtain State the harmonic signal and impact signal of sonograph, wherein the harmonic signal is to carry out median filtering along the time orientation to obtain The signal arrived, the impact signal are to carry out the signal that median filtering obtains along frequency direction.

11. audio detection device as claimed in claim 10, which is characterized in that the filtering submodule, comprising:

First filter unit carries out first time median filtering with frequency direction for the time orientation along the sonograph respectively, To obtain the first harmonic signal and the first impact signal of the sonograph；

Removal unit, for removing the first harmonic signal in the sonograph, to obtain being made of first impact signal Target sonograph；

Second filter unit carries out second of intermediate value filter with frequency direction for the time orientation along the target sonograph respectively Wave, to obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the second of the target sonograph Harmonic signal and the second impact signal constitute the harmonic signal and impact signal of the audio to be measured.

12. audio detection device as claimed in claim 8, which is characterized in that described second, which obtains module, includes:

Framing submodule is more to obtain for carrying out sub-frame processing according to starting envelope of the preset duration to the impact signal Each local segment is divided into multiple framings according still further to preset step-length by a local segment；

Computational submodule, based on being carried out in multiple framings input local autocorrelation functions that each local segment is corresponding It calculates, to obtain the autocorrelation velocity spectrogram of the starting envelope.

13. audio detection device as claimed in claim 12, which is characterized in that the determining module, it is described certainly for basis Time high peak-to-peak value of the auto-correlation mean value of each local segment, is determined as the rule of the audio to be measured in relevant speed spectrogram Dynamic intensity value.

14. audio detection device as claimed in claim 8, which is characterized in that described device further include:

Categorization module carries out audio classification to the audio to be measured for the regular movements sense intensity value according to the audio to be measured；

Generation module, for generating audio according to the audio classification result and current voice applications scene of multiple audios to be measured Recommend inventory.

15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor It is loaded, the step in 1 to 7 described in any item audio-frequency detections is required with perform claim.