CN109256146A - Audio-frequency detection, device and storage medium - Google Patents
Audio-frequency detection, device and storage medium Download PDFInfo
- Publication number
- CN109256146A CN109256146A CN201811278955.8A CN201811278955A CN109256146A CN 109256146 A CN109256146 A CN 109256146A CN 201811278955 A CN201811278955 A CN 201811278955A CN 109256146 A CN109256146 A CN 109256146A
- Authority
- CN
- China
- Prior art keywords
- audio
- signal
- impact signal
- sonograph
- measured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 54
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 230000033001 locomotion Effects 0.000 claims abstract description 95
- 238000001228 spectrum Methods 0.000 claims abstract description 55
- 238000000926 separation method Methods 0.000 claims abstract description 48
- 230000005236 sound signal Effects 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000001914 filtration Methods 0.000 claims description 45
- 238000009432 framing Methods 0.000 claims description 16
- 238000005311 autocorrelation function Methods 0.000 claims description 13
- 230000011664 signaling Effects 0.000 claims description 5
- 238000004080 punching Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 abstract description 10
- 238000004458 analytical method Methods 0.000 abstract description 9
- 239000012634 fragment Substances 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 20
- 238000009527 percussion Methods 0.000 description 14
- 230000004044 response Effects 0.000 description 13
- 230000033764 rhythmic process Effects 0.000 description 9
- 230000006854 communication Effects 0.000 description 7
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000030214 innervation Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/041—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/071—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a kind of audio-frequency detection, device and storage mediums, the described method includes: treating acoustic frequency carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured, and obtain the Meier frequency spectrum of the impact signal, then the starting envelope of the impact signal is calculated according to the Meier frequency spectrum, and the autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal, further according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The embodiment of the present invention provides the regular movements sense intensity value of audio fragment, so that the regular movements sense intensity value provided more meets the auditory perception of user by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio.
Description
Technical field
The present embodiments relate to field of audio processing, and in particular to a kind of audio-frequency detection, device and storage medium.
Background technique
Regular movements sense, also known as timing refer to the mankind to a kind of subjective feeling of music rhythm, the strong music beat of regular movements sense
Point is clear, usually has abundant and rule percussion music content.Regular movements sense be rhythm in music, speed, power, melody and
The one mode that the repeat elements such as sound are constituted.
The regular movements sense of music, which has, to be widely applied, such as music is recommended, mood classification, but regular movements sense is that a comparison is subjective
Impression, lack the description of reasonable numerical value.
Summary of the invention
The embodiment of the present invention provides a kind of audio-frequency detection, device and storage medium, objective value can be used to measure sound
The regular movements sense intensity of frequency.
The embodiment of the present invention provides a kind of audio-frequency detection, which comprises
It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured;
Obtain the Meier frequency spectrum of the impact signal;
The starting envelope of the impact signal is calculated according to the Meier frequency spectrum;
The autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal;
According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.
The embodiment of the present invention also provides a kind of audio detection device, and described device includes:
Signal separation module carries out audio signal separation for treating acoustic frequency, to obtain the harmonic wave of the audio to be measured
Signal and impact signal;
First obtains module, for obtaining the Meier frequency spectrum of the impact signal;
Computing module, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum;
Second obtains module, and the auto-correlation speed of the starting envelope is obtained for the starting envelope according to the impact signal
Spend spectrogram;
Determining module, for determining the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram
Regular movements sense intensity value.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium is stored with a plurality of instruction, and described instruction is suitable
It is loaded in processor, executes the step in any audio-frequency detection provided by the embodiment of the present invention.
The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured
Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum
Number starting envelope, and according to the starting envelope of the impact signal obtain it is described starting envelope autocorrelation velocity spectrogram, then
According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The present invention is real
Example is applied by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, provides the regular movements of audio fragment
Intensity value is felt, so that the regular movements sense intensity value provided more meets the auditory perception of user.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of flow diagram of audio-frequency detection provided in an embodiment of the present invention.
Fig. 2 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.
Fig. 3 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.
Fig. 4 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.
Fig. 5 is strong regular movements sense schematic diagram provided in an embodiment of the present invention.
Fig. 6 is weak regular movements sense schematic diagram provided in an embodiment of the present invention.
Fig. 7 is a kind of another flow diagram of audio-frequency detection provided in an embodiment of the present invention.
Fig. 8 is a kind of structural schematic diagram of audio detection device provided in an embodiment of the present invention.
Fig. 9 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.
Figure 10 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.
Figure 11 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.
Figure 12 is a kind of another structural schematic diagram of audio detection device provided in an embodiment of the present invention.
Figure 13 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.
Figure 14 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
Term " first " and " second " in the present invention etc. be for distinguishing different objects, rather than it is specific suitable for describing
Sequence.In addition, term " includes " and " having " and their any deformations, it is intended that cover and non-exclusive include.Such as comprising
The process, method, system, product or equipment of series of steps or module are not limited to listed step or module, and
It is optionally further comprising the step of not listing or module, or optionally further comprising for these process, methods, product or equipment
Intrinsic other steps or module.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
Containing at least one embodiment of the present invention.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Regular movements sense, also known as timing refer to the mankind to a kind of subjective feeling of music rhythm, the strong music beat of regular movements sense
Point is clear, usually has abundant and rule percussion music content.Regular movements sense be rhythm in music, speed, power, melody and
The one mode that the repeat elements such as sound are constituted.
The regular movements sense of music, which has, to be widely applied, such as music is recommended, mood classification, but regular movements sense is that a comparison is subjective
Impression, lack the description of reasonable numerical value.
Thus, the embodiment of the invention provides a kind of audio-frequency detection, device and storage mediums, by analysis audio
Strong striking point or thump point occur regularity and intensity, the regular movements sense intensity value of audio fragment is provided, so that providing
Regular movements sense intensity value more meet the auditory perception of user.
Audio-frequency detection provided in an embodiment of the present invention is, it can be achieved that in audio detection device, the audio detection device
It specifically can integrate in electronic equipment or other equipment with audio, video data processing function, electronic equipment includes but unlimited
In equipment such as computer, smart television, intelligent sound box, mobile phone, tablet computers.
Fig. 1 is please referred to Fig. 6, wherein Fig. 1 to Fig. 4 is a kind of audio-frequency detection provided in an embodiment of the present invention
Flow diagram, Fig. 5 are strong regular movements sense schematic diagram provided in an embodiment of the present invention, and Fig. 6 is weak rule provided in an embodiment of the present invention
Dynamic schematic diagram.The described method includes:
Step 101, it treats acoustic frequency and carries out audio signal separation, to obtain harmonic signal and the impact of the audio to be measured
Signal.
For example, the separation of the harmonic wave of audio, impulse source (Harmonic-Percussive Source Separation,
HPSS) it is a kind of common preprocessing means, can be used for harmonic source and impulse source in separating audio signals.Wherein, music
Etc. audio signals two kinds of distribution forms are typically exhibited out on spectrogram, it is another one is being distributed along time shaft continuously smooth
It is to be distributed along frequency axis continuously smooth, usually the source of sound by above two distribution is referred to as harmonic source and impulse source.Musical instrument can
It is divided into orchestra and percussion instrument.The source of sound that orchestra generates generally relatively is releived, and is continuously connected between sound and sound, in frequency spectrum
Smooth envelope is shown as on figure.The source of sound that percussion instrument generates generally has strong timing, has between sound and sound larger
Span, vertical envelope is shown as on spectrogram.Therefore, on spectrogram, by the sound of the generations such as orchestra releived
Source is commonly referred to as harmonic source, and the source of sound of the strong timing of the generations such as percussion instrument is commonly referred to as impulse source.
In the embodiment of the present invention, can use harmonic wave, impact source separation method treat acoustic frequency carry out audio signal separation,
To obtain the harmonic signal and impact signal of the audio to be measured.
In some embodiments, as shown in Fig. 2, step 101 can be realized by step 1011 to step 1012, specifically
Are as follows:
Step 1011, Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, with
Obtain the sonograph of the audio to be measured;
Step 1012, median filtering is carried out respectively along the time orientation of the sonograph and frequency direction, described in obtaining
The harmonic signal and impact signal of sonograph, wherein the harmonic signal is to carry out median filtering along the time orientation to obtain
Signal, the impact signal is to carry out the obtained signal of median filtering along frequency direction.
Wherein, after audio to be measured being sampled according to predeterminated frequency, according still further to default frame length and preset step-length to institute
State audio to be measured and carry out Short Time Fourier Transform, to obtain the sonograph of the audio to be measured, such as firstly, by audio to be measured with
After 44100 sample rates are read, and it is frame length with 1024, carries out Short Time Fourier Transform (short-time with 441 for step-length
Fourier transform, STFT), obtain the STFT sonograph of the audio to be measured.Then, along the sonograph when it is m-
Frequency both direction carries out median filtering respectively, can be obtained the audio original signal to be measured the portion Harmonic and
The portion Percussive.Wherein, it filters to obtain harmonic wave (portion Harmonic) signal along time orientation, correspond to continuous in audio to be measured
Part.Filtered along frequency direction and impacted (portion Percussive) signal, correspond to audio to be measured in have strike sense or
Person impacts the part of sense.
In some embodiments, as shown in figure 3, step 1012 can be realized by step 10121 to step 10123,
Specifically:
Step 10121, first time median filtering is carried out respectively along the time orientation of the sonograph and frequency direction, to obtain
Take the first harmonic signal and the first impact signal of the sonograph;
Step 10122, the first harmonic signal in the sonograph is removed, to obtain being made of first impact signal
Target sonograph;
Step 10123, second of median filtering is carried out respectively along the time orientation of the target sonograph and frequency direction,
To obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the second of the target sonograph is humorous
Wave signal and the second impact signal constitute the harmonic signal and impact signal of the audio to be measured.
Wherein, audio harmonic wave, impulse source separation (Harmonic-Percussive Source Separation,
HPSS it) can also be expressed as H-P separation, the portion Harmonic and the portion Percussive isolated by H-P can respectively indicate
For the portion H and the portion P, wherein the portion H corresponds to harmonic signal, and the portion P corresponds to impact signal.
For example, doing a H-P separation, the i.e. time orientation along the sonograph to the sonograph of the audio to be measured first
First time median filtering is carried out respectively with frequency direction, to obtain first harmonic signal (portion H) and the first punching of the sonograph
Hit signal (portion P).Then abandon the portion H and only stay the portion P, that is, remove the first harmonic signal (portion H) in the sonograph, with obtain by
The target sonograph of first impact signal (portion the P) composition.Then a H-P separation is done to the portion P again, extracts newly obtain again
The portion P, i.e., second of median filtering is carried out respectively along the time orientation of the target sonograph and frequency direction, described in obtaining
The second harmonic signal (portion H newly obtained) of target sonograph and the second impact signal (portion P newly obtained), wherein the mesh
The second harmonic signal and the second impact signal of marking sonograph constitute the harmonic signal and impact signal of the audio to be measured.This
When, the sonograph of audio to be measured is after the separation of H-P twice, and the continuity sound that the portion P newly obtained includes is seldom, big portion
Sub-signal is all signal that thwack hits sense or thump sense, such as drum sound, keyboard knock, gong sound etc., can effectively be divided
Separate out harmonic signal and impact signal.
Step 102, the Meier frequency spectrum of the impact signal is obtained.
Wherein, in order to more meet the auditory perception of the mankind, impact signal obtained in step 101 can be converted into Meier
(Mel) scale frequency spectrum.Such as it can use Mel frequency cepstral coefficient (Mel-Frequency Coefficients, MFCC) and incite somebody to action
The impact signal is converted to Mel scale frequency spectrum, and the Meier frequency spectrum of the impact signal is obtained with this.
Wherein, Mel frequency cepstral coefficient is the dimensions in frequency divided according to human hearing characteristic.Mel frequency and practical frequency
Relationship between rate can be indicated with following formula:
Mel (f)=2569log (1+f/700), wherein f indicates the actual frequency of the impact signal.
When frequency is in 1000Hz or less, the hearing ability of human ear linearly increases with sound frequency, when frequency exists
When 1000Hz or more, the hearing ability and sound frequency of human ear are in log series model.Therefore, according to this corresponding relationship to actual frequency
Carry out frequency band division, a series of filter sequence of available triangles, referred to as Mel filter group.For example, taking maximum frequency
Rate is the Mel frequency spectrum that 1000Hz calculates the impact signal, and Mel frequency band number is 128.
Step 103, the starting envelope of the impact signal is calculated according to the Meier frequency spectrum.
The starting envelope (onset envelope) of the impact signal is calculated the Mel frequency spectrum of the impact signal, i.e.,
The envelope of onset point, wherein onset refers in audio the starting point of " event ", and the envelope of onset point refers in audio " thing
The line of the starting point of part ".For example, can use envelope detection (envelope-demodulation) device calculates the impact
The onset envelope of signal, the peak point for being converted to the impact signal of Mel frequency spectrum carry out line, due to impact letter at this time
It number only include strong striking point, therefore obtained onset envelope is actually a series of peak value line of strong striking points.
Step 104, the autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal.
Wherein it is possible to by the local autocorrelation function for the starting envelope for calculating the impact signal, to obtain described rise
The autocorrelation velocity spectrogram (onset envelope tempogram) of beginning envelope.The autocorrelative reason in selection part is song
Regular movements sense variation in complete bent range may be larger, and global calculation auto-correlation function cannot correctly depict the regular movements of music
Sense, and local calculation auto-correlation function can the more acurrate regular movements sense for depicting music.
In some embodiments, as shown in figure 4, step 104 can be realized by step 1041 to step 1042, specifically
Are as follows:
Step 1041, sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal, it is multiple to obtain
Each local segment is divided into multiple framings according still further to preset step-length by local segment;
Step 1042, it will be counted in the corresponding multiple framings input local autocorrelation functions of each local segment
It calculates, to obtain the autocorrelation velocity spectrogram of the starting envelope.
For example, the duration of local segment can be 8.9s, the step-length of framing can be 0.01s.
Wherein, according to time framing and calculate it is that local autocorrelation function obtains the result is that one 2 dimension matrix, referred to as
Tempogram, the tempogram are used to indicate the autocorrelation velocity spectrogram of the starting envelope.
Step 105, according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense of the audio to be measured is determined
Intensity value.
Wherein, so-called regular movements sense shows on percussion instrument to be exactly that idiophonic regular response and its response are strong
The comprehensive function of degree as a result, such as rhythm clear and definite, the stronger percussion instrument of response can bring strong regular movements sense to melody, and
The percussion instrument that rhythm is indistinct, response is weaker then gives the regular movements sense of people not strong.The embodiment of the present invention converts the calculating of regular movements sense
For the calculating to striking point regularity and intensity.
It in some embodiments, can be corresponding certainly by obtaining multiple local frames described in the autocorrelation velocity spectrogram
Related mean value, and time high peak-to-peak value in the corresponding auto-correlation mean value of the multiple local frame is extracted, it is described to be measured to be determined as
The regular movements sense intensity value of audio.
For example, taking the auto-correlation mean value of each local segment in tempogram, and taking its second high peak-to-peak value is audio
Regular movements sense intensity value.Wherein, the numerical value of the regular movements sense intensity value is normalization peak value, and the range that theoretically numerical value can take is
0~1, the actually numerical value is usually no more than 0.8.When the numerical value is higher than 0.2, general subjectivity, which can acoustically experience audio, to be had
Stronger regular movements sense.
For example, Fig. 5 shows that the autocorrelation velocity spectrogram of typical strong regular movements sense, Fig. 6 show typical weak rule
Dynamic autocorrelation velocity spectrogram.Wherein, in autocorrelation velocity spectrogram, abscissa is the amplitude that signal deviates to the right, ordinate
It is the relevance values of original signal and shifted signal after deviating, autocorrelative calculating is that signal itself deviates to the right certain amplitude, so
Calculate the relevance values of shifted signal and original signal again afterwards.
Above-mentioned all technical solutions can form alternative embodiment of the invention using any combination, not another herein
One repeats.
The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured
Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum
Number starting envelope, and according to the starting envelope of the impact signal obtain it is described starting envelope autocorrelation velocity spectrogram, then
According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.The present invention is real
Example is applied by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, provides the regular movements of audio fragment
Feel intensity value, the regular movements sense intensity of objective value audio gauge can be used, so that the regular movements sense intensity value provided more meets user
Auditory perception.
Referring to Fig. 7, wherein, Fig. 7 is a kind of another process signal of audio-frequency detection provided in an embodiment of the present invention
Figure.The described method includes:
Step 201, it treats acoustic frequency and carries out audio signal separation, to obtain harmonic signal and the impact of the audio to be measured
Signal.
For example, the separation of the harmonic wave of audio, impulse source (Harmonic-Percussive Source Separation,
HPSS) it is a kind of common preprocessing means, can be used for harmonic source and impulse source in separating audio signals.Wherein, music
Etc. audio signals two kinds of distribution forms are typically exhibited out on spectrogram, it is another one is being distributed along time shaft continuously smooth
It is to be distributed along frequency axis continuously smooth, usually the source of sound by above two distribution is referred to as harmonic source and impulse source.Musical instrument can
It is divided into orchestra and percussion instrument.The source of sound that orchestra generates generally relatively is releived, and is continuously connected between sound and sound, in frequency spectrum
Smooth envelope is shown as on figure.The source of sound that percussion instrument generates generally has strong timing, has between sound and sound larger
Span, vertical envelope is shown as on spectrogram.Therefore, on spectrogram, by the sound of the generations such as orchestra releived
Source is commonly referred to as harmonic source, and the source of sound of the strong timing of the generations such as percussion instrument is commonly referred to as impulse source.
In the embodiment of the present invention, can use harmonic wave, impact source separation method treat acoustic frequency carry out audio signal separation,
To obtain the harmonic signal and impact signal of the audio to be measured.
In some embodiments, the acoustic frequency for the treatment of carries out audio signal separation, to obtain the humorous of the audio to be measured
Wave signal and impact signal, comprising:
Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, with obtain it is described to
The sonograph of acoustic frequency;
Time orientation and frequency direction along the sonograph carry out median filtering respectively, to obtain the humorous of the sonograph
Wave signal and impact signal, wherein the harmonic signal is that the signal that median filtering obtains is carried out along the time orientation, described
Impact signal is to carry out the signal that median filtering obtains along frequency direction.
Wherein, after audio to be measured being sampled according to predeterminated frequency, according still further to default frame length and preset step-length to institute
State audio to be measured and carry out Short Time Fourier Transform, to obtain the sonograph of the audio to be measured, such as firstly, by audio to be measured with
After 44100 sample rates are read, and it is frame length with 1024, carries out Short Time Fourier Transform (short-time with 441 for step-length
Fourier transform, STFT), obtain the STFT sonograph of the audio to be measured.Then, along the sonograph when it is m-
Frequency both direction carries out median filtering respectively, can be obtained the audio original signal to be measured the portion Harmonic and
The portion Percussive.Wherein, it filters to obtain harmonic wave (portion Harmonic) signal along time orientation, correspond to continuous in audio to be measured
Part.Filtered along frequency direction and impacted (portion Percussive) signal, correspond to audio to be measured in have strike sense or
Person impacts the part of sense.
In some embodiments, the time orientation along the sonograph and frequency direction carry out median filtering respectively,
To obtain the harmonic signal and impact signal of the sonograph, comprising:
Time orientation and frequency direction along the sonograph carry out first time median filtering respectively, to obtain the sound spectrum
The first harmonic signal and the first impact signal of figure;
The first harmonic signal in the sonograph is removed, to obtain the target sound spectrum being made of first impact signal
Figure;
Time orientation and frequency direction along the target sonograph carry out second of median filtering respectively, described in obtaining
The second harmonic signal and the second impact signal of target sonograph, wherein the second harmonic signal of the target sonograph and
Two impact signals constitute the harmonic signal and impact signal of the audio to be measured.
Wherein, audio harmonic wave, impulse source separation (Harmonic-Percussive Source Separation,
HPSS it) can also be expressed as H-P separation, the portion Harmonic and the portion Percussive isolated by H-P can respectively indicate
For the portion H and the portion P, wherein the portion H corresponds to harmonic signal, and the portion P corresponds to impact signal.
For example, doing a H-P separation, the i.e. time orientation along the sonograph to the sonograph of the audio to be measured first
First time median filtering is carried out respectively with frequency direction, to obtain first harmonic signal (portion H) and the first punching of the sonograph
Hit signal (portion P).Then abandon the portion H and only stay the portion P, that is, remove the first harmonic signal (portion H) in the sonograph, with obtain by
The target sonograph of first impact signal (portion the P) composition.Then a H-P separation is done to the portion P again, extracts newly obtain again
The portion P, i.e., second of median filtering is carried out respectively along the time orientation of the target sonograph and frequency direction, described in obtaining
The second harmonic signal (portion H newly obtained) of target sonograph and the second impact signal (portion P newly obtained), wherein the mesh
The second harmonic signal and the second impact signal of marking sonograph constitute the harmonic signal and impact signal of the audio to be measured.This
When, the sonograph of audio to be measured is after the separation of H-P twice, and the continuity sound that the portion P newly obtained includes is seldom, big portion
Sub-signal is all signal that thwack hits sense or thump sense, such as drum sound, keyboard knock, gong sound etc., can effectively be divided
Separate out harmonic signal and impact signal.
Step 202, the Meier frequency spectrum of the impact signal is obtained.
Wherein, in order to more meet the auditory perception of the mankind, impact signal obtained in step 201 can be converted into Meier
(Mel) scale frequency spectrum.Such as it can use Mel frequency cepstral coefficient (Mel-FrequencyCoefficients, MFCC) for institute
It states impact signal and is converted to Mel scale frequency spectrum, the Meier frequency spectrum of the impact signal is obtained with this.
Wherein, Mel frequency cepstral coefficient is the dimensions in frequency divided according to human hearing characteristic.Mel frequency and practical frequency
Relationship between rate can be indicated with following formula:
Mel (f)=2569log (1+f/700), wherein f indicates the actual frequency of the impact signal.
When frequency is in 1000Hz or less, the hearing ability of human ear linearly increases with sound frequency, when frequency exists
When 1000Hz or more, the hearing ability and sound frequency of human ear are in log series model.Therefore, according to this corresponding relationship to actual frequency
Carry out frequency band division, a series of filter sequence of available triangles, referred to as Mel filter group.For example, taking maximum frequency
Rate is the Mel frequency spectrum that 1000Hz calculates the impact signal, and Mel frequency band number is 128.
Step 203, the starting envelope of the impact signal is calculated according to the Meier frequency spectrum.
The starting envelope (onset envelope) of the impact signal is calculated the Mel frequency spectrum of the impact signal, i.e.,
The envelope of onset point, wherein onset refers in audio the starting point of " event ", and the envelope of onset point refers in audio " thing
The line of the starting point of part ".For example, can use envelope detection (envelope-demodulation) device calculates the impact
The onset envelope of signal, the peak point for being converted to the impact signal of Mel frequency spectrum carry out line, due to impact letter at this time
It number only include strong striking point, therefore obtained onset envelope is actually a series of peak value line of strong striking points.
Step 204, processing is filtered to the starting envelope of the impact signal, to filter out numerical value in the starting envelope
Less than the signaling point of threshold value.
Wherein, the onset envelope obtained from step 203 still include it is some can be with ignored weak response point, this
Although a little weak response points are not the principal elements for influencing music mode innervation, subsequent calculate can be had an impact, it therefore, can be by
The weak response point in the starting envelope of the impact signal is filtered out according to certain threshold value.For example, selection signal top
0.2 at be threshold value, eliminate it is described starting envelope in numerical value be less than threshold value weak response point.
Step 205, according to the starting envelope of the impact signal after the filtration treatment, the speed of the starting envelope is obtained
Spectrogram.
Wherein, after step 204 filters out weak response point, the remaining point of onset envelope is all to have relatively strong ring
The rhythm point answered can be by the local autocorrelation function of the starting envelope of the calculating impact signal, to obtain in this step
The autocorrelation velocity spectrogram (onset envelope tempogram) of the starting envelope.The autocorrelative reason in selection part
It is that regular movements sense variation of the song in complete bent range may be larger, and global calculation auto-correlation function cannot correctly depict music
Regular movements sense, and local calculation auto-correlation function can the more acurrate regular movements sense for depicting music.
In some embodiments, the starting envelope according to the impact signal after the filtration treatment obtains described rise
The speed spectrogram of beginning envelope, comprising:
Sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal after the filtration treatment, it is more to obtain
Each local segment is divided into multiple framings according still further to preset step-length by a local segment;
It will be calculated in the corresponding multiple framings input local autocorrelation functions of each local segment, to obtain
State the autocorrelation velocity spectrogram of starting envelope.
For example, the duration of local segment can be 8.9s, the step-length of framing can be 0.01s.
Wherein, according to time framing and calculate it is that local autocorrelation function obtains the result is that one 2 dimension matrix, referred to as
Tempogram, the tempogram are used to indicate the autocorrelation velocity spectrogram of the starting envelope.
Step 206, according to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense of the audio to be measured is determined
Intensity value.
Wherein, so-called regular movements sense shows on percussion instrument to be exactly that idiophonic regular response and its response are strong
The comprehensive function of degree as a result, such as rhythm clear and definite, the stronger percussion instrument of response can bring strong regular movements sense to melody, and
The percussion instrument that rhythm is indistinct, response is weaker then gives the regular movements sense of people not strong.The embodiment of the present invention converts the calculating of regular movements sense
For the calculating to striking point regularity and intensity.
It in some embodiments, can be corresponding certainly by obtaining multiple local frames described in the autocorrelation velocity spectrogram
Related mean value, and time high peak-to-peak value in the corresponding auto-correlation mean value of the multiple local frame is extracted, it is described to be measured to be determined as
The regular movements sense intensity value of audio.
For example, taking the auto-correlation mean value of each local segment in tempogram, and taking its second high peak-to-peak value is audio
Regular movements sense intensity value.Wherein, the numerical value of the regular movements sense intensity value is normalization peak value, and the range that theoretically numerical value can take is
0~1, the actually numerical value is usually no more than 0.8.When the numerical value is higher than 0.2, general subjectivity, which can acoustically experience audio, to be had
Stronger regular movements sense.
In some embodiments, also otherwise comprehensive part is autocorrelative as a result, for example taking in onsetenvelope
Maximum value, take minimum value or other temporal voting strategies etc. in onset envelope.Then it is otherwise obtained from above-mentioned
Signal in obtain regular movements sense intensity value, for example, take the mean value at N number of peak of TOP, time peak is asked after normalizing by peak-peak again
Peak value etc., then using time high peak-to-peak value of acquirement as the regular movements sense intensity value of audio to be measured.In addition, strong in analysis audio
In the regularity and intensity process that striking point or thump point occur, the parameter of algorithm can be finely tuned, for example, window length, step-length,
Mel number of filter, cutoff frequency etc. more accurately provide the regular movements sense intensity value of audio fragment with this.
Step 207, according to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured.
For example, music can be divided into multiple music types according to different regular movements sense intensity values, for example it is divided into light
Music and DJ music etc., or be divided into stroll music, walking music, music of jogging, speed and run music etc..Every a piece of music in addition to
It marks except music type, the beat point of music can also be recorded.
Step 208, audio is generated according to the audio classification result of multiple audios to be measured and current voice applications scene
Recommend inventory.
For example, mobile terminal can pass through fortune when mobile terminal detects the voice applications scene for being currently at running
The paces frequency of dynamic sensor senses user, then chooses music beat point most from the audio classification result of multiple audios to be measured
Former songs close to the paces frequency of user recommend inventory to recommend to user as music.
Above-mentioned all technical solutions can form alternative embodiment of the invention using any combination, not another herein
One repeats.
The embodiment of the present invention carries out audio signal separation by treating acoustic frequency, to obtain the harmonic wave letter of the audio to be measured
Number and impact signal, and obtain the Meier frequency spectrum of the impact signal, the impact letter then calculated according to the Meier frequency spectrum
Number starting envelope, and processing is filtered to the starting envelope of the impact signal, to filter out numerical value in the starting envelope
Less than the signaling point of threshold value, the starting envelope is then obtained according to the starting envelope of the impact signal after the filtration treatment
Autocorrelation velocity spectrogram determines the regular movements of the audio to be measured further according to time high peak-to-peak value in the autocorrelation velocity spectrogram
Feel intensity value, and according to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured, then basis
The audio classification result of multiple audios to be measured and current voice applications scene generate audio and recommend inventory.The embodiment of the present invention
By the regularity and intensity of strong striking point or the appearance of thump point in analysis audio, the regular movements sense for providing audio fragment is strong
Angle value can use the regular movements sense intensity of objective value audio gauge, so that the regular movements sense intensity value provided more meets listening for user
Feel impression, and regular movements sense intensity index can be used as the important feature that the music of a variety of music applications such as running radio station is recommended.
The embodiment of the present invention also provides a kind of audio detection device, and as shown in Figs. 8 to 11, Fig. 8 to Figure 11 is this hair
A kind of structural schematic diagram for audio detection device that bright embodiment provides.The audio detection device 30 may include Signal separator
Module 31, first obtains module 32, and computing module 33, second obtains module 35 and determining module 36.
Wherein, the signal separation module 31 carries out audio signal separation for treating acoustic frequency, described to be measured to obtain
The harmonic signal and impact signal of audio;
Described first obtains module 32, for obtaining the Meier frequency spectrum of the impact signal;
The computing module 33, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum;
Described second obtains module 35, obtains oneself of the starting envelope for the starting envelope according to the impact signal
Relevant speed spectrogram;
The determining module 36, for determining described to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram
The regular movements sense intensity value of audio.
In some embodiments, as shown in figure 9, the signal separation module 31 includes:
Transformation submodule 311, for being carried out in Fu in short-term according to default frame length and preset step-length to the audio to be measured
Leaf transformation, to obtain the sonograph of the audio to be measured;
Submodule 312 is filtered, carries out median filtering respectively with frequency direction for the time orientation along the sonograph, with
Obtain the harmonic signal and impact signal of the sonograph, wherein the harmonic signal is to carry out intermediate value along the time orientation
Obtained signal is filtered, the impact signal is to carry out the signal that median filtering obtains along frequency direction.
In some embodiments, as shown in Figure 10, the filtering submodule 312 includes:
First filter unit 3121 carries out in first time for the time orientation along the sonograph with frequency direction respectively
Value filtering, to obtain the first harmonic signal and the first impact signal of the sonograph;
Removal unit 3122, for removing the first harmonic signal in the sonograph, to obtain being impacted by described first
The target sonograph of signal composition;
Second filter unit 3123 carries out second with frequency direction for the time orientation along the target sonograph respectively
Secondary median filtering, to obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the target sound spectrum
The second harmonic signal and the second impact signal of figure constitute the harmonic signal and impact signal of the audio to be measured.
In some embodiments, as shown in figure 11, the second acquisition module 35 includes:
Framing submodule 351, for carrying out sub-frame processing according to starting envelope of the preset duration to the impact signal, with
Multiple local segments are obtained, each local segment is divided into multiple framings according still further to preset step-length;
Computational submodule 352, for will the corresponding multiple framings input local autocorrelation functions of each local segment
In calculated, with obtain it is described starting envelope autocorrelation velocity spectrogram.
In some embodiments, the determining module 36 is also used to according to each described in the autocorrelation velocity spectrogram
Time high peak-to-peak value of the auto-correlation mean value of local segment, is determined as the regular movements sense intensity value of the audio to be measured.
Audio detection device 30 provided in an embodiment of the present invention treats acoustic frequency by signal separation module 31 and carries out audio
Signal separator, to obtain the harmonic signal and impact signal of the audio to be measured, the first acquisition module 32 obtains the impact letter
Number Meier frequency spectrum, computing module 33 calculates the starting envelope of the impact signal according to the Meier frequency spectrum, and second obtains mould
Block 35 obtains the autocorrelation velocity spectrogram of the starting envelope, 36 basis of determining module according to the starting envelope of the impact signal
Time high peak-to-peak value in the autocorrelation velocity spectrogram, determines the regular movements sense intensity value of the audio to be measured.The embodiment of the present invention
The audio detection device 30 of offer is given by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio
The regular movements sense intensity value of audio fragment out can use the regular movements sense intensity of objective value audio gauge, so that the regular movements sense provided
Intensity value more meets the auditory perception of user.
In some embodiments, as shown in figure 12, Figure 12 is a kind of audio detection device provided in an embodiment of the present invention
Another structural schematic diagram.The audio detection device 30 may include signal separation module 31, and first obtains module 32, calculate mould
Block 33, filtering module 34, second obtains module 35, determining module 36, categorization module 37 and generation module 38.
Wherein, the signal separation module 31 carries out audio signal separation for treating acoustic frequency, described to be measured to obtain
The harmonic signal and impact signal of audio;
Described first obtains module 32, for obtaining the Meier frequency spectrum of the impact signal;
The computing module 33, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum;
The filtering module 34 is filtered processing for the starting envelope to the impact signal, to filter out described rise
Numerical value is less than the signaling point of threshold value in beginning envelope
Described second obtains module 35, for the starting envelope according to the impact signal after the filtration treatment, obtains institute
State the speed spectrogram of starting envelope;
The determining module 36, for determining described to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram
The regular movements sense intensity value of audio;
The categorization module 37 carries out the audio to be measured for the regular movements sense intensity value according to the audio to be measured
Audio classification;
The generation module 38, for according to multiple audios to be measured audio classification result and current voice applications field
Scape generates audio and recommends inventory.
Above-mentioned all technical solutions can form alternative embodiment of the invention using any combination, not another herein
One repeats.
The audio detection device 30 of the embodiment of the present invention.Acoustic frequency, which is treated, by signal separation module 31 carries out audio signal
Separation, to obtain the harmonic signal and impact signal of the audio to be measured, the first acquisition module 32 obtains the impact signal
Meier frequency spectrum, computing module 33 calculate the starting envelope of the impact signal, 34 pairs of institutes of filtering module according to the Meier frequency spectrum
The starting envelope for stating impact signal is filtered processing, to filter out the signaling point that numerical value in the starting envelope is less than threshold value, the
Two acquisition modules 35 obtain the auto-correlation speed of the starting envelope according to the starting envelope of the impact signal after the filtration treatment
Spectrogram is spent, determining module 36 determines the regular movements of the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram
Feel intensity value, categorization module 37 carries out audio classification to the audio to be measured according to the regular movements sense intensity value of the audio to be measured,
Generation module 38, which generates audio according to the audio classification result and current voice applications scene of multiple audios to be measured, to be recommended clearly
It is single.Audio detection device 30 provided in an embodiment of the present invention passes through the strong striking point or the appearance of thump point analyzed in audio
Regularity and intensity, provide the regular movements sense intensity value of audio fragment, can use the regular movements sense intensity of objective value audio gauge, make
The regular movements sense intensity value that must be provided more meets the auditory perception of user, and regular movements sense intensity index can be used as running radio station etc.
The important feature that the music of a variety of music applications is recommended.
The embodiment of the present invention also provides a kind of server, and as shown in figure 13, it illustrates involved in the embodiment of the present invention
The structural schematic diagram of server, specifically:
The server may include one or processor 401, one or more meters of more than one processing core
The components such as memory 402, power supply 403 and the input unit 404 of calculation machine readable storage medium storing program for executing.Those skilled in the art can manage
It solves, server architecture shown in Figure 13 does not constitute the restriction to server, may include than illustrating more or fewer portions
Part perhaps combines certain components or different component layouts.Wherein:
Processor 401 is the control centre of the server, utilizes each of various interfaces and the entire server of connection
Part by running or execute the software program and/or module that are stored in memory 402, and calls and is stored in memory
Data in 402, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate
Managing device 401 may include one or more processing cores;Preferably, processor 401 can integrate application processor and modulatedemodulate is mediated
Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main
Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401.
Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation
Software program and module, thereby executing various function application and data processing.Memory 402 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to server
Deng.In addition, memory 402 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least
One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also include
Memory Controller, to provide access of the processor 401 to memory 402.
Server further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management system
It unites logically contiguous with processor 401, to realize the function such as management charging, electric discharge and power managed by power-supply management system
Energy.Power supply 403 can also include one or more direct current or AC power source, recharging system, power failure monitor electricity
The random components such as road, power adapter or inverter, power supply status indicator.
The server may also include input unit 404, which can be used for receiving the number or character letter of input
Breath, and generation keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal are defeated
Enter.
Although being not shown, server can also be including display unit etc., and details are not described herein.Specifically in the present embodiment,
Processor 401 in server can according to following instruction, by the process of one or more application program is corresponding can
It executes file to be loaded into memory 402, and runs the application program being stored in memory 402 by processor 401, thus
Realize various functions, as follows:
It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured;It obtains
Take the Meier frequency spectrum of the impact signal;The starting envelope of the impact signal is calculated according to the Meier frequency spectrum;According to described
The starting envelope of impact signal obtains the autocorrelation velocity spectrogram of the starting envelope;According in the autocorrelation velocity spectrogram
Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured.
The above operation is for details, reference can be made to the embodiment of front, and therefore not to repeat here.
From the foregoing, it will be observed that server provided in this embodiment, treat acoustic frequency and carry out audio signal separation, with obtain it is described to
The harmonic signal and impact signal of acoustic frequency, and the Meier frequency spectrum of the impact signal is obtained, then according to the Meier frequency spectrum
Calculate the starting envelope of the impact signal, and according to the starting envelope of the impact signal obtain the starting envelope from phase
Speed spectrogram is closed, further according to time high peak-to-peak value in the autocorrelation velocity spectrogram, determines that the regular movements sense of the audio to be measured is strong
Angle value.The embodiment of the present invention is provided by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio
The regular movements sense intensity value of audio fragment, so that the regular movements sense intensity value provided more meets the auditory perception of user.
Correspondingly, the embodiment of the present invention also provides a kind of terminal, as shown in figure 14, the terminal may include radio frequency (RF,
Radio Frequency) circuit 501, the memory 502, defeated that includes one or more computer readable storage medium
Enter unit 503, display unit 504, sensor 505, voicefrequency circuit 506, Wireless Fidelity (WiFi, Wireless Fidelity)
The components such as module 507, the processor 508 for including one or more than one processing core and power supply 509.This field skill
Art personnel are appreciated that the restriction of the not structure paired terminal of terminal structure shown in Figure 14, may include than illustrate it is more or
Less component perhaps combines certain components or different component layouts.Wherein:
RF circuit 501 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station
After downlink information receives, one or the processing of more than one processor 508 are transferred to;In addition, the data for being related to uplink are sent to
Base station.In general, RF circuit 501 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses
Family identity module (SIM, Subscriber Identity Module) card, transceiver, coupler, low-noise amplifier
(LNA, Low Noise Amplifier), duplexer etc..In addition, RF circuit 501 can also by wireless communication with network and its
He communicates equipment.Any communication standard or agreement, including but not limited to global system for mobile telecommunications system can be used in the wireless communication
Unite (GSM, Global System of Mobile communication), general packet radio service (GPRS, General
Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division it is more
Location (WCDMA, Wideband Code Division Multiple Access), long term evolution (LTE, Long Term
Evolution), Email, short message service (SMS, Short Messaging Service) etc..
Memory 502 can be used for storing software program and module, and processor 508 is stored in memory 502 by operation
Software program and module, thereby executing various function application and data processing.Memory 502 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to terminal
(such as audio data, phone directory etc.) etc..In addition, memory 502 may include high-speed random access memory, can also include
Nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.Phase
Ying Di, memory 502 can also include Memory Controller, to provide processor 508 and input unit 503 to memory 502
Access.
Input unit 503 can be used for receiving the number or character information of input, and generate and user setting and function
Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, in a specific embodiment
In, input unit 503 may include touch sensitive surface and other input equipments.Touch sensitive surface, also referred to as touch display screen or touching
Control plate, collect user on it or nearby touch operation (such as user using any suitable object such as finger, stylus or
Operation of the attachment on touch sensitive surface or near touch sensitive surface), and corresponding connection dress is driven according to preset formula
It sets.Optionally, touch sensitive surface may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus is examined
The touch orientation of user is surveyed, and detects touch operation bring signal, transmits a signal to touch controller;Touch controller from
Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then gives processor 508, and can reception processing
Order that device 508 is sent simultaneously is executed.Furthermore, it is possible to a variety of using resistance-type, condenser type, infrared ray and surface acoustic wave etc.
Type realizes touch sensitive surface.In addition to touch sensitive surface, input unit 503 can also include other input equipments.Specifically, other are defeated
Entering equipment can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse
One of mark, operating stick etc. are a variety of.
Display unit 504 can be used for showing information input by user or be supplied to user information and terminal it is various
Graphical user interface, these graphical user interface can be made of figure, text, icon, video and any combination thereof.Display
Unit 504 may include display panel, optionally, can using liquid crystal display (LCD, Liquid Crystal Display),
The forms such as Organic Light Emitting Diode (OLED, Organic Light-Emitting Diode) configure display panel.Further
, touch sensitive surface can cover display panel, after touch sensitive surface detects touch operation on it or nearby, send processing to
Device 508 is followed by subsequent processing device 508 and is provided on a display panel accordingly according to the type of touch event to determine the type of touch event
Visual output.Although touch sensitive surface and display panel are to realize input and defeated as two independent components in Figure 14
Enter function, but in some embodiments it is possible to touch sensitive surface and display panel is integrated and realizes and outputs and inputs function.
Terminal may also include at least one sensor 505, such as optical sensor, motion sensor and other sensors.
Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light
Light and shade adjust the brightness of display panel, proximity sensor can close display panel and/or back when terminal is moved in one's ear
Light.As a kind of motion sensor, gravity accelerometer can detect (generally three axis) acceleration in all directions
Size can detect that size and the direction of gravity when static, can be used to identify mobile phone posture application (such as horizontal/vertical screen switching,
Dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, strike) etc.;It can also configure as terminal
The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.
Voicefrequency circuit 506, loudspeaker, microphone can provide the audio interface between user and terminal.Voicefrequency circuit 506 can
By the electric signal after the audio data received conversion, it is transferred to loudspeaker, voice signal output is converted to by loudspeaker;It is another
The voice signal of collection is converted to electric signal by aspect, microphone, is converted to audio data after being received by voicefrequency circuit 506, then
After the processing of audio data output processor 508, it is sent to such as another terminal through RF circuit 501, or by audio data
Output is further processed to memory 502.Voicefrequency circuit 506 is also possible that earphone jack, with provide peripheral hardware earphone with
The communication of terminal.
WiFi belongs to short range wireless transmission technology, and terminal can help user's transceiver electronics postal by WiFi module 507
Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 14 is shown
WiFi module 507, but it is understood that, and it is not belonging to must be configured into for terminal, it can according to need do not changing completely
Become in the range of the essence of invention and omits.
Processor 508 is the control centre of terminal, using the various pieces of various interfaces and connection whole mobile phone, is led to
It crosses operation or executes the software program and/or module being stored in memory 502, and call and be stored in memory 502
Data execute the various functions and processing data of terminal, to carry out integral monitoring to mobile phone.Optionally, processor 508 can wrap
Include one or more processing cores;Preferably, processor 508 can integrate application processor and modem processor, wherein answer
With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication.
It is understood that above-mentioned modem processor can not also be integrated into processor 508.
Terminal further includes the power supply 509 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe
Reason system and processor 508 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system
Etc. functions.Power supply 509 can also include one or more direct current or AC power source, recharging system, power failure inspection
The random components such as slowdown monitoring circuit, power adapter or inverter, power supply status indicator.
Although being not shown, terminal can also include camera, bluetooth module etc., and details are not described herein.Specifically in this implementation
In example, the processor 508 in terminal can be corresponding by the process of one or more application program according to following instruction
Executable file is loaded into memory 502, and the application program being stored in memory 502 is run by processor 508, from
And realize various functions:
It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured;It obtains
Take the Meier frequency spectrum of the impact signal;The starting envelope of the impact signal is calculated according to the Meier frequency spectrum;According to described
The starting envelope of impact signal obtains the autocorrelation velocity spectrogram of the starting envelope;According in the autocorrelation velocity spectrogram
Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured.
The above operation is for details, reference can be made to the embodiment of front, and therefore not to repeat here.
From the foregoing, it will be observed that terminal provided in this embodiment, treats acoustic frequency and carries out audio signal separation, it is described to be measured to obtain
The harmonic signal and impact signal of audio, and the Meier frequency spectrum of the impact signal is obtained, then according to the Meier spectrometer
The starting envelope of the impact signal is calculated, and obtains the auto-correlation of the starting envelope according to the starting envelope of the impact signal
Speed spectrogram determines the regular movements sense intensity of the audio to be measured further according to time high peak-to-peak value in the autocorrelation velocity spectrogram
Value.The embodiment of the present invention provides sound by the regularity and intensity of strong striking point or the appearance of thump point in analysis audio
The regular movements sense intensity value of frequency segment, so that the regular movements sense intensity value provided more meets the auditory perception of user.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with
It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one
In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed
Device is loaded, to execute the step in any audio-frequency detection provided by the embodiment of the present invention.For example, the instruction can
To execute following steps:
It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured;It obtains
Take the Meier frequency spectrum of the impact signal;The starting envelope of the impact signal is calculated according to the Meier frequency spectrum;According to described
The starting envelope of impact signal obtains the autocorrelation velocity spectrogram of the starting envelope;According in the autocorrelation velocity spectrogram
Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory
Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, it can execute and appoint audio detection side provided by the embodiment of the present invention
Step in method, it is thereby achieved that beneficial achieved by any audio-frequency detection provided by the embodiment of the present invention
Effect is detailed in the embodiment of front, and details are not described herein.
It is provided for the embodiments of the invention a kind of audio-frequency detection, device and storage medium above and has carried out detailed Jie
It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only
It is to be used to help understand method and its core concept of the invention;Meanwhile for those skilled in the art, according to the present invention
Thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as
Limitation of the present invention.
Claims (15)
1. a kind of audio-frequency detection, which is characterized in that the described method includes:
It treats acoustic frequency and carries out audio signal separation, to obtain the harmonic signal and impact signal of the audio to be measured;
Obtain the Meier frequency spectrum of the impact signal;
The starting envelope of the impact signal is calculated according to the Meier frequency spectrum;
The autocorrelation velocity spectrogram of the starting envelope is obtained according to the starting envelope of the impact signal;
According to time high peak-to-peak value in the autocorrelation velocity spectrogram, the regular movements sense intensity value of the audio to be measured is determined.
2. audio-frequency detection as described in claim 1, which is characterized in that in the starting packet according to the impact signal
Network obtains before the speed spectrogram of the starting envelope, further includes:
Processing is filtered to the starting envelope of the impact signal, to filter out the letter that numerical value in the starting envelope is less than threshold value
Number point;
The starting envelope according to the impact signal obtains the autocorrelation velocity spectrogram of the starting envelope, comprising:
According to the starting envelope of the impact signal after the filtration treatment, the speed spectrogram of the starting envelope is obtained.
3. audio-frequency detection as described in claim 1, which is characterized in that the acoustic frequency for the treatment of carries out audio signal point
From to obtain the harmonic signal and impact signal of the audio to be measured, comprising:
Short Time Fourier Transform is carried out to the audio to be measured according to default frame length and preset step-length, it is described to acoustic to obtain
The sonograph of frequency;
Time orientation and frequency direction along the sonograph carry out median filtering respectively, to obtain the harmonic wave letter of the sonograph
Number and impact signal, wherein the harmonic signal is to carry out the obtained signal of median filtering, the impact along the time orientation
Signal is to carry out the signal that median filtering obtains along frequency direction.
4. audio-frequency detection as claimed in claim 3, which is characterized in that the time orientation along the sonograph and frequency
Rate direction carries out median filtering respectively, to obtain the harmonic signal and impact signal of the sonograph, comprising:
Time orientation and frequency direction along the sonograph carry out first time median filtering respectively, to obtain the sonograph
First harmonic signal and the first impact signal;
The first harmonic signal in the sonograph is removed, to obtain the target sonograph being made of first impact signal;
Time orientation and frequency direction along the target sonograph carry out second of median filtering respectively, to obtain the target
The second harmonic signal and the second impact signal of sonograph, wherein the second harmonic signal of the target sonograph and the second punching
Hit harmonic signal and impact signal that signal constitutes the audio to be measured.
5. audio-frequency detection as described in claim 1, which is characterized in that the starting envelope according to the impact signal
Obtain the autocorrelation velocity spectrogram of the starting envelope, comprising:
Sub-frame processing is carried out according to starting envelope of the preset duration to the impact signal, to obtain multiple local segments, then is pressed
Each local segment is divided into multiple framings according to preset step-length;
It will be calculated in the corresponding multiple framings input local autocorrelation functions of each local segment, to obtain described rise
The autocorrelation velocity spectrogram of beginning envelope.
6. audio-frequency detection as claimed in claim 5, which is characterized in that described according in the autocorrelation velocity spectrogram
Secondary high peak-to-peak value determines the regular movements sense intensity value of the audio to be measured, comprising:
According to time high peak-to-peak value of the auto-correlation mean value of the local segment each in the autocorrelation velocity spectrogram, it is determined as institute
State the regular movements sense intensity value of audio to be measured.
7. audio-frequency detection as described in claim 1, which is characterized in that the method also includes:
According to the regular movements sense intensity value of the audio to be measured, audio classification is carried out to the audio to be measured;
Audio, which is generated, according to the audio classification result of multiple audios to be measured and current voice applications scene recommends inventory.
8. a kind of audio detection device, which is characterized in that described device includes:
Signal separation module carries out audio signal separation for treating acoustic frequency, to obtain the harmonic signal of the audio to be measured
With impact signal;
First obtains module, for obtaining the Meier frequency spectrum of the impact signal;
Computing module, for calculating the starting envelope of the impact signal according to the Meier frequency spectrum;
Second obtains module, and the autocorrelation velocity spectrum of the starting envelope is obtained for the starting envelope according to the impact signal
Figure;
Determining module, for determining the regular movements of the audio to be measured according to time high peak-to-peak value in the autocorrelation velocity spectrogram
Feel intensity value.
9. audio detection device as claimed in claim 8, which is characterized in that described device further include:
Filtering module is filtered processing for the starting envelope to the impact signal, to filter out number in the starting envelope
Value is less than the signaling point of threshold value;
Described second obtains module, is also used to the starting envelope according to the impact signal after the filtration treatment, obtains described rise
The speed spectrogram of beginning envelope.
10. audio detection device as claimed in claim 8, which is characterized in that the signal separation module includes:
Transformation submodule, for carrying out Short Time Fourier Transform to the audio to be measured according to default frame length and preset step-length,
To obtain the sonograph of the audio to be measured;
Submodule is filtered, median filtering is carried out respectively with frequency direction for the time orientation along the sonograph, to obtain
State the harmonic signal and impact signal of sonograph, wherein the harmonic signal is to carry out median filtering along the time orientation to obtain
The signal arrived, the impact signal are to carry out the signal that median filtering obtains along frequency direction.
11. audio detection device as claimed in claim 10, which is characterized in that the filtering submodule, comprising:
First filter unit carries out first time median filtering with frequency direction for the time orientation along the sonograph respectively,
To obtain the first harmonic signal and the first impact signal of the sonograph;
Removal unit, for removing the first harmonic signal in the sonograph, to obtain being made of first impact signal
Target sonograph;
Second filter unit carries out second of intermediate value filter with frequency direction for the time orientation along the target sonograph respectively
Wave, to obtain the second harmonic signal and the second impact signal of the target sonograph, wherein the second of the target sonograph
Harmonic signal and the second impact signal constitute the harmonic signal and impact signal of the audio to be measured.
12. audio detection device as claimed in claim 8, which is characterized in that described second, which obtains module, includes:
Framing submodule is more to obtain for carrying out sub-frame processing according to starting envelope of the preset duration to the impact signal
Each local segment is divided into multiple framings according still further to preset step-length by a local segment;
Computational submodule, based on being carried out in multiple framings input local autocorrelation functions that each local segment is corresponding
It calculates, to obtain the autocorrelation velocity spectrogram of the starting envelope.
13. audio detection device as claimed in claim 12, which is characterized in that the determining module, it is described certainly for basis
Time high peak-to-peak value of the auto-correlation mean value of each local segment, is determined as the rule of the audio to be measured in relevant speed spectrogram
Dynamic intensity value.
14. audio detection device as claimed in claim 8, which is characterized in that described device further include:
Categorization module carries out audio classification to the audio to be measured for the regular movements sense intensity value according to the audio to be measured;
Generation module, for generating audio according to the audio classification result and current voice applications scene of multiple audios to be measured
Recommend inventory.
15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor
It is loaded, the step in 1 to 7 described in any item audio-frequency detections is required with perform claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811278955.8A CN109256146B (en) | 2018-10-30 | 2018-10-30 | Audio detection method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811278955.8A CN109256146B (en) | 2018-10-30 | 2018-10-30 | Audio detection method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109256146A true CN109256146A (en) | 2019-01-22 |
CN109256146B CN109256146B (en) | 2021-07-06 |
Family
ID=65044080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811278955.8A Active CN109256146B (en) | 2018-10-30 | 2018-10-30 | Audio detection method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109256146B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978034A (en) * | 2019-03-18 | 2019-07-05 | 华南理工大学 | A kind of sound scenery identification method based on data enhancing |
CN110070884A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN110070856A (en) * | 2019-03-26 | 2019-07-30 | 天津大学 | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data |
CN110070885A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN110085214A (en) * | 2019-02-28 | 2019-08-02 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN110278388A (en) * | 2019-06-19 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Show generation method, device, equipment and the storage medium of video |
CN111639225A (en) * | 2020-05-22 | 2020-09-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information detection method and device and storage medium |
WO2020224107A1 (en) * | 2019-05-05 | 2020-11-12 | 平安科技(深圳)有限公司 | Music style classification method and apparatus, computing device and storage medium |
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
CN112908289A (en) * | 2021-03-10 | 2021-06-04 | 百果园技术(新加坡)有限公司 | Beat determining method, device, equipment and storage medium |
CN113473201A (en) * | 2021-07-29 | 2021-10-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio and video alignment method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101375327A (en) * | 2006-01-25 | 2009-02-25 | 索尼株式会社 | Beat extraction device and beat extraction method |
US20180005615A1 (en) * | 2012-05-23 | 2018-01-04 | Google Inc. | Music selection and adaptation for exercising |
CN107622774A (en) * | 2017-08-09 | 2018-01-23 | 金陵科技学院 | A kind of music-tempo spectrogram generation method based on match tracing |
CN108335703A (en) * | 2018-03-28 | 2018-07-27 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the stress position of audio data |
CN108364660A (en) * | 2018-02-09 | 2018-08-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Accent identification method, device and computer readable storage medium |
-
2018
- 2018-10-30 CN CN201811278955.8A patent/CN109256146B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101375327A (en) * | 2006-01-25 | 2009-02-25 | 索尼株式会社 | Beat extraction device and beat extraction method |
US20180005615A1 (en) * | 2012-05-23 | 2018-01-04 | Google Inc. | Music selection and adaptation for exercising |
CN107622774A (en) * | 2017-08-09 | 2018-01-23 | 金陵科技学院 | A kind of music-tempo spectrogram generation method based on match tracing |
CN108364660A (en) * | 2018-02-09 | 2018-08-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Accent identification method, device and computer readable storage medium |
CN108335703A (en) * | 2018-03-28 | 2018-07-27 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the stress position of audio data |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
US11715446B2 (en) * | 2018-01-09 | 2023-08-01 | Bigo Technology Pte, Ltd. | Music classification method and beat point detection method, storage device and computer device |
CN110070885B (en) * | 2019-02-28 | 2021-12-24 | 北京字节跳动网络技术有限公司 | Audio starting point detection method and device |
CN110070885A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN110085214A (en) * | 2019-02-28 | 2019-08-02 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN110070884A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
WO2020173488A1 (en) * | 2019-02-28 | 2020-09-03 | 北京字节跳动网络技术有限公司 | Audio starting point detection method and apparatus |
CN110070884B (en) * | 2019-02-28 | 2022-03-15 | 北京字节跳动网络技术有限公司 | Audio starting point detection method and device |
CN109978034A (en) * | 2019-03-18 | 2019-07-05 | 华南理工大学 | A kind of sound scenery identification method based on data enhancing |
CN110070856A (en) * | 2019-03-26 | 2019-07-30 | 天津大学 | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data |
WO2020224107A1 (en) * | 2019-05-05 | 2020-11-12 | 平安科技(深圳)有限公司 | Music style classification method and apparatus, computing device and storage medium |
CN110278388A (en) * | 2019-06-19 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Show generation method, device, equipment and the storage medium of video |
CN111639225A (en) * | 2020-05-22 | 2020-09-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information detection method and device and storage medium |
CN111639225B (en) * | 2020-05-22 | 2023-09-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information detection method, device and storage medium |
CN112908289A (en) * | 2021-03-10 | 2021-06-04 | 百果园技术(新加坡)有限公司 | Beat determining method, device, equipment and storage medium |
CN112908289B (en) * | 2021-03-10 | 2023-11-07 | 百果园技术(新加坡)有限公司 | Beat determining method, device, equipment and storage medium |
CN113473201A (en) * | 2021-07-29 | 2021-10-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio and video alignment method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109256146B (en) | 2021-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109256146A (en) | Audio-frequency detection, device and storage medium | |
CN111210021B (en) | Audio signal processing method, model training method and related device | |
CN103440862B (en) | A kind of method of voice and music synthesis, device and equipment | |
CN109166593A (en) | audio data processing method, device and storage medium | |
CN106024005B (en) | A kind of processing method and processing device of audio data | |
CN106356070B (en) | A kind of acoustic signal processing method and device | |
CN110853617B (en) | Model training method, language identification method, device and equipment | |
CN107680614B (en) | Audio signal processing method, apparatus and storage medium | |
CN109903773A (en) | Audio-frequency processing method, device and storage medium | |
KR20110090739A (en) | Method and apparatus for providing user interface using surface acoustic signal, and device with the user interface | |
CN103971681A (en) | Voice recognition method and system | |
CN106782600A (en) | The methods of marking and device of audio file | |
CN110097895A (en) | A kind of absolute music detection method, device and storage medium | |
CN109346061A (en) | Audio-frequency detection, device and storage medium | |
CN106384599B (en) | A kind of method and apparatus of distorsion identification | |
CN108470571A (en) | A kind of audio-frequency detection, device and storage medium | |
CN109872710B (en) | Sound effect modulation method, device and storage medium | |
CN108391207A (en) | Data processing method, device, terminal, earphone and readable storage medium storing program for executing | |
CN109243488A (en) | Audio-frequency detection, device and storage medium | |
CN109256147A (en) | Audio cadence detection method, device and storage medium | |
CN106878390A (en) | Electronic pet interaction control method, device and wearable device | |
CN109817241A (en) | Audio-frequency processing method, device and storage medium | |
CN109616135A (en) | Audio-frequency processing method, device and storage medium | |
CN110796918A (en) | Training method and device and mobile terminal | |
CN105550316B (en) | The method for pushing and device of audio list |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |