CN109087669A - Audio similarity detection method, device, storage medium and computer equipment - Google Patents
Audio similarity detection method, device, storage medium and computer equipment Download PDFInfo
- Publication number
- CN109087669A CN109087669A CN201811233515.0A CN201811233515A CN109087669A CN 109087669 A CN109087669 A CN 109087669A CN 201811233515 A CN201811233515 A CN 201811233515A CN 109087669 A CN109087669 A CN 109087669A
- Authority
- CN
- China
- Prior art keywords
- audio
- similarity
- detected
- benchmark
- characteristic sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 95
- 238000003860 storage Methods 0.000 title claims abstract description 27
- 238000012360 testing method Methods 0.000 claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims description 170
- 230000006870 function Effects 0.000 claims description 78
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 55
- 238000009432 framing Methods 0.000 claims description 45
- 238000012545 processing Methods 0.000 claims description 38
- 238000005070 sampling Methods 0.000 claims description 33
- 238000012216 screening Methods 0.000 claims description 29
- 238000000034 method Methods 0.000 claims description 27
- 238000001914 filtration Methods 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 20
- 238000005457 optimization Methods 0.000 claims description 18
- 238000010586 diagram Methods 0.000 description 23
- 239000011159 matrix material Substances 0.000 description 21
- 239000000284 extract Substances 0.000 description 11
- 238000012546 transfer Methods 0.000 description 10
- 238000007781 pre-processing Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 230000006854 communication Effects 0.000 description 6
- 230000004069 differentiation Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000005764 inhibitory process Effects 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- NJPPVKZQTLUDBO-UHFFFAOYSA-N novaluron Chemical compound C1=C(Cl)C(OC(F)(F)C(OC(F)(F)F)F)=CC=C1NC(=O)NC(=O)C1=C(F)C=CC=C1F NJPPVKZQTLUDBO-UHFFFAOYSA-N 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011045 prefiltration Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 235000015170 shellfish Nutrition 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The embodiment of the invention discloses a kind of audio similarity detection method, device, storage medium and computer equipment, the available audio to be detected of the embodiment of the present invention;The audio for meeting preset condition is filtered out from the audio to be detected, and the characteristic sequence of the audio to be detected is obtained according to the audio filtered out;Obtain the reference characteristic sequence of benchmark audio;Similarity distance between the characteristic sequence for obtaining the audio to be detected, and the reference characteristic sequence of the benchmark audio;The similarity between the audio to be detected and benchmark audio is determined according to the similarity distance.Interference tones in audio to be detected can be filtered and be filtered out required audio frequency characteristics by the program, and can reduce influence of many factors to similarity testing result, improve the accuracy of audio similarity detection.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of audio similarity detection method, device, storage are situated between
Matter and computer equipment.
Background technique
With the development of science and technology, people's lives are more and more abundant, for example, user can not only appreciate music and video display etc.
Audio can also imitate the audio and be entertained, and need the audio imitated user to be compared with original audio, at this time to comment
Estimate the similarity of imitation.
In the prior art, for imitating song, during detecting audio similarity, firstly, acquisition user imitates
Audio, and be mixed with original singer's audio of audio accompaniment, then directly calculate between the audio and original singer's audio that user imitates
Similarity.However, directly being calculated so similar since the audio that original singer's audio and user imitate is influenced by more multifactor
Degree can generate biggish error, and the similarity accuracy caused is lower.
Summary of the invention
The embodiment of the present invention provides a kind of audio similarity detection method, device, storage medium and computer equipment, it is intended to
Improve the accuracy of audio similarity detection.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
A kind of audio similarity detection method, comprising:
Obtain audio to be detected;
The audio for meeting preset condition is filtered out from the audio to be detected, and according to the audio acquisition filtered out
The characteristic sequence of audio to be detected;
Obtain the reference characteristic sequence of benchmark audio;
The characteristic sequence for obtaining the audio to be detected, between the reference characteristic sequence of the benchmark audio it is similar away from
From;
The similarity between the audio to be detected and benchmark audio is determined according to the similarity distance.
A kind of audio similarity detection device, comprising:
Audio acquiring unit, for obtaining audio to be detected;
Screening unit, for filtering out the audio for meeting preset condition from the audio to be detected, and according to filtering out
Audio obtain the characteristic sequence of the audio to be detected;
Feature acquiring unit, for obtaining the reference characteristic sequence of benchmark audio;
Distance acquiring unit, it is special with the benchmark of the benchmark audio for obtaining the characteristic sequence of the audio to be detected
Levy the similarity distance between sequence;
Determination unit, it is similar between the audio to be detected and benchmark audio for being determined according to the similarity distance
Degree.
Optionally, the screening unit includes:
It handles subelement and obtains pretreated audio for pre-processing to the audio to be detected;
Subelement is obtained, for obtaining the energy spectrum of the pretreated audio;
Subelement is screened, for filtering out the default item of satisfaction from the pretreated audio according to the energy spectrum
The audio of part, and set the corresponding frequency sequence of the audio filtered out to the characteristic sequence of the audio to be detected.
Optionally, the processing subelement is specifically used for:
The audio to be detected is sampled according to default sampling policy, the audio after being sampled;
Sub-frame processing is carried out to the audio after the sampling according to default framing strategy, the audio after obtaining framing;
Windowing process is carried out to the audio after the framing, obtains the pretreated audio of discrete time-domain.
Optionally, the acquisition subelement is specifically used for:
Integral transformation is carried out to the pretreated audio, obtains the corresponding frequency spectrum of the pretreated audio;
The energy spectrum of the pretreated audio is determined according to the frequency spectrum.
Optionally, the screening subelement includes:
Module is obtained, for obtaining the intensity of sound of the audio to be detected according to the energy spectrum;
Screening module, the audio for being greater than preset threshold for filtering out intensity of sound from the audio to be detected, obtains
Intensity of sound meets the audio of preset condition.
Optionally, the screening module is specifically used for:
The intensity of sound of the audio to be detected is normalized into preset sound strength range, obtains intensity of sound standardization
Audio;
The audio that intensity of sound is greater than preset threshold is filtered out from the intensity of sound standardized audio, and it is strong to obtain sound
Degree meets the audio of the preset condition.
Optionally, when in the benchmark audio including target fiducials audio and interference tones, the feature acquiring unit
Include:
Mean value obtains subelement, for obtaining the first root mean square average energy value of the target fiducials audio, and acquisition
Second root mean square average energy value of the interference tones;
Energy spectrum obtains subelement, for obtaining the first energy spectrum of the target fiducials audio, and obtains described dry
Disturb the second energy spectrum of audio;
Optimize subelement, for equal according to first energy spectrum, the first root mean square average energy value, the second root mean square energy
Value and the second energy spectrum, optimize the benchmark audio, the benchmark audio after being optimized;
Feature obtains subelement, for obtaining the reference characteristic sequence of the benchmark audio after the optimization.
Optionally, the mean value obtains subelement and is specifically used for:
It determines the first root mean square energy of the target fiducials audio, and determines the second root mean square of the interference tones
Energy;
The first frame number and the first frame length of the target fiducials audio are obtained, and obtains the second frame of the interference tones
Several and the second frame length;
According to the first root mean square energy, the first frame number and the first frame length determine the target fiducials audio first
Root average energy value, and the interference tones are determined according to the second root mean square energy, the second frame number and the second frame length
Second root mean square average energy value.
Optionally, the distance acquiring unit includes:
Coded sub-units are obtained for encoding according to characteristic sequence of the pre-arranged code strategy to the audio to be detected
Characteristic sequence to after the first coding, and according to the pre-arranged code strategy to the reference characteristic sequence of the benchmark audio into
Row coding, the characteristic sequence after obtaining the second coding;
First determine subelement, for determine it is described first coding after characteristic sequence and second coding after characteristic sequence
Between similarity distance.
Optionally, the coded sub-units are specifically used for:
According to pre-arranged code strategy by the characteristic sequence of the audio to be detected, each adjacent two characteristic value carries out size
Compare;
When characteristic value previous in two neighboring characteristic value is less than later feature value, by the spy of the audio to be detected
Sign sequential coding is the first encoded radio, and,
When characteristic value previous in two neighboring characteristic value is equal to later feature value, by the spy of the audio to be detected
Sign sequential coding is the second encoded radio;And
When characteristic value previous in two neighboring characteristic value is greater than later feature value, by the spy of the audio to be detected
Sign sequential coding is third encoded radio;
Characteristic sequence after generating the first coding based on the first encoded radio, the second encoded radio and/or third encoded radio.
Optionally, the similarity distance includes at least editing distance, Euclidean distance and Hamming distance, and described first really
Stator unit is specifically used for:
At least determine it is described first coding after characteristic sequence and second coding after characteristic sequence between editing distance,
Euclidean distance and Hamming distance;
The editing distance, Euclidean distance and Hamming distance are normalized respectively, obtain similarity distance.
Optionally, the determination unit includes:
Subelement is constructed, for constructing each distance and sub- similarity in editing distance, Euclidean distance and Hamming distance
Between affine function;
Determine subelement, for according to it is each determine respectively apart from corresponding affine function it is each apart from corresponding sub- similarity;
Third determines subelement, for being determined between the audio to be detected and benchmark audio according to the sub- similarity
Similarity.
Optionally, the third determines that subelement is specifically used for:
The first weighted value is set for the sub- similarity of the editing distance, and is arranged for the sub- similarity of the Hamming distance
Second weighted value;
Penalty term is set by the sub- similarity of the Euclidean distance;
According to first weighted value, the second weighted value and penalty term, determine the audio to be detected and benchmark audio it
Between similarity.
Optionally, the audio similarity detection device further include:
Resource transfers unit, for being greater than default similarity when the similarity between the audio to be detected and benchmark audio
When threshold value, virtual resource transfer operation is executed, and/or the related of similarity testing result of the display audio to be detected is believed
Breath.
Optionally, the audio similarity detection device further include:
Unlocking unit, for being greater than default similarity threshold when the similarity between the audio to be detected and benchmark audio
When, audio lock operation is unlocked in execution.
A kind of storage medium, the storage medium are stored with a plurality of instruction, and described instruction is suitable for processor and is loaded, with
Execute any audio similarity detection method provided in an embodiment of the present invention.
A kind of computer equipment, including memory and processor, the memory are stored with determining machine program, the determination
When machine program is executed by the processor, so that the processor executes any audio similarity provided in an embodiment of the present invention
Detection method.
The available audio to be detected of the embodiment of the present invention, and filtered out from the audio to be detected and meet preset condition
Audio, and the characteristic sequence of audio to be detected is obtained according to the audio filtered out, so as to will be dry in audio to be detected
It disturbs audio and is filtered and filters out required audio frequency characteristics, and obtain the reference characteristic sequence of benchmark audio;Then, it obtains
Similarity distance between the characteristic sequence of audio to be detected, and the reference characteristic sequence of benchmark audio, such as editing distance, Europe are several
In distance and Hamming distance etc., which can reduce influence of many factors to similarity testing result, at this time may be used
To determine the similarity between audio to be detected and benchmark audio according to similarity distance, the accurate of audio similarity detection is improved
Property.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is the schematic diagram of a scenario of audio similarity detection method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of audio similarity detection method provided in an embodiment of the present invention;
Fig. 3 is another flow diagram of audio similarity detection method provided in an embodiment of the present invention;
Fig. 4 is another flow diagram of audio similarity detection method provided in an embodiment of the present invention;
Fig. 5 is the schematic diagram that terminal provided in an embodiment of the present invention shows K song interface;
Fig. 6 (a) to 6 (d) is initial time domain sample graph provided in an embodiment of the present invention;
Fig. 7 (a) to 7 (d) is spectrum signature figure provided in an embodiment of the present invention;
Fig. 8 is the flow diagram provided in an embodiment of the present invention for obtaining characteristic sequence;
Fig. 9 is the flow diagram of screening frequency sequence provided in an embodiment of the present invention;
Figure 10 (a) to 10 (d) is the spectrum signature figure provided in an embodiment of the present invention after characteristic filter;
Figure 11 (a) to 11 (c) is the schematic diagram of the first dimensional feature sequence provided in an embodiment of the present invention;
Figure 12 (a) to 12 (c) is the schematic diagram of the first coding sign sequence provided in an embodiment of the present invention;
Figure 13 is the schematic diagram that terminal provided in an embodiment of the present invention shows the red packet amount of money and song;
Figure 14 is the schematic diagram of terminal display reminding user music composition for two or more information provided in an embodiment of the present invention;
Figure 15 is the schematic diagram that terminal provided in an embodiment of the present invention shows speech message;
Figure 16 is the structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention;
Figure 17 is another structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention;
Figure 18 is another structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention;
Figure 19 is another structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention;
Figure 20 is another structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention;
Figure 21 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of audio similarity detection method, device, storage medium and computer equipment.
Referring to Fig. 1, Fig. 1 is the schematic diagram of a scenario of audio similarity detection method provided by the embodiment of the present invention, it should
Audio similarity detection method can be applied to audio similarity detection device, which can specifically collect
At tablet computer, mobile phone and laptop etc. have storage element and microprocessor is installed and with operational capability
In terminal, for example, the available audio to be detected of the terminal, which can be the audio etc. of user recording generation,
It is then possible to filter out the audio for meeting preset condition from audio to be detected, and to be detected according to the audio acquisition filtered out
The characteristic sequence of audio, for example, can be sampled to audio to be detected, the pretreatment such as framing and adding window, after obtaining pretreatment
Audio, to pretreated audio carry out integral transformation, obtain the corresponding frequency spectrum of pretreated audio, it is true according to the frequency spectrum
The energy spectrum of fixed pretreated audio, the sound for meeting preset condition is filtered out according to energy spectrum from pretreated audio
Frequently, so as to the interference tones in audio to be detected being filtered and are filtered out required audio frequency characteristics.And obtain base
The reference characteristic sequence of quasi- audio, the audio which can acquire from server, or obtained with other approach
The audio etc. taken;At this point it is possible to the characteristic sequence of audio to be detected and the reference characteristic sequence of benchmark audio be obtained, then to this
Two characteristic sequences are extended Manchester's code, and the similarity distance after the two determining codings between characteristic sequence, example
Such as editing distance, Euclidean distance and Hamming distance, the similarity distance can reduce many factors and detect knot to similarity
The influence of fruit can finally determine the similarity between audio to be detected and benchmark audio according to similarity distance, improve audio
The accuracy of similarity detection;Etc..
It should be noted that the schematic diagram of a scenario of audio similarity detection method shown in FIG. 1 is only an example, this
The scene of the audio similarity detection method of inventive embodiments description is the skill in order to more clearly illustrate the embodiment of the present invention
Art scheme does not constitute the restriction for technical solution provided in an embodiment of the present invention, those of ordinary skill in the art it is found that with
The differentiation of audio similarity detection method and the appearance of new business scene, technical solution provided in an embodiment of the present invention is for class
As technical problem, it is equally applicable.
It is described in detail separately below.
In the present embodiment, it will be described from the angle of audio similarity detection device, audio similarity detection dress
Setting, which specifically can integrate, has storage element in tablet computer, mobile phone and laptop etc. and is equipped with microprocessor and has
Have in the terminal of operational capability.
A kind of audio similarity detection method, comprising: obtain audio to be detected;It is pre- that satisfaction is filtered out from audio to be detected
If the audio of condition, and obtain according to the audio filtered out the characteristic sequence of audio to be detected;The benchmark for obtaining benchmark audio is special
Levy sequence;Similarity distance between the characteristic sequence for obtaining audio to be detected, and the reference characteristic sequence of benchmark audio;According to phase
The similarity between audio and benchmark audio to be detected is determined like distance.
Referring to Fig. 2, Fig. 2 is the flow diagram for the audio similarity detection method that one embodiment of the invention provides.It should
Audio similarity detection method may include:
In step s101, audio to be detected is obtained.
The audio to be detected can be the audio etc. that user sings a song or says one section, for example, working as audio similarity
When detection method is applied to the scene of song scoring, the original singer's audio and audio accompaniment of an available song are as reference note
Frequently and available user records the audio of head song as audio to be detected, it is subsequent can determine the benchmark audio and to
The similarity between audio is detected, can be when similarity is greater than default similarity threshold can get red packet or get empirical value
Deng.
When audio similarity detection method is applied to the scene of sound lock, available user records one section of benchmark in advance
Audio is locked as sound, when unlock the audio to be detected for unlock recorded of available user, it is subsequent to determine the base
Similarity between quasi- audio and audio to be detected can be similarity and be greater than default similarity threshold (such as close to percent
Hundred) sound lock can be just unlocked in the case where.
It should be noted that the audio similarity detection method can also be applied to the other field of acoustic processing, for example,
Pitch detection, loudness of a sound detection or sound quality detection of sound etc..
For example, can use sample rate is 16KHZ or the audio of other sample rates during obtaining audio to be detected
The audio that data format acquisition user speaks or sings obtains audio to be detected and can be code rate to be 16bit or other code rates
Continuous impulse coded modulation (Pulse Code Modulation, PCM) signal.
In step s 102, the audio for meeting preset condition is filtered out from audio to be detected, and according to the sound filtered out
Frequency obtains the characteristic sequence of audio to be detected.
After obtaining audio to be detected, spectrum signature extraction, characteristic filter and screening etc. can be carried out to audio to be detected,
To filter out required characteristic sequence, wherein the intensity of sound can be the intensity of sound of audio, which can root
Flexible setting is carried out according to actual needs, this feature sequence may include the frequency sequence etc. filtered out from audio to be detected.
In some embodiments, the audio for meeting preset condition is filtered out from audio to be detected, and according to filtering out
Audio obtain the characteristic sequence of the audio to be detected and may include:
(1) audio to be detected is pre-processed, obtains pretreated audio;
(2) energy spectrum of pretreated audio is obtained;
(3) it according to energy spectrum, filters out the audio for meeting preset condition from pretreated audio, and will filter out
The corresponding frequency sequence of audio is set as the characteristic sequence of audio to be detected.
Firstly, being screened for convenience to audio to be detected, audio to be detected can be pre-processed, in certain realities
It applies in mode, audio to be detected is pre-processed, obtaining pretreated audio may include: according to default sampling policy pair
Audio to be detected is sampled, the audio after being sampled;The audio after sampling is carried out at framing according to default framing strategy
Reason, the audio after obtaining framing;Windowing process is carried out to the audio after framing, obtains discrete pretreated audio.
Specifically, successively audio to be detected can be sampled, the processing such as framing and adding window, wherein sub-frame processing can
To be the audio for being divided to obtain a frame frame to audio, for example, one minute audio can be divided according to one frame per second
To 60 frame audios.After carrying out framing to audio, it is likely to result in the spectrum energy leakage of audio, it therefore, can be further
Windowing process is carried out to audio is obtained after framing, which can be right using different cutted functions (i.e. windowed function)
Signal is intercepted, so that the spectrum energy of audio is more concentrated, close to true frequency spectrum, audio sampled, framing and is added
After window, the audio signal for the discrete amplitude sequence being distributed along the time axis.For example, can be according to default sampling policy benefit
It is 44100HZ or other sample frequencys etc. with sample frequency, audio to be detected is sampled, the audio after being sampled should
Default sampling policy can be the sampling policy for meeting nyquist sampling law.Then, according to default framing strategy as used
Framing length be 512 or 1024 sampled points and frame move be frame length a half or thirds etc., to the audio after sampling
Sub-frame processing is carried out, the audio after obtaining framing then can be using Hamming window function, rectangular window function or hamming window function etc.
Windowing process is carried out to the audio after framing, obtains discrete pretreated audio.
Wherein, frame length can refer to the length of the data frame of audio, for example, when the length of the sampled point of audio is 512, and
When sample frequency is 44100HZ, frame length is the length that 1/44100*512 obtains being approximately equal to 11.6 milliseconds.Frame shifting can be front and back
The lap of two frame audios, for example, it is the one of frame length that the frame, which moves, when the lap of two frame audio of front and back is the half of frame length
Half.
Then, the energy spectrum of pretreated audio is obtained, in some embodiments, obtains pretreated audio
Energy spectrum may include: to carry out integral transformation to pretreated audio, obtain the corresponding frequency spectrum of pretreated audio;According to
Frequency spectrum determines the energy spectrum of pretreated audio.
Wherein, integral transformation may include Fourier transformation and Laplace transform etc., will be to use Fu in the present embodiment
In be described in detail for leaf transformation.For example, 2048 points or 1024 points etc. can be carried out to pretreated audio
Short Time Fourier Transform obtains the corresponding frequency spectrum of each frame audio in pretreated audio, then to pretreated audio
Frequency spectrum modulus square, obtains the corresponding energy spectrum of pretreated audio, which can be every frame audio in each frequency
Matrix composed by the energy size of distribution.
It should be noted that the feature for needing to extract in the embodiment of the present invention is in addition to that can extract frequency by Fourier transformation
Except spectrum signature, can also obtain short-time average zero-crossing rate, short-time energy, Energy-Entropy, spectral centroid, frequency spectrum extensibility, spectrum entropy,
Spectral flux, spectral roll-off low spot, coloration spectrum signature, and/or mel cepstrum coefficients etc. are used for the parameter of audio processing, these are not
Same feature is applicable to different application scenarios.
Secondly, in order to filter out the lower interference tones of intensity of sound, it can be based on the energy of pretreated audio
Spectrum, filters out the audio for meeting preset condition from audio to be detected, in some embodiments, according to energy spectrum, from pre-
Filter out that meet the audio of preset condition may include: that the sound that obtains audio to be detected according to energy spectrum is strong in audio after reason
Degree;The audio that intensity of sound is greater than preset threshold is filtered out from audio to be detected, is obtained intensity of sound and is met preset condition
Audio.
For example, energy spectrum S can be converted to the matrix P of intensity of sound expression, intensity of sound table is converted by energy spectrum
The formula shown can be such that
Wherein, S indicates that energy spectrum matrix, P indicate that intensity of sound matrix, a and ref indicate that coefficient, such as a can take 10,
Ref can take 1 or other values etc., and when S is equal to ref, P is equal to 0, can determine audio to be detected according to the formula (1)
Intensity of sound can filter out the audio that intensity of sound is greater than preset threshold from audio to be detected at this time, obtain intensity of sound
Meet the audio of preset condition, so as to which the lower interference tones of intensity of sound are filtered out, which can basis
Actual needs carries out flexible setting, and specific value is not construed as limiting here.
In some embodiments, the audio that intensity of sound is greater than preset threshold is filtered out from audio to be detected, is obtained
The audio that intensity of sound meets preset condition may include: that the intensity of sound of audio to be detected is normalized into preset sound intensity
Range obtains intensity of sound standardized audio;Intensity of sound is filtered out from intensity of sound standardized audio greater than preset threshold
Audio, obtain the audio that intensity of sound meets preset condition.
For example, the intensity of sound P of audio to be detected can be normalized into 0~b decibels (db), meet the Auditory Perception of people
Range, standardization formula are as follows:
S_P=max (P, max (P)-b) (2)
Wherein, which can carry out flexible setting according to actual needs, for example, can will be to be detected
The intensity of sound P of audio is normalized into 0~80db, i.e. b can take 80, S_P to indicate that the sound of intensity of sound standardized audio is strong
Matrix is spent, P indicates the intensity of sound matrix before standardization,
The preset threshold of intensity of sound can be set at this time, and preset threshold can will be lower than in intensity of sound standardized audio
Intensity of sound zero setting, will be screened higher than preset threshold in intensity of sound standardized audio, and obtain intensity of sound satisfaction
The audio of preset condition, due in audio to be detected accompaniment and background sound etc. be all interference tones, the preset threshold is arranged can
Rationally to be filtered to interference tones.
After filtering out and meeting the audio of preset condition, the corresponding frequency sequence of the audio filtered out can be set to
The characteristic sequence of detection audio is arranged for example, can be ranked up from big to small to the audio filtered out according to intensity of sound
Audio after sequence;From the audio for extracting maximum acoustic intensity after sequence in audio, the corresponding frequency of the audio of maximum acoustic intensity
Sequence is exactly the characteristic sequence of audio to be detected.
For example, can by filtered intensity of sound matrix S_P (audio filtered out) by intensity of sound from big to small
Be ranked up, audio after being sorted, then, from extracted in audio after sequence the maximum preset audio of intensity of sound (such as
The audio of preceding 6 dimension maximum acoustic intensity), and extract from the frequency matrix of preset audio default dimension frequency sequence (such as 6
Dimension), such as the frequency sequence of sextuple maximum intensity of sound before every frame audio is extracted, the frequency sequence is as finally obtained
The characteristic sequence of audio to be detected.
Do not carry out the processing of sufficient Feature Engineering compared with the existing technology, for example, audio frequency characteristics are not filtered and
The processing such as screening, and since audio to be detected itself has the characteristics that pause or power, individual features are in time domain and frequency domain
There is the characteristics of differentiation of length or size, be directed to audio to be detected in the embodiment of the present invention, has carried out at sufficient Feature Engineering
Reason, such as audio to be detected is pre-processed, energy spectrum is obtained, spectrum signature is filtered and is arranged according to energy size
Sequence, n dimension maximum feature of (such as n=6) energy etc. before filtering out, so as to reduce caused by subsequent determining similarity accidentally
Difference.
It needs, when there are such as audio accompaniment interference tones in audio to be detected, such as in audio to be detected
Audio accompaniment can be weakened including audio user and audio accompaniment in order to improve the accuracy of subsequent determining similarity.
Optionally, during obtaining the characteristic sequence of audio to be detected, the root mean square average energy value of available audio user, with
And obtain the root mean square average energy value of audio accompaniment;The energy spectrum of audio user is obtained, and obtains the energy spectrum of audio accompaniment;
According to the energy spectrum of audio user, the root mean square average energy value of audio user, the root mean square average energy value of audio accompaniment and accompaniment
The energy spectrum of audio optimizes audio to be detected, audio to be detected after being optimized;Obtain audio to be detected after optimization
Characteristic sequence.
Wherein, optimization, which refers to, is weakened or is filtered to interference tones such as the audio accompaniments for including in audio, to audio
The purpose optimized is to weaken the influence of interference tones, such as reduce the influence that environmental noise determines similarity.Due to
Include interference tones in the audio before being optimized to audio, therefore, detects knot to weaken interference tones to audio similarity
The influence of fruit can optimize audio, and in the audio after obtained optimization, interference tones have been weakened or have filtered.
Optionally, the root mean square average energy value of audio user is obtained, and obtains the root mean square average energy value of audio accompaniment
It may include: the root mean square energy of determining audio user, and determine the root mean square energy of audio accompaniment;Obtain audio user
Frame number and frame length, and obtain the frame number and frame length of audio accompaniment;It is true according to the root mean square energy, frame number and frame length of audio user
Determine the root mean square average energy value of audio user, and accompaniment tone is determined according to root mean square energy, frame number and the frame length of audio accompaniment
The root mean square average energy value of frequency.
Such as, it is first determined the root mean square energy of each frame audio in audio user, then obtain audio user frame number and
Frame length;The root mean square average energy value of audio user is determined according to root mean square energy, frame number and the frame length of audio user, is calculated public
Formula can be such that
Wherein, M indicates that frame number, N indicate frame length, xij(n) indicate j-th of sampled point of the i-th frame amplitude, audio accompaniment it is equal
Root average energy value can also be obtained according to formula (3) determination.
At this point it is possible to determine between the root mean square average energy value of audio user and the root mean square average energy value of audio accompaniment
Ratio, for example, root mean square average energy value of the root mean square average energy value of audio user divided by audio accompaniment, obtains the two root mean square
The ratio of average energy value, calculation formula can be such that
Ratio between audio user and the root mean square average energy value of audio accompaniment reflects audio user and audio accompaniment
Between intensity of sound it is relatively strong and weak.
It is then possible to obtain the energy spectrum of audio user, and the energy spectrum of audio accompaniment is obtained, according to audio user
The root mean square average energy value of energy spectrum, the energy spectrum of audio accompaniment and audio user and the root mean square energy of audio accompaniment are equal
Ratio between value optimizes audio to be detected, audio to be detected after being optimized, for example, the energy spectrum of audio user
The energy spectrum of the audio accompaniment of corresponding ratio is subtracted, calculation formula can be such that
Matrix of differences=audio user energy spectrum-audio accompaniment energy spectrum × ratio (5)
Wherein, matrix of differences is audio to be detected after optimizing, which can be by carrying out to audio accompaniment
Weaken, enhances the eigenmatrix of audio user (i.e. voice feature).At this time after available optimization audio to be detected feature
Sequence.
The interference of audio accompaniment and environmental noise for including in audio to be detected etc., example are not considered in compared with the existing technology
If audio to be detected has done many stereo process, there are biggish differences for audio user and audio accompaniment etc., directly determine similar
Degree will lead to biggish error, and can be treated according to the power relatively of audio user and audio accompaniment in the embodiment of the present invention
Detection audio has carried out audio accompaniment decrease, enhances the audio user for comparing, therefore no matter have cappela, can essence
Standard detects the similarity between benchmark audio and audio user.
In step s 103, the reference characteristic sequence of benchmark audio is obtained.
The benchmark audio can be to be obtained from server, or prerecord, for example, in the applied field of song scoring
Scape can be upper downloading or the original singer's audio and audio accompaniment of prerecording a song from server as benchmark audio;?
The application scenarios of sound lock, available user record a segment of audio as benchmark audio (i.e. sound lock) etc. in advance.Reference note
The reference characteristic sequence of frequency may include that the frequency sequence etc. for meeting preset condition is filtered out from benchmark audio, the reference characteristic
Sequence, which can be, to be predefined good and is stored in local, or when needing to use reference characteristic sequence, is carried out to benchmark audio
What feature extraction obtained.
For example, can use sample rate is 16KHZ or the audio number of other sample rates during obtaining benchmark audio
Benchmark audio is acquired according to format, it is 16bit or the continuous P CM signal of other code rates which, which can be code rate,.
Optionally, after obtaining benchmark audio, the target audio for meeting preset condition can be filtered out from benchmark audio,
And the reference characteristic sequence of benchmark audio is obtained according to the target audio filtered out.
Optionally, the target audio for meeting preset condition is filtered out from benchmark audio, and according to the target sound filtered out
The reference characteristic sequence that frequency obtains benchmark audio may include: to pre-process to benchmark audio, reference note after being pre-processed
Frequently;Obtain the energy spectrum of benchmark audio after pre-processing;The mesh for meeting preset condition is filtered out from benchmark audio according to energy spectrum
Mark with phonetic symbols frequency, and set the corresponding frequency sequence of the target audio filtered out to the reference characteristic sequence of benchmark audio.
Benchmark audio is screened for convenience, benchmark audio can be pre-processed, optionally, to benchmark audio
It is pre-processed, benchmark audio may include: to sample according to default sampling policy to benchmark audio after being pre-processed, and obtain
Benchmark audio after to sampling;Sub-frame processing is carried out to benchmark audio after sampling according to default framing strategy, obtains benchmark after framing
Audio;Windowing process is carried out to benchmark audio after framing, obtains benchmark audio after the pretreatment of discrete time-domain.
For example, can be 44100HZ or other sample frequencys etc. using sample frequency according to default sampling policy, to benchmark
Audio is sampled, benchmark audio after being sampled, which, which can be, meets adopting for nyquist sampling law
Sample strategy.Then, it is pipetted according to the framing length that default framing strategy uses for 512 or 1024 sampled points etc. and frame
The a half or thirds etc. of frame length carry out sub-frame processing to benchmark audio after sampling, obtain benchmark audio after framing, at this time may be used
To carry out windowing process to benchmark audio after framing using Hamming window function, rectangular window function or hamming window function etc., obtain from
Benchmark audio after the pretreatment of scattered time domain, i.e., benchmark audio can be discrete time-domain audio signal amplitude sequence after the pretreatment
Column.
Optionally, the energy spectrum for obtaining benchmark audio after pre-processing may include: to accumulate to benchmark audio after pretreatment
Divide transformation, the corresponding frequency spectrum of benchmark audio after being pre-processed;The energy spectrum of benchmark audio after pretreatment is determined according to frequency spectrum.
Wherein, integral transformation may include Fourier transformation and Laplace transform etc., for using Fourier transformation,
For example, the Short Time Fourier Transform of 2048 points or 1024 points etc. can be carried out to benchmark audio after pretreatment, pre- place is obtained
The corresponding frequency spectrum of each frame audio obtains pre- then to the frequency spectrum modulus square of benchmark audio after pretreatment in benchmark audio after reason
The corresponding energy spectrum of benchmark audio after processing, the energy spectrum can be the energy size that every frame benchmark audio is distributed in each frequency
Composed matrix.
Optionally, filter out that meet the target audio of preset condition may include: root from benchmark audio according to energy spectrum
The intensity of sound of benchmark audio is obtained according to energy spectrum;The audio that intensity of sound is greater than preset threshold is filtered out from benchmark audio,
Obtain the target audio that intensity of sound meets preset condition.
For example, intensity of sound can be converted by the energy spectrum of benchmark audio according to above-mentioned formula (1), it at this time can be from base
The audio that intensity of sound is greater than preset threshold is filtered out in quasi- audio, obtains the target audio that intensity of sound meets preset condition,
So as to which the lower interference tones of intensity of sound are filtered out, which can flexibly be set according to actual needs
It sets, specific value is not construed as limiting here.
Optionally, the audio that intensity of sound is greater than preset threshold is filtered out from benchmark audio, obtains intensity of sound satisfaction
The target audio of preset condition may include: that the intensity of sound of benchmark audio is normalized into preset sound strength range, obtain
Intensity of sound standardized benchmark audio;Intensity of sound is filtered out from intensity of sound standardized benchmark audio greater than preset threshold
Audio obtains the target audio that intensity of sound meets preset condition.
For example, the intensity of sound P of audio to be detected can be normalized into 0~80db according to above-mentioned formula (2), meet people
Auditory Perception range, the preset threshold of intensity of sound can be set at this time, can will be in intensity of sound standardized benchmark audio
Lower than the intensity of sound zero setting of preset threshold, screening for preset threshold will be higher than in intensity of sound standardized benchmark audio,
Obtain the target audio that intensity of sound meets preset condition, due in benchmark audio accompaniment and background sound etc. be all interference sound
Frequently, the preset threshold, which is arranged, can rationally filter interference tones.
It, can be by the corresponding frequency sequence of the target audio filtered out after filtering out and meeting the target audio of preset condition
Be set as the reference characteristic sequence of benchmark audio, for example, can to the target audio filtered out according to intensity of sound from big to small
It is ranked up, target audio after being sorted;From the maximum audio of intensity of sound is extracted after sequence in target audio, most loudly
The corresponding frequency sequence of the audio of loudness of a sound degree is exactly the characteristic sequence of benchmark audio.Such as extract sextuple maximum before every frame audio
Intensity of sound frequency sequence, which is the characteristic sequence of finally obtained benchmark audio.Due to acoustic to be checked
Frequency itself has the characteristics that pause or power, and individual features also have the differentiation of length or size in time domain and frequency domain, for
The characteristics of detecting audio, pre-processes benchmark audio, obtains energy spectrum, is filtered according to energy size to spectrum signature
And sequence, n dimension maximum feature of energy etc. before filtering out, so as to reduce error caused by subsequent determining similarity.
In some embodiments, when in benchmark audio including target fiducials audio and interference tones, reference note is obtained
The reference characteristic sequence of frequency may include: to obtain the first root mean square average energy value of target fiducials audio, and obtain interference sound
Second root mean square average energy value of frequency;The first energy spectrum of target fiducials audio is obtained, and obtains the second energy of interference tones
Amount spectrum;According to the first energy spectrum, the first root mean square average energy value, the second root mean square average energy value and the second energy spectrum, to benchmark
Audio optimizes, the benchmark audio after being optimized;The reference characteristic sequence of benchmark audio after obtaining optimization.
In some embodiments, the first root mean square average energy value of target fiducials audio is obtained, and obtains interference sound
Second root mean square average energy value of frequency may include: the first root mean square energy of determining target fiducials audio, and determine interference
Second root mean square energy of audio;The first frame number and the first frame length of target fiducials audio are obtained, and obtains interference tones
Second frame number and the second frame length;The of target fiducials audio is determined according to the first root mean square energy, the first frame number and the first frame length
One root mean square average energy value, and the second of interference tones are determined according to the second root mean square energy, the second frame number and the second frame length
Root mean square average energy value.
For example, can determine the first root mean square average energy value of target fiducials audio according to above-mentioned formula (3), and interference
Then second root mean square average energy value of audio determines the first root mean square average energy value and interference tones of target fiducials audio
Secondly ratio between second root mean square average energy value subtracts the interference tones of the ratio using the energy spectrum of target fiducials audio
Energy spectrum, to optimize to benchmark audio, benchmark audio after being optimized, the benchmark audio after the optimization be can be
Interference tones are weakened, the eigenmatrix of target fiducials audio is enhanced, the reference note after last available optimization
The reference characteristic sequence of frequency, so as to according to the relatively strong and weak of target fiducials audio and interference tones, in benchmark audio
Interference tones have carried out (such as audio accompaniment) decrease, enhance the target fiducials audio (such as original singer's audio) for comparing,
Therefore the similarity between benchmark audio and audio to be detected can be precisely detected.
In step S104, the characteristic sequence of audio to be detected is obtained, between the reference characteristic sequence of benchmark audio
Similarity distance.
Wherein, which at least may include editing distance, Euclidean distance and Hamming distance etc., the editor away from
From the main component that can be used for measuring similarity;Euclidean distance can be used for measuring the otherness of coded sequence, to similarity
As a result it is punished;Hamming distance can be used for measuring the absolute consistency of coded sequence, to similarity result positive feedback.With
Under will be described in more detail.
In some embodiments, the characteristic sequence for obtaining audio to be detected, with the reference characteristic sequence of benchmark audio it
Between similarity distance may include: to be encoded according to characteristic sequence of the pre-arranged code strategy to audio to be detected, obtain first
Characteristic sequence after coding, and encoded according to reference characteristic sequence of the pre-arranged code strategy to benchmark audio, obtain
Characteristic sequence after two codings;Between characteristic sequence after determining the first coding and the characteristic sequence after the second coding it is similar away from
From.
It, can characteristic sequence and reference note to audio to be detected in order to improve the accuracy and stability that similarity determines
The reference characteristic sequence of frequency is encoded, and determines similarity distance based on characteristic sequence after coding.Wherein, pre-arranged code strategy can
To carry out flexible setting according to actual needs, for example, pre-arranged code strategy may include that Differential Manchester Encoding, non-return-to-zero are anti-
Mutually coding (NRZI, No Return Zero-Inverse), Manchester's code and extension Manchester's code etc..
In some embodiments, it encodes, obtains according to characteristic sequence of the pre-arranged code strategy to audio to be detected
Characteristic sequence after first coding may include: according to pre-arranged code strategy by the characteristic sequence of audio to be detected, per adjacent
Two characteristic values carry out size comparison;It, will be to when characteristic value previous in two neighboring characteristic value is less than later feature value
The characteristic sequence of detection audio is encoded to the first encoded radio, and, after characteristic value previous in two neighboring characteristic value is equal to
When one characteristic value, the characteristic sequence of audio to be detected is encoded to the second encoded radio;And before in two neighboring characteristic value
When one characteristic value is greater than later feature value, the characteristic sequence of audio to be detected is encoded to third encoded radio;Based on first
Encoded radio, the second encoded radio and/or third encoded radio generate the characteristic sequence after the first coding.
For the pre-arranged code strategy for extending Manchester's code, the coding rule of the extension Manchester's code can be with
Are as follows: if two neighboring characteristic value changes from low to high in characteristic sequence, is the first coding by the feature coding of audio to be detected
Value, such as it is encoded to " 1 ";If two neighboring characteristic value remains unchanged in characteristic sequence, by the feature coding of audio to be detected
For the second encoded radio, such as it is encoded to " 0 ";It, will be to be detected if two neighboring characteristic value changes from high to low in characteristic sequence
The feature coding of audio is third encoded radio, such as is encoded to " -1 ".
For example, can since in the characteristic sequence of audio to be detected be located at primary characteristic value, first can will
It is 0 positioned at primary Coding pattern features, then, primary characteristic value will be located at and compared with deputy characteristic value is located at
Compared with either, can not being encoded to primary characteristic value is located at, directly will be located at primary characteristic value and be located at the
Two characteristic values are compared.When primary characteristic value is less than deputy characteristic value, it is encoded to " 1 ", and, when the
When one characteristic value is equal to deputy characteristic value, it is encoded to " 0 ";And when primary characteristic value is greater than deputy
When characteristic value, it is encoded to " -1 ".Further, deputy characteristic value will be located to compare with the characteristic value for being located at third position
Compared with, and so on, it finishes, obtains to be checked until each adjacent two characteristic value in the characteristic sequence of audio to be detected is compared
Characteristic sequence after corresponding first coding of acoustic frequency.Characteristic sequence after first coding can be formed by -1,0 or 1, should
The frequecy characteristic that characteristic sequence after first coding can be used for characterizing audio to be detected changes in the height of time scale.
Likewise, it is directed to benchmark audio, it can also be according to the coding rule of the extension Manchester's code to benchmark audio
Reference characteristic sequence encoded, in some embodiments, according to pre-arranged code strategy to the reference characteristic of benchmark audio
Sequence is encoded, and the characteristic sequence after obtaining the second coding may include: according to pre-arranged code strategy by the spy of benchmark audio
It levies in sequence, each adjacent two characteristic value carries out size comparison;When characteristic value previous in two neighboring characteristic value is less than latter
When a characteristic value, the characteristic sequence of benchmark audio is encoded to the first encoded radio, and, when previous in two neighboring characteristic value
When characteristic value is equal to later feature value, the characteristic sequence of benchmark audio is encoded to the second encoded radio;And when two neighboring
When previous characteristic value is greater than later feature value in characteristic value, the characteristic sequence of benchmark audio is encoded to third encoded radio;
Characteristic sequence after generating the second coding based on the first encoded radio, the second encoded radio and/or third encoded radio.
Since audio to be detected or benchmark audio are easy by individual difference and Effect of gender, for example, female voice is relative to male
The frequency of sound is higher, and different people is different in the base frequency for sending out phone same, pronunciation length also difference etc., if therefore passing through letter
The mode of single given threshold and parameter eliminates the influence of individual difference bring, then is easy by subjective factor and data scale
It influences, it is not accurate enough and stable, and using extension Manchester's code to the feature sequence of audio to be detected in the embodiment of the present invention
The reference characteristic sequence of column and benchmark audio is encoded, and the similitude by characteristic sequence after determining coding is to be detected to characterize
Similarity between audio and benchmark audio eliminates the disturbing factors such as audio accompaniment, individual and gender differences and examines to similarity
Survey the influence of result accuracy.
In some embodiments, similarity distance includes at least editing distance, Euclidean distance and Hamming distance, determines
The similarity distance between the characteristic sequence after characteristic sequence and the second coding after first coding may include: at least determining first
Editing distance, Euclidean distance and the Hamming distance between the characteristic sequence after characteristic sequence and the second coding after coding;
Editing distance, Euclidean distance and Hamming distance are normalized respectively, obtain similarity distance.
Wherein, editing distance can be pointer for characteristic sequence after two codings, by feature sequence after one of coding
Column are converted into minimum edit operation times needed for characteristic sequence after another is encoded.Editing distance is bigger, illustrates two codings
Different characteristic is more between characteristic sequence afterwards, conversely, editing distance is smaller, illustrates different special between characteristic sequence after two codings
Levy it is fewer, the edit operation may include a characteristic character is substituted for another characteristic character, insertion one characteristic character,
And delete a characteristic character etc., this feature character can be " 1 ", " 0 " or " -1 " etc. that coding obtains.After determining the first coding
Characteristic sequence and second coding after characteristic sequence between editing distance, that is, determine first coding after characteristic sequence conversion
At minimum edit operation times needed for the characteristic sequence after the second coding, after the first coding can be measured using editing distance
The similitude of the two characteristic sequences entirety such as the characteristic sequence after characteristic sequence and the second coding, preferably solves due to hair
Alignment problem caused by the short difference of the duration of a sound etc..
Euclidean distance can refer to that the characteristic sequence after the first coding and the characteristic sequence after the second coding are several in Europe
In the linear distance of point-to-point transmission in space, in the embodiment of the present invention Euclidean distance be used to measure first encode after feature
Difference degree between the two characteristic sequences such as the characteristic sequence after sequence and the second coding.Such as benchmark audio can be set
Characteristic sequence after (such as original singer's audio) corresponding second coding is (x1, x2 ..., xn), audio to be detected (such as with
Family audio) characteristic sequence after corresponding first coding is (y1, y2 ..., yn), wherein and n is feature after the two codings
The length of maximum length sequence in sequence, the value of n can carry out flexible setting according to actual needs, for example, curtailment n can be with
Zero padding.Euclidean distance d between the characteristic sequence after characteristic sequence and the second coding after first coding2Calculation formula can
With as follows:
Hamming distance can refer to the characteristic sequence after the first coding and the characteristic sequence corresponding position after the second coding not
With characteristic character number, i.e., the characteristic sequence after the first coding is transformed into replacement required for the characteristic sequence after the second coding
Number, the Hamming distance can be used for measuring characteristic sequence after characteristic sequence and the second coding after the first coding etc. this two
The absolute consistency of a sequence corresponding position.
Obtaining editing distance d1, Euclidean distance d2With Hamming distance d3It afterwards, can be to editing distance, Euclid
Distance and Hamming distance are normalized respectively, wherein due to obtaining editing distance d1, Euclidean distance d2And Hamming distance
d3Deng may be larger, the similarity of subsequent determining audio for convenience, therefore can be to obtaining editing distance d1, Euclid away from
From d2With Hamming distance d3Etc. being normalized, which, which refers to, returns editing distance, Euclidean distance and Hamming distance etc.
One changes in the range of 0~1.For example, can be according to following formula (7) to editing distance d1It is normalized, is normalized
Postedit distance is D1;To Euclidean distance d2It is normalized, Euclidean distance is D after being normalized2;To Hamming
Distance d3It is normalized, Hamming distance is D after being normalized3, normalization postedit distance is D1, in Europe is several after normalization
Obtaining distance is D2, i.e. normalize after Hamming distance be D3As similarity distance.
In step s105, the similarity between audio to be detected and benchmark audio is determined according to similarity distance.
In some embodiments, determine that the similarity between audio to be detected and benchmark audio can be with according to similarity distance
It include: the affine function constructed in editing distance, Euclidean distance and Hamming distance between each distance and sub- similarity;According to
It is respectively determined respectively apart from corresponding affine function each apart from corresponding sub- similarity;According to sub- similarity determine audio to be detected and
Similarity between benchmark audio.
Wherein, establish similarity about the affine function of similarity distance can refer to will normalization obtain editing distance,
Euclidean distance and Hamming distance establish both independent variable and dependent variable using similarity as dependent variable as independent variable
Between mapping relations.Can use affine function by after normalization editing distance, Euclidean distance and Hamming distance it is true
Make the sub- similarity being normalized into 0~100 range.
The affine function in editing distance, Euclidean distance and Hamming distance between each distance and sub- similarity is constructed,
Establish sub- similarity and editing distance D1Between the first affine function be F (D1), shown in the following formula of expression formula (8);It establishes
Sub- similarity and Euclidean distance D2Between the second affine function be F (D2), shown in the following formula of expression formula (10);It builds
Found sub- similarity and Hamming distance D3Between third affine function be F (D3), shown in the following formula of expression formula (12).
Wherein, the n in formula (8)1To n8And n10To n44Value can carry out flexible setting according to actual needs,
For example, n1To n8And n10To n44After taking analog value, available first affine function is F (D1) as shown in formula (9).
Wherein, the c in formula (10)1To c4Value can carry out flexible setting according to actual needs, for example, c1To c4
After taking analog value, available second affine function is F (D2) as shown in formula (11).
Wherein, the m in formula (12)1To m6And m10To m36Value can carry out flexible setting according to actual needs,
For example, m1To m6And m10To m36After taking analog value, available third affine function is F (D3) as shown in formula (13).
Obtaining editing distance D1Corresponding first affine function is F (D1), Euclidean distance D2Corresponding second is affine
Function is F (D2) and Hamming distance D3Corresponding third affine function is F (D3) after, it can be F according to the first affine function
(D1) determine editing distance D1Corresponding first sub- similarity is F (D according to the second affine function2) determine Euclidean distance D2
Corresponding second sub- similarity, and according to third affine function be F (D3) determine Hamming distance D3Corresponding third is similar
Degree, can determine audio to be detected and reference note according to the first sub- similarity, the second sub- similarity and the sub- similarity of third at this time
Similarity between frequency.
It should be noted that when determining sequence similarity, in addition to editing distance, Euler's distance and Hamming distance can be used
From come except determining, can also be determined using alignment algorithms such as dynamic time warping or Longest Common Substrings audio to be detected and
Similarity between benchmark audio.
In some embodiments, determine that the similarity between audio to be detected and benchmark audio can be with according to sub- similarity
It include: the first weighted value to be set for the sub- similarity of editing distance, and the second weighted value is set for the sub- similarity of Hamming distance;
Penalty term is set by the sub- similarity of Euclidean distance;According to the first weighted value, the second weighted value and penalty term, determine to
Detect the similarity between audio and benchmark audio.
For example, since editing distance overcomes pronunciation length or pause etc., and the characteristic with strong antijamming capability, because
This can determine component for editing distance as most important similarity;Since Hamming distance has for measures characteristic sequence
The characteristic of absolute consistency, therefore component can be determined using Hamming distance as the similarity of auxiliary;Due to Euclidean distance
The geometric distance of measures characteristic sequence, the characteristic of the difference of prominent features sequence, therefore can be using Euclidean distance as phase
Like the determining penalty term of degree.At this point it is possible to which the first weighted value, and the son for Hamming distance is arranged in the sub- similarity for editing distance
The second weighted value is arranged in similarity, and sets penalty term for the sub- similarity of Euclidean distance, wherein the first weighted value and
The value of second weighted value can carry out flexible setting according to actual needs, then according to the first weighted value, the second weighted value and
Penalty term determines that the similarity between audio to be detected and benchmark audio, calculation formula can be as follows:
Wherein, SimilarityDegree indicates similarity, and N indicates the dimension in characteristic sequence comprising feature, for example, N
Value can be 6, the corresponding similarity of characteristic sequence and be averaged after determining 6 dimension codings respectively, obtain audio to be detected
Similarity testing result between benchmark audio, R1Indicate the first weighted value, R2Indicate the second weighted value, R1And R2Value
Flexible setting can be carried out according to actual needs, for example, R1Value can be 0.7, R2Value can be 0.3, at this time can be with
The calculation formula for obtaining similarity can be as follows:
Similarity determination can refer to unified editing distance, Euclidean distance and Hamming distance in calculating formula of similarity
From, and the similarity being normalized into 0~100 range is determined according to this three's distance value.
In some embodiments, the step of the similarity between audio to be detected and benchmark audio is determined according to similarity distance
After rapid, audio similarity detection method can also include: when the similarity between audio to be detected and benchmark audio is greater than in advance
If when similarity threshold, executing virtual resource transfer operation, and/or show the correlation of the similarity testing result of audio to be detected
Information.
For example, by taking K sings red packet as an example, relating generally to original singer's broadcasting, user sings, examines in the application scenarios of song scoring
Survey the similarity of original singer's audio and audio user, similarity rank and get red packet etc..Specifically, firstly, user can choose
Carrier of one section of original singer's audio as red packet, after user clicks the red packet, user can click " audition " button, generate and play
Instruction, audio similarity detection device can play original singer's audio based on the play instruction, and user can listen to original singer's audio, or
Person user can also click directly on " starting to sing " button, generate acquisition instructions, and user can follow accompaniment to imitate original singer at this time
It sings, audio similarity detection device can acquire audio user based on acquisition instructions.Then the use that can be will acquire
Family audio is as audio to be detected, and using original singer's audio as benchmark audio, respectively successively to audio user and original singer's audio
It is pre-processed, the audio accompaniment that extracts in spectrum signature, original singer's audio and audio user weakens, characteristic filter and sieve
Choosing, similarity distance metric, establishes affine function and similarity of the similarity about distance metric at extension Manchester's code
Determine etc., obtain the similarity between original singer's audio and audio user.When similarity is greater than default similarity threshold, Yong Huke
To get the red packet, i.e., triggering audio similarity detection apparatus executes virtual resource transfer operation and (optimizes user and get red packet i.e.
For audio similarity detection device execute virtual resource transfer operation) and audio similarity detection device can show it is red
The relevant information of the similarities testing result such as volume covered with gold leaf and the song of user;When similarity is less than or equal to default similarity
When threshold value, user cannot get the red packet, and the relevant information for the similarities testing result such as prompt user's music composition for two or more, can move back at this time
Red packet interface out, and audio user can be switched into one section of speech message with grading, the content of the speech message can be
The audio that user follows accompaniment to sing;Etc..
In some embodiments, the step of the similarity between audio to be detected and benchmark audio is determined according to similarity distance
After rapid, audio similarity detection method can also include: when the similarity between audio to be detected and benchmark audio is greater than in advance
If when similarity threshold, audio lock operation is unlocked in execution.
For example, the available benchmark audio prerecorded is locked as sound, when audio phase in the application scenarios of sound lock
When being not used like degree detection device, it is in locking-in state, when needing to unlock, it is to be detected that user can imitate the generation of benchmark audio
Then audio successively pre-processes audio to be detected, extracts spectrum signature, characteristic filter and screening, extension Manchester
Coding, similarity distance metric, establish similarity about distance metric affine function and similarity determine etc., obtain to be checked
Similarity between acoustic frequency and benchmark audio.When similarity is greater than default similarity threshold, audio lock operation is unlocked in execution;
When similarity is less than or equal to default similarity threshold, do not unlock, can also show at this time unlock failure, audio to be detected and
The prompt informations such as the similarity between benchmark audio.
For example, the terminals such as mobile phone, smartwatch, smart television or computer (i.e. audio similarity detection device) are not used
When be in screen lock state, when needing to unlock, user can against terminal imitate benchmark audio, at this time terminal can collect to
Audio is detected, when the similarity between audio to be detected and benchmark audio is greater than default similarity threshold, terminal can be executed
Unlock operation opens terminal, and enters display interface.Either, when terminal is in the open state, A is applied when needing to open
When, user can imitate benchmark audio against terminal, and terminal can collect audio to be detected at this time, when audio to be detected and base
When similarity between quasi- audio is greater than default similarity threshold, terminal can execute unlatching and operate using A.Either, work as sound
When frequency similarity detection apparatus is gate inhibition, when needing to unlock gate inhibition, user can imitate benchmark audio against gate inhibition, at this time door
Taboo can collect audio to be detected, when the similarity between audio to be detected and benchmark audio is greater than default similarity threshold
When, it can be with opening gate;Etc..
It can stablize in the embodiment of the present invention and the similarity between accurate detection audio to be detected and benchmark audio, it should
Similarity testing result is less to be influenced by disturbing factors such as audio accompaniment, environmental noise, individual and gender differences, that is, is overcome
Influence due to audio accompaniment, environmental noise, body and gender differences difference etc. to similarity result, solve user only with accompaniment or
The problem of original singer obtains high similarity is played, no matter has cappela that the similarity of audio to be detected and benchmark audio is used equally for examine
It surveys, stability is good, and similarity testing result accuracy is higher.
From the foregoing, it will be observed that the available audio to be detected of the embodiment of the present invention, and filtered out completely from the audio to be detected
The audio of sufficient preset condition, and obtain according to the audio filtered out the characteristic sequence of audio to be detected, so as to will be to be detected
Interference tones in audio are filtered and filter out required audio frequency characteristics, and obtain the reference characteristic sequence of benchmark audio
Column;Then, the similarity distance between the characteristic sequence for obtaining audio to be detected, and the reference characteristic sequence of benchmark audio, such as
Editing distance, Euclidean distance and Hamming distance etc., the similarity distance can reduce many factors to similarity testing result
Influence, the similarity between audio to be detected and benchmark audio can be determined according to similarity distance at this time, improves audio phase
Like the accuracy of degree detection.
Citing, is described in further detail by the method according to described in above-described embodiment below.
For the present embodiment by taking audio similarity detection device is terminal as an example, the available terminal includes original singer's audio and companion
The benchmark audio of audio is played, and obtains the audio to be detected that user records, then successively to benchmark audio and audio to be detected
Carry out S1 pretreatment, S2 extracts spectrum signature, the audio accompaniment in S3 original singer audio and audio user weakens, S4 feature mistake
Filter and screening, S5 extension Manchester's code, S6 similarity distance metric, S7 establish affine letter of the similarity about distance metric
Several and S8 similarity calculation etc., obtains the similarity between benchmark audio and audio to be detected, as shown in figure 3, secondly judgement should
Whether similarity is greater than default similarity threshold, when the similarity is greater than default similarity threshold, can execute virtual resource
Transfer operation, and the relevant information etc. of display similarity testing result.
Referring to Fig. 4, Fig. 4 is the flow diagram of audio similarity detection method provided in an embodiment of the present invention.The party
Method process may include:
S201, terminal obtain audio to be detected, are successively sampled to audio to be detected, framing and the pretreatment of adding window,
Obtain pretreated audio.
The audio of the available user's recording song of terminal is as audio to be detected, for example, as shown in figure 5, user A is selected
Carrier of one section of original singer's audio as red packet, such as the K of XXX sing red packet, after user clicks the red packet, can choose click
" audition " button listens original singer's audio, and play instruction can be generated in activation " audition " button, and terminal can be broadcast based on the play instruction
Original singer's audio is put, at this point, can show audition progress and lyrics etc. in display interface, or " starting to sing " is clicked directly on and presses
Button generates acquisition instructions, and user can follow accompaniment to imitate original singer's audio and sing at this time, and terminal can be based on acquisition instructions
Audio user is acquired, audio to be detected is obtained.
Audio to be detected is screened for convenience, audio to be detected can be pre-processed, comprising: utilizes satisfaction
The sampling policy of nyquist sampling law is 44100HZ or other sample frequencys etc. to audio to be detected by sample frequency
It is sampled, the audio after being sampled.Then, the framing length used is moved for 512 or 1024 sampled points etc. and frame
The a half or thirds etc. for taking frame length carry out sub-frame processing to the audio after sampling, the audio after obtaining framing.It at this time can be with
Windowing process is carried out to the audio after framing using Hamming window function, rectangular window function or hamming window function etc., when obtaining discrete
The pretreated audio in domain.
For example, as shown in Fig. 6 (a) to 6 (d), wherein it may include original singer's audio and audio accompaniment in benchmark audio, to
Detection audio can be user male audio or user female's audio, Fig. 6 (a) can be original singer's audio is pre-processed after obtain
Initial time domain sample graph, Fig. 6 (b) can be audio accompaniment is pre-processed after obtained initial time domain sample graph, Fig. 6 (c)
Can be the initial time domain sample graph obtained after pre-processing to user's male's audio, Fig. 6 (d) can be to user female's audio into
The initial time domain sample graph obtained after row pretreatment.
S202, terminal carry out Fourier transformation to pretreated audio and obtain frequency spectrum, and are determined and pre-processed according to frequency spectrum
The energy spectrum of audio afterwards.
Terminal can carry out the Short Time Fourier Transform of 2048 points or 1024 points etc. to pretreated audio, obtain
The corresponding frequency spectrum of each frame audio, can be generated spectrum signature figure according to the frequency spectrum, then to pretreatment in pretreated audio
The frequency spectrum modulus square of audio afterwards, obtains the corresponding energy spectrum of pretreated audio, which can be every frame audio
Matrix composed by the energy size being distributed in each frequency.
For example, as shown in Fig. 7 (a) to 7 (d), wherein it may include original singer's audio and audio accompaniment in benchmark audio, to
Detection audio can be user male audio or user female's audio, and Fig. 7 (a) can be to be obtained to after original singer's audio progress Fourier transformation
The spectrum signature figure arrived, Fig. 7 (b), which can be, carries out the spectrum signature figure obtained after Fourier transformation, Fig. 7 (c) to audio accompaniment
Can be and obtained spectrum signature figure after Fourier transformation carried out to user's male's audio, Fig. 7 (d) can be to user female's audio into
The spectrum signature figure obtained after row Fourier transformation.
For example, as shown in figure 8, terminal can pass through audio user so that pretreated audio is audio user as an example
Then the Short Time Fourier Transform of 2048 points extracts the energy spectrum of the audio user, can be based on the energy spectrum so as to subsequent
Carry out characteristic filter and screening etc..
S203, terminal obtain the intensity of sound of audio to be detected according to energy spectrum, and screen and speak from audio to be detected
Loudness of a sound degree is greater than the audio of preset threshold, and the characteristic sequence of audio to be detected is obtained according to the audio filtered out.
In order to filter out the lower interference tones of intensity of sound, terminal can be based on the energy of pretreated audio
Spectrum, filters out the audio that intensity of sound meets preset condition from audio to be detected.For example, as shown in figure 9, terminal first can be with
The eigenmatrix S of energy spectrum is standardized as intensity of sound matrix P, then judges that each intensity of sound is in intensity of sound matrix P
It is no to be greater than preset threshold, and the intensity of sound zero setting of preset threshold will be less than or equal to, and will be greater than the sound of preset threshold
Intensity filters out the intensity of sound greater than preset threshold, secondly, by big by (extracting the intensity of sound greater than preset threshold)
In preset threshold intensity of sound according to intensity of sound from being ranked up to small greatly, finally from the intensity of sound matrix after sequence
The frequency sequence for filtering out maximum preceding 6 dimension intensity of sound, obtains the characteristic sequence of audio to be detected.
Specifically, terminal can convert intensity of sound matrix P for energy spectrum matrix S according to above-mentioned formula (1), at this time may be used
To filter out the audio that intensity of sound is greater than preset threshold from audio to be detected, so as to the lower interference of intensity of sound
Filtered audio falls, which can carry out flexible setting according to actual needs, and specific value is not construed as limiting here.
Optionally, the intensity of sound of audio to be detected can be normalized into preset sound strength range by terminal, obtain sound
The intensities normalised audio of sound filters out the audio that intensity of sound is greater than preset threshold from intensity of sound standardized audio, obtains
Intensity of sound meets the audio of preset condition.
For example, the intensity of sound P of audio to be detected can be normalized into 0~80db according to above-mentioned formula (2) by terminal, symbol
The Auditory Perception range for closing people, can set the preset threshold of intensity of sound at this time, can will be in intensity of sound standardized audio
Lower than the intensity of sound zero setting of preset threshold, screening for preset threshold will be higher than in intensity of sound standardized audio, obtained
Intensity of sound meets the audio of preset condition, due in audio to be detected accompaniment and background sound etc. be all interference tones, setting
The preset threshold can rationally filter interference tones.
At this point, terminal can be ranked up the audio filtered out according to intensity of sound from big to small, sound after being sorted
Frequently, it from extracting the maximum preset audio of intensity of sound after sequence in audio, and is extracted from the frequency matrix of preset audio
Default dimension frequency sequence, obtains the characteristic sequence of audio to be detected.For example, sextuple maximum sound is strong before extracting every frame audio
The frequency sequence of degree, the frequency sequence are the characteristic sequence of finally obtained audio to be detected.
Since audio to be detected itself has the characteristics that pause or power, individual features also have length in time domain and frequency domain
Or the differentiation of size can be according to energy size to frequency spectrum by pre-processing to audio to be detected in the embodiment of the present invention
Feature is filtered and sorts, and filters out preceding 6 dimension maximum feature of energy etc., is produced so as to reduce subsequent determining similarity
Raw error.
S204, terminal obtain the root mean square average energy value that audio and audio accompaniment are sung in benchmark audio Central Plains, Yi Jiyuan respectively
Sing the energy spectrum of audio and audio accompaniment.
The benchmark audio may include original singer's audio and audio accompaniment, which can be obtains from server, or
The song that person prerecords.Terminal can obtain the energy spectrum that audio and audio accompaniment are sung in benchmark audio Central Plains respectively, optionally,
Terminal can pre-process benchmark audio, benchmark audio after being pre-processed, comprising: according to default sampling policy to benchmark
Audio is sampled, benchmark audio after being sampled;Sub-frame processing is carried out to benchmark audio after sampling according to default framing strategy,
Obtain benchmark audio after framing;Windowing process is carried out to benchmark audio after framing, obtains reference note after the pretreatment of discrete time-domain
Frequently.Then, after the available pretreatment of terminal benchmark audio energy spectrum, comprising: to after pretreatment benchmark audio carry out Fu in
Leaf transformation, the corresponding frequency spectrum of benchmark audio after being pre-processed;The energy spectrum of benchmark audio after pretreatment is determined according to frequency spectrum.
Terminal can obtain the root mean square average energy value that audio and audio accompaniment are sung in benchmark audio Central Plains respectively, can wrap
It includes: determining the first root mean square energy of original singer's audio, and determine the second root mean square energy of audio accompaniment, for example, for example, can
To determine the first root mean square average energy value of original singer's audio and the second root mean square energy of audio accompaniment according to above-mentioned formula (3)
Measure mean value;Then the first frame number and the first frame length of original singer's audio are obtained, and obtains the second frame number and second of audio accompaniment
Frame length;The root mean square average energy value of original singer's audio, Yi Jigen are determined according to the first root mean square energy, the first frame number and the first frame length
The root mean square average energy value of audio accompaniment is determined according to the second root mean square energy, the second frame number and the second frame length.
For example, as shown in figure 8, benchmark audio includes original singer's audio and audio accompaniment, terminal can respectively by original singer's audio and
Audio accompaniment passes through the Short Time Fourier Transform of 2048 points, then extracts the energy spectrum of original singer's audio and audio accompaniment respectively,
Secondly, determining the root mean square average energy value of original singer's audio and audio accompaniment respectively, and determine that the root mean square energy of original singer's audio is equal
Ratio between value and the root mean square average energy value of audio accompaniment, finally, the ratio can be subtracted with the energy spectrum of original singer's audio
Audio accompaniment energy spectrum, so as to obtain audio accompaniment weaken after benchmark audio, so as to it is subsequent can be to accompaniment tone
Benchmark audio after frequency weakens carries out characteristic filter and screening etc., obtains characteristic sequence.
S205, terminal root mean square average energy value and energy spectrum based on original singer's audio and audio accompaniment, by audio accompaniment into
Row weakens, the benchmark audio after being optimized, and obtains the reference characteristic sequence of the benchmark audio after optimization.
After obtaining the root mean square average energy value and energy spectrum of original singer's audio and audio accompaniment, terminal, which can determine, sings audio
Root mean square average energy value and audio accompaniment root mean square average energy value between ratio, then utilize original singer's audio energy spectrum
The energy spectrum of the audio accompaniment of the ratio is subtracted, to be optimized to benchmark audio, benchmark audio after being optimized, the optimization
Benchmark audio afterwards, which can be, weakens audio accompaniment, enhances the eigenmatrix of original singer's audio.
For example, as shown in Figure 10 (a) to Figure 10 (d), wherein may include original singer's audio and accompaniment tone in benchmark audio
Frequently, audio to be detected can be user male audio or user female's audio, and Figure 10 (a), which can be original singer's audio, to be passed through to audio accompaniment
Weaken and characteristic filter after obtained spectrum signature figure, Figure 10 (b) can be obtains audio accompaniment after characteristic filter
Spectrum signature figure, Figure 10 (c) can be the spectrum signature figure obtained after characteristic filter to user's male's audio, and Figure 10 (d) can
To be the spectrum signature figure obtained after characteristic filter to user female's audio.
The reference characteristic sequence of benchmark audio after available optimization at this time, for example, can be from the reference note after optimization
The target audio that intensity of sound is greater than preset threshold is filtered out in frequency, it is alternatively possible to by the sound of the benchmark audio after optimization
It is intensities normalised to arrive preset sound strength range, obtain intensity of sound standardized benchmark audio;From intensity of sound standardized benchmark
The target audio that intensity of sound is greater than preset threshold is filtered out in audio, and benchmark audio is obtained according to the target audio filtered out
Reference characteristic sequence arranged for example, can be ranked up from big to small to the target audio filtered out according to intensity of sound
Target audio after sequence;From extracting the maximum preset audio of intensity of sound after sequence in target audio, and from the frequency of preset audio
Default dimension frequency sequence is extracted in rate matrix, obtains the reference characteristic sequence of benchmark audio.So as to according to original singer's audio
It is relatively strong and weak with audio accompaniment, the audio accompaniment in benchmark audio is weakened, original singer's audio is enhanced, so as to it is subsequent can
Precisely to detect the similarity between benchmark audio and audio to be detected.
For example, may include 6 in the reference characteristic sequence of obtained benchmark audio as shown in Figure 11 (a) to Figure 11 (c)
The characteristic sequence of original singer's audio is tieed up, may include the feature of 6 Wesy's family male's audios in the characteristic sequence of obtained audio to be detected
The characteristic sequence of sequence or 6 Wesy's family female's audios, wherein Figure 11 (a) can be the first dimensional feature sequence of original singer's audio, other
5 dimensional feature sequences are not shown;Figure 11 (b) can be the first dimensional feature sequence of user's male's audio, other 5 dimensional feature sequences are not shown
Out;Figure 11 (c) can be the first dimensional feature sequence of user female's audio, other 5 dimensional feature sequences are not shown.
S206, terminal are special to the characteristic sequence of audio to be detected and the benchmark of benchmark audio using extension Manchester's code
Sign sequence is encoded, characteristic sequence after being encoded.
It, can characteristic sequence and reference note to audio to be detected in order to improve the accuracy and stability that similarity determines
The reference characteristic sequence of frequency is encoded, for example, the coding rule that can use extension Manchester's code is encoded: if special
Two neighboring characteristic value changes from low to high in sign sequence, then is encoded to " 1 ";If two neighboring characteristic value is kept in characteristic sequence
It is constant, then it is encoded to " 0 ";If two neighboring characteristic value changes from high to low in characteristic sequence, it is encoded to " -1 ".
For example, can since in the characteristic sequence of audio to be detected be located at primary characteristic value, first can will
It is 0 positioned at primary Coding pattern features, then, primary characteristic value will be located at and compared with deputy characteristic value is located at
Compared with either, can not being encoded to primary characteristic value is located at, directly will be located at primary characteristic value and be located at the
Two characteristic values are compared.When primary characteristic value is less than deputy characteristic value, it is encoded to " 1 ", and, when the
When one characteristic value is equal to deputy characteristic value, it is encoded to " 0 ";And when primary characteristic value is greater than deputy
When characteristic value, it is encoded to " -1 ".Further, deputy characteristic value will be located to compare with the characteristic value for being located at third position
Compared with, and so on, it finishes, obtains to be checked until each adjacent two characteristic value in the characteristic sequence of audio to be detected is compared
Characteristic sequence after the corresponding coding of acoustic frequency.
Likewise, terminal can be according to the coding rule of the extension Manchester's code to the reference characteristic sequence of benchmark audio
Column are encoded, and characteristic sequence after the corresponding coding of benchmark audio is obtained.
For example, may include 6 dimension original singers in characteristic sequence after the coding of benchmark audio as shown in Figure 12 (a) to Figure 12 (c)
The coded sequence of audio may include the coded sequence or 6 of 6 Wesy's family male's audios in characteristic sequence after the coding of audio to be detected
The coded sequence of Wesy's family female's audio, wherein Figure 12 (a) can be the first dimension coded sequence of original singer's audio, other 5 dimension codings
Sequence is not shown;Figure 12 (b) can be the first dimension coded sequence of user's male's audio, other 5 dimension coded sequences are not shown;Figure 12
(c) it can be the first dimension coded sequence of user female's audio, other special coded sequences of 5 dimension are not shown.
Since audio to be detected or benchmark audio are easy by individual difference and Effect of gender, for example, female voice is relative to male
The frequency of sound is higher, and different people is different in the base frequency for sending out phone same, pronunciation length also difference etc., therefore uses extension
Manchester's code encodes the characteristic sequence of audio to be detected and the reference characteristic sequence of benchmark audio, is compiled by determining
The similitude of characteristic sequence characterizes the similarity between audio to be detected and benchmark audio after code, eliminates audio accompaniment, a
Influence of the disturbing factors such as body and gender differences to similarity testing result accuracy.
S207, terminal determine after the coding of audio to be detected after characteristic sequence and the coding of benchmark audio between characteristic sequence
Editing distance, Euclidean distance and Hamming distance.
Wherein, editing distance can refer to characteristic sequence after the coding for audio to be detected and benchmark audio, by be checked
Characteristic sequence is converted into minimum edit operation times needed for characteristic sequence after the coding of benchmark audio after the coding of acoustic frequency.It compiles
Volume distance is bigger, illustrates that different characteristic is more between characteristic sequence after the coding of audio to be detected and benchmark audio, conversely, editor
Apart from smaller, illustrate that different characteristic is fewer between characteristic sequence after the coding of audio to be detected and benchmark audio, the edit operation
It may include that a characteristic character is substituted for one another characteristic character, one characteristic character of insertion and deletion tagged word
Symbol etc., this feature character can be " 1 ", " 0 " or " -1 " etc. that coding obtains.Determine characteristic sequence after the coding of audio to be detected
And the editing distance after the coding of benchmark audio between characteristic sequence, feature sequence is converted after as determining the coding of audio to be detected
At minimum edit operation times needed for characteristic sequence after the coding of benchmark audio, acoustic to be checked can be measured using editing distance
Similitude whole between characteristic sequence after characteristic sequence and the coding of benchmark audio after the coding of frequency reduces pronunciation length not
The influence that similarity is determined with caused alignment problem etc..
Characteristic sequence is in Euclid's sky after Euclidean distance can refer to the coding of audio to be detected and benchmark audio
Between middle point-to-point transmission linear distance, which can be used for measuring characteristic sequence and base after the coding of audio to be detected
Difference degree after the coding of quasi- audio between characteristic sequence can determine the coding of audio to be detected according to above-mentioned formula (6)
Euclidean distance between characteristic sequence after characteristic sequence and the coding of benchmark audio afterwards.
Hamming distance can refer to that corresponding position is different between characteristic sequence after the coding of audio to be detected and benchmark audio
Characteristic character number, i.e., characteristic sequence after the coding of audio to be detected is transformed into characteristic sequence institute after the coding of benchmark audio
The number for needing to replace, the Hamming distance can be used for measuring characteristic sequence and benchmark audio after the coding of audio to be detected
After coding between characteristic sequence corresponding position absolute consistency.
It, can be according to formula (7) to editing distance, Europe after obtaining editing distance, Euclidean distance and Hamming distance
Distance is obtained in several and Hamming distance is normalized respectively.
S208, terminal according to the affine function between editing distance, Euclidean distance and Hamming distance and sub- similarity,
It determines each apart from corresponding sub- similarity, and is determined according to sub- similarity similar between audio to be detected and benchmark audio respectively
Degree.
For example, terminal can construct in editing distance, Euclidean distance and Hamming distance each distance and sub- similarity it
Between affine function, according to it is each determine respectively apart from corresponding affine function it is each apart from corresponding sub- similarity, it is similar according to son
Degree determines the similarity between audio and benchmark audio to be detected.
Wherein, establish similarity about the affine function of similarity distance can refer to will normalization obtain editing distance,
Euclidean distance and Hamming distance establish both independent variable and dependent variable using similarity as dependent variable as independent variable
Between mapping relations, can use affine function by after normalization editing distance, Euclidean distance and Hamming distance it is true
Make the sub- similarity being normalized into 0~100 range.
For example, establishing sub- similarity and editing distance D1Between the first affine function be F (D1) can be such as above-mentioned formula
(8) shown in, sub- similarity and Euclidean distance D are established2Between the second affine function be F (D2) can be such as above-mentioned formula
(10) shown in, similarity and Hamming distance D are established3Between third affine function be F (D3) can be such as above-mentioned formula (12) institute
Show.Obtaining editing distance D1Corresponding first affine function is F (D1), Euclidean distance D2Corresponding second affine function
For F (D2) and Hamming distance D3Corresponding third affine function is F (D3) after, it can be F (D according to the first affine function1) really
Determine editing distance D1Corresponding first sub- similarity is F (D according to the second affine function2) determine Euclidean distance D2It is corresponding
Second sub- similarity, and according to third affine function be F (D3) determine Hamming distance D3The corresponding sub- similarity of third, at this time
Audio to be detected can be determined according to the first sub- similarity, the second sub- similarity and the sub- similarity of third according to above-mentioned formula (14)
Similarity between benchmark audio.
For example, due to editing distance can be used for solving pronounce length or pause etc., can using editing distance as
Most important similarity determines component;It, can be with since Hamming distance can be used for the absolute consistency of measures characteristic sequence
Component is determined using Hamming distance as the similarity of auxiliary;Due to the geometric distance and difference of Euclidean distance measures characteristic sequence
The opposite sex, therefore the penalty term that Euclidean distance can be determined as similarity.At this point it is possible to which the son for editing distance is similar
Degree the first weighted value of setting, and it is that the second weighted value is arranged in the sub- similarity of Hamming distance, and by the sub- phase of Euclidean distance
It is set as penalty term like degree, audio to be detected and reference note are then determined according to the first weighted value, the second weighted value and penalty term
Similarity between frequency.
S209, when similarity is greater than default similarity threshold, terminal executes virtual resource transfer operation, and display is to be checked
The relevant information of the similarity testing result of acoustic frequency.
After obtaining the similarity between audio to be detected and benchmark audio, it can be determined that it is default whether the similarity is greater than
Similarity threshold,
When similarity is greater than default similarity threshold, user can get red packet, i.e. triggering terminal executes virtual resource
Transfer operation, for example, as shown in figure 13, terminal can show the similarities testing result such as the red packet amount of money and the song of user
Relevant information;When similarity is less than or equal to default similarity threshold, user cannot get the red packet, and prompt user's weight
It sings: the relevant information of the similarities testing result such as " not bringing into play, try again ... ", for example, as shown in figure 14.At this time.
Red packet interface can be exited, and audio user can be switched into one section of speech message with grading, the content of the speech message
It can be the audio that user follows accompaniment to sing, for example, as shown in figure 15.
Audio to be detected can be sampled in the embodiment of the present invention, framing, adding window and extract the processing such as energy spectrum, with
And from the audio for filtering out intensity of sound after processing in audio and being greater than preset threshold, and obtained according to the audio that filters out to be detected
The characteristic sequence of audio, and obtain benchmark audio Central Plains and sing root mean square average energy value and energy spectrum of audio and audio accompaniment etc.
Optimize benchmark audio, and obtains the reference characteristic sequence of the benchmark audio after optimization;Then, to the feature sequence of audio to be detected
The reference characteristic sequence of column and benchmark audio is encoded, and characteristic sequence after the coding of audio to be detected is determined, with benchmark audio
Coding after the similarity distances such as editing distance, Euclidean distance and Hamming distance between characteristic sequence, at this time can basis
Similarity distance determines the similarity between audio to be detected and benchmark audio, so as to stable and accurate detection acoustic to be checked
Similarity between frequency and benchmark audio, the similarity testing result are less by audio accompaniment, environmental noise, individual and gender gap
The influence of different equal disturbing factors, improves the accuracy of audio similarity detection.
For convenient for better implementation audio similarity detection method provided in an embodiment of the present invention, the embodiment of the present invention is also mentioned
For a kind of device based on above-mentioned audio similarity detection method.The wherein meaning of noun and above-mentioned audio similarity detection method
In it is identical, specific implementation details can be with the explanation in pedestal method embodiment.
Figure 16 is please referred to, Figure 16 is the structural schematic diagram of audio similarity detection device provided in an embodiment of the present invention,
In the audio similarity detection device may include audio acquiring unit 301, screening unit 302, feature acquiring unit 303, away from
From acquiring unit 304 and determination unit 305 etc..
Wherein, audio acquiring unit 301, for obtaining audio to be detected.
Audio acquiring unit 301 can obtain user under the scene that song scores and sing a song as acoustic to be checked
Frequently, the audio etc. for or under the scene of sound lock obtaining user's one section of word of recording is obtained as audio to be detected etc., such as audio
Unit 301 can use the sound that sample rate is 16KHZ or the audio data format acquisition user of other sample rates speaks or sings
Frequency is used as audio to be detected, obtains audio to be detected and can be code rate to be 16bit or the continuous P CM signal of other code rates.
Screening unit 302, for filtering out the audio for meeting preset condition from audio to be detected, and according to filtering out
Audio obtains the characteristic sequence of audio to be detected.
In some embodiments, as shown in figure 17, screening unit 302 may include:
It handles subelement 3021 and obtains pretreated audio for pre-processing to audio to be detected;
Subelement 3022 is obtained, for obtaining the energy spectrum of pretreated audio;
Subelement 3023 is screened, for filtering out from pretreated audio and meeting preset condition according to energy spectrum
Audio, and set the corresponding frequency sequence of the audio filtered out to the characteristic sequence of the audio to be detected.
Firstly, screening for convenience to audio to be detected, processing subelement 3021 can be carried out audio to be detected
Pretreatment, in some embodiments, processing subelement 3021 be specifically used for: according to default sampling policy to audio to be detected into
Row sampling, the audio after being sampled;Sub-frame processing is carried out to the audio after sampling according to default framing strategy, after obtaining framing
Audio;Windowing process is carried out to the audio after framing, obtains discrete pretreated audio.
Specifically, processing subelement 3021 successively audio to be detected can be sampled, the processing such as framing and adding window,
For example, can according to default sampling policy using sample frequency be 44100HZ or other sample frequencys etc., to audio to be detected into
Row sampling, the audio after being sampled, the default sampling policy can be the sampling policy for meeting nyquist sampling law.So
It afterwards, is the half of frame length for 512 or 1024 sampled points and frame shifting according to the default framing strategy such as framing length that uses
Or one third etc., sub-frame processing is carried out to the audio after sampling, the audio after obtaining framing can then use Hamming window letter
Number, rectangular window function or hamming window function etc. carry out windowing process to the audio after framing, obtain discrete pretreated sound
Frequently.
Then, it obtains the energy spectrum that subelement 3022 obtains pretreated audio and obtains son in some embodiments
Unit 3022 is specifically used for: carrying out integral transformation to pretreated audio, obtains the corresponding frequency spectrum of pretreated audio;Root
The energy spectrum of pretreated audio is determined according to frequency spectrum.
For example, 2048 points or 1024 points etc. can be carried out in short-term to pretreated audio by obtaining subelement 3022
Integral transformation obtains the corresponding frequency spectrum of each frame audio in pretreated audio, then takes to the frequency spectrum of pretreated audio
Mould square, obtains the corresponding energy spectrum of pretreated audio, which can be what every frame audio was distributed in each frequency
Matrix composed by energy size.
Secondly, in order to filter out the lower interference tones of intensity of sound, it can be based on the energy of pretreated audio
Spectrum, filters out the audio that intensity of sound meets preset condition from audio to be detected, in some embodiments, screens subelement
3023 may include:
Module is obtained, for obtaining the intensity of sound of audio to be detected according to energy spectrum;
Screening module, the audio for being greater than preset threshold for filtering out intensity of sound from audio to be detected, obtains sound
Intensity meets the audio of preset condition.
For example, the intensity of sound of audio to be detected can be determined according to above-mentioned formula (1) by obtaining module, mould is screened at this time
Block can filter out the audio that intensity of sound is greater than preset threshold from audio to be detected, obtain intensity of sound and meet preset condition
Audio, so as to which the lower interference tones of intensity of sound are filtered out, which can carry out according to actual needs
Flexible setting, specific value are not construed as limiting here.
In some embodiments, screening module is specifically used for: the intensity of sound of audio to be detected being normalized into default
Intensity of sound range obtains intensity of sound standardized audio;Intensity of sound is filtered out from intensity of sound standardized audio to be greater than
The audio of preset threshold obtains the audio that intensity of sound meets preset condition.
For example, the intensity of sound P of audio to be detected can be normalized into 0~b points according to above-mentioned formula (2) by screening module
Shellfish (db) meets the Auditory Perception range of people, can will be lower than the sound of preset threshold in intensity of sound standardized audio at this time
Intensity zero setting will be higher than screening for preset threshold in intensity of sound standardized audio, obtain intensity of sound and meet default item
The audio of part, due in audio to be detected accompaniment and background sound etc. be all interference tones, the preset threshold is arranged can be to dry
Audio is disturbed rationally to be filtered.
In some embodiments, screening subelement 3023 can will sieve after filtering out and meeting the audio of preset condition
The corresponding frequency sequence of the audio selected is set as the characteristic sequence of audio to be detected, for example, can to the audio filtered out by
It is ranked up from big to small according to intensity of sound, audio after being sorted;From maximum acoustic intensity is extracted after sequence in audio
Audio, the corresponding frequency sequence of the audio of maximum acoustic intensity are exactly the characteristic sequence of audio to be detected.
For example, the audio filtered out can be ranked up by sorting subunit from big to small by intensity of sound, sorted
Audio afterwards, then, extract subelement from extracted in audio after sequence the maximum preset audio of intensity of sound (such as it is preceding 6 tie up most
The audio of loud noise intensity), and default dimension frequency sequence (such as 6 dimensions) are extracted from the frequency matrix of preset audio, such as
The frequency sequence of sextuple maximum intensity of sound before every frame audio is extracted, which is finally obtained acoustic to be checked
The characteristic sequence of frequency.
Do not carry out the processing of sufficient Feature Engineering compared with the existing technology, for example, audio frequency characteristics are not filtered and
The processing such as screening, and since audio to be detected itself has the characteristics that pause or power, individual features are in time domain and frequency domain
There is the characteristics of differentiation of length or size, be directed to audio to be detected in the embodiment of the present invention, has carried out at sufficient Feature Engineering
Reason, such as audio to be detected is pre-processed, energy spectrum is obtained, spectrum signature is filtered and is arranged according to energy size
Sequence, n dimension maximum feature of (such as n=6) energy etc. before filtering out, so as to reduce caused by subsequent determining similarity accidentally
Difference.
It needs, when there are such as audio accompaniment interference tones in audio to be detected, such as in audio to be detected
Including audio user and audio accompaniment, in order to improve the accuracy of subsequent determining similarity, screening unit 302 can be by accompaniment tone
Frequency is weakened.Optionally, during obtaining the characteristic sequence of audio to be detected, the available user of screening unit 302
The root mean square average energy value of audio, and obtain the root mean square average energy value of audio accompaniment;The energy spectrum of audio user is obtained, with
And obtain the energy spectrum of audio accompaniment;According to the energy spectrum of audio user, the root mean square average energy value of audio user, audio accompaniment
Root mean square average energy value and audio accompaniment energy spectrum, audio to be detected is optimized, audio to be detected after being optimized;
Obtain the characteristic sequence of audio to be detected after optimization.
Optionally, screening unit 302 can also determine the root mean square energy of audio user, and determine the equal of audio accompaniment
Root energy;The frame number and frame length of audio user are obtained, and obtains the frame number and frame length of audio accompaniment;According to audio user
Root mean square energy, frame number and frame length determine the root mean square average energy value of audio user, and the root mean square energy according to audio accompaniment
Amount, frame number and frame length determine the root mean square average energy value of audio accompaniment.It can be according to audio user and companion in the embodiment of the present invention
The relatively strong and weak of audio is played, audio accompaniment decrease has been carried out to audio to be detected, has enhanced audio user, therefore can precisely examine
Measure the similarity between benchmark audio and audio user.
Feature acquiring unit 303, for obtaining the reference characteristic sequence of benchmark audio.
The benchmark audio can be to be obtained from server, or prerecord, for example, in the applied field of song scoring
Scape can be upper downloading or the original singer's audio and audio accompaniment of prerecording a song from server as benchmark audio;?
The application scenarios of sound lock, available user record a segment of audio as benchmark audio (i.e. sound lock) etc. in advance.Reference note
The reference characteristic sequence of frequency may include that the frequency sequence etc. for meeting preset condition is filtered out from benchmark audio, the reference characteristic
Sequence, which can be, to be predefined good and is stored in local, or when needing to use reference characteristic sequence, is carried out to benchmark audio
What feature extraction obtained.
Optionally, after obtaining benchmark audio, it is default that feature acquiring unit 303 can filter out satisfaction from benchmark audio
The target audio of condition, and according to the reference characteristic sequence of the target audio acquisition benchmark audio filtered out.Optionally, feature obtains
Take unit 303 that can pre-process to benchmark audio, benchmark audio after being pre-processed;Obtain benchmark audio after pre-processing
Energy spectrum;The target audio for meeting preset condition, and the target sound that will be filtered out are filtered out from benchmark audio according to energy spectrum
Frequently corresponding frequency sequence is set as the reference characteristic sequence of benchmark audio.
Benchmark audio is screened for convenience, benchmark audio can be pre-processed, optionally, feature obtains single
Member 303 can sample benchmark audio according to default sampling policy, benchmark audio after being sampled;According to default framing plan
Sub-frame processing slightly is carried out to benchmark audio after sampling, obtains benchmark audio after framing;Benchmark audio after framing is carried out at adding window
Reason, obtains benchmark audio after the pretreatment of discrete time-domain.Optionally, feature acquiring unit 303 can be to reference note after pretreatment
Frequency carries out integral transformation, the corresponding frequency spectrum of benchmark audio after being pre-processed;Benchmark audio after pre-processing is determined according to frequency spectrum
Energy spectrum.Optionally, feature acquiring unit 303 can obtain the intensity of sound of benchmark audio according to energy spectrum;From benchmark audio
In filter out intensity of sound be greater than preset threshold audio, obtain the target audio that intensity of sound meets preset condition.Optionally,
The intensity of sound of benchmark audio can be normalized into preset sound strength range by feature acquiring unit 303, obtain intensity of sound
Standardized benchmark audio;The audio that intensity of sound is greater than preset threshold is filtered out from intensity of sound standardized benchmark audio, is obtained
Meet the target audio of preset condition to intensity of sound.
Optionally, feature acquiring unit 303 can carry out the target audio filtered out according to intensity of sound from big to small
Sequence, target audio after being sorted;From the maximum audio of intensity of sound is extracted after sequence in target audio, maximum acoustic is strong
The corresponding frequency sequence of the audio of degree is exactly the characteristic sequence of benchmark audio.Such as extract sextuple maximum sound before every frame audio
The frequency sequence of loudness of a sound degree, the frequency sequence are the characteristic sequence of finally obtained benchmark audio.Due to audio sheet to be detected
Body has the characteristics that pause or strong and weak, and individual features also have the differentiation of length or size in time domain and frequency domain, for be detected
The characteristics of audio, pre-processes benchmark audio, obtains energy spectrum, spectrum signature is filtered and is arranged according to energy size
Sequence, n dimension maximum feature of energy etc. before filtering out, so as to reduce error caused by subsequent determining similarity.
In some embodiments, as shown in figure 18, when in benchmark audio including target fiducials audio and interference tones,
Feature acquiring unit 303 may include:
Mean value obtains subelement 3031, for obtaining the first root mean square average energy value of target fiducials audio, and acquisition
Second root mean square average energy value of interference tones;
Energy spectrum obtains subelement 3032, for obtaining the first energy spectrum of target fiducials audio, and acquisition interference sound
Second energy spectrum of frequency;
Optimize subelement 3033, for equal according to the first energy spectrum, the first root mean square average energy value, the second root mean square energy
Value and the second energy spectrum, optimize benchmark audio, the benchmark audio after being optimized;
Feature obtains subelement 3034, for obtaining the reference characteristic sequence of the benchmark audio after optimizing.
In some embodiments, mean value obtain subelement 3031 be specifically used for: determine target fiducials audio first
Root energy, and determine the second root mean square energy of interference tones;Obtain the first frame number and first frame of target fiducials audio
It is long, and obtain the second frame number and the second frame length of interference tones;According to the first root mean square energy, the first frame number and the first frame length
Determine the first root mean square average energy value of target fiducials audio, and according to the second root mean square energy, the second frame number and the second frame
Long the second root mean square average energy value for determining interference tones.So as to according to the relatively strong of target fiducials audio and interference tones
It is weak, (such as audio accompaniment) has been carried out to the interference tones in benchmark audio and has been weakened, the target fiducials sound for comparing is enhanced
Frequently (such as original singer's audio), therefore can precisely detect the similarity between benchmark audio and audio to be detected.
Distance acquiring unit 304, the reference characteristic sequence for obtaining the characteristic sequence of audio to be detected, with benchmark audio
Between similarity distance.
In some embodiments, as shown in figure 19, distance acquiring unit 304 includes:
Coded sub-units 3041 are obtained for encoding according to characteristic sequence of the pre-arranged code strategy to audio to be detected
Characteristic sequence to after the first coding, and encoded according to reference characteristic sequence of the pre-arranged code strategy to benchmark audio,
Characteristic sequence after obtaining the second coding;
First determines subelement 3042, for determining the characteristic sequence after the first coding and the characteristic sequence after the second coding
Between similarity distance.
In some embodiments, coded sub-units 3041 are specifically used for: according to pre-arranged code strategy by audio to be detected
Characteristic sequence in, each adjacent two characteristic value carry out size comparison;When characteristic value previous in two neighboring characteristic value is less than
When later feature value, the characteristic sequence of audio to be detected is encoded to the first encoded radio, and, when in two neighboring characteristic value
When previous characteristic value is equal to later feature value, the characteristic sequence of audio to be detected is encoded to the second encoded radio;And when
When previous characteristic value is greater than later feature value in two neighboring characteristic value, the characteristic sequence of audio to be detected is encoded to the
Three encoded radios;Characteristic sequence after generating the first coding based on the first encoded radio, the second encoded radio and/or third encoded radio.
For the pre-arranged code strategy for extending Manchester's code, the coding rule of the extension Manchester's code can be with
Are as follows: if two neighboring characteristic value changes from low to high in characteristic sequence, is encoded to the first encoded radio, such as is encoded to " 1 ";If
Two neighboring characteristic value remains unchanged in characteristic sequence, then is encoded to the second encoded radio, such as is encoded to " 0 ";If characteristic sequence
In two neighboring characteristic value change from high to low, then be encoded to third encoded radio, such as be encoded to " -1 ".
For example, can since in the characteristic sequence of audio to be detected be located at primary characteristic value, first can will
It is 0 positioned at primary Coding pattern features, then, primary characteristic value will be located at and compared with deputy characteristic value is located at
Compared with either, can not being encoded to primary characteristic value is located at, directly will be located at primary characteristic value and be located at the
Two characteristic values are compared.When primary characteristic value is less than deputy characteristic value, it is encoded to " 1 ", and, when the
When one characteristic value is equal to deputy characteristic value, it is encoded to " 0 ";And when primary characteristic value is greater than deputy
When characteristic value, it is encoded to " -1 ".Further, deputy characteristic value will be located to compare with the characteristic value for being located at third position
Compared with, and so on, it finishes, obtains to be checked until each adjacent two characteristic value in the characteristic sequence of audio to be detected is compared
Characteristic sequence after corresponding first coding of acoustic frequency.Characteristic sequence after first coding can be formed by -1,0 or 1, should
The frequecy characteristic that characteristic sequence after first coding can be used for characterizing audio to be detected changes in the height of time scale.
Likewise, it is directed to benchmark audio, it can also be according to the coding rule of the extension Manchester's code to benchmark audio
Reference characteristic sequence encoded, in some embodiments, coded sub-units 3041 are specifically used for: according to pre-arranged code plan
Slightly by the characteristic sequence of benchmark audio, each adjacent two characteristic value carries out size comparison;When previous in two neighboring characteristic value
When a characteristic value is less than later feature value, the characteristic sequence of benchmark audio is encoded to the first encoded radio, and, when adjacent two
When previous characteristic value is equal to later feature value in a characteristic value, the characteristic sequence of benchmark audio is encoded to the second coding
Value;And when characteristic value previous in two neighboring characteristic value is greater than later feature value, by the characteristic sequence of benchmark audio
It is encoded to third encoded radio;Feature after generating the second coding based on the first encoded radio, the second encoded radio and/or third encoded radio
Sequence.
Since audio to be detected or benchmark audio are easy by individual difference and Effect of gender, for example, female voice is relative to male
The frequency of sound is higher, and different people is different in the base frequency for sending out phone same, pronunciation length also difference etc., if therefore passing through letter
The mode of single given threshold and parameter eliminates the influence of individual difference bring, then is easy by subjective factor and data scale
It influences, it is not accurate enough and stable, and using extension Manchester's code to the feature sequence of audio to be detected in the embodiment of the present invention
The reference characteristic sequence of column and benchmark audio is encoded, and the similitude by characteristic sequence after determining coding is to be detected to characterize
Similarity between audio and benchmark audio eliminates the disturbing factors such as audio accompaniment, individual and gender differences and examines to similarity
Survey the influence of result accuracy.
In some embodiments, similarity distance include at least editing distance, Euclidean distance and Hamming distance, first
Determine that subelement 3042 is specifically used for: between the characteristic sequence after characteristic sequence and the second coding after at least determining the first coding
Editing distance, Euclidean distance and Hamming distance;Editing distance, Euclidean distance and Hamming distance are returned respectively
One changes, and obtains similarity distance.
Wherein, editing distance can be pointer for characteristic sequence after two codings, by feature sequence after one of coding
Column are converted into minimum edit operation times needed for characteristic sequence after another is encoded.Editing distance is bigger, illustrates two codings
Different characteristic is more between characteristic sequence afterwards, conversely, editing distance is smaller, illustrates different special between characteristic sequence after two codings
Levy it is fewer, the edit operation may include a characteristic character is substituted for another characteristic character, insertion one characteristic character,
And delete a characteristic character etc., this feature character can be " 1 ", " 0 " or " -1 " etc. that coding obtains.First determines subelement
The editing distance between the characteristic sequence after characteristic sequence and the second coding after 3042 determining first codings, that is, determine that first compiles
Minimum edit operation times needed for characteristic sequence after code is converted into the characteristic sequence after the second coding, can using editing distance
To measure the similitude of the two characteristic sequences entirety such as the characteristic sequence after the first coding and the characteristic sequence after the second coding,
Preferably solves alignment problem as caused by pronunciation length difference etc..
Euclidean distance can refer to that the characteristic sequence after the first coding and the characteristic sequence after the second coding are several in Europe
In the linear distance of point-to-point transmission in space, in the embodiment of the present invention Euclidean distance be used to measure first encode after feature
Difference degree between the two characteristic sequences such as the characteristic sequence after sequence and the second coding.Such as first determine subelement
3042 can determine that Europe is several between the characteristic sequence after the first coding and the characteristic sequence after the second coding according to above-mentioned formula (6)
In distance d2。
Hamming distance can refer to the characteristic sequence after the first coding and the characteristic sequence corresponding position after the second coding not
With characteristic character number, i.e., the characteristic sequence after the first coding is transformed into replacement required for the characteristic sequence after the second coding
Number, the Hamming distance can be used for measuring characteristic sequence after characteristic sequence and the second coding after the first coding etc. this two
The absolute consistency of a sequence corresponding position.
Obtaining editing distance d1, Euclidean distance d2With Hamming distance d3Afterwards, first determine that subelement 3042 can be right
Editing distance, Euclidean distance and Hamming distance are normalized respectively, obtain similarity distance.
Determination unit 305, for determining the similarity between audio to be detected and benchmark audio according to similarity distance.
In some embodiments, as shown in figure 20, determination unit 305 includes:
Subelement 3051 is constructed, for constructing each distance and sub- phase in editing distance, Euclidean distance and Hamming distance
Like the affine function between degree;
Second determines subelement 3052, for according to it is each determine respectively apart from corresponding affine function it is each apart from corresponding son
Similarity;
Third determines subelement 3053, similar between audio to be detected and benchmark audio for being determined according to sub- similarity
Degree.
Wherein, building subelement 3051, which establishes similarity, can refer to and will normalize about the affine function of similarity distance
Editing distance, Euclidean distance and the Hamming distance arrived establishes independent variable using similarity as dependent variable as independent variable
Mapping relations between dependent variable the two.Can use affine function by after normalization editing distance, Euclid away from
From and Hamming distance determine the sub- similarity being normalized into 0~100 range.
Such as building subelement 3051 can establish sub- similarity and editing distance D1Between the first affine function be F
(D1), shown in expression formula such as above-mentioned formula (8);Establish sub- similarity and Euclidean distance D2Between the second affine function
For F (D2), shown in expression formula such as above-mentioned formula (10);Establish sub- similarity and Hamming distance D3Between third affine function
For F (D3), shown in expression formula such as above-mentioned formula (12).
Obtaining editing distance D1Corresponding first affine function is F (D1), Euclidean distance D2Corresponding second is affine
Function is F (D2) and Hamming distance D3Corresponding third affine function is F (D3) after, second determines that subelement 3052 can basis
First affine function is F (D1) determine editing distance D1Corresponding first sub- similarity is F (D according to the second affine function2) really
Determine Euclidean distance D2Corresponding second sub- similarity, and according to third affine function be F (D3) determine Hamming distance D3It is right
The sub- similarity of the third answered, third determines that subelement 3053 can be according to the first sub- similarity, the second sub- similarity and at this time
Three sub- similarities determine the similarity between audio to be detected and benchmark audio.
In some embodiments, third determines that subelement 3053 is specifically used for: being arranged for the sub- similarity of editing distance
First weighted value, and the second weighted value is set for the sub- similarity of Hamming distance;The sub- similarity of Euclidean distance is arranged
For penalty term;According to the first weighted value, the second weighted value and penalty term, determine similar between audio to be detected and benchmark audio
Degree.
For example, since editing distance overcomes pronunciation length or pause etc., and the characteristic with strong antijamming capability, because
This can determine component for editing distance as most important similarity;Since Hamming distance has for measures characteristic sequence
The characteristic of absolute consistency, therefore component can be determined using Hamming distance as the similarity of auxiliary;Due to Euclidean distance
The geometric distance of measures characteristic sequence, the characteristic of the difference of prominent features sequence, therefore can be using Euclidean distance as phase
Like the determining penalty term of degree.At this point, third determines that the first weight can be arranged for the sub- similarity of editing distance in subelement 3053
Value, and the second weighted value is set for the sub- similarity of Hamming distance, and the sub- similarity of Euclidean distance is set as punishing
, wherein the value of the first weighted value and the second weighted value can carry out flexible setting according to actual needs, then third is determined
Subelement 3053 determines similar between audio to be detected and benchmark audio according to the first weighted value, the second weighted value and penalty term
Degree, calculation formula can be as shown in above-mentioned formula (14).
In some embodiments, audio similarity detection device can also include: resource transfers unit, for when to be checked
When similarity between acoustic frequency and benchmark audio is greater than default similarity threshold, virtual resource transfer operation is executed, and/or aobvious
Show the relevant information of the similarity testing result of audio to be detected.
In some embodiments, audio similarity detection device can also include: unlocking unit, for working as acoustic to be checked
When similarity between frequency and benchmark audio is greater than default similarity threshold, audio lock operation is unlocked in execution.
From the foregoing, it will be observed that the embodiment of the present invention can be obtained audio and screening unit to be detected by audio acquiring unit 301
302 filter out the audio for meeting preset condition from the audio to be detected, and obtain audio to be detected according to the audio filtered out
Characteristic sequence, so as to the interference tones in audio to be detected are filtered and filtered out required audio frequency characteristics, with
And the reference characteristic sequence of benchmark audio is obtained by feature acquiring unit 303;Then, distance acquiring unit 304 obtains to be detected
Similarity distance between the characteristic sequence of audio, and the reference characteristic sequence of benchmark audio, for example, editing distance, Euclid away from
From with Hamming distance etc., which can reduce influence of many factors to similarity testing result, at this time determination unit
305 can determine the similarity between audio to be detected and benchmark audio according to similarity distance, improve audio similarity detection
Accuracy.
Correspondingly, the embodiment of the present invention also provides a kind of computer equipment, the computer equipment may include tablet computer,
The terminals such as mobile phone and laptop, as shown in figure 21, the computer equipment may include radio frequency (RF, Radio
Frequency) circuit 601, include one or more memory 602, the input unit of determining machine readable storage medium storing program for executing
603, display unit 604, sensor 605, voicefrequency circuit 606, Wireless Fidelity (WiFi, Wireless Fidelity) module
607, the components such as processor 608 and the power supply 609 of processing core are included one or more than one.Those skilled in the art
Member is appreciated that computer equipment structure shown in Figure 21 does not constitute the restriction to computer equipment, may include than figure
Show more or fewer components, perhaps combines certain components or different component layouts.Wherein:
RF circuit 601 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station
After downlink information receives, one or the processing of more than one processor 608 are transferred to;In addition, the data for being related to uplink are sent to
Base station.In general, RF circuit 601 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses
Family identity module (SIM, Subscriber Identity Module) card, transceiver, coupler, low-noise amplifier
(LNA, Low Noise Amplifier), duplexer etc..In addition, RF circuit 601 can also by wireless communication with network and its
He communicates equipment.Any communication standard or agreement, including but not limited to global system for mobile telecommunications system can be used in the wireless communication
Unite (GSM, Global System of Mobile communication), general packet radio service (GPRS, General
Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division it is more
Location (WCDMA, Wideband Code Division Multiple Access), long term evolution (LTE, Long Term
Evolution), Email, short message service (SMS, Short Messaging Service) etc..
Memory 602 can be used for storing software program and module, and processor 608 is stored in memory 602 by operation
Software program and module, thereby executing various function application and data processing.Memory 602 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as sound-playing function, image player function) etc.;Storage data area can be stored to be created according to using for computer equipment
Data (such as audio data, phone directory etc.) etc..It, can be in addition, memory 602 may include high-speed random access memory
Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states
Part.Correspondingly, memory 602 can also include Memory Controller, to provide processor 608 and 603 pairs of input unit storages
The access of device 602.
Input unit 603 can be used for receiving the number or character information of input, and generate and user setting and function
Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, in a specific embodiment
In, input unit 603 may include touch sensitive surface and other input equipments.Touch sensitive surface, also referred to as touch display screen or touching
Control plate, collect user on it or nearby touch operation (such as user using any suitable object such as finger, stylus or
Operation of the attachment on touch sensitive surface or near touch sensitive surface), and corresponding connection dress is driven according to preset formula
It sets.Optionally, touch sensitive surface may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus is examined
The touch orientation of user is surveyed, and detects touch operation bring signal, transmits a signal to touch controller;Touch controller from
Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then gives processor 608, and can reception processing
Order that device 608 is sent simultaneously is executed.Furthermore, it is possible to a variety of using resistance-type, condenser type, infrared ray and surface acoustic wave etc.
Type realizes touch sensitive surface.In addition to touch sensitive surface, input unit 603 can also include other input equipments.Specifically, other are defeated
Entering equipment can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse
One of mark, operating stick etc. are a variety of.
Display unit 604 can be used for showing information input by user or the information and computer equipment that are supplied to user
Various graphical user interface, these graphical user interface can be by figure, text, icon, video and any combination thereof come structure
At.Display unit 604 may include display panel, optionally, can use liquid crystal display (LCD, Liquid Crystal
Display), the forms such as Organic Light Emitting Diode (OLED, Organic Light-Emitting Diode) configure display surface
Plate.Further, touch sensitive surface can cover display panel, after touch sensitive surface detects touch operation on it or nearby,
Processor 608 is sent to determine the type of touch event, is followed by subsequent processing device 608 according to the type of touch event in display panel
It is upper that corresponding visual output is provided.Although touch sensitive surface and display panel are come in fact as two independent components in Figure 21
Now input and input function, but in some embodiments it is possible to touch sensitive surface and display panel is integrated and realize input and
Output function.
Computer equipment may also include at least one sensor 605, such as optical sensor, motion sensor and other biographies
Sensor.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ring
The light and shade of border light adjusts the brightness of display panel, and proximity sensor can close aobvious when computer equipment is moved in one's ear
Show panel and/or backlight.As a kind of motion sensor, gravity accelerometer can detect in all directions (generally
Three axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify answering for computer equipment posture
With (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion)
Deng;Other sensings such as gyroscope, barometer, hygrometer, thermometer, infrared sensor for can also configure as computer equipment
Device, details are not described herein.
Voicefrequency circuit 606, loudspeaker, microphone can provide the audio interface between user and computer equipment.Audio-frequency electric
Electric signal after the audio data received conversion can be transferred to loudspeaker, it is defeated to be converted to voice signal by loudspeaker by road 606
Out;On the other hand, the voice signal of collection is converted to electric signal by microphone, is converted to audio after being received by voicefrequency circuit 606
Data, then by after the processing of audio data output processor 608, such as another computer equipment is sent to through RF circuit 601, or
Person exports audio data to memory 602 to be further processed.Voicefrequency circuit 606 is also possible that earphone jack, to mention
For the communication of peripheral hardware earphone and computer equipment.
WiFi belongs to short range wireless transmission technology, and computer equipment can help user to receive and dispatch by WiFi module 607
Email, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 21
WiFi module 607 is shown, but it is understood that, and it is not belonging to must be configured into for computer equipment, it completely can root
It is omitted within the scope of not changing the essence of the invention according to needs.
Processor 608 is the control centre of computer equipment, utilizes various interfaces and the entire computer equipment of connection
Various pieces, by running or execute the software program and/or module that are stored in memory 602, and call and be stored in
Data in memory 602 execute the various functions and processing data of computer equipment, to carry out to computer equipment whole
Monitoring.Optionally, processor 608 may include one or more processing cores;Preferably, processor 608 can be integrated using processing
Device and modem processor, wherein the main processing operation system of application processor, user interface and application program etc., modulation
Demodulation processor mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processing
In device 608.
Computer equipment further includes the power supply 609 (such as battery) powered to all parts, it is preferred that power supply can pass through
Power-supply management system and processor 608 are logically contiguous, to realize management charging, electric discharge, Yi Jigong by power-supply management system
The functions such as consumption management.Power supply 609 can also include one or more direct current or AC power source, recharging system, power supply
The random components such as fault detection circuit, power adapter or inverter, power supply status indicator.
Although being not shown, computer equipment can also include camera, bluetooth module etc., and details are not described herein.Specifically exist
In the present embodiment, the processor 608 in computer equipment can be according to following instruction, by one or more application program
The corresponding executable file of process be loaded into memory 602, and run by processor 608 storage in the memory 602
Application program, to realize various functions:
Obtain audio to be detected;Filter out the audio for meeting preset condition from audio to be detected, and according to filtering out
Audio obtains the characteristic sequence of audio to be detected;Obtain the reference characteristic sequence of benchmark audio;Obtain the feature of audio to be detected
Similarity distance between sequence, and the reference characteristic sequence of benchmark audio;Audio to be detected and benchmark are determined according to similarity distance
Similarity between audio.
Optionally, processor 608 runs storage application program in the memory 602, can also realize following functions:
Audio to be detected is pre-processed, pretreated audio is obtained;Obtain the energy spectrum of pretreated audio;According to energy
Spectrum, filters out the audio for meeting preset condition from pretreated audio, and by the corresponding frequency sequence of the audio filtered out
It is set as the characteristic sequence of audio to be detected.
Optionally, processor 608 runs storage application program in the memory 602, can also realize following functions:
The first root mean square average energy value of target fiducials audio is obtained, and obtains the second root mean square average energy value of interference tones;It obtains
The first energy spectrum of target fiducials audio is taken, and obtains the second energy spectrum of interference tones;According to the first energy spectrum, first
Root average energy value, the second root mean square average energy value and the second energy spectrum, optimize benchmark audio, the base after being optimized
Quasi- audio;The reference characteristic sequence of benchmark audio after obtaining optimization.
Optionally, processor 608 runs storage application program in the memory 602, can also realize following functions:
It is encoded according to characteristic sequence of the pre-arranged code strategy to audio to be detected, the characteristic sequence after obtaining the first coding, and
It is encoded according to reference characteristic sequence of the pre-arranged code strategy to benchmark audio, the characteristic sequence after obtaining the second coding;Really
The similarity distance between the characteristic sequence after characteristic sequence and the second coding after fixed first coding.
Optionally, processor 608 runs storage application program in the memory 602, can also realize following functions:
The editing distance between the characteristic sequence after characteristic sequence and the second coding after at least determining the first coding, Euclidean distance
And Hamming distance;Editing distance, Euclidean distance and Hamming distance are normalized respectively, obtain similarity distance.
Optionally, processor 608 runs storage application program in the memory 602, can also realize following functions:
Construct the affine function in editing distance, Euclidean distance and Hamming distance between each distance and sub- similarity;According to respectively away from
Corresponding sub- similarity with a distance from each is determined respectively from corresponding affine function;Audio to be detected and benchmark are determined according to sub- similarity
Similarity between audio.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the detailed description above with respect to audio similarity detection method, details are not described herein again.
From the foregoing, it will be observed that the available audio to be detected of the embodiment of the present invention, and filtered out completely from the audio to be detected
The audio of sufficient preset condition, and obtain according to the audio filtered out the characteristic sequence of audio to be detected, so as to will be to be detected
Interference tones in audio are filtered and filter out required audio frequency characteristics, and obtain the reference characteristic sequence of benchmark audio
Column;Then, the similarity distance between the characteristic sequence for obtaining audio to be detected, and the reference characteristic sequence of benchmark audio, such as
Editing distance, Euclidean distance and Hamming distance etc., the similarity distance can reduce many factors to similarity testing result
Influence, the similarity between audio to be detected and benchmark audio can be determined according to similarity distance at this time, improves audio phase
Like the accuracy of degree detection.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with
It is completed by instructing, or relevant hardware is controlled by instruction to complete, which, which can store, determines that machine is readable in one and deposit
In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed
Device is loaded, to execute the step in any audio similarity detection method provided by the embodiment of the present invention.For example, should
Instruction can execute following steps:
Obtain audio to be detected;Filter out the audio for meeting preset condition from audio to be detected, and according to filtering out
Audio obtains the characteristic sequence of audio to be detected;Obtain the reference characteristic sequence of benchmark audio;Obtain the feature of audio to be detected
Similarity distance between sequence, and the reference characteristic sequence of benchmark audio;Audio to be detected and benchmark are determined according to similarity distance
Similarity between audio.
Optionally, which can also be performed following steps: pre-processing, obtains pretreated to audio to be detected
Audio;Obtain the energy spectrum of pretreated audio;According to energy spectrum, the default item of satisfaction is filtered out from pretreated audio
The audio of part, and set the corresponding frequency sequence of the audio filtered out to the characteristic sequence of audio to be detected.
Optionally, which can also be performed following steps: the first root mean square average energy value of target fiducials audio is obtained,
And obtain the second root mean square average energy value of interference tones;The first energy spectrum of target fiducials audio is obtained, and is obtained dry
Disturb the second energy spectrum of audio;According to the first energy spectrum, the first root mean square average energy value, the second root mean square average energy value and second
Energy spectrum optimizes benchmark audio, the benchmark audio after being optimized;The reference characteristic of benchmark audio after obtaining optimization
Sequence.
Optionally, which can also be performed following steps: according to pre-arranged code strategy to the feature sequence of audio to be detected
Column are encoded, the characteristic sequence after obtaining the first coding, and according to pre-arranged code strategy to the reference characteristic of benchmark audio
Sequence is encoded, the characteristic sequence after obtaining the second coding;After characteristic sequence and the second coding after determining the first coding
Similarity distance between characteristic sequence.
Optionally, which can also be performed following steps: characteristic sequence and second after at least determining the first coding are compiled
Editing distance, Euclidean distance and the Hamming distance between characteristic sequence after code;To editing distance, Euclidean distance and
Hamming distance is normalized respectively, obtains similarity distance.
Optionally, which can also be performed following steps: in building editing distance, Euclidean distance and Hamming distance
Affine function between each distance and sub- similarity;According to it is each determine respectively apart from corresponding affine function it is each apart from corresponding son
Similarity;The similarity between audio to be detected and benchmark audio is determined according to sub- similarity.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory
Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, any audio phase provided by the embodiment of the present invention can be executed
Like the step in degree detection method, it is thereby achieved that any audio similarity detection side provided by the embodiment of the present invention
Beneficial effect achieved by method is detailed in the embodiment of front, and details are not described herein.
It is provided for the embodiments of the invention a kind of audio similarity detection method, device, storage medium and calculating above
Machine equipment is described in detail, and used herein a specific example illustrates the principle and implementation of the invention,
The above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile for the skill of this field
Art personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this
Description should not be construed as limiting the invention.
Claims (18)
1. a kind of audio similarity detection method characterized by comprising
Obtain audio to be detected;
The audio for meeting preset condition is filtered out from the audio to be detected, and described to be checked according to the audio acquisition filtered out
The characteristic sequence of acoustic frequency;
Obtain the reference characteristic sequence of benchmark audio;
Similarity distance between the characteristic sequence for obtaining the audio to be detected, and the reference characteristic sequence of the benchmark audio;
The similarity between the audio to be detected and benchmark audio is determined according to the similarity distance.
2. audio similarity detection method according to claim 1, which is characterized in that described from the audio to be detected
The audio for meeting preset condition is filtered out, and obtains the characteristic sequence of the audio to be detected according to the audio filtered out, comprising:
The audio to be detected is pre-processed, pretreated audio is obtained;
Obtain the energy spectrum of the pretreated audio;
According to the energy spectrum, the audio for meeting preset condition is filtered out from the pretreated audio, and will be filtered out
The corresponding frequency sequence of audio be set as the characteristic sequence of the audio to be detected.
3. audio similarity detection method according to claim 2, which is characterized in that it is described to the audio to be detected into
Row pretreatment, obtains pretreated audio, comprising:
The audio to be detected is sampled according to default sampling policy, the audio after being sampled;
Sub-frame processing is carried out to the audio after the sampling according to default framing strategy, the audio after obtaining framing;
Windowing process is carried out to the audio after the framing, obtains the pretreated audio of discrete time-domain.
4. audio similarity detection method according to claim 2, which is characterized in that the acquisition is described pretreated
The energy spectrum of audio, comprising:
Integral transformation is carried out to the pretreated audio, obtains the corresponding frequency spectrum of the pretreated audio;
The energy spectrum of the pretreated audio is determined according to the frequency spectrum.
5. audio similarity detection method according to claim 2, which is characterized in that it is described according to the energy spectrum, from
The audio for meeting preset condition is filtered out in the pretreated audio, comprising:
The intensity of sound of the audio to be detected is obtained according to the energy spectrum;
Filtered out from the audio to be detected intensity of sound be greater than preset threshold audio, obtain intensity of sound meet it is described pre-
If the audio of condition.
6. audio similarity detection method according to claim 5, which is characterized in that described from the audio to be detected
The audio that intensity of sound is greater than preset threshold is filtered out, the audio that intensity of sound meets the preset condition is obtained, comprising:
The intensity of sound of the audio to be detected is normalized into preset sound strength range, obtains intensity of sound standardization sound
Frequently;
The audio that intensity of sound is greater than preset threshold is filtered out from the intensity of sound standardized audio, and it is full to obtain intensity of sound
The audio of the foot preset condition.
7. audio similarity detection method according to claim 1, which is characterized in that when in the benchmark audio include mesh
When marking benchmark audio and interference tones, the reference characteristic sequence for obtaining benchmark audio, comprising:
The first root mean square average energy value of the target fiducials audio is obtained, and obtains the second root mean square of the interference tones
Average energy value;
The first energy spectrum of the target fiducials audio is obtained, and obtains the second energy spectrum of the interference tones;
According to first energy spectrum, the first root mean square average energy value, the second root mean square average energy value and the second energy spectrum, to institute
It states benchmark audio to optimize, the benchmark audio after being optimized;
The reference characteristic sequence of benchmark audio after obtaining the optimization.
8. audio similarity detection method according to claim 7, which is characterized in that described to obtain the target fiducials sound
First root mean square average energy value of frequency, and obtain the second root mean square average energy value of the interference tones, comprising:
It determines the first root mean square energy of the target fiducials audio, and determines the second root mean square energy of the interference tones
Amount;
Obtain the first frame number and the first frame length of the target fiducials audio, and obtain the interference tones the second frame number and
Second frame length;
The first root mean square of the target fiducials audio is determined according to the first root mean square energy, the first frame number and the first frame length
Average energy value, and the second of the interference tones are determined according to the second root mean square energy, the second frame number and the second frame length
Root mean square average energy value.
9. audio similarity detection method according to any one of claims 1 to 8, which is characterized in that described in the acquisition
Similarity distance between the characteristic sequence of audio to be detected, and the reference characteristic sequence of the benchmark audio, comprising:
It is encoded according to characteristic sequence of the pre-arranged code strategy to the audio to be detected, the feature sequence after obtaining the first coding
Column, and encoded according to reference characteristic sequence of the pre-arranged code strategy to the benchmark audio, obtain the second coding
Characteristic sequence afterwards;
The similarity distance between the characteristic sequence after characteristic sequence and the second coding after determining first coding.
10. audio similarity detection method according to claim 9, which is characterized in that described according to pre-arranged code strategy
The characteristic sequence of the audio to be detected is encoded, the characteristic sequence after obtaining the first coding, comprising:
According to pre-arranged code strategy by the characteristic sequence of the audio to be detected, each adjacent two characteristic value carries out size ratio
Compared with;
When characteristic value previous in two neighboring characteristic value is less than later feature value, by the feature sequence of the audio to be detected
Column are encoded to the first encoded radio, and,
When characteristic value previous in two neighboring characteristic value is equal to later feature value, by the feature sequence of the audio to be detected
Column are encoded to the second encoded radio;And
When characteristic value previous in two neighboring characteristic value is greater than later feature value, by the feature sequence of the audio to be detected
Column are encoded to third encoded radio;
Characteristic sequence after generating the first coding based on first encoded radio, the second encoded radio and/or third encoded radio.
11. audio similarity detection method according to claim 9, which is characterized in that the similarity distance includes at least
Editing distance, Euclidean distance and Hamming distance, after the characteristic sequence and the second coding after the determination first coding
Characteristic sequence between similarity distance, comprising:
The editing distance between the characteristic sequence after characteristic sequence and the second coding after at least determining first coding, Europe are several
In distance and Hamming distance;
The editing distance, Euclidean distance and Hamming distance are normalized respectively, obtain similarity distance.
12. audio similarity detection method according to claim 11, which is characterized in that described according to the similarity distance
Determine the similarity between the audio to be detected and benchmark audio, comprising:
Construct the affine function in editing distance, Euclidean distance and Hamming distance between each distance and sub- similarity;
According to it is each determine respectively apart from corresponding affine function it is each apart from corresponding sub- similarity;
The similarity between the audio to be detected and benchmark audio is determined according to the sub- similarity.
13. audio similarity detection method according to claim 12, which is characterized in that described according to the sub- similarity
Determine the similarity between the audio to be detected and benchmark audio, comprising:
The first weighted value is set for the sub- similarity of the editing distance, and is arranged second for the sub- similarity of the Hamming distance
Weighted value;
Penalty term is set by the sub- similarity of the Euclidean distance;
According to first weighted value, the second weighted value and penalty term, determine between the audio to be detected and benchmark audio
Similarity.
14. audio similarity detection method according to any one of claims 1 to 9, which is characterized in that described according to
Similarity distance determines after the similarity between the audio to be detected and benchmark audio, which comprises
When the similarity between the audio to be detected and benchmark audio is greater than default similarity threshold, executes virtual resource and turn
Operation is moved, and/or shows the relevant information of the similarity testing result of the audio to be detected.
15. audio similarity detection method according to any one of claims 1 to 9, which is characterized in that described according to
Similarity distance determines after the similarity between the audio to be detected and benchmark audio, which comprises
When the similarity between the audio to be detected and benchmark audio is greater than default similarity threshold, audio lock is unlocked in execution
Operation.
16. a kind of audio similarity detection device characterized by comprising
Audio acquiring unit, for obtaining audio to be detected;
Screening unit, for filtering out the audio for meeting preset condition from the audio to be detected, and according to the sound filtered out
Frequency obtains the characteristic sequence of the audio to be detected;
Feature acquiring unit, for obtaining the reference characteristic sequence of benchmark audio;
Distance acquiring unit, the reference characteristic sequence for obtaining the characteristic sequence of the audio to be detected, with the benchmark audio
Similarity distance between column;
Determination unit, for determining the similarity between the audio to be detected and benchmark audio according to the similarity distance.
17. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor
It is loaded, 1 to 16 described in any item audio similarity detection methods is required with perform claim.
18. a kind of computer equipment, including memory and processor, which is characterized in that the memory is stored with determining machine journey
Sequence, when the determining machine program is executed by the processor, so that the processor executes such as any one of claims 1 to 16
The audio similarity detection method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811233515.0A CN109087669B (en) | 2018-10-23 | 2018-10-23 | Audio similarity detection method and device, storage medium and computer equipment |
CN202110100844.3A CN112863547B (en) | 2018-10-23 | 2018-10-23 | Virtual resource transfer processing method, device, storage medium and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811233515.0A CN109087669B (en) | 2018-10-23 | 2018-10-23 | Audio similarity detection method and device, storage medium and computer equipment |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110100844.3A Division CN112863547B (en) | 2018-10-23 | 2018-10-23 | Virtual resource transfer processing method, device, storage medium and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109087669A true CN109087669A (en) | 2018-12-25 |
CN109087669B CN109087669B (en) | 2021-03-02 |
Family
ID=64843827
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811233515.0A Active CN109087669B (en) | 2018-10-23 | 2018-10-23 | Audio similarity detection method and device, storage medium and computer equipment |
CN202110100844.3A Active CN112863547B (en) | 2018-10-23 | 2018-10-23 | Virtual resource transfer processing method, device, storage medium and computer equipment |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110100844.3A Active CN112863547B (en) | 2018-10-23 | 2018-10-23 | Virtual resource transfer processing method, device, storage medium and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109087669B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109547843A (en) * | 2019-02-01 | 2019-03-29 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus that audio-video is handled |
CN110010159A (en) * | 2019-04-02 | 2019-07-12 | 广州酷狗计算机科技有限公司 | Sound similarity determines method and device |
CN110491413A (en) * | 2019-08-21 | 2019-11-22 | 中国传媒大学 | A kind of audio content consistency monitoring method and system based on twin network |
CN110677718A (en) * | 2019-09-27 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Video identification method and device |
CN110838296A (en) * | 2019-11-18 | 2020-02-25 | 锐迪科微电子科技(上海)有限公司 | Recording process control method, system, electronic device and storage medium |
CN111462775A (en) * | 2020-03-30 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN111583963A (en) * | 2020-05-18 | 2020-08-25 | 合肥讯飞数码科技有限公司 | Method, device and equipment for detecting repeated audio and storage medium |
CN112201265A (en) * | 2020-12-07 | 2021-01-08 | 成都启英泰伦科技有限公司 | LSTM voice enhancement method based on psychoacoustic model |
CN112700790A (en) * | 2020-12-11 | 2021-04-23 | 广州市申迪计算机***有限公司 | IDC machine room sound processing method, system, equipment and computer storage medium |
CN112885374A (en) * | 2021-01-27 | 2021-06-01 | 吴怡然 | Sound accuracy judgment method and system based on spectrum analysis |
CN113313183A (en) * | 2020-06-05 | 2021-08-27 | 谷歌有限责任公司 | Training speech synthesis neural networks by using energy scores |
CN113571033A (en) * | 2021-07-13 | 2021-10-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Detection method and equipment for back stepping of accompaniment and computer readable storage medium |
CN113572547A (en) * | 2021-07-16 | 2021-10-29 | 上海科江电子信息技术有限公司 | Construction method of frequency spectrum integral graph, frequency spectrum matching method and frequency spectrum matcher |
CN115578999A (en) * | 2022-12-07 | 2023-01-06 | 深圳市声扬科技有限公司 | Method and device for detecting copied voice, electronic equipment and storage medium |
CN116434791A (en) * | 2023-06-12 | 2023-07-14 | 深圳福德源数码科技有限公司 | Configuration method and system for audio player |
TWI832698B (en) * | 2023-02-10 | 2024-02-11 | 宏碁股份有限公司 | Video conference evaluation method and system |
Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6957357B2 (en) * | 2000-10-30 | 2005-10-18 | International Business Machines Corporation | Clock synchronization with removal of clock skews through network measurements in derivation of a convext hull |
CN101320566A (en) * | 2008-06-30 | 2008-12-10 | 中国人民解放军第四军医大学 | Non-air conduction speech reinforcement method based on multi-band spectrum subtraction |
CN101458931A (en) * | 2009-01-08 | 2009-06-17 | 无敌科技(西安)有限公司 | Method for eliminating environmental noise from voice signal |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN101896937A (en) * | 2007-12-19 | 2010-11-24 | 米其林技术公司 | Method for processing a three-dimensional image of the surface of a tyre so that it can be used to inspect the said surface |
CN101996631A (en) * | 2009-08-28 | 2011-03-30 | 国际商业机器公司 | Method and device for aligning texts |
US7920992B2 (en) * | 2005-03-10 | 2011-04-05 | Carnegie Mellon University | Method and system for modeling uncertainties in integrated circuits, systems, and fabrication processes |
CN102024033A (en) * | 2010-12-01 | 2011-04-20 | 北京邮电大学 | Method for automatically detecting audio templates and chaptering videos |
CN102467939A (en) * | 2010-11-04 | 2012-05-23 | 北京彩云在线技术开发有限公司 | Song audio frequency cutting apparatus and method thereof |
CN102521281A (en) * | 2011-11-25 | 2012-06-27 | 北京师范大学 | Humming computer music searching method based on longest matching subsequence algorithm |
JP5182892B2 (en) * | 2009-09-24 | 2013-04-17 | 日本電信電話株式会社 | Voice search method, voice search device, and voice search program |
CN103871426A (en) * | 2012-12-13 | 2014-06-18 | 上海八方视界网络科技有限公司 | Method and system for comparing similarity between user audio frequency and original audio frequency |
CN103915103A (en) * | 2014-04-15 | 2014-07-09 | 成都凌天科创信息技术有限责任公司 | Voice quality enhancement system |
CN104091598A (en) * | 2013-04-18 | 2014-10-08 | 腾讯科技(深圳)有限公司 | Audio file similarity calculation method and device |
CN104133851A (en) * | 2014-07-07 | 2014-11-05 | 小米科技有限责任公司 | Audio similarity detecting method, audio similarity detecting device and electronic equipment |
CN104167211A (en) * | 2014-08-08 | 2014-11-26 | 南京大学 | Multi-source scene sound abstracting method based on hierarchical event detection and context model |
CN104685903A (en) * | 2012-10-09 | 2015-06-03 | 皇家飞利浦有限公司 | Method and apparatus for audio interference estimation |
CN104810025A (en) * | 2015-03-31 | 2015-07-29 | 天翼爱音乐文化科技有限公司 | Audio similarity detecting method and device |
CN104900238A (en) * | 2015-05-14 | 2015-09-09 | 电子科技大学 | Audio real-time comparison method based on sensing filtering |
CN104900239A (en) * | 2015-05-14 | 2015-09-09 | 电子科技大学 | Audio real-time comparison method based on Walsh-Hadamard transform |
CN105893549A (en) * | 2016-03-31 | 2016-08-24 | 中国人民解放军信息工程大学 | Audio retrieval method and device |
CN106095943A (en) * | 2016-06-14 | 2016-11-09 | 腾讯科技(深圳)有限公司 | Give song recitals and know well range detection method and device |
CN106250400A (en) * | 2016-07-19 | 2016-12-21 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and system |
CN106328168A (en) * | 2016-08-30 | 2017-01-11 | 成都普创通信技术股份有限公司 | Voice signal similarity detection method |
CN106601258A (en) * | 2016-12-12 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Speaker identification method capable of information channel compensation based on improved LSDA algorithm |
US9721559B2 (en) * | 2015-04-17 | 2017-08-01 | International Business Machines Corporation | Data augmentation method based on stochastic feature mapping for automatic speech recognition |
CN107731233A (en) * | 2017-11-03 | 2018-02-23 | 王华锋 | A kind of method for recognizing sound-groove based on RNN |
CN107919133A (en) * | 2016-10-09 | 2018-04-17 | 赛谛听股份有限公司 | For the speech-enhancement system and sound enhancement method of destination object |
CN108665903A (en) * | 2018-05-11 | 2018-10-16 | 复旦大学 | A kind of automatic testing method and its system of audio signal similarity degree |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6415341B2 (en) * | 2015-01-30 | 2018-10-31 | 株式会社第一興商 | Karaoke system with pitch shift function for harmony singing |
CN106469413B (en) * | 2015-08-20 | 2021-08-03 | 深圳市腾讯计算机***有限公司 | Data processing method and device for virtual resources |
CN107393519B (en) * | 2017-08-03 | 2020-09-15 | 腾讯音乐娱乐(深圳)有限公司 | Display method, device and storage medium for singing scores |
CN107527206A (en) * | 2017-08-24 | 2017-12-29 | 维沃移动通信有限公司 | A kind of resource transfers method, server, terminal and resource transfers system |
CN107705105A (en) * | 2017-08-24 | 2018-02-16 | 维沃移动通信有限公司 | A kind of resource transfers method, server, terminal and resource transfers system |
CN107818798B (en) * | 2017-10-20 | 2020-08-18 | 百度在线网络技术(北京)有限公司 | Customer service quality evaluation method, device, equipment and storage medium |
CN107798561B (en) * | 2017-10-25 | 2021-08-13 | 网易传媒科技(北京)有限公司 | Audio playing and sharing method and device, storage medium and electronic equipment |
-
2018
- 2018-10-23 CN CN201811233515.0A patent/CN109087669B/en active Active
- 2018-10-23 CN CN202110100844.3A patent/CN112863547B/en active Active
Patent Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6957357B2 (en) * | 2000-10-30 | 2005-10-18 | International Business Machines Corporation | Clock synchronization with removal of clock skews through network measurements in derivation of a convext hull |
US7920992B2 (en) * | 2005-03-10 | 2011-04-05 | Carnegie Mellon University | Method and system for modeling uncertainties in integrated circuits, systems, and fabrication processes |
CN101896937A (en) * | 2007-12-19 | 2010-11-24 | 米其林技术公司 | Method for processing a three-dimensional image of the surface of a tyre so that it can be used to inspect the said surface |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN101320566A (en) * | 2008-06-30 | 2008-12-10 | 中国人民解放军第四军医大学 | Non-air conduction speech reinforcement method based on multi-band spectrum subtraction |
CN101458931A (en) * | 2009-01-08 | 2009-06-17 | 无敌科技(西安)有限公司 | Method for eliminating environmental noise from voice signal |
CN101996631A (en) * | 2009-08-28 | 2011-03-30 | 国际商业机器公司 | Method and device for aligning texts |
JP5182892B2 (en) * | 2009-09-24 | 2013-04-17 | 日本電信電話株式会社 | Voice search method, voice search device, and voice search program |
CN102467939A (en) * | 2010-11-04 | 2012-05-23 | 北京彩云在线技术开发有限公司 | Song audio frequency cutting apparatus and method thereof |
CN102024033A (en) * | 2010-12-01 | 2011-04-20 | 北京邮电大学 | Method for automatically detecting audio templates and chaptering videos |
CN102521281A (en) * | 2011-11-25 | 2012-06-27 | 北京师范大学 | Humming computer music searching method based on longest matching subsequence algorithm |
CN104685903A (en) * | 2012-10-09 | 2015-06-03 | 皇家飞利浦有限公司 | Method and apparatus for audio interference estimation |
CN103871426A (en) * | 2012-12-13 | 2014-06-18 | 上海八方视界网络科技有限公司 | Method and system for comparing similarity between user audio frequency and original audio frequency |
CN104091598A (en) * | 2013-04-18 | 2014-10-08 | 腾讯科技(深圳)有限公司 | Audio file similarity calculation method and device |
CN103915103A (en) * | 2014-04-15 | 2014-07-09 | 成都凌天科创信息技术有限责任公司 | Voice quality enhancement system |
CN104133851A (en) * | 2014-07-07 | 2014-11-05 | 小米科技有限责任公司 | Audio similarity detecting method, audio similarity detecting device and electronic equipment |
CN104167211A (en) * | 2014-08-08 | 2014-11-26 | 南京大学 | Multi-source scene sound abstracting method based on hierarchical event detection and context model |
CN104810025A (en) * | 2015-03-31 | 2015-07-29 | 天翼爱音乐文化科技有限公司 | Audio similarity detecting method and device |
US9721559B2 (en) * | 2015-04-17 | 2017-08-01 | International Business Machines Corporation | Data augmentation method based on stochastic feature mapping for automatic speech recognition |
CN104900239A (en) * | 2015-05-14 | 2015-09-09 | 电子科技大学 | Audio real-time comparison method based on Walsh-Hadamard transform |
CN104900238A (en) * | 2015-05-14 | 2015-09-09 | 电子科技大学 | Audio real-time comparison method based on sensing filtering |
CN105893549A (en) * | 2016-03-31 | 2016-08-24 | 中国人民解放军信息工程大学 | Audio retrieval method and device |
CN106095943A (en) * | 2016-06-14 | 2016-11-09 | 腾讯科技(深圳)有限公司 | Give song recitals and know well range detection method and device |
CN106250400A (en) * | 2016-07-19 | 2016-12-21 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and system |
CN106328168A (en) * | 2016-08-30 | 2017-01-11 | 成都普创通信技术股份有限公司 | Voice signal similarity detection method |
CN107919133A (en) * | 2016-10-09 | 2018-04-17 | 赛谛听股份有限公司 | For the speech-enhancement system and sound enhancement method of destination object |
CN106601258A (en) * | 2016-12-12 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Speaker identification method capable of information channel compensation based on improved LSDA algorithm |
CN107731233A (en) * | 2017-11-03 | 2018-02-23 | 王华锋 | A kind of method for recognizing sound-groove based on RNN |
CN108665903A (en) * | 2018-05-11 | 2018-10-16 | 复旦大学 | A kind of automatic testing method and its system of audio signal similarity degree |
Non-Patent Citations (3)
Title |
---|
AGGELOS GKIOKAS ET AL.: "《Deploying Deep Belief Nets for content based audio music similarity》", 《IEEE IISA 2014, THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS》 * |
KRIS WEST ET AL.: "《Incorporating Cultural Representations of Features Into Audio Music Similarity Estimation》", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
李超等: "《基于距离相关图的音频相似性度量方法》", 《北京航空航天大学学报》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109547843A (en) * | 2019-02-01 | 2019-03-29 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus that audio-video is handled |
CN109547843B (en) * | 2019-02-01 | 2022-05-17 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for processing audio and video |
CN110010159A (en) * | 2019-04-02 | 2019-07-12 | 广州酷狗计算机科技有限公司 | Sound similarity determines method and device |
CN110491413B (en) * | 2019-08-21 | 2022-01-04 | 中国传媒大学 | Twin network-based audio content consistency monitoring method and system |
CN110491413A (en) * | 2019-08-21 | 2019-11-22 | 中国传媒大学 | A kind of audio content consistency monitoring method and system based on twin network |
CN110677718A (en) * | 2019-09-27 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Video identification method and device |
CN110677718B (en) * | 2019-09-27 | 2021-07-23 | 腾讯科技(深圳)有限公司 | Video identification method and device |
WO2021098153A1 (en) * | 2019-11-18 | 2021-05-27 | 锐迪科微电子科技(上海)有限公司 | Method, system, and electronic apparatus for detecting change of target user, and storage medium |
CN110838296A (en) * | 2019-11-18 | 2020-02-25 | 锐迪科微电子科技(上海)有限公司 | Recording process control method, system, electronic device and storage medium |
CN110838296B (en) * | 2019-11-18 | 2022-04-29 | 锐迪科微电子科技(上海)有限公司 | Recording process control method, system, electronic device and storage medium |
CN111462775B (en) * | 2020-03-30 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN111462775A (en) * | 2020-03-30 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN111583963B (en) * | 2020-05-18 | 2023-03-21 | 合肥讯飞数码科技有限公司 | Repeated audio detection method, device, equipment and storage medium |
CN111583963A (en) * | 2020-05-18 | 2020-08-25 | 合肥讯飞数码科技有限公司 | Method, device and equipment for detecting repeated audio and storage medium |
CN113313183A (en) * | 2020-06-05 | 2021-08-27 | 谷歌有限责任公司 | Training speech synthesis neural networks by using energy scores |
CN112201265A (en) * | 2020-12-07 | 2021-01-08 | 成都启英泰伦科技有限公司 | LSTM voice enhancement method based on psychoacoustic model |
CN112700790A (en) * | 2020-12-11 | 2021-04-23 | 广州市申迪计算机***有限公司 | IDC machine room sound processing method, system, equipment and computer storage medium |
CN112885374A (en) * | 2021-01-27 | 2021-06-01 | 吴怡然 | Sound accuracy judgment method and system based on spectrum analysis |
CN113571033A (en) * | 2021-07-13 | 2021-10-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Detection method and equipment for back stepping of accompaniment and computer readable storage medium |
CN113571033B (en) * | 2021-07-13 | 2024-06-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Accompaniment stepping back detection method, accompaniment stepping back detection equipment and computer readable storage medium |
CN113572547A (en) * | 2021-07-16 | 2021-10-29 | 上海科江电子信息技术有限公司 | Construction method of frequency spectrum integral graph, frequency spectrum matching method and frequency spectrum matcher |
CN113572547B (en) * | 2021-07-16 | 2023-04-18 | 上海科江电子信息技术有限公司 | Construction method of frequency spectrum integral graph, frequency spectrum matching method and frequency spectrum matcher |
CN115578999A (en) * | 2022-12-07 | 2023-01-06 | 深圳市声扬科技有限公司 | Method and device for detecting copied voice, electronic equipment and storage medium |
TWI832698B (en) * | 2023-02-10 | 2024-02-11 | 宏碁股份有限公司 | Video conference evaluation method and system |
CN116434791A (en) * | 2023-06-12 | 2023-07-14 | 深圳福德源数码科技有限公司 | Configuration method and system for audio player |
CN116434791B (en) * | 2023-06-12 | 2023-08-11 | 深圳福德源数码科技有限公司 | Configuration method and system for audio player |
Also Published As
Publication number | Publication date |
---|---|
CN112863547A (en) | 2021-05-28 |
CN112863547B (en) | 2022-11-29 |
CN109087669B (en) | 2021-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109087669A (en) | Audio similarity detection method, device, storage medium and computer equipment | |
CN103440862B (en) | A kind of method of voice and music synthesis, device and equipment | |
US10964300B2 (en) | Audio signal processing method and apparatus, and storage medium thereof | |
CN111210021B (en) | Audio signal processing method, model training method and related device | |
CN109166593A (en) | audio data processing method, device and storage medium | |
CN106652996B (en) | Prompt tone generation method and device and mobile terminal | |
CN103714824B (en) | A kind of audio-frequency processing method, device and terminal device | |
CN108242235A (en) | Electronic equipment and its audio recognition method | |
CN104409081B (en) | Audio signal processing method and device | |
CN109903773A (en) | Audio-frequency processing method, device and storage medium | |
CN108470571A (en) | A kind of audio-frequency detection, device and storage medium | |
CN106062867A (en) | Voice font speaker and prosody interpolation | |
CN106782627B (en) | Audio file rerecords method and device | |
CN111261144A (en) | Voice recognition method, device, terminal and storage medium | |
CN110096611A (en) | A kind of song recommendations method, mobile terminal and computer readable storage medium | |
CN109616135B (en) | Audio processing method, device and storage medium | |
CN106328176B (en) | A kind of method and apparatus generating song audio | |
CN108319657A (en) | Detect method, storage medium and the terminal of strong rhythm point | |
CN107274876A (en) | A kind of audition paints spectrometer | |
CN110444190A (en) | Method of speech processing, device, terminal device and storage medium | |
CN110505332A (en) | A kind of noise-reduction method, device, mobile terminal and storage medium | |
CN107371102A (en) | Control method, device and the storage medium and mobile terminal of audio broadcast sound volume | |
CN108172237A (en) | Voice communication data processing method, device, storage medium and mobile terminal | |
CN104092827A (en) | Method and device for setting terminal | |
CN107680614A (en) | Acoustic signal processing method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |