CN100424692C - Audio fast search method - Google Patents
Audio fast search method Download PDFInfo
- Publication number
- CN100424692C CN100424692C CNB2005100863153A CN200510086315A CN100424692C CN 100424692 C CN100424692 C CN 100424692C CN B2005100863153 A CNB2005100863153 A CN B2005100863153A CN 200510086315 A CN200510086315 A CN 200510086315A CN 100424692 C CN100424692 C CN 100424692C
- Authority
- CN
- China
- Prior art keywords
- audio
- histogram
- similarity
- target audio
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000001228 spectrum Methods 0.000 claims abstract description 7
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 230000005236 sound signal Effects 0.000 claims abstract description 4
- 238000013139 quantization Methods 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 10
- 239000012634 fragment Substances 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 5
- 206010038743 Restlessness Diseases 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000007796 conventional method Methods 0.000 claims description 2
- 230000008878 coupling Effects 0.000 claims description 2
- 238000010168 coupling process Methods 0.000 claims description 2
- 238000005859 coupling reaction Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 claims description 2
- 238000010845 search algorithm Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This invention provides one rapid audio frequency research method based on time and frequency zone, which has the following properties: using audio signal energy proportion and taking histogram as establishing method and testing the appearance position on aim audio frequency; selecting proper sub band to make the frequency signal with best robustness of noise signal and deformation in statistical means; frequency spectrum distribution according to aim audio and adjusting VQ boundary; using widely histogram match formula; forwarding audio research formula property standard and designing the object evaluation parameters.
Description
Technical field
The present invention relates to multimedia audio searching system technical field.More precisely, a kind of audio fast search method.
Background technology
At present, information industry is just obtaining unprecedented development.Various information mediums have also obtained swift and violent development, such as TV, and broadcasting, network, wireless telecommunications etc.All be flooded with a large amount of information every day in these information mediums.How the attention that just progressively obtains country with the information security that guarantees country is effectively managed and monitored to these information.Based on the responsive Audio Monitoring System of audio frequency time-frequency domain treatment technology in order to satisfy the monitoring requirement of the responsive audio frequency of information security field.
Summary of the invention
The present invention proposes a kind of audio fast search method of robust, this method has strong robustness for distortion such as noises.The most basic feature of the present invention is the time-frequency domain treatment technology at frequency spectrum.By normalized, make proper vector have very strong robustness and the property distinguished to frequency spectrum.Based on the frequency spectrum after handling, set up sub belt energy than histogram, the matching process that utilizes histogram to overlap carries out rapid Estimation to the doubtful position of target audio;
A kind of audio fast search method, fast audio search method have proposed the fast audio search method based on the description of time and frequency zone frequency spectrum.The essential characteristic of this method is to utilize the sound signal sub belt energy to liken to be essential characteristic, and as modeling method, the appearance position of target audio jumped to be detected, thereby has very high search speed with histogram; The essential characteristic of this method, the one, select suitable subband, make the signal of this subband on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment vector quantization border; The 3rd, used for reference widely used histogram matching algorithm in the image recognition.After the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what distortion such as ground unrest interference caused; The 4th, set up the Performance evaluation criterion of audio search algorithm and the objective evaluation parameter of design analysis result for retrieval.Experiment showed, that algorithm that the present invention proposes not only steadily obtaining good retrieval precision and search speed under the ground unrest, also has good robustness to nonstationary noise.
Audio fast search method, this method can be located the target audio fragment of being concerned about from the tested audio stream of magnanimity fast, and process flow diagram the steps include: as shown in Figure 1
1) at first target audio segment and tested audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and tested audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out quantification treatment, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bag chamber bin be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio carries out vector quantization to the proper vector of tested audio stream;
4) histogram of target audio slides along tested audio stream feature, and sets up the histogram of tested audio stream current location, and the histogram of target audio and the histogram of tested audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
The present invention mainly comprises three modules: a feature extraction, two histogrammic foundation are described in detail respectively below three measuring similarities.
Feature extraction.This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bag chamber bin be criterion, proper vector after quantization boundary and the quantification is deposited in the file
Can be expressed as:
Feature(n)=(f(n),g(n)) (5)
f(n)=(f
1(n),f
2(n),f
3(n),..,f
M(n))?(6)
g(n)=(g
1(n),g
2(n),g
3(n),...,g
M(n))(7)
In the formula, n express time, the frequency band number of M representation feature vector
f
i(n)=α(n)×E
i(n) (8)
g
i(n)=β(n)×ECR
i(n) (9)
ECR
i(n)=(E
i(n)-E
i(n-1))/E
i(n-1)?(10)
In the formula, E
i(n) the output frame energy of pairing i the bandpass filter of expression n frame; Because short-time energy is relatively more responsive to high level,, be defined as so the range value that adopts short-time average magnitude to measure sound signal changes:
α (n) is used for each proper vector is carried out normalization, so that eliminate the influence of volume, is defined as:
In the formula, max represents to get maximal value.
In order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio.The vector quantization border is to determine according to the distribution of the sub belt energy ratio of target audio.The selection of quantization boundary has equal characteristic number with each dimensional feature of target audio in each bag chamber bin be criterion.
Histogrammic foundation and measuring similarity.After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, so adopt histogrammic matching process.
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into n subwindow to duration, set up histogram respectively at each subwindow, use h
i RExpression.
Distance metric adopts the overlapping mode of histogram, can be expressed as such as the histogrammic distance constantly of n in target audio histogram and the tested audio stream:
In the formula, h
R: the histogram of target audio, h
i T(n): n is the histogram of tested audio frequency constantly, L: the number in histogram Zhong Bao chamber.
Because similarity and histogrammic sliding position between the histogram have correlativity, can pass through n
1Similarity constantly is to n
2The similarity upper limit is constantly estimated.The coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus calculated amount reduced.Predictor formula is as follows:
In the formula, S
Up: according to of the discreet value of n1 similarity constantly to n2 moment similarity;
So the jumping over step-length and can utilize formulate as follows of each subwindow:
In the formula, w
iExpression jump step-length, P
iThe representation feature number, θ represents the thresholding of appointment, S
iRepresent current similarity, the maximum positive integer less than x is got in floor (x) expression;
Final jump step-length w can use following formula:
Algorithm performance is estimated.The performance evaluation of this algorithm is by the occurrence number of advertisement in the TV programme is verified.If detect position and the actual play position of targeted advertisements differ and be no more than 1 second, we just think that this advertisement correctly detects.Search performance is made up of two indexs: accuracy ξ, recall rate δ and overall accuracy τ.Formulate is as follows:
Description of drawings
Fig. 1 is a quick audio retrieval process flow diagram of the present invention.
Fig. 2 is that audio-frequency fragments is through the short-time energy oscillogram behind the comb filtering.
Fig. 3 is the energy waveform figure of each frequency band after the low-pass filtering.
Fig. 4 is the energy waveform figure of each frequency band after the normalization.
Embodiment
The quick audio retrieval flow process of Fig. 1, this flow process at first utilize the comb filter group that testing audio and reference audio are carried out comb filtering, obtain proper vector through handling; Then reference audio is set up histogram; Utilize the reference audio histogram that testing audio is searched at last.Jump each time all and the current matching similarity of search window have confidential relation.
The audio-frequency fragments of Fig. 2 is through the short-time energy oscillogram behind the comb filtering, and this figure is the subband short-time energy waveform that obtains after audio-frequency fragments is handled through the comb filtering group.The frequency band energy waveform that different color showings is different.
The energy waveform figure of each frequency band after the low-pass filtering of Fig. 3.This figure is the short-time energy curve that obtains behind the subband short-time energy waveform process low pass smoothing filter.
Fig. 4, this figure are to carry out the normalized on the frequency axis direction, the normalization short-time energy curve that finally obtains through the short-time energy curve after the processing of low pass smoothing filter.
Table 1: result for retrieval
Table 1: experimental result relatively
Claims (3)
1. audio fast search method, utilize the sound signal sub belt energy to liken to and be essential characteristic, with histogram as modeling method, the appearance position of target audio jumped detect, the essential characteristic of this method, the one, select suitable subband, make the signal of this subband on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment vector quantization border; The 3rd, used for reference widely used histogram matching algorithm in the image recognition, after the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what the ground unrest distorted due to interference caused; The 4th, set up the Performance evaluation criterion of audio search algorithm and the objective evaluation parameter of design analysis result for retrieval.
2. according to the audio fast search method of claim 1, it is characterized in that this method can be located the target audio fragment of being concerned about fast, the steps include: from the tested audio stream of magnanimity
1) at first target audio segment and tested audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and tested audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out vector quantization and handles, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bag chamber be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio carries out vector quantization to the proper vector of tested audio stream;
4) histogram of target audio slides along tested audio stream feature, and sets up the histogram of tested audio stream current location, and the histogram of target audio and the histogram of tested audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
3. audio fast search method according to claim 2 is characterized in that, feature extraction, and histogrammic foundation and similarity are calculated and are finished by the following step:
1) feature extraction
This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bag chamber be criterion, proper vector after quantization boundary and the quantification is deposited in the file
2) histogrammic foundation and measuring similarity
After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, thus adopt histogrammic matching process,
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into 4 subwindows to duration, set up histogram respectively at each subwindow, use h
i RExpression,
Distance metric adopts the overlapping mode of histogram, and n histogrammic distance constantly can be expressed as in target audio histogram and the tested audio stream:
In the formula, h
i R: target audio histogram, h
i T(n): n is the histogram of tested audio frequency constantly, L: the number in histogram Zhong Bao chamber,
Because similarity and histogrammic sliding position between the histogram have correlativity, pass through n
1Similarity constantly is to n
2The similarity upper limit is constantly estimated, the coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus having reduced calculated amount, predictor formula is as follows:
In the formula, S
Up: according to of the discreet value of n1 similarity constantly to n2 moment similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100863153A CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100863153A CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1924850A CN1924850A (en) | 2007-03-07 |
CN100424692C true CN100424692C (en) | 2008-10-08 |
Family
ID=37817492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100863153A Expired - Fee Related CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100424692C (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103123787B (en) * | 2011-11-21 | 2015-11-18 | 金峰 | A kind of mobile terminal and media sync and mutual method |
CN104505101B (en) * | 2014-12-24 | 2017-11-03 | 北京巴越赤石科技有限公司 | A kind of real-time audio comparison method |
CN110299134B (en) * | 2019-07-01 | 2021-10-26 | 中科软科技股份有限公司 | Audio processing method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1510661A (en) * | 2002-12-23 | 2004-07-07 | ���ǵ�����ʽ���� | Method and apparatus for using time frequency related coding and/or decoding digital audio frequency |
US20050004910A1 (en) * | 2003-07-02 | 2005-01-06 | Trepess David William | Information retrieval |
WO2005010865A2 (en) * | 2003-07-31 | 2005-02-03 | The Registrar, Indian Institute Of Science | Method of music information retrieval and classification using continuity information |
-
2005
- 2005-08-31 CN CNB2005100863153A patent/CN100424692C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1510661A (en) * | 2002-12-23 | 2004-07-07 | ���ǵ�����ʽ���� | Method and apparatus for using time frequency related coding and/or decoding digital audio frequency |
US20050004910A1 (en) * | 2003-07-02 | 2005-01-06 | Trepess David William | Information retrieval |
WO2005010865A2 (en) * | 2003-07-31 | 2005-02-03 | The Registrar, Indian Institute Of Science | Method of music information retrieval and classification using continuity information |
Also Published As
Publication number | Publication date |
---|---|
CN1924850A (en) | 2007-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102760444B (en) | Support vector machine based classification method of base-band time-domain voice-frequency signal | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
US20160322064A1 (en) | Method and apparatus for signal extraction of audio signal | |
CN103646649A (en) | High-efficiency voice detecting method | |
KR20180063282A (en) | Method, apparatus and storage medium for voice detection | |
CN1655229A (en) | Apparatus, method, and medium for detecting and discriminating impact sound | |
CN104916289A (en) | Quick acoustic event detection method under vehicle-driving noise environment | |
CN102097095A (en) | Speech endpoint detecting method and device | |
CN109949823A (en) | A kind of interior abnormal sound recognition methods based on DWPT-MFCC and GMM | |
CN101159834A (en) | Method and system for detecting repeatable video and audio program fragment | |
CN101995437B (en) | Method for extracting features of crack acoustic emission signal of drawing part | |
US20140282664A1 (en) | Methods and apparatus to classify audio | |
CN101133442B (en) | Method of generating a footprint for a useful signal | |
CN110890087A (en) | Voice recognition method and device based on cosine similarity | |
CN100424692C (en) | Audio fast search method | |
CN110767248B (en) | Anti-modulation interference audio fingerprint extraction method | |
CN101594527B (en) | Two-stage method for detecting templates in audio and video streams with high accuracy | |
CN102759572B (en) | A kind of quality determining method of product and pick-up unit | |
CN101858939B (en) | Method and device for detecting harmonic signal | |
CN113782051B (en) | Broadcast effect classification method and system, electronic equipment and storage medium | |
CN102759571B (en) | Product quality test process and test device | |
CN106340310A (en) | Speech detection method and device | |
CN104318931A (en) | Emotional activity obtaining method and apparatus of audio file, and classification method and apparatus of audio file | |
CN116631443B (en) | Infant crying type detection method, device and equipment based on vibration spectrum comparison | |
CN114093385A (en) | Unmanned aerial vehicle detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081008 Termination date: 20180831 |
|
CF01 | Termination of patent right due to non-payment of annual fee |