CN106653061A

CN106653061A - Audio matching tracking device and tracking method thereof based on dictionary classification

Info

Publication number: CN106653061A
Application number: CN201610967738.4A
Authority: CN
Inventors: 胡瑞敏; 姜林; 胡霞; 王晓晨; 江游
Original assignee: Shenzhen Research Institute of Wuhan University
Current assignee: Shenzhen Research Institute of Wuhan University
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2017-05-10

Abstract

The invention discloses an audio matching tracking device based on dictionary classification, which comprises a signal decomposing unit and a signal reconstructing unit, wherein the signal decomposing unit comprises a dictionary building module, a signal classification module, a weight comparison module, a residual calculation module and a threshold control module; and the signal reconstructing unit comprises a reconstruction coefficient extraction module and a signal synthesis module. The invention also discloses a tracking method for the tracking device. through classifying the signals, different sparse dictionaries are adopted for different types of signals for MP algorithm, the irrelevant traversal times are reduced, and the calculation complexity is reduced; during a classification pre-processing process, an adapted sparse dictionary is judged through calculating the energy distribution interval of original signals; and the method reduces the dimension of the needed dictionary, the coding rate is improved, and the use effects are good.

Description

A kind of Audio Matching follow-up mechanism classified based on dictionary and its method for tracing

Technical field

The present invention relates to audio coding field, specifically a kind of Audio Matching follow-up mechanism classified based on dictionary.

Background technology

Rarefaction representation refers generally to represent primary signal exactly with as far as possible few basic function, so as to catch the main of signal Feature, and then inherently reduce signal transacting cost.Match tracing (MP：Matching pursuit) as using wider One of rarefaction representation algorithm, its basic ideas are to select from excessively complete dictionary optimum atom successively in an iterative process so that Approaching for signal more optimizes.Because MP algorithms are used for representing that the excessively complete dictionary base of signal can be with self adaptation according to signal itself The characteristics of neatly choosing；And what it took in atom selection course is the greedy algorithm that a kind of iteration is approached, Ensure that the atomic number for finally giving is less, MP algorithms are widely used in the every field of signal analysis, such as image Process, processing of biomedical signals, audio frequency process etc..

With the raising of people's streamed media quality requirement and being continuously increased for number of users of mobile terminal amount, audio frequency and video are compiled The requirement of code efficiency is also day by day improved.Traditional matching pursuit algorithm is higher because of its computation complexity, is not suitable for real-time processing.Mesh Before have pointed out various Rapid matching tracing algorithms, but be directed to time-consuming optimization greatly, or sacrifice rarefaction representation efficiency for compensation, Calculating speed is also difficult to meet the needs of extensive problem.

The content of the invention

It is an object of the invention to provide a kind of Audio Matching follow-up mechanism classified based on dictionary, to solve above-mentioned background The problem proposed in technology.

For achieving the above object, the present invention provides following technical scheme：

A kind of Audio Matching follow-up mechanism classified based on dictionary, including signal decomposition unit and signal reconstruction unit, letter Number resolving cell sets up module, Modulation recognition module, weights comparison module, residual computations module and threshold value control mould including dictionary Block, signal reconstruction unit includes reconstruction coefficients extraction module and signal synthesizing module.

The method for tracing of the Audio Matching follow-up mechanism classified based on dictionary, including signal decomposition method and signal weight Structure method；The signal decomposition method is comprised the following steps that：

Step one, according to different disposal signal type corresponding sparse dictionary is selected；

Step 2, judges the type of pending signal, according to the sparse dictionary that its type selecting is adapted therewith, to original Signal carries out classification pretreatment, the matching degree of the sparse dictionary set up in calculating it with step one；

Step 3, by the sparse dictionary drawn in step 2 D is designated as, to seek each dictionary in each atom on pending signal Weight coefficient, the atom in sparse dictionary is done into successively inner product with pending signal, calculate the maximum of inner product absolute value；

Step 4, by step 3 the component of pending signal maximum atom in dictionary can be obtained, then this time believed after iteration Number residual error is the vector difference of signal and the component, while recording the sparse coefficient of renewal in sparse coefficient vector；

Step 5, matching pursuit algorithm process signal be by add up iteration, by primary signal be expressed as weight with it is corresponding The superposition that atom is multiplied and residual error sum, iteration can be terminated when signal residual error is reduced to certain value, and the certain value can be by repeatedly Generation number and signal to noise ratio are together decided on；

The signal reconfiguring method is comprised the following steps that：

Step one, extracts atom weight, atom label and used that reconstruction signal will be used from sparse coefficient code stream Dictionary type label；

Step 2, when determining according to dictionary type label and encoding sparse dictionary type is adopted, by gained atom in step one Atom in the corresponding sparse dictionary of weight does product, and adds up successively, obtain output signal for this matching pursuit algorithm most Eventually to the fitted signal of primary signal.

As further scheme of the invention：Step 2 calculates pending signal with institute in step one in signal decomposition method The concrete calculation procedure of the matching degree of the sparse dictionary of foundation is as follows：The corresponding frequency domain value of pending signal is calculated, will be pending The time-domain value of signal is divided into j length after normalizing respectively with frequency domain value be the segment (a≤N/2) of a, then calculates each segment Energy value, the energy value of a continuous sample can approximate calculation formula it is as follows：Time-frequency domain energy is calculated respectively The twice of the sum of amount, and compare its size,

As further scheme of the invention：The meter of the inner product maximum absolute value value in signal decomposition method described in step 3 Calculating formula is：i_opt∈ [1, M] is atom label in dictionary,For each atom and S Inner product value,The weight limit coefficient for being atom in dictionary on S.

As further scheme of the invention：Signal to noise ratio reduced mechanical model in signal decomposition method described in step 5 For：S is the original signal amplitude value of rarefaction representation before processing, and S ' is this sparse table Show the signal amplitude value after recovering.

Compared with prior art, the invention has the beneficial effects as follows：The present invention by classifying to signal, to different type Signal take different sparse dictionaries to carry out MP algorithms, reduce unrelated traversal number of times, reduce computation complexity；In classification In preprocessing process, by the sparse dictionary for calculating its adaptation of the Energy distribution interval judgement of primary signal；This method is reduced The dimension of required dictionary, improves code rate, and using effect is good.

Description of the drawings

Fig. 1 is the flow chart of signal decomposition method in the Audio Matching follow-up mechanism classified based on dictionary.

Fig. 2 is the flow chart of signal reconfiguring method in the Audio Matching follow-up mechanism classified based on dictionary.

Fig. 3 is the structural representation of signal decomposition unit in the Audio Matching follow-up mechanism classified based on dictionary.

Fig. 4 is the structural representation of signal reconstruction unit in the Audio Matching follow-up mechanism classified based on dictionary.

Wherein：101- dictionaries set up module, 102- Modulation recognition modules, 103- weights comparison modules, 104- residual computations Module, 105- threshold control blocks, 201- reconstruction coefficients extraction modules, 202- signal synthesizing modules.

Specific embodiment

The technical scheme of this patent is described in more detail with reference to specific embodiment.

Refer to Fig. 1-4, a kind of Audio Matching follow-up mechanism classified based on dictionary, including signal decomposition unit and signal Reconfiguration unit, signal decomposition unit sets up module 101, Modulation recognition module 102, weights comparison module 103, residual error including dictionary Computing module 104 and threshold control block 105, signal reconstruction unit includes reconstruction coefficients extraction module 201 and signal synthesis mould Block 202.

Dictionary set up module 101 for build be applied to signal with different type sparse dictionary.For example for speech processes System, selects the dictionary with characteristics of speech sounds；For transient signal processing system, the dictionary of relative snapshot is selected；And for one A little features substantially or while needing to process the system of polytype signal for, then select the stronger word of universality Allusion quotation.Modulation recognition module 102 is used to judge the type of pending signal, according to its type selecting sparse word adaptable therewith Allusion quotation, to primary signal classification pretreatment is carried out, and calculates its matching degree that the sparse dictionary set up in module is set up with dictionary.Tool Body calculation procedure is as follows：The corresponding frequency domain value of pending signal is calculated first, by the time-domain value of pending signal and frequency domain value point Not Gui Yihua after be divided into a certain number of fixed length segment, then the energy value for calculating each segment；Due to signal amplitude square it And there is class proportional relation with signal amplitude sum, while the amount of calculation for calculating signal amplitude sum far smaller than calculates signal Amplitude square sum, therefore come approximate with signal amplitude sum.Calculate the sum of time-frequency domain energy respectively again, and compare its size, Energy and bigger, signal energy distribution is more intensive.Assume time domain energy and sum more than frequency domain energy, selection time domain energy compared with For the Gabor dictionaries concentrated, conversely, selecting the cosine dictionary that frequency domain energy is more concentrated, sparse dictionary type is recorded.Weights ratio It is used to seek weight coefficient of each atom on pending signal compared with module 103, the sparse word drawn from Modulation recognition module 102 Allusion quotation, by the atom in sparse dictionary inner product is done successively with pending signal, calculates the maximum of inner product absolute value.Residual computations mould Block 104 is used to calculate the residual values of pending signal and gained atomic orientation component in weights comparison module, due to atomic orientation Component is equal to the maximum of weights absolute value and the product of the atom, and the residual values are equal to the vector of signal and atomic orientation component Difference, while the sparse coefficient for updating is recorded in sparse coefficient vector, atom label is somebody's turn to do corresponding in sparse coefficient vector Sparse coefficient position, the value of the position is atom weight.Threshold control block 105 is used for using control signal to noise ratio (SNR) Threshold value supervises the precision that circulation is matched jointly with iterations, and matching pursuit algorithm process signal is by adding up iteration, by original Beginning signal is expressed as superposition and the residual error sum that weight is multiplied with corresponding atom, can be whole when signal residual error is reduced to certain value Only iteration, the certain value can be together decided on by iterations, signal to noise ratio (SNR), when reaching target SNR or default iterations When terminate loop iteration, export sparse coefficient vector, on the contrary repeat pretreatment module and terminate until meeting to residual computations module Condition, because amplitude square sum and the signal amplitude sum of signal have class proportional relation, while calculating signal amplitude sum Amount of calculation far smaller than calculate the amplitude square sum of signal, therefore signal to noise ratio can approximate calculation be that primary signal is believed with residual error The logarithm of the ratio of number amplitude absolute value.Reconstruction coefficients extraction module 201 will for extracting reconstruction signal in sparse coefficient code stream Using the atom weight (sparse coefficient), atom label (corresponding to the sparse coefficient position in sparse coefficient vector) for arriving and Dictionary type label used.Signal synthesizing module 202 is used for composite signal, using what is obtained in reconstruction coefficients extraction module 201 Dictionary type label adopts sparse dictionary type when determining coding, and the atom in the corresponding sparse dictionary of atom weight is taken advantage of Product, and adds up successively, and it is this matching pursuit algorithm finally to the fitted signal of primary signal to obtain output signal

Step one, according to different disposal signal type corresponding sparse dictionary is selected, such as speech processing system, choosing Select the dictionary with characteristics of speech sounds；For transient signal processing system, the dictionary of relative snapshot is selected；And for some features simultaneously Substantially or while needing to process the system of polytype signal for, then select the stronger dictionary of universality, it is conventional It is the Gabor dictionaries that time domain energy is more concentrated and the cosine dictionary that frequency domain energy is more concentrated；Gabor dictionaries：Wherein, w represents dimensions in frequency；μ represents time offset； σ represents time scale；λ represents nuclear energy.The excursion for assuming time scale is 1 to N, and N is the length of pending signal. Dimensions in frequency w, time scale σ, nuclear energy λ have M₁Individual combining form, then dictionary size is M₁×N；Cosine dictionary：Wherein, m represents analysis window rope Draw, u represents frequency constriction coefficient, k represents dimensions in frequency, L₀Represent initial step length, that is, the half of first analysis window length. The excursion for assuming dimensions in frequency is 1 to N, analysis window index m, the total M of frequency constriction coefficient u₂Individual combining form, then dictionary Size is M₂×N。

Step 2, judges the type of pending signal, according to the sparse dictionary that its type selecting is adapted therewith, to original Signal carries out classification pretreatment, the matching degree of the sparse dictionary set up in calculating it with step one, the concrete calculating of matching degree Step is as follows：The corresponding frequency domain value of pending signal is calculated, after the time-domain value of pending signal and frequency domain value are normalized respectively It is the segment (a≤N/2) of a to be divided into j length, then calculates the energy value of each segment, due to signal amplitude square sum with There is class proportional relation in signal amplitude sum, while the amount of calculation for calculating signal amplitude sum far smaller than calculates the amplitude of signal Square sum, the energy value of a continuous sample can approximate calculation formula it is as follows：When calculating respectively

The twice of the sum of frequency domain energy, and compare its size, energy and bigger, signal energy distribution is more intensive,Assume time domain energy and more than frequency domain energy and Energy_t,j＞ Energy_f,j, choosing The Gabor dictionaries of time domain energy concentration are selected, otherwise selects the cosine dictionary of frequency domain energy concentration, record sparse dictionary type；

Step 3, by the sparse dictionary drawn in step 2 D is designated as, to seek each dictionary in each atom on pending signal Weight coefficient, the atom in sparse dictionary is done into successively inner product with pending signal, calculate the maximum of inner product absolute value, it is interior The computing formula of product maximum absolute value value is：i_opt∈ [1, M] is atom in dictionary Label,For the inner product value of each atom and S,The weight limit coefficient for being atom in dictionary on S；

Step 4, by step 3 the component of pending signal maximum atom in dictionary can be obtained, then this time believed after iteration Number residual error is the vector difference of signal and the component, while recording the sparse coefficient of renewal in sparse coefficient vector, S is in dictionary Component at middle maximum atom isThen understand that signal residual error is after this iteration：

Record simultaneously and update sparse coefficient：

Wherein,For sparse coefficient vector, it is in place that atom label corresponds to sparse coefficient institute in sparse coefficient vector Put, the value of the position is atom weight

Step 5, matching pursuit algorithm process signal be by add up iteration, by primary signal be expressed as weight with it is corresponding The superposition that atom is multiplied and residual error sum, signal residual error is S '_later, as S '_laterReduce to iteration can be terminated during certain value, The certain value can be together decided on by iterations, signal to noise ratio (SNR), be terminated when target SNR or default iterations is reached Loop iteration, exports sparse coefficient vectorThe sparse dictionary type label obtained with step one, on the contrary repeat step two to Step 5 is until meet end condition, signal to noise ratio is defined as follows：

Because there is class with signal amplitude sum in the amplitude square sum of signal Proportional relation, while the amount of calculation for calculating signal amplitude sum far smaller than calculates the amplitude square sum of signal, therefore noise It is than reduced mechanical model：Wherein, S is the primary signal width of rarefaction representation before processing

Angle value, S ' is the signal amplitude value after this rarefaction representation recovery；

The signal reconfiguring method is comprised the following steps that：

Step 2, when determining according to dictionary type label and encoding sparse dictionary type is adopted, by gained atom in step one WeightAtom g in corresponding sparse dictionary_iProduct is done, and is added up successively, obtain S_outFor this matching pursuit algorithm most Eventually to the fitted signal of primary signal S,

Wherein, k is sparse coefficient number.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of spirit or essential attributes without departing substantially from the present invention, the present invention can be in other specific forms realized.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as and limit involved claim.

Moreover, it will be appreciated that although this specification is been described by according to embodiment, not each embodiment is only wrapped Containing an independent technical scheme, this narrating mode of specification is only that for clarity those skilled in the art should Using specification as an entirety, the technical scheme in each embodiment can also Jing it is appropriately combined, form those skilled in the art Understandable other embodiment.

Claims

1. it is a kind of based on dictionary classify Audio Matching follow-up mechanism, it is characterised in that including signal decomposition unit and signal weight Structure unit, signal decomposition unit including dictionary set up module, Modulation recognition module, weights comparison module, residual computations module and Threshold control block, signal reconstruction unit includes reconstruction coefficients extraction module and signal synthesizing module.

2. a kind of method for tracing of the Audio Matching follow-up mechanism based on dictionary classification as claimed in claim 1, its feature exists In, including signal decomposition method and signal reconfiguring method；The signal decomposition method is comprised the following steps that：

Step 2, judges the type of pending signal, according to the sparse dictionary that its type selecting is adapted therewith, to primary signal Classification pretreatment is carried out, the matching degree of the sparse dictionary set up in calculating it with step one；

Step 3, by the sparse dictionary drawn in step 2 D is designated as, to seek each dictionary in power of each atom on pending signal Weight coefficient, by the atom in sparse dictionary inner product is done successively with pending signal, calculates the maximum of inner product absolute value；

Step 4, by step 3 the component of pending signal maximum atom in dictionary can be obtained, then this time signal is residual after iteration Difference is signal and the vector difference of the component, while recording the sparse coefficient of renewal in sparse coefficient vector；

Step 5, matching pursuit algorithm process signal is by adding up iteration, primary signal being expressed as into weight with corresponding atom The superposition of multiplication and residual error sum, iteration can be terminated when signal residual error is reduced to certain value, and the certain value can be by iteration time Number and signal to noise ratio are together decided on；

The signal reconfiguring method is comprised the following steps that：

Step one, extracts atom weight, atom label and dictionary used that reconstruction signal will be used from sparse coefficient code stream Type label；

Step 2, when determining according to dictionary type label and encoding sparse dictionary type is adopted, by gained atom weight in step one Atom in corresponding sparse dictionary does product, and adds up successively, and it is that this matching pursuit algorithm is finally right to obtain output signal The fitted signal of primary signal.

3. it is according to claim 2 based on dictionary classify Audio Matching follow-up mechanism method for tracing, it is characterised in that Step 2 calculates the concrete meter of the matching degree of the sparse dictionary set up in pending signal and step one in signal decomposition method Calculate step as follows：The corresponding frequency domain value of pending signal is calculated, the time-domain value of pending signal and frequency domain value are normalized respectively It is the segment (a≤N/2) of a to be divided into j length afterwards, then calculates the energy value of each segment, and the energy value of a continuous sample can be near It is as follows like calculating formula：The twice of the sum of time-frequency domain energy is calculated respectively, and compares its size,

4. it is according to claim 2 based on dictionary classify Audio Matching follow-up mechanism method for tracing, it is characterised in that The computing formula of the inner product maximum absolute value value in signal decomposition method described in step 3 is： i_opt∈ [1, M] is atom label in dictionary,For the inner product value of each atom and S,For maximum of the atom in dictionary on S Weight coefficient.

5. it is according to claim 2 based on dictionary classify Audio Matching follow-up mechanism method for tracing, it is characterised in that Signal to noise ratio reduced mechanical model in signal decomposition method described in step 5 is：S For the original signal amplitude value of rarefaction representation before processing, S ' is the signal amplitude value after this rarefaction representation recovers.