CN106782550A

CN106782550A - A kind of automatic speech recognition system based on dsp chip

Info

Publication number: CN106782550A
Application number: CN201611064684.7A
Authority: CN
Inventors: 田丽
Original assignee: Heilongjiang Bayi Agricultural University
Current assignee: Heilongjiang Bayi Agricultural University
Priority date: 2016-11-28
Filing date: 2016-11-28
Publication date: 2017-05-31

Abstract

The invention discloses a kind of automatic speech recognition system based on dsp chip, including voice signal acquisition device, wavelet filter, speech signal pre-processing module, speech recognition module, neural network module, Pattern Matching Module, speech recognition output module and dsp chip, the voice signal acquisition device, wavelet filter, speech signal pre-processing module, speech recognition module, Pattern Matching Module and speech recognition output module are sequentially connected, and speech recognition module and Pattern Matching Module are connected with neural network module.The present invention is based on Speech processing, small echo and neural network theory and method, have studied the Dynamic Recognition of voice signal, and small echo and neural network theory are applied into speech recognition with method, automatically voice can be identified, simple structure, easy to use, low cost.

Description

A kind of automatic speech recognition system based on dsp chip

Technical field

The present invention relates to technical field of voice recognition, more particularly to a kind of automatic speech recognition system based on dsp chip.

Background technology

Automatic speech recognition is always the ideal that the mankind pursue, and is also the direction that immediate and mid-term scientific research personnel seek assiduously； Its final goal is to allow machine to understand the language of the mankind, and performs corresponding function；Although over 50 years, people is in field of speech recognition Considerable progress is achieved, but we can be clearly seen that, also have very big gap apart from preferable target；With computer Fast development, speech recognition develops into a poor interdisciplinary study extensively by increasingly in-depth study；It with acoustics, The tight phase in the subjects such as linguistics, psychology, signal transacting, artificial intelligence, pattern-recognition, information theory and computer field Even；It shows huge application prospect in many fields, and many high performance speech recognition systems are also come out one after another； Meanwhile, man-machine interaction is made by way of natural language, have far-reaching significance, be widely applied prospect and application field；It is first First, the intelligent sound input based on mode identification technology, can bring revolutionary impact to office automation；Secondly, voice Identification technology will greatly reduce the cumbersome and work of dullness in the extensive use in service industry field, save substantial amounts of manpower, carry High workload efficiency；Again, speech recognition can also embody its powerful advantage on dangerous, severe working environment and battlefield；Cause This, the research work of speech recognition for improving people's living standard, strengthen the various aspects such as national defense construction suffer from it is far-reaching Meaning.

The content of the invention

Based on the technical problem that background technology is present, the present invention proposes a kind of automatic speech recognition based on dsp chip System.

A kind of automatic speech recognition system based on dsp chip proposed by the present invention, including it is voice signal acquisition device, small Wave filter, speech signal pre-processing module, speech recognition module, neural network module, Pattern Matching Module, language Sound recognizes output module and dsp chip, the voice signal acquisition device, wavelet filter, speech signal pre-processing module, language Message characteristic extracting module, Pattern Matching Module and speech recognition output module are sequentially connected, and speech recognition Module and Pattern Matching Module are connected with neural network module, the voice signal acquisition device, wavelet filter, voice letter Number pretreatment module, speech recognition module, neural network module, Pattern Matching Module and speech recognition output module It is connected with dsp chip.

Preferably, the speech signal pre-processing module includes pre-emphasis unit, windowing unit and end-point detection unit, institute State pre-emphasis unit, windowing unit and end-point detection unit to be sequentially connected, pre-emphasis unit is connected with wavelet filter, and end points Detection unit is connected with speech recognition module, and pre-emphasis unit is preaccentuator.

Preferably, the neural network module is including training unit, modeling unit and infers unit, the training unit, Modeling unit and deduction unit are sequentially connected, and training unit is connected with speech recognition module, and infers unit and mould Formula matching module is connected.

Preferably, the wavelet filter is used to choose the useful information of voice signal, and suppresses irrelevant information to knowing Not produced interference, speech signal pre-processing module is used to remove the voice signal of non-speech segment, speech recognition Module is used to for pretreated voice signal to extract effective argument sequence for neural network module and Pattern Matching Module Use.

In the present invention, the automatic speech recognition system that should be based on dsp chip can choose voice letter by wavelet filter Number useful information, and suppress irrelevant information to the interference produced by identification, can be gone by speech signal pre-processing module Except the voice signal of non-speech segment, can be to pretreated voice signal by time domain by speech recognition module And frequency-domain analysis, extract effective argument sequence and used for neural network module and Pattern Matching Module, by neutral net Module can summarize the rule of speech recognition, the voice signal of input can be carried out according to rule by Pattern Matching Module Match somebody with somebody, reach the purpose of identification, the present invention is based on Speech processing, small echo and neural network theory and method, have studied language The Dynamic Recognition of message number, speech recognition is applied to by small echo and neural network theory with method, and voice can be carried out automatically Identification, simple structure is easy to use, low cost.

Brief description of the drawings

Fig. 1 is a kind of structural representation of automatic speech recognition system based on dsp chip proposed by the present invention.

Specific embodiment

The present invention is made with reference to specific embodiment further explain.

Embodiment

With reference to Fig. 1, the present embodiment proposes a kind of automatic speech recognition system based on dsp chip, including voice signal Acquisition device, wavelet filter, speech signal pre-processing module, speech recognition module, neural network module, pattern Matching module, speech recognition output module and dsp chip, voice signal acquisition device, wavelet filter, speech signal pre-processing Module, speech recognition module, Pattern Matching Module and speech recognition output module are sequentially connected, and voice signal is special Levy extraction module and Pattern Matching Module to be connected with neural network module, voice signal acquisition device, wavelet filter, voice Signal pre-processing module, speech recognition module, neural network module, Pattern Matching Module and speech recognition output mould Block is connected with dsp chip, and the automatic speech recognition system that should be based on dsp chip can choose voice letter by wavelet filter Number useful information, and suppress irrelevant information to the interference produced by identification, can be gone by speech signal pre-processing module Except the voice signal of non-speech segment, can be to pretreated voice signal by time domain by speech recognition module And frequency-domain analysis, extract effective argument sequence and used for neural network module and Pattern Matching Module, by neutral net Module can summarize the rule of speech recognition, the voice signal of input can be carried out according to rule by Pattern Matching Module Match somebody with somebody, reach the purpose of identification, the present invention is based on Speech processing, small echo and neural network theory and method, have studied language The Dynamic Recognition of message number, speech recognition is applied to by small echo and neural network theory with method, and voice can be carried out automatically Identification, simple structure is easy to use, low cost.

In the present embodiment, speech signal pre-processing module includes pre-emphasis unit, windowing unit and end-point detection unit, in advance Weighting unit, windowing unit and end-point detection unit are sequentially connected, and pre-emphasis unit is connected with wavelet filter, and end-point detection Unit is connected with speech recognition module, and pre-emphasis unit is preaccentuator, neural network module include training unit, Modeling unit and infer unit, training unit, modeling unit and infer that unit is sequentially connected, training unit and phonic signal character Extraction module is connected, and infers that unit is connected with Pattern Matching Module, and wavelet filter is used to choose the useful letter of voice signal Breath, and suppress irrelevant information to the interference produced by identification, speech signal pre-processing module is used to remove the language of non-speech segment Message number, speech recognition module is used to for pretreated voice signal to extract effective argument sequence for nerve Mixed-media network modules mixed-media and Pattern Matching Module are used, and the automatic speech recognition system that should be based on dsp chip can by wavelet filter The useful information of voice signal is chosen, and suppresses irrelevant information to the interference produced by identification, by speech signal pre-processing Module can remove the voice signal of non-speech segment, and pretreated voice can be believed by speech recognition module Number by time and frequency domain analysis, extract effective argument sequence and used for neural network module and Pattern Matching Module, lead to Crossing neural network module can summarize the rule of speech recognition, by Pattern Matching Module be capable of will be input into voice signal according to Rule is matched, and reaches the purpose of identification, and the present invention is based on Speech processing, small echo and neural network theory and side Method, have studied the Dynamic Recognition of voice signal, and small echo and neural network theory are applied into speech recognition with method, can be automatic Voice is identified, simple structure is easy to use, low cost.

In the present embodiment, voice signal acquisition device obtains voice signal, is then transmit to wavelet filter, wavelet filtering Device chooses the useful information of voice signal, and suppresses irrelevant information to the interference produced by identification, then passes voice signal Speech signal pre-processing module is transported to, the effect of pre-emphasis unit is, by high boost, to be produced when lip is radiated to make up sound Raw high frequency loss；, by digitized voice signal s (n) by a low-order digit system, this digital display circuit can be for it Fixed, or slow self adaptation；Preaccentuator uses the first-order system of most widely used fixation, and its transmission function is such as Under：

Here output s ' (n) of preemphasis is related by the input of following difierence equation to system：

The conventional window function of windowing unit has rectangular window, Hamming window and Hanning window etc., due to Hamming in actual application The frequency characteristic of window is more suitable for the analysis of voice signal, so the system is weighted using Hamming window to signal, Hamming window Function formula it is as follows：

Its frequency characteristic is：

End-point detection unit：Several seconds voice to collecting and recording must make end-point detection to distinguish sound section and unvoiced segments, can The foundation for realizing end points judgement be voice of different nature various parameters in short-term have different probability density function and Adjacent some frame voices should have consistent characteristics of speech sounds；Then speech signal pre-processing module believes pretreated voice Number transmit to speech recognition module, speech recognition module is fallen by linear predictor coefficient and linear prediction Spectral coefficient carries out feature extraction, and linear predictor coefficient is the linear prediction of voice, and its basic thought is：Each of voice signal takes Sample value, can be represented with the weighted sum linear combination of its past several sampling value；The determination principle of each weight coefficient is Make the mean-square value of predicated error minimum.

If be predicted using p sampling value of past, referred to as the linear prediction of p ranks；If with p sampling value { x of past (n-1), x (n-2) ..., x (n-p) } weighting carry out prediction signal current sample value, then predicted value has x (n)：

Wherein, weight coefficient use-a_plRepresent, referred to as predictive coefficient；Predicated error is：

Make predictive coefficient optimal, even if

ε=E [e²(n)]=min

Predictive coefficient can be solved by Durbin recursive algorithms, be comprised the following steps that：Iterative calculation is to be opened from p=0 from zeroth order Begin；Zeroth order prediction does not give a forecast, and at this moment predicts that multinomial is

A₀(z)=1

Predicated error is

e₀(n)=x (n)

Predicated error power is

This is the primary condition of iterative calculation；Iterative step is as follows：

1. initialize

2. the parameter of known p rank fallout predictors, i.e., known A are assumed_P(z) and ε_p；

3. the reflectance factor of p+1 rank fallout predictors is calculated：

4. the predictive coefficient of p+1 rank fallout predictors is calculated：

The prediction multinomial of corresponding p+1 ranks fallout predictor is：

A_p+1(z)=A_p(z)-γ_p+1z^-(p+1)A_p(z^-1)

5. p+1 rank predicated error power is calculated：

6. the is returned 2. to walk.

After calculating terminates, following three classes result is obtained：The predictive coefficient of each rank fallout predictor；The reflection system of each rank fallout predictor Number；Each rank predicated error power.

Linear prediction residue error：Because voice signal has short-term stationarity, therefore characteristics of speech sounds also can use short-time spectrum Represent, cepstrum is conventional one kind；Cepstrum is that inverse Fourier of the signal after Fourier transform gained power spectrum is taken the logarithm becomes Change；Can be separated for recurrent pulse and sound channel by it, that is, obtain channel parameters；Cepstrum coefficient can directly be tried to achieve by the definition of cepstrum, Also can be obtained by LPC coefficient recursion；Compared with directly cepstrum coefficient is calculated, the amount of calculation drug effect of LPCCEP, therefore the system is used LPC cepstrum coefficients；There is a kind of very simple and effective recursive algorithm in the cepstrum based on lpc analysis：

In formula, C_mIt is cepstrum coefficient, a_pIt is predictive coefficient, m is the exponent number (m=1-Q) of cepstrum coefficient, and p is predictive coefficient Exponent number；The phonic signal character of extraction is transmitted to neural network module and carries out speech recognition regularity summarization, while the voice for extracting Signal characteristic is transmitted to Pattern Matching Module, and the speech recognition rule that Pattern Matching Module is summarized according to neural network module is to defeated The voice signal for entering carries out match cognization.

The above, the only present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto, Any one skilled in the art the invention discloses technical scope in, technology according to the present invention scheme and its Inventive concept is subject to equivalent or change, should all be included within the scope of the present invention.

Claims

1. a kind of automatic speech recognition system based on dsp chip, including voice signal acquisition device, wavelet filter, voice Signal pre-processing module, speech recognition module, neural network module, Pattern Matching Module, speech recognition output mould Block and dsp chip, it is characterised in that the voice signal acquisition device, wavelet filter, speech signal pre-processing module, language Message characteristic extracting module, Pattern Matching Module and speech recognition output module are sequentially connected, and speech recognition Module and Pattern Matching Module are connected with neural network module, the voice signal acquisition device, wavelet filter, voice letter Number pretreatment module, speech recognition module, neural network module, Pattern Matching Module and speech recognition output module It is connected with dsp chip.

2. a kind of automatic speech recognition system based on dsp chip according to claim 1, it is characterised in that institute's predicate Sound signal pre-processing module includes pre-emphasis unit, windowing unit and end-point detection unit, the pre-emphasis unit, windowing unit It is sequentially connected with end-point detection unit, pre-emphasis unit is connected with wavelet filter, and end-point detection unit is special with voice signal Extraction module connection is levied, pre-emphasis unit is preaccentuator.

3. a kind of automatic speech recognition system based on dsp chip according to claim 1, it is characterised in that the god Include training unit, modeling unit through mixed-media network modules mixed-media and infer unit, the training unit, modeling unit and deduction unit are successively Connection, training unit is connected with speech recognition module, and infers that unit is connected with Pattern Matching Module.

4. a kind of automatic speech recognition system based on dsp chip according to claim 1, it is characterised in that described small Wave filter is used to choose the useful information of voice signal, and suppresses irrelevant information to the interference produced by identification, voice letter Number pretreatment module is used to remove the voice signal of non-speech segment, and speech recognition module is used for pretreated language Message number extracts effective argument sequence and is used for neural network module and Pattern Matching Module.