CN106782510B - Place name voice signal recognition method based on continuous Gaussian mixture HMM model - Google Patents

Place name voice signal recognition method based on continuous Gaussian mixture HMM model Download PDF

Info

Publication number
CN106782510B
CN106782510B CN201611177818.6A CN201611177818A CN106782510B CN 106782510 B CN106782510 B CN 106782510B CN 201611177818 A CN201611177818 A CN 201611177818A CN 106782510 B CN106782510 B CN 106782510B
Authority
CN
China
Prior art keywords
place name
probability
state
model
name voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611177818.6A
Other languages
Chinese (zh)
Other versions
CN106782510A (en
Inventor
蔡熙
聂腾云
赖雪军
谢巍
车松勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunda Hi Tech Co ltd
Original Assignee
Shanghai Yunda Freight Co ltd
Suzhou Jinfeng Iot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunda Freight Co ltd, Suzhou Jinfeng Iot Technology Co ltd filed Critical Shanghai Yunda Freight Co ltd
Priority to CN201611177818.6A priority Critical patent/CN106782510B/en
Publication of CN106782510A publication Critical patent/CN106782510A/en
Application granted granted Critical
Publication of CN106782510B publication Critical patent/CN106782510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a place name speech signal recognition method based on a continuous mixture Gaussian HMM model, wherein the training process of the continuous mixture Gaussian HMM model is as follows: defining an HMM model and initializing; substituting the feature matrix of the place name voice signals into the model for training; solving the probability of occurrence of the place name voice signals according to the model parameters; comparing the probability with the output probability before training, and judging whether the relative error meets the output condition; according to the place name, outputting an HMM model corresponding to the place name voice signal; if not, judging whether the training times reach the highest training threshold value; if the HMM model does not arrive, training again to reach, and outputting the HMM model; and substituting the feature matrixes of the place name voice signals into the models to obtain a plurality of HMM models corresponding to different place names to form a place name voice recognition model library. The invention can obtain the HMM model and the place name voice recognition model library suitable for the place name voice recognition of the isolated word, and creates conditions for accurately performing the place name voice recognition.

Description

Place name voice signal recognition method based on continuous Gaussian mixture HMM model
Technical Field
The invention relates to a place name voice signal recognition method, in particular to a place name voice signal recognition method based on a continuous Gaussian mixture HMM model.
Background
With the rapid development of economy and the increasingly prominent globalization trend, the modern epidemic has been unprecedentedly developed in developed countries and has generated huge economic and social benefits, and logistics resources such as transportation, storage, sorting, packaging, distribution and the like are distributed in a plurality of fields including manufacturing industry, agriculture, circulation industry and the like.
In the sorting link, the manual work is basically sorted in the present stage, workers are in a noisy working environment for a long time, certain fatigue is inevitably generated in mind and body, the working state of the workers is too relaxed due to the unicity and the repeatability of a working task, the sorting accuracy is inevitably reduced, more irrecoverable sorting error accidents are caused, and the mode of manually detecting the product sorting on a production line in the industrial field cannot meet the requirements of modern industry.
The speech recognition develops to the present, the life of people is changed in many aspects as an important interface for human-computer interaction, and the speech recognition system brings much convenience to people from a speech control system of an intelligent home to a vehicle-mounted speech recognition system, so that the integration of the speech recognition technology and the logistics sorting link is an inevitable requirement for the development of the logistics industry.
One of the keys of the combination of the logistics sorting link and the voice recognition technology is how to effectively realize accurate recognition of place name voice signals, so that technical support is provided for automatically and accurately classifying various articles to set places, and currently, related technologies for performing voice recognition on place names of isolated words are rarely seen, so that research and development of place name voice recognition technology are urgently needed.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a place name speech signal recognition method based on a continuous Gaussian mixture HMM model.
The purpose of the invention is realized by the following technical scheme:
the place name voice signal recognition method based on the continuous Gaussian mixture HMM model comprises a training process of the continuous Gaussian mixture HMM model and a place name voice recognition process, wherein the training process of the continuous Gaussian mixture HMM model comprises the following steps:
s1, defining a continuous gaussian mixture HMM model comprising the following parameters, λ ═ (N, M, a, pi, B), wherein:
n, the number of model states is 4;
m, the number of Gaussian functions corresponding to each state, each state comprises 3 39-dimensional Gaussian functions, and the number of the Gaussian functions of each state in N states in one model is the same;
a, state transition probability matrix, a ═ aij},aij=P[qt+1=j/qt=i]1 ≦ i, j ≦ N, where qt ═ i denotes the state i at time t, q (t +1) ═ j denotes the state j at time t +1, and overall denotes the probability of transitioning from state i to state j;
pi, initial probability distribution of each state, pi ═ pit,πt=P[qi=i]I is more than or equal to 1 and less than or equal to N, wherein pi is pitThe probability of starting from the state i is shown, and the subscript i represents the starting probability corresponding to each state;
b, output probability density function, B ═ Bj(o)},
Figure GDA0002423261650000021
J is more than or equal to 1 and less than or equal to N, wherein o is an observation vector, and M is the number of Gaussian elements contained in each state; c. CjlIs the weight of the ith mixed Gaussian function of the jth state, L is the normal Gaussian probability density function, mujlMean vector, U, of the l-th mixed Gaussian element of the j-th statejlA covariance matrix for the ith mixed gaussian element for the jth state;
s2, model initialization, initial state pi ═ pitThe vector is set to be (1000), the probability of the state transition matrix A in the state transition and the transition to the next state is 0.5, each Gaussian function is a 39-order function with the mean value of 0 and the variance of 1, and the weight is 1/3;
s3, substituting the feature matrix of the place name voice signals into the model, and performing primary model parameter training by using a Baum-Welch iterative algorithm; the first-class place name voice signals are obtained by putting feature matrix data of all sample voice signals of a place name together, clustering according to a mean clustering method k-means, and dividing into 4 classes corresponding to 4 states;
s4, calculating the probability of the place name voice signals by using a viterbi algorithm according to the calculated model parameters;
s5, comparing the probability with the output probability before training, and judging whether the relative error between the probability and the output probability meets the output condition;
s6, if the place name voice signal meets the output condition, outputting a continuous Gaussian mixture HMM model corresponding to the place name voice signal;
s7, if the output condition is not met, judging whether the training frequency reaches the highest training threshold value;
s8, if the training frequency does not reach the highest training threshold, repeating the steps S3-S7, if the training frequency reaches the highest training threshold, terminating the training and outputting a continuous Gaussian mixture HMM model;
and S9, substituting the feature matrixes of the place name voice signals into the models, repeating the steps S3-S8 to obtain a plurality of continuous Gaussian mixture HMM models corresponding to different place names, and forming a place name voice recognition model library by all continuous Gaussian mixture HMM model data.
Preferably, the place name speech signal recognition method based on the continuous mixture gaussian HMM model, wherein: in the step S3, the process of calculating the model parameters by using the Baum-Welch algorithm is as follows:
s31, constructing an objective optimization function Q by Lagrange number multiplication, wherein parameters of all continuous Gaussian mixture HMM models are used as variables;
s32, making the partial derivative of Q to each variable be 0, deducing the relationship between the new HMM parameter and the old HMM parameter when Q reaches the pole, thereby obtaining the estimation of each parameter of the HMM;
and S33, repeating iterative operation by using the functional relation between the new HMM model parameters and the old HMM model parameters until the HMM model parameters converge.
Preferably, the place name speech signal recognition method based on the continuous mixture gaussian HMM model, wherein: in the step S6, if the relative error is less than 0.000001, it indicates that the model training has converged and the output condition is satisfied.
Preferably, the place name speech signal recognition method based on the continuous mixture gaussian HMM model, wherein: the place name voice recognition process is as follows:
and S10, substituting a place name voice signal feature matrix with 39 dimensions into the established place name voice recognition model library, solving the output probability of the continuous mixed Gaussian HMM model corresponding to each type of place name voice signals by using a viterbi algorithm, and recognizing the place name voice signal feature matrix as the type with the maximum output probability.
Preferably, the place name speech signal recognition method based on the continuous mixture gaussian HMM model, wherein: the place name voice recognition process is as follows:
s110, inputting a feature matrix of an nx39 unknown place name voice signal into a continuous Gaussian mixture HMM model corresponding to a kind of place name voice signals in the established place name voice recognition model library, and recording the model as an observation sequence O (O)1,o2,…,on) Record PinRepresenting the probability of occurring in state i after the input of the signal of the consecutive nth frame; p is a radical ofinRepresenting the probability of observing the nth frame signal at state i; a isijRepresents the probability of transitioning from state i to state j;
when the 1 st frame signal is input, pi1=fi(o1) (ii) a (1. ltoreq. i.ltoreq.4), where fi(o1) Representing the probability of the occurrence of the first frame vector at the state i position;
since the initial state is 1, P11=p11;P21=0;P31=0;P41=0;
When the 2 nd frame signal is input, pi2=fi(o2);(1≤i≤4)
Then P isi2=max{Pj1*aji*pi2J is not less than 1 and not more than 4), wherein ajiRepresents the probability of transitioning from state j to state i;
by the way of analogy, the method can be used,
when the n-th frame signal is input, pin=fi(on);(1≤i≤4)
Pin=max{Pj(n-1)*aji*pinJ is more than or equal to 1 and less than or equal to 4, wherein n is the frame number of a section of voice signal;
When all frame signals of the unknown place name voice signal are input, P is obtained1n,P2n,P3n,P4nThe maximum probability is the probability that the unknown place name voice signal appears in the continuous mixed Gaussian HMM model corresponding to the place name voice signal;
and S120, substituting the feature matrix of the unknown place name voice signal into the continuous Gaussian mixture HMM models corresponding to all other kinds of place name voice signals to obtain the probability of the unknown place name voice signal appearing in each continuous Gaussian mixture HMM model, and attributing the unknown place name voice signal to the class with the highest probability of the unknown place name voice signal appearing in the continuous Gaussian mixture HMM models corresponding to all kinds of place name voice signals.
The technical scheme of the invention has the advantages that:
the method has the advantages of ingenious design and reasonable process, and can effectively train and obtain a continuous Gaussian mixture HMM model suitable for place name voice recognition of isolated words and establish a place name voice recognition model library by collecting a large number of place name voice samples, scientific algorithms and optimized training conditions, thereby creating a foundation for subsequent place name voice recognition and providing guarantee for accurate place name recognition.
The invention utilizes the characteristics of the place name voice signals, the selected continuous mixed Gaussian model is 4 states, each state comprises 3 Gaussian functions with 39 dimensions, the dimension of the feature matrix of the place name voice signals is also 39 dimensions, the calculated amount is greatly reduced, and the model training speed and the voice recognition speed are higher.
Drawings
FIG. 1 is a schematic process diagram of the present invention;
figure 2 is a schematic diagram of the hidden markov chain of the present invention.
Detailed Description
Objects, advantages and features of the present invention will be illustrated and explained by the following non-limiting description of preferred embodiments. The embodiments are merely exemplary for applying the technical solutions of the present invention, and any technical solution formed by replacing or converting the equivalent thereof falls within the scope of the present invention claimed.
The invention discloses a place name speech signal recognition method based on a continuous Gaussian mixture HMM model, which comprises a training process of the continuous Gaussian mixture HMM model and a place name speech recognition process, wherein as shown in the attached figure 1, the training process of the continuous Gaussian mixture HMM model comprises the following steps:
s1, defining a continuous gaussian mixture HMM model comprising the following parameters, λ ═ (N, M, a, pi, B), wherein:
n, the number of model states is 4;
m, the number of Gaussian functions corresponding to each state, each state comprises 3 39-dimensional Gaussian functions, and the number of the Gaussian functions of each state in N states in one model is the same;
a, state transition probability matrix, a ═ aij},aij=P[qt+1=j/qt=i]I is more than or equal to 1, j is less than or equal to N, wherein qtI denotes the state i, q at time t(t+1)J denotes the time t +1 at state j, and overall denotes the probability of transitioning from state i to state j;
pi, initial probability distribution of each state, pi ═ pit,πt=P[qi=i]I is more than or equal to 1 and less than or equal to N, wherein pi is pitThe probability of starting from the state i is shown, and the subscript i represents the starting probability corresponding to each state;
b, output probability density function, B ═ Bj(o)},
Figure GDA0002423261650000061
J is more than or equal to 1 and less than or equal to N, wherein o is an observation vector, and M is the number of Gaussian elements contained in each state; c. CjlIs the weight of the ith mixed Gaussian function of the jth state, L is the normal Gaussian probability density function, mujlMean vector, U, of the l-th mixed Gaussian element of the j-th statejlCovariance matrix for the ith mixed gaussian element for the jth state.
S2, after defining the model, initializing the model parameters, specifically, setting the initial state pi to pitVector set to (1000), state transition matrixThe probability of the transition of the A to the next state is 0.5, the mean value of 39 orders of the Gaussian function is 0, the variance of the Gaussian function is 1, and the weight of the Gaussian function is 1/3.
S3, substituting the feature matrix of a class of place name voice signals into a model, and performing model parameter training once by using a Baum-Welch iterative algorithm, wherein the class of place name voice signals refers to that feature matrix data of all sample voice signals of a place name are put together, clustering is performed according to a mean value clustering method k-means, vectors with close distances are classified into one class, the class is divided into 4 classes, and the 4 states correspond to each other; four types are selected because the result is inaccurate due to a small number of states, and the calculated amount is large due to an excessive number of states, so four types are selected; the Baum-Welch iterative algorithm is actually an application of the Maximum Likelihood (ML) criterion, and adopts a multi-iteration optimization algorithm, and the detailed process is as follows:
s31, constructing an objective optimization function Q by Lagrange number multiplication, wherein all continuous Gaussian mixture HMM model parameters are used as variables;
s32, making the partial derivative of Q to each variable be 0, deducing the relationship between the new HMM parameter and the old HMM parameter when Q reaches the pole, thereby obtaining the estimation of each parameter of the HMM;
and S33, repeating iterative operation by using the functional relation between the new HMM model parameters and the old HMM model parameters until the HMM model parameters converge.
And S4, calculating the probability of the occurrence of the first-class place name voice signals by using a viterbi algorithm according to the calculated model parameters.
And S5, comparing the probability calculated in the step S4 with the output probability before training, judging whether the relative error of the probability and the output probability meets the output condition, and ending the circulation when the output meets the requirement.
And S6, if the output condition is met, namely the relative error is less than 0.000001, the model training is converged and the output condition is met, outputting a continuous Gaussian mixture HMM model corresponding to the place name voice signal.
S7, if the output condition is not met, namely the relative error is more than 0.000001, judging whether the training frequency reaches the highest training threshold value; the reason why the highest training threshold is set is that if the training samples are few, a dead cycle occurs in the training process, and the training can be normally terminated by setting the highest training frequency threshold, so that the dead cycle is avoided, otherwise, the training can be continued forever and the training cannot be stopped.
And S8, repeating the steps S3-S7 if the training frequency does not reach the highest training threshold, terminating the training if the training frequency reaches the highest training threshold, and outputting a continuous Gaussian mixture HMM model.
And S9, substituting the feature matrixes of the place name voice signals into the models, repeating the steps S3-S8 to obtain a plurality of continuous Gaussian mixture HMM models corresponding to different place names, and forming a place name voice recognition model library by all continuous Gaussian mixture HMM model data.
After the place name voice recognition model base is formed, a feature matrix obtained after feature extraction is carried out on any place name voice signal is input into the place name voice model base for recognition, and the process is as follows:
and S10, substituting a place name voice signal feature matrix with 39 dimensions into the established place name voice recognition model library, solving the output probability of the continuous mixed Gaussian HMM model corresponding to each type of place name voice signals by using a viterbi algorithm, and recognizing the place name voice signal feature matrix as the type with the maximum output probability.
In detail, in all the continuous mixture gaussian HMM models corresponding to different geographical names, each model corresponds to a hidden markov chain as shown in fig. 2, and its parameters include a 4-state transition matrix and four gaussian functions of states 1-4, so that when performing a speech signal recognition of unknown geographical names:
s110, inputting a feature matrix of an nx39 unknown place name voice signal into a continuous Gaussian mixture HMM model corresponding to a kind of place name voice signals in the established place name voice recognition model library, and recording the model as an observation sequence O (O)1,o2,…,on) Record PinRepresenting the probability of occurring in state i after the input of the signal of the consecutive nth frame; p is a radical ofinRepresenting the probability of observing the nth frame signal at state i; a isijIndicating a transition from state iProbability to state j;
when the 1 st frame signal is input, pi1=fi(o1) (ii) a (1. ltoreq. i.ltoreq.4), where fi(o1) Representing the probability of the occurrence of the first frame vector at the state i position;
since the initial state is defined as being in state 1 and not in other locations, only the probability of position 1 is calculated, so P11=p11;P21=0;P31=0;P41=0;
When the 2 nd frame signal is input, pi2=fi(o2);(1≤i≤4)
Then P isi2=max{Pj1*aji*pi2J is more than or equal to 1 and less than or equal to 4), wherein Pj1Representing the probability, a, at state j after the first frame signaljiRepresents the probability of transitioning from state j to state i;
by the way of analogy, the method can be used,
when the n-th frame signal is input, pin=fi(on);(1≤i≤4)
Pin=max{Pj(n-1)*aji*pinJ is more than or equal to 1 and less than or equal to 4, wherein n is the frame number of a section of voice signal;
when all frame signals of unknown place name voice signals are input, because the last frame signal can only appear in states 1-4 after all frames of a signal are input, only 4 probabilities are obtained, and P is obtained1n,P2n,P3n,P4nThe maximum probability is the probability that the unknown place name voice signal appears in the continuous mixed Gaussian HMM model corresponding to the place name voice signal;
and S120, substituting the feature matrix of the unknown place name voice signal into the continuous Gaussian mixture HMM models corresponding to all other kinds of place name voice signals to obtain the probability of the unknown place name voice signal appearing in each continuous Gaussian mixture HMM model, and attributing the unknown place name voice signal to the class with the highest probability of appearing in the continuous Gaussian mixture HMM models corresponding to the place name voice signals.
The invention has various embodiments, and all technical solutions formed by adopting equivalent transformation or equivalent transformation are within the protection scope of the invention.

Claims (4)

1. The place name speech signal recognition method based on the continuous Gaussian mixture HMM model is characterized by comprising the following steps: the method comprises a training process of a continuous Gaussian mixture HMM model and a place name speech recognition process, wherein the training process of the continuous Gaussian mixture HMM model comprises the following steps:
s1, defining a continuous gaussian mixture HMM model comprising the following parameters, λ ═ (N, M, a, pi, B), wherein:
n, the number of model states is 4;
m, the number of Gaussian functions corresponding to each state, each state comprises 3 39-dimensional Gaussian functions, and the number of the Gaussian functions of each state in N states in one model is the same;
a, state transition probability matrix, a ═ aij},aij=P[qt+1=j/qt=i]I is more than or equal to 1, j is less than or equal to N, wherein qtI denotes the state i, q at time t(t+1)J denotes the time t +1 at state j, and overall denotes the probability of transitioning from state i to state j;
pi, initial probability distribution of each state, pi ═ pit,πt=P[qi=i]I is more than or equal to 1 and less than or equal to N, wherein pi is pitRepresenting the probability from the state i, wherein i represents the starting probability corresponding to each state;
b, output probability density function, B ═ Bj(o)},
Figure FDA0002423261640000011
Wherein, o is an observation vector, and M is the number of Gaussian functions contained in each state; c. CjlIs the weight of the ith mixed Gaussian function of the jth state, L is the normal Gaussian probability density function, mujlMean vector, U, of the l-th mixed Gaussian element of the j-th statejlA covariance matrix for the ith mixed gaussian element for the jth state;
s2 model beginningInitializing, i.e. changing the initial state pi to pitThe vector is set to be (1000), the probability of the state transition matrix A in the state transition and the transition to the next state is 0.5, each Gaussian function is a 39-order function with the mean value of 0 and the variance of 1, and the weight is 1/3;
s3, substituting the feature matrix of the place name voice signals into the model, and performing primary model parameter training by using a Baum-Welch iterative algorithm; the first-class place name voice signals are obtained by putting feature matrix data of all sample voice signals of a place name together, clustering according to a mean clustering method k-means, and dividing into 4 classes corresponding to 4 states;
s4, calculating the probability of the place name voice signals by using a viterbi algorithm according to the calculated model parameters;
s5, comparing the probability with the output probability before training, and judging whether the relative error between the probability and the output probability meets the output condition;
s6, if the place name voice signal meets the output condition, outputting a continuous Gaussian mixture HMM model corresponding to the place name voice signal; if the relative error is less than 0.000001, the model training is converged and the output condition is met;
s7, if the output condition is not met, judging whether the training frequency reaches the highest training threshold value;
s8, if the training frequency does not reach the highest training threshold, repeating the steps S3-S7, if the training frequency reaches the highest training threshold, terminating the training and outputting a continuous Gaussian mixture HMM model;
and S9, substituting the feature matrixes of the place name voice signals into the models, repeating the steps S3-S8 to obtain a plurality of continuous Gaussian mixture HMM models corresponding to different place names, and forming a place name voice recognition model library by all continuous Gaussian mixture HMM model data.
2. The method of recognizing place name speech signal based on continuous mixture gaussian HMM model according to claim 1, wherein: in the step S3, the process of calculating the model parameters by using the Baum-Welch algorithm is as follows:
s31, constructing an objective optimization function Q by Lagrange number multiplication, wherein parameters of all continuous Gaussian mixture HMM models are used as variables;
s32, making the partial derivative of Q to each variable be 0, deducing the relationship between the new HMM parameter and the old HMM parameter when Q reaches the pole, thereby obtaining the estimation of each parameter of the HMM;
and S33, repeating iterative operation by using the functional relation between the new HMM model parameters and the old HMM model parameters until the HMM model parameters converge.
3. The method of recognizing place name speech signal based on continuous mixture gaussian HMM model according to claim 1, wherein: the place name voice recognition process is as follows:
and S10, substituting a place name voice signal feature matrix with 39 dimensions into the established place name voice recognition model library, solving the output probability of the continuous mixed Gaussian HMM model corresponding to each type of place name voice signals by using a viterbi algorithm, and recognizing the place name voice signal feature matrix as the type with the maximum output probability.
4. The method of recognizing place name speech signal based on continuous mixture gaussian HMM model according to claim 1, wherein: the place name voice recognition process is as follows:
s110, inputting a feature matrix of an nx39 unknown place name voice signal into a continuous Gaussian mixture HMM model corresponding to a kind of place name voice signals in the established place name voice recognition model library, and recording the model as an observation sequence O (O)1,o2,…,on) Record PinRepresenting the probability of occurring in state i after the input of the signal of the consecutive nth frame; p is a radical ofinRepresenting the probability of observing the nth frame signal at state i; a isijRepresents the probability of transitioning from state i to state j;
when the 1 st frame signal is input, pi1=fi(o1) (ii) a (1. ltoreq. i.ltoreq.4), where fi(o1) Representing the probability of the occurrence of the first frame vector at the state i position;
since the initial state is 1, P11=p11;P21=0;P31=0;P41=0;
When the 2 nd frame signal is input, pi2=fi(o2);(1≤i≤4)
Then P isi2=max{Pj1*aji*pi2J is not less than 1 and not more than 4), wherein ajiRepresents the probability of transitioning from state j to state i;
by the way of analogy, the method can be used,
when the n-th frame signal is input, pin=fi(on);(1≤i≤4)
Pin=max{Pj(n-1)*aji*pinJ is more than or equal to 1 and less than or equal to 4, wherein n is the frame number of a section of voice signal;
when all frame signals of the unknown place name voice signal are input, P is obtained1n,P2n,P3n,P4nThe maximum probability is the probability that the unknown place name voice signal appears in the continuous mixed Gaussian HMM model corresponding to the place name voice signal;
and S120, substituting the feature matrix of the unknown place name voice signal into the continuous Gaussian mixture HMM models corresponding to all other kinds of place name voice signals to obtain the probability of the unknown place name voice signal appearing in each continuous Gaussian mixture HMM model, and attributing the unknown place name voice signal to the class with the highest probability of the unknown place name voice signal appearing in the continuous Gaussian mixture HMM models corresponding to all kinds of place name voice signals.
CN201611177818.6A 2016-12-19 2016-12-19 Place name voice signal recognition method based on continuous Gaussian mixture HMM model Active CN106782510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611177818.6A CN106782510B (en) 2016-12-19 2016-12-19 Place name voice signal recognition method based on continuous Gaussian mixture HMM model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611177818.6A CN106782510B (en) 2016-12-19 2016-12-19 Place name voice signal recognition method based on continuous Gaussian mixture HMM model

Publications (2)

Publication Number Publication Date
CN106782510A CN106782510A (en) 2017-05-31
CN106782510B true CN106782510B (en) 2020-06-02

Family

ID=58890206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611177818.6A Active CN106782510B (en) 2016-12-19 2016-12-19 Place name voice signal recognition method based on continuous Gaussian mixture HMM model

Country Status (1)

Country Link
CN (1) CN106782510B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240396B (en) * 2017-06-16 2023-01-17 百度在线网络技术(北京)有限公司 Speaker self-adaptation method, device, equipment and storage medium
CN108417207B (en) * 2018-01-19 2020-06-30 苏州思必驰信息科技有限公司 Deep hybrid generation network self-adaption method and system
CN109344999A (en) * 2018-09-07 2019-02-15 华中科技大学 A kind of runoff probability forecast method
CN110120218B (en) * 2019-04-29 2021-06-22 东北大学 Method for identifying highway large-scale vehicles based on GMM-HMM
CN111508481B (en) * 2020-04-24 2022-11-08 展讯通信(上海)有限公司 Training method and device of voice awakening model, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息***有限公司 Hidden Markov model-based vehicle sound identification method
CN102708871A (en) * 2012-05-08 2012-10-03 哈尔滨工程大学 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model
CN103034847A (en) * 2012-12-13 2013-04-10 河海大学 Face recognition method based on hidden markov models
CN104485103A (en) * 2014-11-21 2015-04-01 东南大学 Vector Taylor series-based multi-environment model isolated word identifying method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息***有限公司 Hidden Markov model-based vehicle sound identification method
CN101980336B (en) * 2010-10-18 2012-01-11 福州星网视易信息***有限公司 Hidden Markov model-based vehicle sound identification method
CN102708871A (en) * 2012-05-08 2012-10-03 哈尔滨工程大学 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model
CN103034847A (en) * 2012-12-13 2013-04-10 河海大学 Face recognition method based on hidden markov models
CN104485103A (en) * 2014-11-21 2015-04-01 东南大学 Vector Taylor series-based multi-environment model isolated word identifying method

Also Published As

Publication number Publication date
CN106782510A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106782510B (en) Place name voice signal recognition method based on continuous Gaussian mixture HMM model
US20220076150A1 (en) Method, apparatus and system for estimating causality among observed variables
CN110349597B (en) Voice detection method and device
CN110334726A (en) A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure
CN106601230B (en) Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system
Prabhavalkar et al. Backpropagation training for multilayer conditional random field based phone recognition
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
EP3336714A1 (en) Language dialog system with acquisition of replys from user input
CN111768000A (en) Industrial process data modeling method for online adaptive fine-tuning deep learning
Li et al. Large margin HMMs for speech recognition
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
Bahari Speaker age estimation using Hidden Markov Model weight supervectors
CN110853630A (en) Lightweight speech recognition method facing edge calculation
CN111477220A (en) Neural network speech recognition method and system for household spoken language environment
CN111222575B (en) KLXS multi-model fusion method and system based on HRRP target recognition
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
Fang From dynamic time warping (DTW) to hidden markov model (HMM)
CN113344243A (en) Wind speed prediction method and system for optimizing ELM based on improved Harris eagle algorithm
Zhang et al. Improvement of dynamic hand gesture recognition based on HMM algorithm
CN118013038A (en) Text increment relation extraction method based on prototype clustering
CN113420508A (en) Unit combination calculation method based on LSTM
CN115206455B (en) Deep neural network-based rare earth element component content prediction method and system
CN109033413B (en) Neural network-based demand document and service document matching method
Grósz et al. A sequence training method for Deep Rectifier Neural Networks in speech recognition
CN114372181B (en) Equipment production intelligent planning method based on multi-mode data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230104

Address after: Floor 3, Building 2, No. 1588, Fengxing Road, Huaxin Town, Qingpu District, Shanghai, 200,000

Patentee after: Shanghai Yunda Hi Tech Co.,Ltd.

Address before: 21588, East Industrial Park, building E1, Suzhou City, Jiangsu Province

Patentee before: SUZHOU JINFENG IOT TECHNOLOGY Co.,Ltd.

Patentee before: SHANGHAI YUNDA FREIGHT CO.,LTD.

TR01 Transfer of patent right