DE19508137A1 - Stepwise classification of arrhythmically segmented words - Google Patents
Stepwise classification of arrhythmically segmented wordsInfo
- Publication number
- DE19508137A1 DE19508137A1 DE1995108137 DE19508137A DE19508137A1 DE 19508137 A1 DE19508137 A1 DE 19508137A1 DE 1995108137 DE1995108137 DE 1995108137 DE 19508137 A DE19508137 A DE 19508137A DE 19508137 A1 DE19508137 A1 DE 19508137A1
- Authority
- DE
- Germany
- Prior art keywords
- classification
- word
- reference words
- words
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 8
- 230000002763 arrhythmic effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Die Erfindung betrifft ein Verfahren zur schrittweisen Klassifi kation arrhythmisch segmentierter Worte und kommt bei der auto matischen Spracherkennung zur Anwendung.The invention relates to a method for gradual classification cation arrhythmically segmented words and comes at the auto Matic speech recognition for use.
Die bekannten Verfahren zur Spracherkennung segmentieren das Signal eines Wortes sowohl beim Training als auch beim Test in Segmente, sie berechnen für diese Segmente vereinbarte Merkmale und verglei chen die Merkmale des Testwortes mit den vorher gespeicherten Merkmalen aller trainierten Worte.The known methods for speech recognition segment the signal one word both during training and when testing in segments, they calculate agreed characteristics for these segments and compare them Chen the characteristics of the test word with the previously saved Characteristics of all trained words.
Dabei ermittelte Abstandsmaße sind die Grundlage für die nachfolgende Klassifizierung. (Dellert, J.R., Proakis, J.G., Hansen, J.H.L.: Diskrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York 1993.)The distance dimensions determined are the basis for the subsequent classification. (Dellert, J.R., Proakis, J.G., Hansen, J.H.L .: Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York 1993.)
Der Rechenaufwand für den Vergleich steigt mit der Anzahl trainierter Worte und ist durch die zulässige Verarbeitungszeit eingeschränkt. Für großes Vokabular und für die Erkennung fließender Sprache greift man deshalb auf kleinere phonetische Einheiten, wie Silben oder Halbsilben, zurück, deren Menge in der Sprache kleiner ist als die der Worte. Aber auch für Silben gibt es in der deutschen Sprache noch etwa 6000 und für Halbsilben etwa 1600 Klassen. (Waigel, W.: Silbenorientierte Erkennung fließender Sprache mittels diskreter stochastischer Modellierung. Diss., TU München, 1990). Zur Verminderung des Aufwandes wurden auch schon Verfahren zur Vorklassifikation auf der Basis von Phonemen vorgeschlagen. (Schulze, E.: Verfahren zur Referenzselektion für ein automatisches Sprachverarbeitungssystem. DE 32 16 871). Ergeben sich bei dem Vergleich mehrere Kandidaten, wird eine Nachklas sifikation durchgeführt. Dazu wurde eine automatische unterschei dungsrelevante Gewichtung von Zuständen und Merkmalen vorge schlagen. (Zünkler, K.: Verfahren zur Erkennung von Mustern in zeitvarianten Meßsignalen, DE 41 31 387).The computing effort for the comparison increases with the number trained words and is due to the allowable processing time limited. For large vocabulary and for recognition Fluent language is therefore resorted to smaller phonetic ones Units, such as syllables or half-syllables, back, their amount in the Language is smaller than that of words. But there are also syllables in the German language about 6000 and for half syllables about 1600 classes. (Waigel, W .: Syllable-oriented recognition of fluent Speech using discrete stochastic modeling. Diss., TU Munich, 1990). To reduce the effort have already been Preclassification procedure based on phonemes suggested. (Schulze, E .: Procedure for reference selection for an automatic speech processing system. DE 32 16 871). Surrender If there are several candidates in the comparison, it becomes a post-class sification carried out. There was an automatic difference weighting of conditions and characteristics relevant to the application beat. (Zünkler, K .: Process for recognizing patterns in time-variant measurement signals, DE 41 31 387).
Der Erfindung liegt die Aufgabe zugrunde, ein Verfahren anzugeben, mit dem bei arrhythmischer Segmentierung die Klassifizierung beschleunigt wird. The invention has for its object to provide a method with which for arrhythmic segmentation the classification is accelerated.
Die Aufgabe wird erfindungsgemäß dadurch gelöst, daß die Klassifikation in mindestens zwei Schritten erfolgt, wobei im ersten Schritten nach einfachen und groben Merkmalen, wie der Teilwortlänge, der Anzahl arrhythmischer Segmente und der ermittelten Lauttypfolge, verglichen und vorausgewählt wird, und in nachfolgenden Schritten die in den arrhythmischen Segmenten berechneten Merkmalsmuster zur Klassifikation herangezogen werden. Zur endgültigen Entscheidung zwischen ähnlichen Worten gehen stark betont oder auffallend gedehnt gesprochene Laute mit stärkerem Gewicht in die Entscheidung ein.The object is achieved in that the Classification takes place in at least two steps, whereby in first steps after simple and rough features like the Partial word length, the number of arrhythmic segments and the determined sound type sequence, compared and preselected, and in subsequent steps that in the arrhythmic segments calculated feature patterns can be used for classification. To make the final decision between similar words go strong emphasized or strikingly stretched spoken sounds with stronger Weight in the decision.
Die Erfindung wird nachstehend an zwei Ausführungsbeispielen erläutert.The invention is based on two exemplary embodiments explained.
Nach Erfindungsanspruch 1 werden in einem Worterkenner bei der Klassifikation eines unbekannten Testmusters im ersten Schritt diejenigen Worthypothesen ausgeschieden, deren Teilwortlängen variabilität die vorliegende Testteilwortlänge mit großer Wahr scheinlichkeit ausschließt, und bei denen die aus der arrhythmischen signaladaptiven Segmentierung sich ergebende Segmentanzahl signifikant von der Segmentanzahl des Testwortes abweicht und deren einzelne Segmentlängen oder deren Aufeinanderfolge der Lauttypen, wie stimmhaft oder stimmlos, zu stark von den ermittelten Werten und Folgen des Testwortes abweichen.According to claim 1 are in a word recognizer at Classification of an unknown test pattern in the first step those word hypotheses eliminated, whose partial word lengths variability the present test part word length with great truth excludes likelihood, and in which the arrhythmic resulting segment adaptive segmentation differs significantly from the number of segments of the test word and their individual segment lengths or their succession of sound types, how voiced or unvoiced, too much from the determined values and consequences of the test word deviate.
Teilwortlängen werden z. B. zwischen dem Anfang des ersten Vokals und dem Ende des letzten Vokals vereinbart. Sie sind genauer meßbar als die Wortlängen, wenn das Wort mit einem stimmlosen Laut beginnt oder endet.Subword lengths are e.g. B. between the beginning of the first vowel and the end of the last vowel. They can be measured more precisely than the word lengths if the word starts with an unvoiced sound or ends.
Treten nach dem Vergleich auf Basis berechneter Muster oder Generatorwahrscheinlichkeiten mehrere ähnliche Kandidaten auf, die eine eindeutige Klassifizierung noch nicht zulassen, werden nach Erfindungsanspruch 2 diese in einer Nachklassifikation dadurch unterschieden, daß stark betonte und/oder auffallend gedehnt gesprochene Laute oder Silben mit stärkerem Gewicht berücksichtigt werden.Occur after the comparison based on calculated patterns or Generator probabilities on several similar candidates, that do not allow clear classification yet according to claim 2 this in a reclassification distinguished that strongly emphasized and / or strikingly stretched spoken sounds or syllables with greater weight are taken into account will.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE1995108137 DE19508137A1 (en) | 1995-03-08 | 1995-03-08 | Stepwise classification of arrhythmically segmented words |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE1995108137 DE19508137A1 (en) | 1995-03-08 | 1995-03-08 | Stepwise classification of arrhythmically segmented words |
Publications (1)
Publication Number | Publication Date |
---|---|
DE19508137A1 true DE19508137A1 (en) | 1996-09-12 |
Family
ID=7755982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE1995108137 Withdrawn DE19508137A1 (en) | 1995-03-08 | 1995-03-08 | Stepwise classification of arrhythmically segmented words |
Country Status (1)
Country | Link |
---|---|
DE (1) | DE19508137A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19705471A1 (en) * | 1997-02-13 | 1997-07-24 | Sibet Gmbh Sican Forschungs Un | Speech recognition and control method |
DE112009003930B4 (en) * | 2009-01-30 | 2016-12-22 | Mitsubishi Electric Corporation | Voice recognition device |
-
1995
- 1995-03-08 DE DE1995108137 patent/DE19508137A1/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19705471A1 (en) * | 1997-02-13 | 1997-07-24 | Sibet Gmbh Sican Forschungs Un | Speech recognition and control method |
DE19705471C2 (en) * | 1997-02-13 | 1998-04-09 | Sican F & E Gmbh Sibet | Method and circuit arrangement for speech recognition and for voice control of devices |
DE112009003930B4 (en) * | 2009-01-30 | 2016-12-22 | Mitsubishi Electric Corporation | Voice recognition device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69816177T2 (en) | Speech / pause differentiation using unguided adaptation of hidden Markov models | |
DE69636057T2 (en) | Speaker verification system | |
DE69031284T2 (en) | Method and device for speech recognition | |
DE69432570T2 (en) | voice recognition | |
DE69008023T2 (en) | Method and device for distinguishing voiced and unvoiced speech elements. | |
DE68924134T2 (en) | Speech recognition system. | |
DE19630109A1 (en) | Method for speaker verification using at least one speech signal spoken by a speaker, by a computer | |
DE102008024258A1 (en) | A method for classifying and removing unwanted portions from a speech recognition utterance | |
EP0925461A2 (en) | Process for the multilingual use of a hidden markov sound model in a speech recognition system | |
DE60034772T2 (en) | REJECTION PROCEDURE IN LANGUAGE IDENTIFICATION | |
EP0633559B1 (en) | Method and device for speech recognition | |
DE10119284A1 (en) | Method and system for training parameters of a pattern recognition system assigned to exactly one implementation variant of an inventory pattern | |
DE60018696T2 (en) | ROBUST LANGUAGE PROCESSING OF CHARACTERED LANGUAGE MODELS | |
DE3711342A1 (en) | METHOD FOR RECOGNIZING CONTINUOUSLY SPOKEN WORDS | |
EP1435087B1 (en) | Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model | |
DE69026474T2 (en) | Speech recognition system | |
DE19508137A1 (en) | Stepwise classification of arrhythmically segmented words | |
DE3129353A1 (en) | Method for speaker-independent recognition of spoken words in telecommunications systems | |
EP0817167B1 (en) | Speech recognition method and device for carrying out the method | |
EP0470411A2 (en) | Training of speech reference patterns to situation dependent pronunciation variants | |
DE10308611A1 (en) | Determination of the likelihood of confusion between vocabulary entries in phoneme-based speech recognition | |
DE3935308C1 (en) | Speech recognition method by digitising microphone signal - using delta modulator to produce continuous of equal value bits for data reduction | |
EP0540535B1 (en) | Process for speaker adaptation in an automatic speech-recognition system | |
EP0962914B1 (en) | Method and apparatus for determining a confidence measure for speech recognition | |
DE2448909C3 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
8141 | Disposal/no request for examination |