DE19508137A1 - Stepwise classification of arrhythmically segmented words - Google Patents

Stepwise classification of arrhythmically segmented words

Info

Publication number
DE19508137A1
DE19508137A1 DE1995108137 DE19508137A DE19508137A1 DE 19508137 A1 DE19508137 A1 DE 19508137A1 DE 1995108137 DE1995108137 DE 1995108137 DE 19508137 A DE19508137 A DE 19508137A DE 19508137 A1 DE19508137 A1 DE 19508137A1
Authority
DE
Germany
Prior art keywords
classification
word
reference words
words
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
DE1995108137
Other languages
German (de)
Inventor
Werner Prof Dr Ing Zuehlke
Karl Dr Ing Schran
Jiri Dipl Ing Navratil
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to DE1995108137 priority Critical patent/DE19508137A1/en
Publication of DE19508137A1 publication Critical patent/DE19508137A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/085Methods for reducing search complexity, pruning

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The classification takes place in speech recognition systems. In the first step the test word is compared with all reference words for a part-word length, segment number, and a detected sound type sequence. The reference words are permitted to take part in the following classification step, only when the deviations lie within the preset tolerance range. For a decision between similar reference words, the segments stressed or expanded in spoked sound, or syllables of the test word, are more heavily weighted.

Description

Die Erfindung betrifft ein Verfahren zur schrittweisen Klassifi­ kation arrhythmisch segmentierter Worte und kommt bei der auto­ matischen Spracherkennung zur Anwendung.The invention relates to a method for gradual classification cation arrhythmically segmented words and comes at the auto Matic speech recognition for use.

Die bekannten Verfahren zur Spracherkennung segmentieren das Signal eines Wortes sowohl beim Training als auch beim Test in Segmente, sie berechnen für diese Segmente vereinbarte Merkmale und verglei­ chen die Merkmale des Testwortes mit den vorher gespeicherten Merkmalen aller trainierten Worte.The known methods for speech recognition segment the signal one word both during training and when testing in segments, they calculate agreed characteristics for these segments and compare them Chen the characteristics of the test word with the previously saved Characteristics of all trained words.

Dabei ermittelte Abstandsmaße sind die Grundlage für die nachfolgende Klassifizierung. (Dellert, J.R., Proakis, J.G., Hansen, J.H.L.: Diskrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York 1993.)The distance dimensions determined are the basis for the subsequent classification. (Dellert, J.R., Proakis, J.G., Hansen, J.H.L .: Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York 1993.)

Der Rechenaufwand für den Vergleich steigt mit der Anzahl trainierter Worte und ist durch die zulässige Verarbeitungszeit eingeschränkt. Für großes Vokabular und für die Erkennung fließender Sprache greift man deshalb auf kleinere phonetische Einheiten, wie Silben oder Halbsilben, zurück, deren Menge in der Sprache kleiner ist als die der Worte. Aber auch für Silben gibt es in der deutschen Sprache noch etwa 6000 und für Halbsilben etwa 1600 Klassen. (Waigel, W.: Silbenorientierte Erkennung fließender Sprache mittels diskreter stochastischer Modellierung. Diss., TU München, 1990). Zur Verminderung des Aufwandes wurden auch schon Verfahren zur Vorklassifikation auf der Basis von Phonemen vorgeschlagen. (Schulze, E.: Verfahren zur Referenzselektion für ein automatisches Sprachverarbeitungssystem. DE 32 16 871). Ergeben sich bei dem Vergleich mehrere Kandidaten, wird eine Nachklas­ sifikation durchgeführt. Dazu wurde eine automatische unterschei­ dungsrelevante Gewichtung von Zuständen und Merkmalen vorge­ schlagen. (Zünkler, K.: Verfahren zur Erkennung von Mustern in zeitvarianten Meßsignalen, DE 41 31 387).The computing effort for the comparison increases with the number trained words and is due to the allowable processing time limited. For large vocabulary and for recognition Fluent language is therefore resorted to smaller phonetic ones Units, such as syllables or half-syllables, back, their amount in the Language is smaller than that of words. But there are also syllables in the German language about 6000 and for half syllables about 1600 classes. (Waigel, W .: Syllable-oriented recognition of fluent Speech using discrete stochastic modeling. Diss., TU Munich, 1990). To reduce the effort have already been Preclassification procedure based on phonemes suggested. (Schulze, E .: Procedure for reference selection for an automatic speech processing system. DE 32 16 871). Surrender If there are several candidates in the comparison, it becomes a post-class sification carried out. There was an automatic difference weighting of conditions and characteristics relevant to the application beat. (Zünkler, K .: Process for recognizing patterns in time-variant measurement signals, DE 41 31 387).

Der Erfindung liegt die Aufgabe zugrunde, ein Verfahren anzugeben, mit dem bei arrhythmischer Segmentierung die Klassifizierung beschleunigt wird. The invention has for its object to provide a method with which for arrhythmic segmentation the classification is accelerated.  

Die Aufgabe wird erfindungsgemäß dadurch gelöst, daß die Klassifikation in mindestens zwei Schritten erfolgt, wobei im ersten Schritten nach einfachen und groben Merkmalen, wie der Teilwortlänge, der Anzahl arrhythmischer Segmente und der ermittelten Lauttypfolge, verglichen und vorausgewählt wird, und in nachfolgenden Schritten die in den arrhythmischen Segmenten berechneten Merkmalsmuster zur Klassifikation herangezogen werden. Zur endgültigen Entscheidung zwischen ähnlichen Worten gehen stark betont oder auffallend gedehnt gesprochene Laute mit stärkerem Gewicht in die Entscheidung ein.The object is achieved in that the Classification takes place in at least two steps, whereby in first steps after simple and rough features like the Partial word length, the number of arrhythmic segments and the determined sound type sequence, compared and preselected, and in subsequent steps that in the arrhythmic segments calculated feature patterns can be used for classification. To make the final decision between similar words go strong emphasized or strikingly stretched spoken sounds with stronger Weight in the decision.

Die Erfindung wird nachstehend an zwei Ausführungsbeispielen erläutert.The invention is based on two exemplary embodiments explained.

Nach Erfindungsanspruch 1 werden in einem Worterkenner bei der Klassifikation eines unbekannten Testmusters im ersten Schritt diejenigen Worthypothesen ausgeschieden, deren Teilwortlängen­ variabilität die vorliegende Testteilwortlänge mit großer Wahr­ scheinlichkeit ausschließt, und bei denen die aus der arrhythmischen signaladaptiven Segmentierung sich ergebende Segmentanzahl signifikant von der Segmentanzahl des Testwortes abweicht und deren einzelne Segmentlängen oder deren Aufeinanderfolge der Lauttypen, wie stimmhaft oder stimmlos, zu stark von den ermittelten Werten und Folgen des Testwortes abweichen.According to claim 1 are in a word recognizer at Classification of an unknown test pattern in the first step those word hypotheses eliminated, whose partial word lengths variability the present test part word length with great truth excludes likelihood, and in which the arrhythmic resulting segment adaptive segmentation differs significantly from the number of segments of the test word and their individual segment lengths or their succession of sound types, how voiced or unvoiced, too much from the determined values and consequences of the test word deviate.

Teilwortlängen werden z. B. zwischen dem Anfang des ersten Vokals und dem Ende des letzten Vokals vereinbart. Sie sind genauer meßbar als die Wortlängen, wenn das Wort mit einem stimmlosen Laut beginnt oder endet.Subword lengths are e.g. B. between the beginning of the first vowel and the end of the last vowel. They can be measured more precisely than the word lengths if the word starts with an unvoiced sound or ends.

Treten nach dem Vergleich auf Basis berechneter Muster oder Generatorwahrscheinlichkeiten mehrere ähnliche Kandidaten auf, die eine eindeutige Klassifizierung noch nicht zulassen, werden nach Erfindungsanspruch 2 diese in einer Nachklassifikation dadurch unterschieden, daß stark betonte und/oder auffallend gedehnt gesprochene Laute oder Silben mit stärkerem Gewicht berücksichtigt werden.Occur after the comparison based on calculated patterns or Generator probabilities on several similar candidates, that do not allow clear classification yet according to claim 2 this in a reclassification distinguished that strongly emphasized and / or strikingly stretched spoken sounds or syllables with greater weight are taken into account will.

Claims (2)

1. Verfahren zur schrittweisen Klassifikation arrhythmisch segmen­ tierter Worte in Spracherkennungsanlagen, dadurch gekennzeichnet, daß in einem ersten Schritt das Testwort mit allen Referenzworten nach einer Teilwortlänge, der Segmentanzahl und der ermittelten Lauttypfolge verglichen wird, und die Referenzworte zu dem nachfolgenden Klassifikationsschritt nur zugelassen werden, wenn die Abweichungen in vorgegebenen Toleranzen liegen.1. Method for the gradual classification of arrhythmically segregated words in speech recognition systems, characterized in that in a first step the test word is compared with all reference words according to a partial word length, the number of segments and the determined sound type sequence, and the reference words are only permitted for the subsequent classification step, if the deviations are within specified tolerances. 2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, daß zur Entscheidung zwischen ähnlichen Referenzworten die Segmente betont oder gedehnt gesprochener Laute oder Silben des Testwortes stärker gewichtet werden.2. The method according to claim 1, characterized in that for Decision between similar reference words the segments emphasizes or stretches spoken sounds or syllables of the test word be weighted more heavily.
DE1995108137 1995-03-08 1995-03-08 Stepwise classification of arrhythmically segmented words Withdrawn DE19508137A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DE1995108137 DE19508137A1 (en) 1995-03-08 1995-03-08 Stepwise classification of arrhythmically segmented words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE1995108137 DE19508137A1 (en) 1995-03-08 1995-03-08 Stepwise classification of arrhythmically segmented words

Publications (1)

Publication Number Publication Date
DE19508137A1 true DE19508137A1 (en) 1996-09-12

Family

ID=7755982

Family Applications (1)

Application Number Title Priority Date Filing Date
DE1995108137 Withdrawn DE19508137A1 (en) 1995-03-08 1995-03-08 Stepwise classification of arrhythmically segmented words

Country Status (1)

Country Link
DE (1) DE19508137A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19705471A1 (en) * 1997-02-13 1997-07-24 Sibet Gmbh Sican Forschungs Un Speech recognition and control method
DE112009003930B4 (en) * 2009-01-30 2016-12-22 Mitsubishi Electric Corporation Voice recognition device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19705471A1 (en) * 1997-02-13 1997-07-24 Sibet Gmbh Sican Forschungs Un Speech recognition and control method
DE19705471C2 (en) * 1997-02-13 1998-04-09 Sican F & E Gmbh Sibet Method and circuit arrangement for speech recognition and for voice control of devices
DE112009003930B4 (en) * 2009-01-30 2016-12-22 Mitsubishi Electric Corporation Voice recognition device

Similar Documents

Publication Publication Date Title
DE69816177T2 (en) Speech / pause differentiation using unguided adaptation of hidden Markov models
DE69636057T2 (en) Speaker verification system
DE69031284T2 (en) Method and device for speech recognition
DE69432570T2 (en) voice recognition
DE69008023T2 (en) Method and device for distinguishing voiced and unvoiced speech elements.
DE68924134T2 (en) Speech recognition system.
DE19630109A1 (en) Method for speaker verification using at least one speech signal spoken by a speaker, by a computer
DE102008024258A1 (en) A method for classifying and removing unwanted portions from a speech recognition utterance
EP0925461A2 (en) Process for the multilingual use of a hidden markov sound model in a speech recognition system
DE60034772T2 (en) REJECTION PROCEDURE IN LANGUAGE IDENTIFICATION
EP0633559B1 (en) Method and device for speech recognition
DE10119284A1 (en) Method and system for training parameters of a pattern recognition system assigned to exactly one implementation variant of an inventory pattern
DE60018696T2 (en) ROBUST LANGUAGE PROCESSING OF CHARACTERED LANGUAGE MODELS
DE3711342A1 (en) METHOD FOR RECOGNIZING CONTINUOUSLY SPOKEN WORDS
EP1435087B1 (en) Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model
DE69026474T2 (en) Speech recognition system
DE19508137A1 (en) Stepwise classification of arrhythmically segmented words
DE3129353A1 (en) Method for speaker-independent recognition of spoken words in telecommunications systems
EP0817167B1 (en) Speech recognition method and device for carrying out the method
EP0470411A2 (en) Training of speech reference patterns to situation dependent pronunciation variants
DE10308611A1 (en) Determination of the likelihood of confusion between vocabulary entries in phoneme-based speech recognition
DE3935308C1 (en) Speech recognition method by digitising microphone signal - using delta modulator to produce continuous of equal value bits for data reduction
EP0540535B1 (en) Process for speaker adaptation in an automatic speech-recognition system
EP0962914B1 (en) Method and apparatus for determining a confidence measure for speech recognition
DE2448909C3 (en)

Legal Events

Date Code Title Description
8141 Disposal/no request for examination