NO318112B1

NO318112B1 - Speech-to-speech conversion system and method

Info

Publication number: NO318112B1
Application number: NO19985178A
Authority: NO
Inventors: Bertil Lyberg
Original assignee: Teliasonera Ab Publ
Priority date: 1996-05-13
Filing date: 1998-11-06
Publication date: 2005-01-31
Also published as: SE9601812D0; WO1997043707A1; EP0976026A1; NO985178L; NO985178D0; SE519273C2; SE9601812L

Description

Foreliggende oppfinnelse vedrører en fremgangsmåte og et system, i et stemmeresponskommunikasjonssystem, for å tilveiebringe en talt respons til en innmatet tale, der nevnte fremgangsmåte omfatter trinnene å kjenne igjen og tolke den innmatete tale, og å utnytte tolkningen for å oppta taleinformasjonsdata fra en database for anvendelse til formuleringen av det talte svar, og tale-til-tale-omformingssystemet omfatter, ved utmatingen derav, talte svar på taleinnmatinger på minst to naturlige språk, omfattende talegjenkjenningsmidler for den innmatete tale, tolkningsmidler for å tolke innholdet i den gjenkjente innmatete tale, og en database som inneholder taleinformasjonsdata for anvendelse for formuleringen av nevnte talte svar. The present invention relates to a method and a system, in a voice response communication system, for providing a spoken response to an input speech, wherein said method comprises the steps of recognizing and interpreting the input speech, and utilizing the interpretation to record speech information data from a database for application to the formulation of the spoken response, and the speech-to-speech transformation system comprises, when outputting it, spoken responses to speech inputs in at least two natural languages, comprising speech recognition means for the input speech, interpretation means for interpreting the content of the recognized input speech, and a database containing speech information data for use in formulating said spoken responses.

Kjente tale-gjenkjenningssystemer som er tilpasset til å frembringe talte responser på innmatet tale, inkluderer database som inneholder taleinformasjon på mange ulike språk, og som tilveiebringer en gjenkjennelsesfunksjon for å kjenne igjen og tolke informasjon på de aktuelle språk. Imidlertid er de kjente talegjenkjennelsessystemer som kan utgjøre en del av et tale-til-tale-omformingssystem, eller lignende, dedikert for kun ett språk, dvs. de kommer kun til å reagere på innført tale f.eks. talte forespørsler/ spørsmål på det spesielle språk som systemet er tilpasset til å håndtere og bearbeide. Known speech recognition systems which are adapted to produce spoken responses to inputted speech include a database which contains speech information in many different languages, and which provides a recognition function for recognizing and interpreting information in the respective languages. However, the known speech recognition systems that can form part of a speech-to-speech conversion system, or similar, are dedicated to only one language, i.e. they will only react to inputted speech, e.g. spoken requests/questions in the special language that the system is adapted to handle and process.

GB 02165969 omfatter et interaktivt dialogsystem som består av en talegjenkjenner for å analysere en brukers uttalelser og en talegenerator for å overføre beskjeder til brukeren. GB 02165969 comprises an interactive dialog system consisting of a speech recognizer for analyzing a user's statements and a speech generator for transmitting messages to the user.

Dessuten produseres taleinformasjonsdataene, som lagres i en database og som anvendes for formulering av egnete syn-tetiske talte svar til den innmatede tale, normalt med en dialekt som stemmer overens med en standard- riksdialekt. Derfor kan det når det finnes betydelige forskjeller mellom dialekten for den innførte tale og standard-riksdialekten under visse omstendigheter vise seg vanskelig for databasen i kjente tale-til-tale-omformingssystemer å korrekt tolke mottatt taleinformasjon, dvs. stemmeinnmatingen til systemet. Det kan også være vanskelig for personer som utfører stemmeinnmatingen å helt forstå det talte svar. Selv om slike svar kan forstås av en mottaker, vil det være mer anvendelig dersom dialekten i det talte svar er den samme som dialekten for den relaterte stemmeinnmating. In addition, the speech information data, which is stored in a database and which is used for formulating suitable synthetic spoken responses to the inputted speech, is normally produced with a dialect that corresponds to a standard national dialect. Therefore, when there are significant differences between the dialect of the introduced speech and the standard national dialect, under certain circumstances it may prove difficult for the database in known speech-to-speech conversion systems to correctly interpret received speech information, i.e. the voice input to the system. It can also be difficult for people doing the voice input to fully understand the spoken response. Although such responses can be understood by a recipient, it would be more useful if the dialect of the spoken response is the same as the dialect of the related voice input.

Selv ved artifisiell reproduksjon av et talt språk er det et behov at språket reproduseres naturlig og med korrekt aksentuering. Spesielt kan et ord ha svært forskjellig betydning avhengig av språkbetoningen. Likeledes kan betydningen for en og samme mening være avhengig av hvor betoningen er plassert. I tillegg bestemmer betoningen av meninger, eller deler derav, seksjoner som skal under-strekes i språket og som kan være av betydning når det gjelder å avgjøre den eksakte betydning av det talte språk. Even with the artificial reproduction of a spoken language, there is a need for the language to be reproduced naturally and with correct accentuation. In particular, a word can have very different meanings depending on the language emphasis. Likewise, the meaning for one and the same sentence can depend on where the emphasis is placed. In addition, the emphasis of opinions, or parts thereof, determines sections to be underlined in the language and which may be of importance when it comes to determining the exact meaning of the spoken language.

Behovet for at artifiselt produsert tale skal være så naturlig som mulig, og ha korrekt aksentuering er av spesiell viktighet i stemmeresponsive kommunikasjonsutstyr og/eller -systemer som produserer tale i ulike sammen-henger. Med kjente stemmeresponsive arrangementer er den reproduserte tale iblant vanskelig å forstå og tolke. Det er derfor et behov for et tale-til-tale-omformingssystem hvori den artifisielle utgående tale er naturlig, og har den korrekte aksentuering, og er lett å forstå. The need for artificially produced speech to be as natural as possible and to have correct accentuation is of particular importance in voice-responsive communication equipment and/or systems that produce speech in various contexts. With known voice-responsive arrangements, the reproduced speech is sometimes difficult to understand and interpret. There is therefore a need for a speech-to-speech conversion system in which the artificial output speech is natural, has the correct accentuation, and is easy to understand.

For språk som har godt utviklet setningsaksentbetoning og/eller tonehøyde på individuelle ord, er identifiseringen av den naturlige betydning av ord/meninger svært vanskelig. Det faktum at betoninger kan plasseres feil øker risikoen for feiltolkning, eller at meningen går helt tapt for den part som hører på. For languages that have well-developed sentence stress and/or pitch on individual words, the identification of the natural meaning of words/sentences is very difficult. The fact that emphasis can be placed incorrectly increases the risk of misinterpretation, or that the meaning is completely lost to the listening party.

Det er derfor et behov, for å unngå disse problemer, at det tilveiebringes et tale-til-tale-omformingssystem som har som formål å tolke den mottatte taleinformasjon, uavhengig av språk og/eller dialekt, og å kunne tilpasse språket og/eller dialekten i den utgående tale tilsvarende som for den respektive inngående tale. Likeledes vil det, for å kunne bestemme betydningen av enkelte ord, eller fraser på et ikke-tvetydig vis i en talt sekvens være nødvendig at tale-til-tale-omformingssystemet kan bestemme, og ta hensyn til, setningsaksent og setningsbetoning i den talte sekvens. There is therefore a need, in order to avoid these problems, for a speech-to-speech conversion system to be provided whose purpose is to interpret the received speech information, regardless of language and/or dialect, and to be able to adapt the language and/or dialect in the outgoing speech corresponding to the respective incoming speech. Likewise, in order to be able to determine the meaning of certain words or phrases in an unambiguous manner in a spoken sequence, it will be necessary that the speech-to-speech conversion system can determine, and take account of, sentence accent and sentence stress in the spoken sequence .

Et formål med den foreliggende oppfinnelse er å tilveiebringe en fremgangsmåte slik beskrevet i innledningen i krav 1, kjennetegnet ved at databasen inneholder tale-inf ormas jonsdata på minst to naturlige språk, og av at nevnte fremgangsmåte er tilpasset for å kjenne igjen og tolke innmatet tale på nevnte minst to språk, med anvendelse av statistikkbasert talegjenkjennings- og språkmodelleringsteknikk for å danne en leksikalisk og syntaktisk aksepterbar talemodell for foreliggende språk og å tilveiebringe talte svar på taleinnmatinger på nevnte språk, og av at nevnte fremgangsmåte inkluderer de ytterligere trinn å evaluere en gjenkjent taleinnmating for å bestemme språket for innmatingen, effektuere en dialog med databasen for å oppnå taleinformasjonsdata for formuleringen av et talt svar på den innmatete talens språk, og å omforme taleinformasjonsdataene, som opptas fra databasen, til nevnte talte svar. An object of the present invention is to provide a method as described in the introduction in claim 1, characterized by the fact that the database contains speech information data in at least two natural languages, and by the fact that said method is adapted to recognize and interpret inputted speech in said at least two languages, using statistical speech recognition and language modeling techniques to form a lexically and syntactically acceptable speech model for the present language and to provide spoken responses to speech inputs in said language, and in that said method includes the additional step of evaluating a recognized speech input to determine the language of the input, effecting a dialogue with the database to obtain speech information data for the formulation of a spoken response in the language of the input speech, and transforming the speech information data, which is recorded from the database, into said spoken response.

Alternative utførelser av fremgangsmåten i den foreliggende oppfinnelsen er kjennetegnet ved karakteristikken i kravene 2-15. Alternative embodiments of the method in the present invention are characterized by the characteristics in claims 2-15.

Det er videre et formål med foreliggende oppfinnelse å tilveiebringe et stemmeresponsivt kommunikasjonssystem omfattende en tale-til-tale-omformingsmetode i samsvar med kravene 1-15. It is further an object of the present invention to provide a voice-responsive communication system comprising a speech-to-speech conversion method in accordance with claims 1-15.

Et annet formål med foreliggende oppfinnelse er å tilveiebringe et system for tale-til-tale-omforming slik beskrevet i innledningen i krav 17, kjennetegnet ved at taleinformasjonsdataene som er lagret i databasen er på de nevnte minst to naturlige språk, av at talegjenkjennings-og tolkningsmidlene er tilpasset for å kjenne igjen og tolke taleinnmatinger på nevnte minst to naturlige språk, og av at systemet ytterligere inkluderer evalueringsmidler for å evaluere de gjenkjente taleinnmatingene, og bestemme språket for innmatingene, dialogbehandlingsmidler for å effektuere en dialog med databasen for å oppta nevnte taleinformasjonsdata på den innmatete talens språk, og tekst-til-tale-omformingsmidler for å omforme tale-inf ormas jonsdatane, som opptas fra databasen, til et talt svar. Another object of the present invention is to provide a system for speech-to-speech transformation as described in the introduction in claim 17, characterized in that the speech information data stored in the database is in the aforementioned at least two natural languages, in that speech recognition and the interpreting means are adapted to recognize and interpret speech inputs in said at least two natural languages, and that the system further includes evaluation means for evaluating the recognized speech inputs, and determining the language of the inputs, dialogue processing means for effecting a dialogue with the database to record said speech information data in the language of the input speech, and text-to-speech converting means for converting the speech information data, which is captured from the database, into a spoken response.

Alternative utførelser av oppfinnelsen er kjennetegnet ved karakteristikken i kravene 18-37. Alternative embodiments of the invention are characterized by the characteristics in claims 18-37.

Det er videre et formål med foreliggende oppfinnelse å tilveiebringe et stemmeresponsivt kommunikasjonssystem omfattende et tale-til-tale-omformingssystem i samsvar med kravene 17-37. It is further an object of the present invention to provide a voice-responsive communication system comprising a speech-to-speech conversion system in accordance with claims 17-37.

I en foretrukket fremgangsmåte kan separate databaser anvendes for hvert og et av de nevnte to språk, og dialog kan effektueres med kun den av databasene som inneholder taleinformasjonsdata på det innmatete talte språk. Imidlertid, i tilfelle av at minst en del av den nødvendige taleinformasjonsdata for en talt respons lagres i en annen av nevnte databaser, kan metoden inkludere de ytterligere trinn og effektuere en dialog med den andre database for å oppnå de nødvendige talte informasjonsdata, oversette informasjonsdataene til språket for en av databasene, kombinere taleinformasjonsdataene fra databasen, og omdanne de kombinerte taleinformasjonsdata til en talt respons (svar) på det innmatete talte språk. In a preferred method, separate databases can be used for each one of the two languages mentioned, and dialogue can be effected with only the one of the databases that contains speech information data in the entered spoken language. However, in the event that at least part of the necessary speech information data for a spoken response is stored in another of said databases, the method may include the further steps of effecting a dialogue with the second database to obtain the necessary spoken information data, translate the information data into the language of one of the databases, combining the speech information data from the database, and converting the combined speech information data into a spoken response (answer) in the input spoken language.

I en annen foretrukket utførelse kan utfallet av vurderingsprosessen anvendes for å bestemme den databasen dialogen skal føres med for å oppnå taleinformasjonsdataene for et talt svar til den innmatete talen. In another preferred embodiment, the outcome of the assessment process can be used to determine the database to be dialogued with to obtain the speech information data for a spoken response to the input speech.

Dialogen med en database, og/eller mellom databaser, kan effektueres ved anvendelse av et databasekommunikasjonsspråk som f.eks. SQL (Structured Query Language). The dialogue with a database, and/or between databases, can be effected by using a database communication language such as e.g. SQL (Structured Query Language).

I en foretrukket fremgangsmåte i samsvar med foreliggende oppfinnelse omfatter talegjenkjenningen og tolkningen trinnene å ekstrahere prosodiinformasjon, dvs. grunntonekurven, fra en taleinnmatning, og oppta dialektinformasjon fra nevnte prosodiinformasjon, der dialektinformasjonen anvendes ved omforming av taleinformasjonsdataene som opptas fra databasen, til et talesvar der tale-svaret er på det samme språk og dialekt som den innmatete tale. Denne foretrukne fremgangsmåte omfatter videre at prosodiinformasjonen som ekstraheres fra taleinnmatingen er grunntonekurven for den innmatede talen. Den foretrukne fremgangsmåten inkluderer de ytterligere trinn å bestemme intonasjonsmønsteret for grunntonen, og derigjennom maksimums- og minimumsverdiene for grunntonekurven for den immatete talen, å bestemme intonasjonsmønsteret for grunntonekurven for en talemodell og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, å sammenligne intonasjonsmønsteret for den innmatete tale med intonasjonsmønsteret for talemodellen for å identifisere en tidsforskjell mellom forekomsten av maksimums- og minimumsverdiene for grunntonekurven for den innkommende tale i forhold til maksimums- og minimumsverdiene for grunntonekurven for talemodellen, der den identifiserte tidsdifferanse indikerer dialektkarakteristika for den innmatete tale. Tidsdifferansen kan bestemmes i relasjon til et referansepunkt i intonasjons-mønsteret, f.eks. ved et punkt ved hvilken en konsonant/vokalgrense inntreffer. In a preferred method in accordance with the present invention, the speech recognition and interpretation includes the steps of extracting prosody information, i.e. the pitch curve, from a speech input, and recording dialect information from said prosody information, where the dialect information is used when transforming the speech information data that is recorded from the database, into a speech response where speech - the answer is in the same language and dialect as the entered speech. This preferred method further comprises that the prosody information extracted from the speech input is the pitch curve for the input speech. The preferred method includes the further steps of determining the intonation pattern of the fundamental tone, and thereby the maximum and minimum values of the fundamental tone curve for the immaterial speech, determining the intonation pattern of the fundamental tone curve of a speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions, comparing the intonation pattern for the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the pitch curve of the incoming speech relative to the maximum and minimum values of the pitch curve of the speech model, where the identified time difference indicates dialect characteristics of the input speech. The time difference can be determined in relation to a reference point in the intonation pattern, e.g. at a point at which a consonant/vowel boundary occurs.

Fremgangsmåten i foreliggende oppfinnelse kan omfatte trinnet å evaluere informasjon om setningsaksenter fra prosodiinformasjonen. The method in the present invention may include the step of evaluating information about sentence accents from the prosody information.

Ordene i talemodellen kan kontrolleres leksikalt, og frasene i talemodellen kan kontrolleres syntaktisk. De ord og fraser som ikke er lingvistisk mulige utelukkes fra talemodellen. Dessuten kan ortografien og den fonetiske transkripsjon for ordene i talemodellen kontrolleres, der transkripsjonsinformasjonen omfatter leksikalisk abstrahert aksentinformasjon av typen betonte stavelser, og innforma-sjon som er egnet for plassering av sekundær aksent. Aksentinformasjonen kan vedrøre tonal ordaksent I og aksent The words in the speech model can be checked lexically, and the phrases in the speech model can be checked syntactically. Words and phrases that are not linguistically possible are excluded from the speech model. In addition, the orthography and the phonetic transcription of the words in the speech model can be checked, where the transcription information includes lexically abstracted accent information of the type stressed syllables, and information suitable for the placement of secondary accent. The accent information can relate to tonal word accent I and accent

II. II.

Dessuten kan fremgangsmåten i samsvar med foreliggende oppfinnelse anvende setningsaksentinformasjon ved tolkningen av den innmatete tale. Moreover, the method in accordance with the present invention can use sentence accent information when interpreting the inputted speech.

Oppfinnelsen tilveiebringer også et tale-til-tale-omformingssystem som, ved utmatingen derav, gir talte svar på innmatet tale på minst to naturlige språk, inkluderende talegjenkjennelsesmiddel for taleinnmatinger, tolkningshjelpemiddel for tolkning av innholdet i de gjenkjente innmatet taler, og en database som inneholder taleinformasjonsdata for anvendelse til formuleringen av nevnte talesvar, kjennetegnet av at taleinformasjonsdataene som er lagret i databasen er på nevnte minst to naturlige språk, og at talegjenkjennings- og tolkningshjelpemidler er tilpasset å kjenne igjen og tolke taleinnmatinger på nevnte minst to naturlige språk, av at talegjenkjennelsesmiddel og tolkningshjelpemiddel er tilpasset til å kjenne igjen og tolke taleinnmatingen på nevnte minst to naturlige språk og at systemet ytterligere omfatter evalueringsmidler for å evaluere de gjenkjente taleinnmatingene og bestemme språket for innmatingene, dialogbehandlingsmidler for å effektuere en dialog med databasen for å oppnå nevnte taleinformasjonsdata på det innmatete talte språk, og midler for tale-til-tale-omf orming for å omforme taleinformasjonsdataene, som opptas fra databasen, til et talt svar. The invention also provides a speech-to-speech conversion system which, upon output thereof, provides spoken responses to inputted speech in at least two natural languages, including speech recognition means for speech inputs, interpretation aids for interpreting the content of the recognized inputted speech, and a database containing speech information data for application to the formulation of said speech response, characterized by the fact that the speech information data stored in the database is in said at least two natural languages, and that speech recognition and interpretation aids are adapted to recognize and interpret speech inputs in said at least two natural languages, by the fact that the speech recognition means and the interpretation aid is adapted to recognize and interpret the speech input in said at least two natural languages and that the system further comprises evaluation means for evaluating the recognized speech inputs and determining the language for the inputs, dialogue processing means for effecting a dialogue with the database for obtaining said speech information data in the input spoken language, and speech-to-speech conversion means for converting the speech information data, which is recorded from the database, into a spoken response.

Tale-til-tale-omformingssystemet, i samsvar med foreliggende oppfinnelse, som er tilpasset til å ta imot taleinnmatinger på to eller flere naturlige språk som omfatter talegjenkjenningsmidler for den innmatete talen, tolkningshjelpemidler for å tolke innholdet i den gjenkjente innmatete tale, og en database som inneholder taleinformasjonsdata for anvendelse for formuleringen av nevnte talte svar, kjennetegnet ved at taleinformasjonsdataene som er lagret i databasen er på de nevnte minst to naturlige språk, av at talegjenkjennings- og tolkningsmidlene kan være tilpasset for å kjenne igjen og tolke taleinnmatinger på nevnte minst to naturlige språk, og av at systemet ytterligere inkluderer evalueringsmidler for å evaluere de gjenkjente taleinnmatingene, og bestemme språket for innmatingene, dialogbehandlingsmidler for å effektuere en dialog med databasen for å oppta nevnte taleinformasjonsdata på den innmatete talens språk, og tekst-til-tale-omformingsmidler for å omforme taleinformasjonsdatane, som opptas fra databasen, til et talt svar. The speech-to-speech conversion system, in accordance with the present invention, which is adapted to receive speech inputs in two or more natural languages comprising speech recognition means for the speech input, interpretation aids for interpreting the content of the recognized speech input, and a database containing speech information data for use in the formulation of said spoken responses, characterized in that the speech information data stored in the database is in said at least two natural languages, in that the speech recognition and interpretation means can be adapted to recognize and interpret speech inputs in said at least two natural language, and that the system further includes evaluation means for evaluating the recognized speech inputs and determining the language of the inputs, dialogue processing means for effecting a dialogue with the database to record said speech information data in the language of the input speech, and text-to-speech conversion means to transform speech information sdata, which is recorded from the database, into a spoken response.

Tale-til-tale-omformingssystemet kan også være kjennetegnet ved at systemet er tilpasset til å ta imot taleinnmatinger på to, eller flere, naturlige språk og for å tilveiebringe, ved utmatingen derfra, talte svar på respektive taleinnmatingsspråk, og at systemet omfatter, for hvert og et av de naturlige språk, talegjenkjenningsmidler der inngangene til hvert og et av talegjenkjenningsmidlene er tilkoplet til en felles inngang for systemet, tale-evalueringsmidler for å bestemme, i avhengighet av utmatingen fra hvert og et av talegjenkjennelsesmidlene, språket for en taleinnmating, en database som inneholder taleinformasjonsdata som anvendes ved formuleringen av talte svar på databasens språk, dialogbehandlingsmidler for tilkopling til et respektivt talegjenkjennelsesmiddel, i avhengighet av språket for den immatete tale, der behandlingsmidlene er tilpasset for å tolke innholdet i den gjenkjente tale, og på basis av tolkningen aksessere og oppta taleinformasjonsdata fra minst én av de respektive databasene, og tekst-til-tale-omformingsmiddel for å omforme taleinformasjonsdata som opptas fra behandlingsmidlene til talte svar til de respektive taleinnmatinger. The speech-to-speech conversion system can also be characterized by the fact that the system is adapted to receive speech inputs in two or more natural languages and to provide, when outputting from there, spoken responses in respective speech input languages, and that the system includes, for each of the natural languages, speech recognition means wherein the inputs of each of the speech recognition means are connected to a common input for the system, speech evaluation means for determining, depending on the output from each of the speech recognition means, the language of a speech input, a database containing speech information data used in the formulation of spoken responses in the language of the database, dialogue processing means for connecting to a respective speech recognition means, depending on the language of the immaterial speech, where the processing means are adapted to interpret the content of the recognized speech, and on the basis of the interpretation access and record speech information data from at least one of the re the respective databases, and text-to-speech converting means for converting speech information data captured from the processing means into spoken responses to the respective speech inputs.

En ytterligere utførelse av foreliggende system kan være kjennetegnet ved at systemet omfatter separate databaser for hvert og et av nevnte minst to språk, at systemet kan omfatte separate dialogbehandlingsmidler for hver og en av databasene, der hvert dialogbehandlingsmiddel er tilpasset for å effektuere en dialog med minst én av respektive databaser, at hvert av dialogbehandlingsmidlene er tilpasset for å effektuere en dialog med hver og en av databasene. A further embodiment of the present system can be characterized by the fact that the system comprises separate databases for each one of said at least two languages, that the system can comprise separate dialogue processing means for each one of the databases, where each dialogue processing means is adapted to effect a dialogue with at least one of the respective databases, that each of the dialogue processing means is adapted to effect a dialogue with each one of the databases.

Systemet kan omfatte oversettelsesmidler for oversettelse av de utgående taleinformasjonsdata fra hver og en av databasene til språket eller språkene for de andre databasene, og dersom minst én del av den nødvendige taleinformasjonsdata for et talt svar finnes lagret i en database på et annet språk enn det som er nødvendig for det talte svar, opptas nevnte informasjon fra nevnte database og oversettes av oversettelseshjelpemidler til det nødvendige språk for det talte svar, og at den oversatte taleinformasjon anvendes enten alene, eller i kombinasjon med annen taleinformasjon av dialogbehandlingsmidlet for å tilveiebringe en utmating for applisering på tekst-til-tale-omformingsmidlet. The system may include translation means for translating the outgoing speech information data from each of the databases into the language or languages of the other databases, and if at least one part of the necessary speech information data for a spoken response is stored in a database in a language other than that is necessary for the spoken response, said information is recorded from said database and translated by translation aids into the language required for the spoken response, and that the translated speech information is used either alone, or in combination with other speech information by the dialog processing means to provide an output for application on the text-to-speech converter.

Enda en ytterligere utførelse av systemet kan være kjennetegnet ved at systemet er tilpasset å ta imot taleinnmatinger på to språk, og at systemet omfatter, for hvert og et av to språk, en database, dialogbehandlingsmiddel og oversettelsesmiddel, og at hvert og et av dialogbehandlingsmidlene er tilpasset å kommunisere med hver og en av databasene, og at datautgangene fra hver og en av databasene er tilkoplet direkte til et av dialogbehandlingsmidlene og til det andre av behandlingsmidlene via et oversettelsesmiddel. A further embodiment of the system can be characterized by the fact that the system is adapted to receive voice inputs in two languages, and that the system comprises, for each of two languages, a database, dialogue processing means and translation means, and that each of the dialogue processing means is adapted to communicate with each and every one of the databases, and that the data outputs from each and every one of the databases are connected directly to one of the dialog processing means and to the other of the processing means via a translation means.

Systemet kan omfatte talegjenkjennings- og tolkningsmidler for hvert og et av de nevnte minst to naturlige språk, der inngangene til talegjenkjennings- og tolkningshjelpemidlene er tilkoplet til en felles systeminngang, og talegjenkjennings- og tolkningsmidlene omfatter ekstraksjonsmidler for ekstraksjon av prosodiinformasjon fra den innmatete tale, og midler for å oppta dialektinformasjon fra prosodiinformasjonen, hvor dialektinformasjonen anvendes av tekst-til-tale-omformingsmidlet ved omformingen av taleinformasjonsdata til det talte svar, der dialekten for det talte svar er tilpasset den innmatete tale. The system may include speech recognition and interpretation means for each of the aforementioned at least two natural languages, where the inputs to the speech recognition and interpretation aids are connected to a common system input, and the speech recognition and interpretation means include extraction means for extracting prosody information from the inputted speech, and means for recording dialect information from the prosody information, where the dialect information is used by the text-to-speech conversion means in the conversion of the speech information data into the spoken response, where the dialect of the spoken response is adapted to the input speech.

En ytterligere utførelse av ovennevnte system kan være kjennetegnet ved at utmatingen fra evalueringsmidlet anvendes for å velge den database fra hvilke taleinformasjonsdataene skal opptas av dialogbehandlingsmidlet for formulering av det talte svar til taleinnmatingen. Midlene for å oppta dialektinformasjon fra prosodiinformasjonen kan omfatte første analysemiddel for å bestemme intonasjons-mønsteret for grunntonen i den innmatete tale, og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, andre analysemiddel for å bestemme intonasjonsmønsteret for grunntonekurven for talemodellen og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, sammenligningsmidler for å sammenligne intonasjonsmønsteret for den innmatete tale med intonasjonsmønsteret for talemodellen for å identifisere en tidsforskjell mellom forekomsten av maksimums- og minimumsverdiene for grunntonekurven for den innkommende tale i forhold til maksimums- og minimumsverdiene for grunntonekurven i talemodellen, idet den identifiserte tidsforskjell indikerer dialektkarakteristikken for den innmatete tale. A further embodiment of the above system can be characterized by the output from the evaluation means being used to select the database from which the speech information data is to be recorded by the dialog processing means for formulating the spoken response to the speech input. The means for recording dialect information from the prosody information may comprise first analysis means for determining the intonation pattern for the base tone in the input speech, and thereby the maximum and minimum values for the base tone curve and their respective positions, second analysis means for determining the intonation pattern for the base tone curve for the speech model and thereby maximum - and the minimum values of the fundamental tone curve and their respective positions, comparison means for comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curve of the incoming speech relative to the maximum and minimum values of the fundamental tone curve in the speech model, as the identified time difference indicates the dialect characteristic of the input speech.

Videre kan tale-til-tale-omformingssystemet kjennetegnes ved at dialogen med en database, og/eller mellom databaser, effektueres med anvendelse av et databasekommunikasjonsspråk, som f.eks. SQL (Structured Query Language). Furthermore, the speech-to-speech conversion system can be characterized by the fact that the dialogue with a database, and/or between databases, is effected using a database communication language, such as e.g. SQL (Structured Query Language).

En annen ytterligere utførelse av tale-til-tale-omf ormingssystemet i den foreliggende oppfinnelsen kan kjennetegnes ved at tidsforskjellen bestemmes i forhold til et referansepunkt i intonasjonsmønsteret, at referansepunktet i intonasjonsmønsteret, mot hvilket tidsforskjellen måles, er det punkt ved hvilken en konsonant/vokalgrenser inntreffer, og at systemet dessuten omfatter midler for å oppta informasjon om setningsaksenter fra prosodiinforma-s jon. Another further embodiment of the speech-to-speech conversion system in the present invention can be characterized by the fact that the time difference is determined in relation to a reference point in the intonation pattern, that the reference point in the intonation pattern, against which the time difference is measured, is the point at which a consonant/vowel borders occurs, and that the system also includes means for recording information about sentence accents from prosody information.

Nok en ytterligere utførelse av oppfinnelsen kan kjennetegnes ved at talegjenkjenningsmidlet omfatter kontrollmiddel for å leksikalt kontrollere ordene i talemodellen, og for å syntaktisk kontrollere frasene i talemodellen, der de ord og fraser som ikke er lingvistisk mulige ekskluderes fra talemodellen, og at kontrollmidlet er tilpasset for å kontrollere ortografien og den fonetiske transkripsjonen av ordene i talemodellen, og at transkripsjonsinformasjonen inkluderer leksikalisk abstrahert aksentinformasjon, av typen betonte stavelser, og innforma-sjon i relasjon til plassering av sekundær aksent, hvor aksentinformasjonen relateres til tonal ordaksent I og aksent II, og setningsaksentinformasjonen anvendes ved tolkning av innholdet i den gjenkjente innmatete tale. Videre kan setningsbetoningene bestemmes og anvendes ved tolkningen av innholdet i den gjenkjente innmatete tale. A further embodiment of the invention can be characterized by the fact that the speech recognition means includes control means for lexically checking the words in the speech model, and for syntactically checking the phrases in the speech model, where the words and phrases that are not linguistically possible are excluded from the speech model, and that the control means is adapted for to check the orthography and the phonetic transcription of the words in the speech model, and that the transcription information includes lexically abstracted accent information, of the type stressed syllables, and information in relation to the placement of secondary accent, where the accent information is related to tonal word accent I and accent II, and the sentence accent information used when interpreting the content of the recognized input speech. Furthermore, the sentence emphases can be determined and used when interpreting the content of the recognized input speech.

Den foreliggende oppfinnelsen kan også kjennetegnes ved et stemmeresponsivt kommunikasjonssystem som omfatter et tale-til-tale-omf ormingssystem i samsvar med ett av kravene 17-37. The present invention can also be characterized by a voice-responsive communication system comprising a speech-to-speech conversion system in accordance with one of claims 17-37.

Tale-til-tale-omformingssystemet i samsvar med foreliggende oppfinnelse er tilpasset for å tilveiebringe, ved utmat-inger derav, talte svar til taleinnmatinger på minst to naturlige språk. Språkkarakteristikken for de talte svar, f.eks. dialekt, setningsaksent og setningsbetoning, tilpasses i forbindelse med foreliggende oppfinnelse til tilsvarende for den innmatete tale for å tilveiebringe naturlig utgående tale som lett kan forstås, som har korrekt aksentuering og som dermed gir opphav til et brukeranvendelig system. Det vil fremgå av følgende beskrivelse at tilpasningen av språkkarakteristikken oppnås ved å ekstrahere prosodiinformasjon fra den innmatete tale, dvs. grunntonekurven for den innmatete tale, og anvende prosodiinformasjonen for å bestemme dialekt-, setningsaksent- og setningsbetoningsinformasjon for anvendelse i forbindelse med formulering av de talte svar. The speech-to-speech conversion system according to the present invention is adapted to provide, by outputs thereof, spoken responses to speech inputs in at least two natural languages. The language characteristics of the spoken responses, e.g. dialect, sentence accent and sentence stress, are adapted in connection with the present invention to the equivalent of the inputted speech in order to provide natural outgoing speech that can be easily understood, which has correct accentuation and which thus gives rise to a user-usable system. It will be apparent from the following description that the adaptation of the language characteristic is achieved by extracting prosody information from the inputted speech, i.e. the pitch curve for the inputted speech, and using the prosody information to determine dialect, sentence accent and sentence stress information for use in connection with the formulation of the spoken answers.

Tale-til-tale-omformingssystemet kan derfor anvendes i mange applikasjoner, f.eks. i stemmesvarskommunikasjons-system for å effektuere en dialog mellom en bruker av systemet og en database som utgjør en del av systemets talegjenkjennelsesenhet og som inneholder taleinformasjonsdata for formulering av det talte svar til talte spørsmål/ forespørsler fra brukerne av systemet. Slike stemmesvars-kommunikasjonssystemer kan anvendes innen telekommunika-sjon, banksystem, eller sikkerhetssystem etc. for å tilveiebringe et lett forståelig, brukervennlig system. Tale-til-tale-omformingssystemet, som er illustrert i figuren, er tilpasset for å tilveiebringe, ved utmating derav, talte svar til taleinnmatinger på to naturlige språk, dvs. språk A og B, som kan være hvilke som helst naturlige språk, f.eks. svensk og engelsk. The speech-to-speech conversion system can therefore be used in many applications, e.g. in a voice response communication system to effect a dialogue between a user of the system and a database which forms part of the system's speech recognition unit and which contains speech information data for formulating the spoken response to spoken questions/requests from the users of the system. Such voice response communication systems can be used in telecommunications, banking systems, or security systems etc. to provide an easily understandable, user-friendly system. The speech-to-speech conversion system illustrated in the figure is adapted to provide, upon output thereof, spoken responses to speech inputs in two natural languages, i.e. languages A and B, which may be any natural languages, f .ex. Swedish and English.

Forklaringer til fig. 1: Explanations to fig. 1:

A = Talegjenkjenning, Språk A. A = Speech recognition, Language A.

B = Talegjenkjenning, Språk B. B = Speech recognition, Language B.

C = Språk A, Leksikon + Syntaks. C = Language A, Lexicon + Syntax.

D = Språk B, Leksikon + Syntaks. D = Language B, Lexicon + Syntax.

E = Tekst-til-tale, Språk A. E = Text-to-speech, Language A.

F = Tekst-til-tale, Språk B. F = Text-to-speech, Language B.

G = Evaluering Språk A eller Språk B. G = Evaluation Language A or Language B.

H = Dialogbehandling + Databaseaksess, Språk A. H = Dialog processing + Database access, Language A.

I = Database, Språk A. I = Database, Language A.

J = Dialogbehandling + Databaseaksess, Språk B. J = Dialog processing + Database access, Language B.

K = Database, Språk B. K = Database, Language B.

L = Overs. Språk. L = Trans. Language.

M = Språk B. M = Language B.

N = Overs. Språk. N = Trans. Language.

O = Språk B. O = Language B.

P = SQL P = SQL

Q = Språk A. Q = Language A.

R = Språk B. R = Language B.

Som det fremgår av den medfølgende figur omfatter systemet gjenkjennings- og tolkningsenheter for de respektive språk A og B. Inngangene på enhetene 1 og 2 er tilkoplet til en felles inngang til systemet. Talegjenkjennings- og tolkningsenhetene 1 og 2 anvendes for å kjenne igjen og tolke innholdet i taleinnmatingen på et vis som blir beskrevet senere. As can be seen from the accompanying figure, the system includes recognition and interpretation units for the respective languages A and B. The inputs on units 1 and 2 are connected to a common input to the system. The speech recognition and interpretation units 1 and 2 are used to recognize and interpret the content of the speech input in a way that will be described later.

En utgang på hver og en av enhetene 1 og 2 er tilkoplet til separate innganger i en evalueringsenhet 3 som er tilpasset for å evaluere de gjenkjente taleinnmatingene, og for å bestemme språket for innmatingene, dvs. språk A eller språk An output on each of the units 1 and 2 is connected to separate inputs in an evaluation unit 3 which is adapted to evaluate the recognized speech inputs and to determine the language of the inputs, i.e. language A or language

B. B.

Systemet for foreliggende oppfinnelse omfatter også to omkoplingsenheter 4 og 5, hvis respektive innganger er tilkoplet til en utgang på talegjenkjennings- og tolkningsenhetene 1 og 2. Funksjonene for omkoplingsenhetene 4 og 5 styres, på et vis som blir beskrevet senere, av evalueringsenheten 3, dvs. styringsinngangene til respektive enheter 4 og 5 er tilkoplet separate utganger av evalueringsenheten 3. The system for the present invention also comprises two switching units 4 and 5, whose respective inputs are connected to an output on the speech recognition and interpretation units 1 and 2. The functions of the switching units 4 and 5 are controlled, in a way that will be described later, by the evaluation unit 3, i.e. .the control inputs of respective units 4 and 5 are connected to separate outputs of the evaluation unit 3.

Utgangene på omkoplingsenhetene 4 og 5 er hver for seg tilkoplet til en inngang på dialogbehandlingsenhetene 6 og 7. Det vil fremgå av den senere beskrivelse at dialogbehandlingsenheten 6 og 7 anvendes for å effektuere en dialog med database-enhetene 8 og 9 for å oppta tale-inf ormas jonsdata på det innmatete talte språk, for anvendelse for formulering av de talte svar. The outputs of the switching units 4 and 5 are each connected to an input of the dialog processing units 6 and 7. It will be clear from the later description that the dialog processing units 6 and 7 are used to effect a dialog with the database units 8 and 9 in order to record speech information data in the entered spoken language, for use in formulating the spoken responses.

En leksikon- og syntaksenhet 10 for språket A er tilkoplet til en annen utgang på talegjenkjennings- og tolkningsenheten 1, til dialogbehandlingsenheten 6 og til en inngang på en tekst-til-tale-omformingsenhet 12. A lexicon and syntax unit 10 for the language A is connected to another output of the speech recognition and interpretation unit 1, to the dialogue processing unit 6 and to an input of a text-to-speech conversion unit 12.

En leksikon- og syntaksenhet 11 for språket B er tilkoplet til en annen utgang på talegjenkjennings- og tolkningsenheten 2, til dialogbehandlingsenheten 7 og til en inngang på en tekst-til-tale-omformingsenhet 13. A lexicon and syntax unit 11 for the language B is connected to another output of the speech recognition and interpretation unit 2, to the dialogue processing unit 7 and to an input of a text-to-speech conversion unit 13.

Tekst-til-tale-omformingsenhetene 12 og 13 er også hver for seg tilkoplet, med en annen inngang derav, til en utgang på dialogbehandlingsenhetene 6 og 7. The text-to-speech conversion units 12 and 13 are also separately connected, with another input thereof, to an output on the dialog processing units 6 and 7.

Utgangene på tekst-til-tale-omformingsenhetene 12 og 13 er tilkoplet til en felles talutgang for systemet. The outputs of the text-to-speech conversion units 12 and 13 are connected to a common speech output for the system.

Som det fremgår av den medfølgende figur finnes det en toveiskommunikasjon mellom dialogbehandlingsenheten 6 og database-enheten 8, og mellom dialogbehandlingsenheten 7 og database-enheten 9. Disse kommunikasjonsveier anvendes for å effektuere, som forklart nedenfor, en dialog mellom respektive behandlings- og database-enheter for å tilveiebringe taleinformasjonsdata som skal anvendes for formuleringen av de talte svar. Toveiskommunikasjonsveiene er for-bundet innbyrdes for å muliggjøre at en dialog kan utføres mellom behandlingsenhet 6 og database-enhet 9, og/eller mellom behandlingsenhet 7 og database-enhet 8. I praksis effektueres dialogen med en database-enhet, og/eller mellom database-enheter, ved anvendelse av et databasekommunikasjonsspråk, som f.eks. SQL (Structured Query Language). As can be seen from the accompanying figure, there is a two-way communication between the dialog processing unit 6 and the database unit 8, and between the dialog processing unit 7 and the database unit 9. These communication paths are used to effect, as explained below, a dialogue between the respective processing and database units to provide speech information data to be used for the formulation of the spoken responses. The two-way communication paths are interconnected to enable a dialogue to be carried out between processing unit 6 and database unit 9, and/or between processing unit 7 and database unit 8. In practice, the dialogue is effected with a database unit, and/or between databases units, using a database communication language, such as SQL (Structured Query Language).

En oversettelsesenhet 14 tilveiebringes for oversettelse av språk A til språk B, og vice versa. Det fremgår av den medfølgende figur at en seksjon 14a av oversettelsesenheten 14 har en inngang for språk B som er tilkoplet til en utgang på database-enheten 9, og en utgang for språk A som er tilkoplet til en inngang på dialogbehandlingsenheten 6. En annen seksjon 14b på oversettelsesenheten 14 har en inngang for språk A som er tilkoplet til en utgang på database-enheten 8, og en utgang for språk B som er tilkoplet til en inngang på dialogbehandlingsenheten 7. A translation unit 14 is provided for translation from language A to language B, and vice versa. It appears from the accompanying figure that a section 14a of the translation unit 14 has an input for language B which is connected to an output on the database unit 9, and an output for language A which is connected to an input on the dialogue processing unit 6. Another section 14b on the translation unit 14 has an input for language A which is connected to an output on the database unit 8, and an output for language B which is connected to an input on the dialogue processing unit 7.

De påfølgende avsnitt viser på hvilket vis tale-til-tale-omf ormingssystemet er tilpasset for å ta imot taleinnmating på naturlige språk A og B, og tilveiebringe, ved utmating derav, talte svar på språket til de respektive taleinnmatinger. The following sections show how the speech-to-speech conversion system is adapted to accept speech input in natural languages A and B, and provide, by output thereof, spoken responses in the language of the respective speech inputs.

En taleinnmating til tale-til-tale-omforingssystemet som kan være enten på språk A eller språk B gjenkjennes og tolkes av hver og en av talegjenkjennings- og tolkningsenhetene 1 og 2, i samvirke med respektive leksikon- og syntaksenheter 10 og 11, dvs. med anvendelse av statistikkbasert talegjenkjennings- og språkmodelleringsteknikk, og garanterer at de gjenkjente ordene og/eller ordkombinasjon-ene som anvendes for å forme en modell av den innmatete tale er akseptable både leksikalisk og syntaktisk. Formålet med leksikon/syntakskontrollen er å identifisere og ekskludere hvert ord fra talemodellen som ikke eksisterer i det aktuelle språk, og/eller hver frase viss syntaks ikke stemmer overens med det aktuelle språk. A speech input to the speech-to-speech redirection system which can be either in language A or language B is recognized and interpreted by each of the speech recognition and interpretation units 1 and 2, in cooperation with respective lexicon and syntax units 10 and 11, i.e. with the application of statistics-based speech recognition and language modeling techniques, and guarantees that the recognized words and/or word combinations used to form a model of the input speech are acceptable both lexically and syntactically. The purpose of the lexicon/syntax check is to identify and exclude every word from the speech model that does not exist in the language in question, and/or every phrase whose syntax does not match the language in question.

De respektive språkmodeller som skapes av enhetene 1 og 10, og enhetene 2 og 11, appliseres, og evalueres av evalueringsenheten 3 som bestemmer hvilket av språkene A og B som er mest sannsynlig for den innmatete tale. Denne evaluering effektueres på basis av sannsynlighet, dvs. sannsynligheten for at taleinnmatingen er på det ene eller det andre av språkene A og B, forskjellene mellom språk-modellene, og hvorvidt språkmoduleringen for det ene eller det andre av språkene har blitt suksessfullt avsluttet. Jo større forskjellen er mellom språk-karakteristika for språk A og B, desto lettere vil oppgaven bli for evalueringsenheten 3. The respective language models created by units 1 and 10, and units 2 and 11, are applied and evaluated by the evaluation unit 3 which determines which of the languages A and B is most likely for the inputted speech. This evaluation is carried out on the basis of probability, i.e. the probability that the speech input is in one or the other of languages A and B, the differences between the language models, and whether the language modulation for one or the other of the languages has been successfully completed. The greater the difference between the language characteristics of languages A and B, the easier the task will be for evaluation unit 3.

Avhengig av utfallet på evalueringen som utføres av enhet 3, dvs. det valgte språk for den innmatete tale, kommer en av omkoplingsenhetene 4 og 5 å aktiveres for å tilkople talegjenkjennings- og tolkningsenheten for det valgte språk til motsvarende dialogbehandlingsenhet. Depending on the outcome of the evaluation performed by unit 3, i.e. the selected language for the input speech, one of the switching units 4 and 5 will be activated to connect the speech recognition and interpretation unit for the selected language to the corresponding dialogue processing unit.

Dersom det antas, ut fra et beskrivelsessynspunkt, at språk A har blitt valgt som det mest sannsynlige språket for den innmatete tale, vil omkoplerenhet 4 aktiveres og utgangen på talegjenkjennings- og tolkningsenhet 1 vil bli tilkoplet til en inngang på dialogbehandlingsenheten 6. Således vil omkoplingsenheten 5 forbli i en ikke-aktivert tilstand, og ingen tilkopling kommer til å bli utført mellom dialogbehandlingsenheten 9 og talegjenkjennings- og tolkningsenheten 2. If it is assumed, from a description point of view, that language A has been selected as the most likely language for the input speech, switch unit 4 will be activated and the output of speech recognition and interpretation unit 1 will be connected to an input of dialog processing unit 6. Thus, the switch unit will 5 remain in a non-activated state, and no connection will be made between the dialog processing unit 9 and the speech recognition and interpretation unit 2.

I neste trinn av tale-til-tale-omformingsprosessen går behandlingsenheten 6 inn i en lingvistisk dialog med database-enheten 8, basert på den innmatete tales talemodell, for å oppta taleinformasjonsdata for formulering av et talt svar til taleinnmatingen. Taleinformasjonsdataene, som velges som et resultat av denne dialog, overføres via behandlingsenheten 6 til en inngang på tekst-til-tale-omf ormingsenheten 5 for formulering av et talt svar. Det vil fremgå av den senere beskrivelse at språk-karakteristikken for det talte svar tilpasses, så langt dette er mulig, til språk-karakteristikken for den innmatete tale. In the next step of the speech-to-speech conversion process, the processing unit 6 enters into a linguistic dialogue with the database unit 8, based on the speech model of the input speech, to record speech information data for formulating a spoken response to the speech input. The speech information data, which is selected as a result of this dialogue, is transferred via the processing unit 6 to an input on the text-to-speech conversion unit 5 for formulating a spoken response. It will appear from the later description that the language characteristic of the spoken response is adapted, as far as this is possible, to the language characteristic of the entered speech.

I det tilfelle der minst én del av de nødvendige taleinformasjonsdata for et talt svar ikke finnes lagret i database-enhet 6, men kan finnes lagret i database-enhet 9, går dialogbehandlingsenheten 6 inn i en dialog med database-enhet 9 for å oppta de nødvendige taleinformasjonsdata. Dersom de nødvendige taleinformasjonsdata finnes lagret i database-enhet 9, aksesseres den og overføres til dialogbehandlingsenheten 6 via seksjon 14a av oversettelsesenheten 14, dvs. den oversettes fra språk B til A. De oversatte taleinformasjonsdata anvendes deretter enten alene eller i kombinasjon med taleinformasjonsdata som er opptatt fra database-enheten 8, for å formulere et talt svar, dvs. omformet av tekst-til-tale-omformingsenheten 12 til det talte svar. In the case where at least one part of the necessary speech information data for a spoken response is not stored in the database unit 6, but can be found stored in the database unit 9, the dialogue processing unit 6 enters into a dialogue with the database unit 9 to record the necessary voice information data. If the necessary speech information data is stored in the database unit 9, it is accessed and transferred to the dialog processing unit 6 via section 14a of the translation unit 14, i.e. it is translated from language B to A. The translated speech information data is then used either alone or in combination with speech information data that is taken from the database unit 8, to formulate a spoken response, i.e. transformed by the text-to-speech conversion unit 12 into the spoken response.

Det er åpenbart at dersom språk B, i stedet for språk A, velges av evalueringsenheten 3 som det innmatete talespråk, vil enhetene 7, 9 og 13 anvendes, på samme vis som skissert ovenfor for enhetene 6, 8 og 12, for formulering av det talte svar. Hver informasjon som kan opptas fra database-enheten 8 vil aksesseres av og overføres til dialogbehandlingsenheten 7, og oversettelse av den over-førte informasjonsdata effektueres av seksjon 14b i oversettelsesenheten 14. It is obvious that if language B, instead of language A, is selected by evaluation unit 3 as the input spoken language, units 7, 9 and 13 will be used, in the same way as outlined above for units 6, 8 and 12, for formulating the spoken answers. Every piece of information that can be recorded from the database unit 8 will be accessed by and transferred to the dialog processing unit 7, and translation of the transferred information data is effected by section 14b in the translation unit 14.

Gjenkjennelse og tolkning av tall kan gi opphav til tekniske problem, og dersom disse problem ikke overvinnes vil vanskeligheter oppstå i forbindelse med å oppta en korrekt og meningsfull tolkning av den innmatete tale. Spesielt dersom gjenkjennelsen og tolkningen av den innmatete tale er feilaktig vil det bli ekstremt vanskelig for evalueringsenheten 3 å bestemme språket for den innmatete tale, og det vil derfor ikke bli mulig å tilveiebringe korrekte svar til taleinnmatingene. Recognition and interpretation of numbers can give rise to technical problems, and if these problems are not overcome, difficulties will arise in connection with recording a correct and meaningful interpretation of the entered speech. In particular, if the recognition and interpretation of the entered speech is incorrect, it will be extremely difficult for the evaluation unit 3 to determine the language of the entered speech, and it will therefore not be possible to provide correct answers to the speech inputs.

Derfor løses disse problem, i samsvar med foreliggende oppfinnelse, ved å ekstrahere prosodiinformasjon fra taleinnmatingene, og ved å anvende denne informasjon for å bestemme, på et vis som beskrives senere, dialekt-, setningsaksent- og setningsbetoningsinformasjon for anvendelse i gjenkjennelses- og tolkningsprosessen, og i formuleringen av de talte svar. Therefore, these problems are solved, in accordance with the present invention, by extracting prosody information from the speech inputs, and by using this information to determine, in a manner described later, dialect, sentence accent and sentence stress information for use in the recognition and interpretation process, and in the formulation of the spoken responses.

Ekstrahering av prosodiinformasjon, dvs. grunntonekurven, fra den innmatete tale effektueres gjennom posodi-ekstraksjonsmidler (ikke nærmere vist) som utgjør en del av talegjenkjennings- og tolkningsenhetene 1 og 2. Disse enheter inkluderer også midler (ikke vist) for å oppta dialektinformasjon fra prosodiinformasjonen. Extraction of prosody information, i.e. the pitch curve, from the input speech is effected through posody extraction means (not further shown) which form part of the speech recognition and interpretation units 1 and 2. These units also include means (not shown) for recording dialect information from the prosody information .

Således er, ifølge foreliggende oppfinnelse, talegjenkjennings- og tolkningsenhetene 1 og 2 tilpasset for å arbeide på et vis som er kjent for fagkyndige, for å kjenne igjen og tolke taleinnmatinger i systemet. Talegjenkjennings- og tolkningsenhetene 1 og 2 kan f.eks. arbeide med anvendelse av en "Hidden Markov"-modell, eller en tilsvarende modell. I bunn og grunn er funksjonen til enhetene 1 og 2 å omforme innmatet tale til systemet til en form som er en rett representasjon av innholdet i den innmatete tale, og som er egnet for evaluering av evalueringsenheten 3, og til å anvendes av dialogbehandlingsenhetene 6 og 7. Med andre ord må innholdet i tekstinformasjonsdata-ene, med utgangen av hver og en av talegjenkjennings- og tolkningsenhetene 1 og 2, være: Thus, according to the present invention, the speech recognition and interpretation units 1 and 2 are adapted to work in a manner known to those skilled in the art, to recognize and interpret speech inputs in the system. The speech recognition and interpretation units 1 and 2 can e.g. work with the application of a "Hidden Markov" model, or a similar model. Basically, the function of the units 1 and 2 is to transform input speech to the system into a form which is a correct representation of the content of the input speech, and which is suitable for evaluation by the evaluation unit 3, and to be used by the dialog processing units 6 and 7. In other words, the content of the text information data, with the output of each of the speech recognition and interpretation units 1 and 2, must be:

- en eksakt representasjon av den innmatete tale, og - an exact representation of the input speech, and

- anvendbar for databasebehandlingsenhetene 6 og 7 å respektivt aksesse og ekstrahere taleinformasjonsdata fra database-enhetene 8 og 9, for å anvendes ved formuleringen av et syntetisk, talt svar, dvs. igjennom en av de respektive tekst-til-tale-omformingsenhetene 12 og 13. - usable for the database processing units 6 and 7 to respectively access and extract voice information data from the database units 8 and 9, to be used in the formulation of a synthetic spoken response, i.e. through one of the respective text-to-speech conversion units 12 and 13 .

I praksis effektueres gjenkjennings- og tolkningsprosessen i bunn og grunn gjennom identifisering av et antall fonem fra et segment av den innmatete tale som kombineres til allofonstrenger, der fonemet tolkes som mulige ord, eller ordkombinasjoner, for å opprette en modell av talen. Den opprettete talemodell kommer til å ha ord og setningsaksenter i samsvar med et standardisert mønster for språket for den innmatete tale. In practice, the recognition and interpretation process is basically effected through the identification of a number of phonemes from a segment of the input speech which are combined into allophone strings, where the phonemes are interpreted as possible words, or word combinations, to create a model of the speech. The created speech model will have word and sentence accents in accordance with a standardized pattern for the language of the input speech.

Informasjonen som gjelder gjenkjennelsesordene og ord-kombinas j onene som genereres av talegjenkjennings- og tolkningsenhetene 1 og 2, kontrolleres, på et vis som skissert ovenfor, både leksikalisk og syntaktisk. I praksis effektueres dette ved anvendelse av et leksikon med ortografi og transkripsjon. The information concerning the recognition words and word combinations generated by the speech recognition and interpretation units 1 and 2 is controlled, in a manner as outlined above, both lexically and syntactically. In practice, this is effected by using a lexicon with orthography and transcription.

Således, i samsvar med foreliggende oppfinnelse, sikrer talegjenkjennelses- og tolkningsenhetene 1 og 2 at kun de ord og ordkombinasjoner som finnes akseptable, både leksikalisk og syntaktisk, anvendes for å skape en modell av den innmatete tale. I praksis er intonasjonsmønsteret for talemodellen et standardisert intonasjonsmønster for det aktuelle språk, eller et intonasjonsmønster som etableres igjennom innlæring, eller rett og slett kunnskaper, med hjelp av et antall dialekter for det aktuelle språk. Thus, in accordance with the present invention, the speech recognition and interpretation units 1 and 2 ensure that only those words and word combinations that are acceptable, both lexically and syntactically, are used to create a model of the inputted speech. In practice, the intonation pattern for the speech model is a standardized intonation pattern for the language in question, or an intonation pattern that is established through learning, or simply knowledge, with the help of a number of dialects for the language in question.

Som nevnt ovenfor kan prosodiinformasjonen, dvs. grunntonekurven, ekstraheres fra den innmatete tale igjennom ekstraksjonsenheten 3, og anvendes for å oppta dialekt-, setningsaksent- og setningsbetoningsinformasjon for å anvendes av tale-til-tale-omformingssystemet og fremgangsmåten ifølge foreliggende oppfinnelse. Spesielt kan dialektinformasjonen anvendes av tale-til-tale-omformingssystemet og fremgangsmåten for å tilpasse dialekten av den utmatete tale til dialekten for den innmatete tale, og setningsaksent og betoningsinformasjon kan anvendes ved gjenkjennelse og tolkning av den innmatete tale. As mentioned above, the prosody information, i.e. the pitch curve, can be extracted from the input speech through the extraction unit 3, and used to record dialect, sentence accent and sentence stress information to be used by the speech-to-speech conversion system and the method according to the present invention. In particular, the dialect information can be used by the speech-to-speech conversion system and the method for adapting the dialect of the outputted speech to the dialect of the inputted speech, and sentence accent and stress information can be used for recognition and interpretation of the inputted speech.

I samsvar med foreliggende oppfinnelse omfattes midler for å oppta dialektinformasjon fra prosodiinformasjon: - et første analysemiddel for å bestemme intonasjons-mønsteret for grunntonen for den innmatete tale, og derigjennom maksimums- og minimumsverdiene for grunntonekurven og dens respektive nivå; - et andre analysemiddel for å bestemme intonasjons-mønsteret for grunntonekurven for talemodellen og derigjennom maksimums- og minimumsverdiene for grunntonekurven og dens respektive nivå, og et sammenligningsmiddel for å sammenligne intonasjonsmønstre for den innmatete tale med intonasjonsmønstre for talemodellen for å identifisere en tidsforskjell mellom forekomsten av maksimums- og minimumsverdiene for grunntonekurven for den innkommende tale i forhold til maksimums- og minimumsverdiene for grunntonekurven for talemodellen, der den identifiserte tidsforskjell indikerer dialektkarakteristikken for den innmatete tale. In accordance with the present invention, means for recording dialect information from prosody information are included: - a first analysis means for determining the intonation pattern for the basic tone of the entered speech, and thereby the maximum and minimum values for the basic tone curve and its respective level; - a second analysis means for determining the intonation pattern of the pitch curve of the speech model and thereby the maximum and minimum values of the pitch curve and its respective level, and a comparison means for comparing intonation patterns of the input speech with intonation patterns of the speech model to identify a time difference between the occurrence of the maximum and minimum values for the pitch curve for the incoming speech in relation to the maximum and minimum values for the pitch curve for the speech model, where the identified time difference indicates the dialect characteristic of the input speech.

Tidsforskjellen som henvises til ovenfor kan bestemmes i forhold til et referansepunkt i intonasjonsmønsteret. The time difference referred to above can be determined in relation to a reference point in the intonation pattern.

For det svenske språk kan forskjellen, i termer av intona-sjonsmønster mellom ulike dialekter beskrives med ulike punkter i tiden for ord og setningsaksent, dvs. tidsforskjellen kan bestemmes i forhold til et punkt i intona-sjonsmønsteret, f.eks. det punkt ved hvilken en konsonant/vokalgrense inntreffer. For the Swedish language, the difference in terms of intonation pattern between different dialects can be described by different points in time for word and sentence accent, i.e. the time difference can be determined in relation to a point in the intonation pattern, e.g. the point at which a consonant/vowel boundary occurs.

Således, i et foretrukket arrangement ifølge foreliggende oppfinnelse, er den referanse mot hvilken tidsforskjellen måles, det punkt hvor konsonant/vokal-grensen, dvs. K/V-grensen, inntreffer. Thus, in a preferred arrangement according to the present invention, the reference against which the time difference is measured is the point where the consonant/vowel boundary, i.e. the K/V boundary, occurs.

Den identifiserte tidsforskjell som er nevnt ovenfor indikerer dialekten for den innmatete tale, dvs. det talte språk, og appliseres på tekst-til-tale-omformingsenheten 12 og 13 for å gjøre det mulig for intonasjonsmønsteret, og derigjennom dialekten, for den utmatete tale i systemet å korrigeres slik at den motsvarer intonasjonsmønsteret for de motsvarende ordene og/eller frasene i den tale som mates inn. Således gjør denne korrigeringsprosess det mulig for dialektinformasjonen for den tale som mates inn å inkorpo-reres i den tale som mates ut. The identified time difference mentioned above indicates the dialect of the input speech, i.e. the spoken language, and is applied to the text-to-speech conversion unit 12 and 13 to enable the intonation pattern, and thereby the dialect, of the output speech in system to be corrected so that it corresponds to the intonation pattern of the corresponding words and/or phrases in the input speech. Thus, this correction process makes it possible for the dialect information for the input speech to be incorporated into the output speech.

Som nevnt ovenfor baseres grunntonekurven for talemodellen på informasjon som resulterer fra de leksikalske (ortografi og transkripsjon) og syntaktiske kontrollene. Dessuten inkluderer transkripsjonensinformasjonen leksikalt abstrahert aksentinformasjon av typen betonte stavelser, dvs. tonale ordaksenter I og II, og informasjon som er relatert til plassering av sekundære aksenter, dvs. informasjon som gis i f.eks. ordbøker. Denne informasjon kan anvendes for å justere gjenkjennelsesmønsteret for talegjenkjennelsesmodellen, f.eks. "Hidden Markov"-modellen, for å ta hensyn til transkripsjonsinformasjonen. En mer eksakt modell av den innmatete tale oppnås derfor under tolkningsprosessen. As mentioned above, the pitch curve for the speech model is based on information resulting from the lexical (orthography and transcription) and syntactic checks. Moreover, the transcription information includes lexically abstracted accent information of the stressed syllable type, i.e. tonal word accents I and II, and information related to the placement of secondary accents, i.e. information given in e.g. dictionaries. This information can be used to adjust the recognition pattern for the speech recognition model, e.g. The "Hidden Markov" model, to take into account the transcriptional information. A more exact model of the input speech is therefore obtained during the interpretation process.

En ytterligere konsekvens av denne talemodellkorrigerings-prosess er at talemodellen med tiden kommer til å få et informasjonsmønster som er etablert igjennom en innlærings-prosess. A further consequence of this speech model correction process is that the speech model will eventually acquire an information pattern that has been established through a learning process.

Videre, ifølge systemet og metoden ifølge foreliggende oppfinnelse, sammenlignes talemodellen med en talt innmat-ingssekvens, og hver forskjell dem imellom kan bestemmes og anvendes for å bringe talemodellen i overensstemmelse med den talte sekvens og/eller for å bestemme betoningene i den talte sekvens. Furthermore, according to the system and method of the present invention, the speech model is compared with a spoken input sequence, and each difference between them can be determined and used to bring the speech model into agreement with the spoken sequence and/or to determine the emphasis in the spoken sequence.

Dessuten gjør identifiseringen av betoningene i en talt sekvens det mulig å bestemme den eksakte betydning for den talte sekvens på en utvetydig måte. Spesielt kan relative setningsbetoninger bestemmes igjennom å klassifisere forholdet mellom variasjoner og deklinasjon for grunntonekurven, hvorigjennom betonte seksjoner, eller individuelle ord, kan bestemmes. Dessuten kan tonehøyden på talen bestemmes ut fra deklinasjonen for grunntonekurven. Moreover, the identification of the stresses in a spoken sequence makes it possible to determine the exact meaning of the spoken sequence in an unambiguous way. In particular, relative sentence stresses can be determined by classifying the relationship between variations and declination for the fundamental tone curve, through which stressed sections, or individual words, can be determined. In addition, the pitch of the speech can be determined from the declination of the fundamental tone curve.

Således, for å ta hensyn til setningsbetoninger ved gjenkjennelse og tolkning av den innmatete tale til tale-til-tale-omf ormingssystemet ifølge foreliggende oppfinnelse, er prosodi-ekstraksjonshjelpemidler og den til-hørende talegjenkjennings- og tolkningsenhet, for hvert og et av språkene A og B, tilpasset for å bestemme: - et første forhold mellom variasjonen og deklinasjonen for grunntonekurven for den tale som mates inn; - et andre forhold mellom variasjonen og deklinasjonen for grunntonekurven for talemodellen, og - en sammenligning mellom de første og andre forhold, idet hver identifiserte forskjell anvendes for å bestemme setningsaksentplasseringer. Thus, in order to take sentence stress into account when recognizing and interpreting the inputted speech to the speech-to-speech conversion system according to the present invention, prosody extraction aids and the associated speech recognition and interpretation unit, for each and every one of the languages A and B, adapted to determine: - a first ratio between the variation and the declination of the fundamental tone curve of the input speech; - a second ratio between the variation and the declination of the fundamental tone curve for the speech model, and - a comparison between the first and second ratios, each identified difference being used to determine sentence accent placements.

Videre gjør klassifiseringen av forholdet mellom variasjonen og deklinasjonen for grunntonekurven det mulig å identifisere/bestemme relative setningsbetoninger, og betonte seksjoner, eller ord. Furthermore, the classification of the relationship between the variation and the declination of the fundamental tone curve makes it possible to identify/determine relative sentence stresses, and stressed sections, or words.

Også forholdet mellom variasjonen og deklinasjonen for grunntonekurven kan utnyttes for å bestemme dynamikken i grunntonekurven. The relationship between the variation and the declination of the fundamental tone curve can also be used to determine the dynamics of the fundamental tone curve.

Informasjon som oppnås i forhold til grunntonekurven i relasjon til dialekt, setningsaksent og betoning kan anvendes for tolkningen av den innmatete tale av enhetene 1 og 2, dvs. informasjonen kan anvendes på det vis som er skissert ovenfor for å oppta en bedre forståelse av innholdet i den innmatete tale, og å få intonasjonsmønsteret for talemodellen i overensstemmelse med den innmatete tale. Information obtained in relation to the fundamental tone curve in relation to dialect, sentence accent and emphasis can be used for the interpretation of the inputted speech of units 1 and 2, i.e. the information can be used in the way outlined above to obtain a better understanding of the content of the inputted speech, and to get the intonation pattern of the speech model in accordance with the inputted speech.

Ettersom den korrigerte talemodellen oppviser de språk-karakteristika (inkludert dialektinformasjon, setningsaksent og betoning) for den tale som mates inn, kan dette anvendes for å gi en økt forståelse av den tale som mates inn, og øke sannsynligheten for at evalueringsenheten 3 skal velge det rette språk for den tale som mates inn. Den korrigerte talemodell kan også anvendes av databehand-lingsenhetene 6 og 7 for å oppnå de nødvendige tale-inf ormas jonsdataene fra database-enhetene 8 og 9 for formuleringen av et svar på en stemmeinnmating i et tale-til-tale-omf ormingssystem. As the corrected speech model shows the language characteristics (including dialect information, sentence accent and stress) of the inputted speech, this can be used to provide an increased understanding of the inputted speech, and increase the probability that the evaluation unit 3 will select it correct language for the input speech. The corrected speech model can also be used by the data processing units 6 and 7 to obtain the necessary speech information data from the database units 8 and 9 for the formulation of a response to a voice input in a speech-to-speech conversion system.

Evnen til enkelt å tolke de forskjellige dialekter i et språk igjennom å anvende informasjon fra grunntonekurven er av en viss betydning, ettersom slike tolkninger kan effektueres uten at man behøver lære opp talegjenkjennings-systemet. Resultatet av dette er at størrelsen, og derigjennom kostnadene, for et talegjenkjenningssystem i samsvar med foreliggende oppfinnelse kan bli mye mindre enn det som er mulig med kjente systemer. Slike systemer har derfor klare fordeler sammenlignet med kjente talegjen-kj enningssysterner. The ability to easily interpret the different dialects of a language by using information from the fundamental tone curve is of some importance, as such interpretations can be effected without having to train the speech recognition system. The result of this is that the size, and thereby the costs, of a speech recognition system in accordance with the present invention can be much smaller than what is possible with known systems. Such systems therefore have clear advantages compared to known speech recognition systems.

Systemet er derfor tilpasset for å kjenne igjen og eksakt tolke innholdet i den tale som mates inn på to, eller flere, naturlige språk, og å tilpasse språk-karakteristika, f.eks. dialekt for stemmeresponsen med den for stemmeinn-matingene. Denne prosess tilveiebringer et brukervennlig system ettersom språket i menneske/maskin-dialogen er i overensstemmelse med dialekten til den aktuelle bruker. Den foreliggende oppfinnelse er ikke begrenset til utførelseseksemplene som er skissert ovenfor, men kan modifiseres innen rammen av de medfølgende patentkrav og oppfinnelseskonseptet. The system is therefore adapted to recognize and accurately interpret the content of the speech that is fed into two or more natural languages, and to adapt language characteristics, e.g. dialect of the voice response with that of the voice inputs. This process provides a user-friendly system as the language of the human/machine dialogue is consistent with the dialect of the user in question. The present invention is not limited to the embodiments outlined above, but can be modified within the scope of the accompanying patent claims and the invention concept.

Claims

1. Fremgangsmåte, i et stemmeresponskommunikasjonssystem, for å tilveiebringe en talt respons til en innmatet tale, der nevnte fremgangsmåte omfatter trinnene å kjenne igjen og tolke den innmatete tale, og å utnytte tolkningen for å oppta taleinformasjonsdata fra en database for anvendelse til formuleringen av det talte svar, karakterisert ved at databasen inneholder taleinformasjonsdata på minst to naturlige språk, og av at nevnte fremgangsmåte er tilpasset for å kjenne igjen og tolke innmatet tale på nevnte minst to språk, med anvendelse av statistikkbasert talegjenkjennings- og språkmodelleringsteknikk for å danne en leksikalisk og syntaktisk aksepterbar talemodell for foreliggende språk og å tilveiebringe talte svar på taleinnmatinger på nevnte språk, og av at nevnte fremgangsmåte inkluderer de ytterligere trinn å evaluere en gjenkjent taleinnmating for å bestemme språket for innmatingen, effektuere en dialog med databasen for å oppnå taleinformasjonsdata for formuleringen av et talt svar på den innmatete talens språk, og å omforme taleinformasjonsdataene, som opptas fra databasen, til nevnte talte svar.1. Method, in a voice response communication system, for providing a spoken response to an input speech, said method comprising the steps of recognizing and interpreting the input speech, and utilizing the interpretation to retrieve speech information data from a database for use in formulating the spoken responses, characterized in that the database contains speech information data in at least two natural languages, and in that said method is adapted to recognize and interpret inputted speech in said at least two languages, with the application of statistics-based speech recognition and language modeling techniques to form a lexical and syntactically acceptable speech model for the present language and to provide spoken responses to speech inputs in said language, and that said method includes the further steps of evaluating a recognized speech input to determine the language of the input, effecting a dialogue with the database to obtain speech information data for the formulation of a take lt response in the language of the inputted speech, and to transform the speech information data, which is recorded from the database, into said spoken response.

2. Fremgangsmåte i samsvar med krav 1, karakterisert ved at separate databaser anvendes for hver og en av de minst to språk.2. Method in accordance with claim 1, characterized in that separate databases are used for each of the at least two languages.

3. Fremgangsmåte i samsvar med krav 2, karakterisert ved at nevnte dialog effektueres med kun den ene av databasene som inneholder taleinformasjonsdataene på den innmatete talens språk.3. Method in accordance with claim 2, characterized in that said dialogue is effected with only one of the databases containing the speech information data in the language of the entered speech.

4. Fremgangsmåte i samsvar med krav 2, karakterisert ved at dialogen effektueres med den av databasene som inneholder taleinformasjonen på den innmatete talens språk, og av at, idet minst én del av den nødvendige taleinformasjonsdataen for et talt svar finnes lagret i en annen av databasene, idet fremgangsmåten omfatter de ytterligere trinn å effektuere en dialog med den andre av databasene for å oppnå den nødvendige tale-inf ormas jonsdataen, og oversette informasjonsdataen til språket for den nevnte ene av databasene, å kombinere taleinformasjonsdataen fra databasene, og å omforme den kombinerte taleinformasjonsdataen til et talt svar på den innmatete talens språk.4. Method in accordance with claim 2, characterized in that the dialogue is effected with the one of the databases containing the speech information in the language of the entered speech, and in that at least one part of the necessary speech information data for a spoken response is stored in another of the databases , the method comprising the further steps of effecting a dialogue with the other of the databases to obtain the required speech information data, and translating the information data into the language of said one of the databases, combining the speech information data from the databases, and transforming the combined the speech information data of a spoken response in the language of the input speech.

5. Fremgangsmåte i samsvar med ett av de foregående krav, karakterisert ved at utfallet av vurderingsprosessen anvendes for å bestemme den databasen dialogen skal føres med for å oppnå taleinformasjonsdataen for et talt svar til den innmatete talen.5. Method in accordance with one of the preceding claims, characterized in that the outcome of the assessment process is used to determine the database with which the dialogue is to be conducted in order to obtain the speech information data for a spoken response to the entered speech.

6. Fremgangsmåte i samsvar med ett av de foregående krav, karakterisert ved at dialogen med en database, og/eller mellom databaser, effektueres med anvendelse av ett databasekommunikasjonsspråk, som f.eks. DQL (Structured Query Language).6. Method in accordance with one of the preceding requirements, characterized in that the dialogue with a database, and/or between databases, is effected using a database communication language, such as e.g. DQL (Structured Query Language).

7. Fremgangsmåte i samsvar med ett av de foregående krav, karakterisert ved at talegjenkjenningen og tolkningen omfatter trinnene å ekstrahere prosodiinformasjon fra en taleinnmating, og å oppnå dialekt-inf ormas jon fra nevnte prosodiinformasjon, hvor nevnte dialektinformasjon anvendes i omformingen av taleinformasjonsdataen som frembringes av databasen, til et talt svar, hvor de talte svarene er på samme språk som den innmatete talen.7. Method in accordance with one of the preceding claims, characterized in that the speech recognition and interpretation includes the steps of extracting prosody information from a speech input, and obtaining dialect information from said prosody information, where said dialect information is used in the transformation of the speech information data produced by the database, to a spoken answer, where the spoken answers are in the same language as the entered speech.

8. Fremgangsmåte i samsvar med krav 7, karakterisert ved at prosodiinformasjonen som ekstraheres fra taleinnmatingen er grunntonekurven for den innmatete talen.8. Method in accordance with claim 7, characterized in that the prosody information extracted from the speech input is the fundamental tone curve for the input speech.

9. Fremgangsmåte i samsvar med krav 8, karakterisert ved at trinnene å bestemme informasjonsmønsteret for grunntonekurven for den innmatete tale, og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, å bestemme intonasjonsmønsteret for grunntonekurven for en talemodell, og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, å sammenligne intonasjonsmønsteret for den innmatete tale med intona-sjonsmønsteret for talemodellen for å identifisere en tidsforskjell mellom forekomsten av maksimums- og minimumsverdiene for grunntonekurven for den innkommende tale i forhold til maksimums- og minimumsverdiene for grunntonekurven for talemodellen, der den identifiserte tidsforskjell indikerer dialektkarakteristika for den innmatete tale.9. Method in accordance with claim 8, characterized in that the steps of determining the information pattern for the pitch curve for the input speech, and thereby the maximum and minimum values for the pitch curve and their respective positions, determining the intonation pattern for the pitch curve for a speech model, and thereby the maximum and the minimum values of the fundamental tone curve and their respective positions, comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curve of the incoming speech relative to the maximum and minimum values of the fundamental tone curve of the speech model , where the identified time difference indicates dialect characteristics of the input speech.

10. Fremgangsmåte i samsvar med krav 9, karakterisert ved at tidsforskjellen bestemmes i forhold til et referansepunkt i intonasjonsmønsteret.10. Method in accordance with claim 9, characterized in that the time difference is determined in relation to a reference point in the intonation pattern.

11. Fremgangsmåte i samsvar med krav 10, karakterisert ved at referansepunktet i intonasjonsmønsteret, mot hvilket tidsforskjellen måles, er det punkt ved hvilken en konsonant/vokalgrense inntreffer.11. Method in accordance with claim 10, characterized in that the reference point in the intonation pattern, against which the time difference is measured, is the point at which a consonant/vowel boundary occurs.

12. Fremgangsmåte i samsvar med et av kravene 7-11, karakterisert ved at fremgangsmåten omfatter trinnene å oppta informasjon om setningsaksenter fra prosodiinformasjonen.12. Method in accordance with one of claims 7-11, characterized in that the method comprises the steps of recording information about sentence accents from the prosody information.

13. Fremgangsmåte i samsvar med krav 12, karakterisert ved at ordene i talemodellen kontrolleres leksikalisk, og at frasene i talemodellen kontrolleres syntaktisk, og av at ordene og frasene som ikke er lingvistisk mulige ekskluderes fra talemodellen, og at ortografien og den fonetiske transkripsjon av ordene i talemodellen kontrolleres, og av at transkripsjonsinformasjonen inkluderer leksikalt abstrahert aksentinformasjon, av typen betonte stavelser, og informasjon relatert til plasseringen av sekundære aksenter.13. Method in accordance with claim 12, characterized in that the words in the speech model are checked lexically, and that the phrases in the speech model are checked syntactically, and that the words and phrases that are not linguistically possible are excluded from the speech model, and that the orthography and the phonetic transcription of the words in the speech model is controlled, and that the transcription information includes lexically abstracted accent information, of the type stressed syllables, and information related to the location of secondary accents.

14. Fremgangsmåte i samsvar med krav 13, karakterisert ved at aksentinformasjonen vedrører tonal ordaksent I og -aksent II.14. Method in accordance with claim 13, characterized in that the accent information relates to tonal word accent I and accent II.

15. Fremgangsmåte i samsvar med et av kravene 12-14, karakterisert ved at fremgangsmåten omfatter trinnet å anvende setningsaksentinformasjonen for tolkning av den innmatete tale.15. Method in accordance with one of claims 12-14, characterized in that the method includes the step of using the sentence accent information for interpretation of the inputted speech.

16. Stemmeresponsivt kommunikasjonssystem som utnytter en fremgangsmåte i samsvar med ett av de foregående krav, for å tilveiebringe et talt svar som en respons på en taleinnmating i systemet.16. A voice responsive communication system utilizing a method in accordance with one of the preceding claims to provide a spoken response in response to a voice input into the system.

17. Tale-til-tale-omformingssystem for å tilveiebringe, ved utmatingen derav, talte svar på taleinnmatinger på minst to naturlige språk, omfattende talegjenkjenningsmidler for den innmatete tale, tolkningsmidler for å tolke innholdet i den gjenkjente innmatete tale, og en database som inneholder taleinformasjonsdata for anvendelse for formuleringen av nevnte talte svar, karakterisert ved at taleinformasjonsdataene som er lagret i databasen er på de nevnte minst to naturlige språk, av at talegjenkjennings- og tolkningsmidlene er tilpasset for å kjenne igjen og tolke taleinnmatinger på nevnte minst to naturlige språk, og av at systemet ytterligere inkluderer evalueringsmidler for å evaluere de gjenkjente taleinnmatingene, og bestemme språket for innmatingene, dialogbehandlingsmidler for å effektuere en dialog med databasen for å oppta nevnte taleinformasjonsdata på den innmatete talens språk, og tekst-til-tale-omformingsmidler for å omforme taleinformasjonsdatane, som opptas fra databasen, til et talt svar.17. Speech-to-speech conversion system for providing, upon output thereof, spoken responses to speech inputs in at least two natural languages, comprising speech recognition means for the input speech, interpretation means for interpreting the content of the recognized input speech, and a database containing speech information data for use for the formulation of said spoken responses, characterized in that the speech information data stored in the database is in said at least two natural languages, in that the speech recognition and interpretation means are adapted to recognize and interpret speech inputs in said at least two natural languages, and that the system further includes evaluation means for evaluating the recognized speech inputs and determining the language of the inputs, dialogue processing means for effecting a dialogue with the database for recording said speech information data in the language of the input speech, and text-to-speech conversion means for converting the speech information data, which is recorded from the database n, to a spoken answer.

18. Tale-til-tale-omformingssystem i samsvar med krav 17, karakterisert ved at systemet er tilpasset til å ta imot taleinnmatinger på to, eller flere, naturlige språk og for å tilveiebringe, ved utmatingen derfra, talte svar på respektive taleinnmatingsspråk, og at systemet omfatter, for hvert og et av de naturlige språk, talegjenkjenningsmidler der inngangene til hvert og et av talegjenkjenningsmidlene er tilkoplet til en felles inngang for systemet, tale-evalueringsmidler for å bestemme, i avhengighet av utmatingen fra hvert og et av talegjenkjennelsesmidlene, språket for en taleinnmating, en database som inneholder taleinformasjonsdata som anvendes ved formuleringen av talte svar på databasens språk, dialogbehandlingsmidler for tilkopling til et respektivt talegjenkjennelsesmiddel, i avhengighet av språket for den immatete tale, der behandlingsmidlene er tilpasset for å tolke innholdet i den gjenkjente tale, og på basis av tolkningen aksessere og oppta taleinformasjonsdata fra minst én av de respektive databasene, og tekst-til-tale-om-formingsmiddel for å omforme taleinformasjonsdata som opptas fra behandlingsmidlene til talte svar til de respektive taleinnmatinger.18. Speech-to-speech conversion system in accordance with claim 17, characterized in that the system is adapted to receive speech inputs in two, or more, natural languages and to provide, upon output from there, spoken responses in respective speech input languages, and that the system comprises, for each of the natural languages, speech recognition means where the inputs of each of the speech recognition means are connected to a common input for the system, speech evaluation means for determining, depending on the output from each of the speech recognition means, the language for a speech input, a database containing speech information data used in the formulation of spoken responses in the language of the database, dialogue processing means for connection to a respective speech recognition means, depending on the language of the immaterial speech, where the processing means are adapted to interpret the content of the recognized speech, and on the basis of the interpretation access and record speech information data from at least one of the respective databases, and text-to-speech reshaping means for reshaping speech information data received from the processing means into spoken responses to the respective speech inputs.

19. Tale-til-tale-omformingssystem i samsvar med krav 17, karakterisert ved at systemet omfatter separate databaser for hvert og et av nevnte minst to språk.19. Speech-to-speech conversion system in accordance with claim 17, characterized in that the system comprises separate databases for each one of said at least two languages.

20. Tale-til-tale-omformingssystem i samsvar med krav 19, karakterisert ved at systemet omfatter separate dialogbehandlingsmidler for hver og en av databasene, der hvert dialogbehandlingsmiddel er tilpasset for å effektuere en dialog med minst én av respektive databaser .20. Speech-to-speech conversion system in accordance with claim 19, characterized in that the system comprises separate dialogue processing means for each of the databases, where each dialogue processing means is adapted to effect a dialogue with at least one of the respective databases.

21. Tale-til-tale-omformingssystem i samsvar med krav 20, karakterisert ved at hvert av dialogbehandlingsmidlene er tilpasset for å effektuere en dialog med hver og en av databasene.21. Speech-to-speech conversion system in accordance with claim 20, characterized in that each of the dialogue processing means is adapted to effect a dialogue with each one of the databases.

22. Tale-til-tale-omformingssystem i samsvar med krav 21, karakterisert ved at systemet omfatter oversettelsesmidler for oversettelse av de utgående tale-inf ormas jonsdata fra hver og en av databasene til språket eller språkene for de andre databasene.22. Speech-to-speech conversion system in accordance with claim 21, characterized in that the system comprises translation means for translating the outgoing speech information data from each of the databases into the language or languages of the other databases.

23. Tale-til-tale-omformingssystem i samsvar med krav 22, karakterisert ved at, dersom minst én del av den nødvendige taleinformasjonsdata for et talt svar finnes lagret i en database på et annet språk enn det som er nødvendig for det talte svar, opptas nevnte informasjon fra nevnte database og oversettes av oversettelseshjelpemidler til det nødvendige språk for det talte svar, og at den oversatte taleinformasjon anvendes enten alene, eller i kombinasjon med annen taleinformasjon av dialogbehandlingsmidlet for å tilveiebringe en utmating for applisering på tekst-til-tale-omformingsmidlet.23. Speech-to-speech conversion system in accordance with claim 22, characterized in that, if at least one part of the necessary speech information data for a spoken response is stored in a database in a different language than that required for the spoken response, said information is recorded from said database and translated by translation aids into the necessary language for the spoken response, and that the translated speech information is used either alone, or in combination with other speech information by the dialogue processing means to provide an output for application on text-to-speech the transforming agent.

24. Tale-til-tale-omformingssystem i samsvar med krav 23, karakterisert ved at systemet er tilpasset å ta imot taleinnmatinger på to språk, og at systemet omfatter, for hvert og et av to språk, en database, dialogbehandlingsmiddel og oversettelsesmiddel, og at hvert og et av dialogbehandlingsmidlene er tilpasset å kommunisere med hver og en av databasene, og at datautgangene fra hver og en av databasene er tilkoplet direkte til et av dialogbehandlingsmidlene og til det andre av behandlingsmidlene via et oversettelsesmiddel.24. Speech-to-speech conversion system in accordance with claim 23, characterized in that the system is adapted to receive speech inputs in two languages, and that the system includes, for each of two languages, a database, dialogue processing means and translation means, and that each and every one of the dialog processing means is adapted to communicate with each and every one of the databases, and that the data outputs from each and every one of the databases are connected directly to one of the dialog processing means and to the other of the processing means via a translation means.

25. Tale-til-tale-omformingssystem i samsvar med et av kravene 17-24, karakterisert ved at systemet omfatter talegjenkjennings- og tolkningsmidler for hvert og et av de nevnte minst to naturlige språk, der inngangene til talegjenkjennings- og tolkningshjelpemidlene er tilkoplet til en felles systeminngang.25. Speech-to-speech conversion system in accordance with one of claims 17-24, characterized in that the system includes speech recognition and interpretation means for each and every one of the aforementioned at least two natural languages, where the inputs to the speech recognition and interpretation aids are connected to a common system input.

26. Tale-til-tale-omformingssystem i samsvar med ett av kravene 17-25, karakterisert ved at utmatingen fra evalueringsmidlet anvendes for å velge den database fra hvilken taleinformasjonsdataene skal opptas av dialogbehandlingsmidlet for formulering av det talte svar til taleinnmatingen.26. Speech-to-speech conversion system in accordance with one of claims 17-25, characterized in that the output from the evaluation means is used to select the database from which the speech information data is to be recorded by the dialog processing means for formulating the spoken response to the speech input.

27. Tale-til-tale-omformingssystem i samsvar med ett av kravene 17-26, karakterisert ved at dialogen med en database, og/eller mellom databaser, effektueres med anvendelse av et databasekommunikasjonsspråk, som f.eks. SQL (Structured Query Language).27. Speech-to-speech conversion system in accordance with one of claims 17-26, characterized in that the dialogue with a database, and/or between databases, is effected using a database communication language, such as e.g. SQL (Structured Query Language).

28. Tale-til-tale-omformingssystem i samsvar med ett av kravene 17-27, karakterisert ved at talegjenkjennings- og tolkningsmidlene omfatter ekstraksjonsmidler for ekstraksjon av prosodiinformasjon fra den innmatete tale, og midler for å oppta dialektinformasjon fra prosodiinformasjonen, hvor dialektinformasjonen anvendes av tekst-til-tale-omformingsmidlet ved omformingen av taleinformasjonsdata til det talte svar, der dialekten for det talte svar er tilpasset den innmatete tale.28. Speech-to-speech conversion system in accordance with one of claims 17-27, characterized in that the speech recognition and interpretation means comprise extraction means for extracting prosody information from the input speech, and means for recording dialect information from the prosody information, where the dialect information is used by the text-to-speech conversion means in the conversion of speech information data into the spoken response, where the dialect of the spoken response is adapted to the input speech.

29. Tale-til-tale-omformingssystem i samsvar med krav 28, karakterisert ved at prosodiinformasjons-utdraget fra den innmatete tale er grunntonekurven for den innmatete tale.29. Speech-to-speech conversion system in accordance with claim 28, characterized in that the prosody information extract from the inputted speech is the fundamental tone curve for the inputted speech.

30. Tale-til-tale-omformingssystem i samsvar med krav 29, karakterisert ved at midlene for å oppta dialektinformasjon fra prosodiinformasjonen omfatter første analysemiddel for å bestemme intonasjonsmønsteret for grunntonen i den innmatete tale, og derigjennom maksimums-og minimumsverdiene for grunntonekurven og deres respektive posisjoner, andre analysemiddel for å bestemme intonasjons-mønsteret for grunntonekurven for talemodellen og derigjennom maksimums- og minimumsverdiene for grunntonekurven og deres respektive posisjoner, sammenligningsmidler for å sammenligne intonasjonsmønsteret for den innmatete tale med intonasjonsmønsteret for talemodellen for å identifisere en tidsforskjell mellom forekomsten av maksimums- og minimumsverdiene for grunntonekurven for den innkommende tale i forhold til maksimums- og minimumsverdiene for grunntonekurven i talemodellen, idet den identifiserte tidsforskjell indikerer dialektkarakteristikken for den innmatete tale.30. Speech-to-speech conversion system in accordance with claim 29, characterized in that the means for recording dialect information from the prosody information comprise first analysis means for determining the intonation pattern for the basic tone in the inputted speech, and thereby the maximum and minimum values for the basic tone curve and their respective positions, other analysis means for determining the intonation pattern of the fundamental tone curve of the speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions, comparison means for comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and the minimum values for the pitch curve for the incoming speech in relation to the maximum and minimum values for the pitch curve in the speech model, the identified time difference indicating the dialect characteristic of the input speech.

31. Tale-til-tale-omformingssystem i samsvar med krav 30, karakterisert ved at tidsforskjellen bestemmes i forhold til et referansepunkt i intonasjons-mønsteret .31. Speech-to-speech conversion system in accordance with claim 30, characterized in that the time difference is determined in relation to a reference point in the intonation pattern.

32. Tale-til-tale-omformingssystem i samsvar med krav 31, karakterisert ved at referansepunktet i intonasjonsmønsteret, mot hvilket tidsforskjellen måles, er det punkt ved hvilken en konsonant/vokalgrenser inntreffer.32. Speech-to-speech conversion system in accordance with claim 31, characterized in that the reference point in the intonation pattern, against which the time difference is measured, is the point at which a consonant/vowel boundary occurs.

33. Tale-til-tale-omformingssystem i samsvar med ett av kravene 28-32, karakterisert ved at systemet dessuten omfatter midler for å oppta informasjon om setningsaksenter fra prosodiinformasjon.33. Speech-to-speech conversion system in accordance with one of claims 28-32, characterized in that the system also includes means for recording information about sentence accents from prosody information.

34. Tale-til-tale-omformingssystem i samsvar med krav 33, karakterisert ved at talegjenkjenningsmidlet omfatter kontrollmiddel for å leksikalt kontrollere ordene i talemodellen, og for å syntaktisk kontrollere frasene i talemodellen, der de ord og fraser som ikke er lingvistisk mulige ekskluderes fra talemodellen, og at kontrollmidlet er tilpasset for å kontrollere ortografien og den fonetiske transkripsjonen av ordene i talemodellen, og at transkripsjonsinformasjonen inkluderer leksikalisk abstrahert aksentinformasjon, av typen betonte stavelser, og informasjon i relasjon til plassering av sekundær aksent.34. Speech-to-speech conversion system in accordance with claim 33, characterized in that the speech recognition means includes control means for lexically checking the words in the speech model, and for syntactically checking the phrases in the speech model, where the words and phrases that are not linguistically possible are excluded from the speech model, and that the control means is adapted to check the orthography and the phonetic transcription of the words in the speech model, and that the transcription information includes lexically abstracted accent information, of the type stressed syllables, and information in relation to the placement of secondary accent.

35. Tale-til-tale-omformingssystem i samsvar med krav 34, karakterisert ved at aksentinformasjonen relateres til tonal ordaksent I og aksent II.35. Speech-to-speech conversion system in accordance with claim 34, characterized in that the accent information is related to tonal word accent I and accent II.

36. Tale-til-tale-omformingssystem i samsvar med ett av kravene 33-35, karakterisert ved at setningsaksentinformasjonen anvendes ved tolkning av innholdet i den gjenkjente innmatete tale.36. Speech-to-speech conversion system in accordance with one of claims 33-35, characterized in that the sentence accent information is used when interpreting the content of the recognized input speech.

37. Tale-til-tale-omformingssystem i samsvar med ett av kravene 28-36, karakterisert ved at setningsbetoningene bestemmes og anvendes ved tolkningen av innholdet i den gjenkjente innmatete tale.37. Speech-to-speech conversion system in accordance with one of claims 28-36, characterized in that the sentence emphases are determined and used in the interpretation of the content of the recognized input speech.

38. Stemmeresponsivt kommunikasjonssystem omfattende et tale-til-tale-omformingssystem i samsvar med ett av kravene 17-37.38. Voice-responsive communication system comprising a speech-to-speech conversion system in accordance with one of claims 17-37.