WO2002061729A1 - Procede et systeme pour l'interaction vocale personne/ordinateur - Google Patents

Procede et systeme pour l'interaction vocale personne/ordinateur Download PDF

Info

Publication number
WO2002061729A1
WO2002061729A1 PCT/JP2001/000628 JP0100628W WO02061729A1 WO 2002061729 A1 WO2002061729 A1 WO 2002061729A1 JP 0100628 W JP0100628 W JP 0100628W WO 02061729 A1 WO02061729 A1 WO 02061729A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
computer
noise
human
response sentence
Prior art date
Application number
PCT/JP2001/000628
Other languages
English (en)
Japanese (ja)
Inventor
Tadamitsu Ryu
Masato Numabe
Yoichi Saitoh
Shinichiro Kubo
Hiroyuki Shimazaki
Original Assignee
Cai Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cai Co., Ltd filed Critical Cai Co., Ltd
Priority to PCT/JP2001/000628 priority Critical patent/WO2002061729A1/fr
Publication of WO2002061729A1 publication Critical patent/WO2002061729A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to a method and system for voice dialogue exchanged between a computer and a human, and more particularly to a computer which correctly recognizes a conversation sentence from a human and enables a natural dialogue between humans.
  • the present invention relates to a voice interaction method and system.
  • the computer provides information to the computer.
  • the computer asks the patient to listen to the patient's symptoms and condition and fill out a sheet before the doctor's consultation. was there.
  • the computer must correctly recognize and interpret the speech uttered by humans, that is, the conversational sentences, and create various programs for establishing conversations. We need to be able to cooperate properly.
  • speech dialogue between computers and humans has been rapidly conducted in recent years, and there has been considerable research on theoretical aspects such as the theory of speech recognition and efforts to create artificial brains. Level.
  • Adjustment techniques such as devising to recognize the noise even when there is ambient noise and noise, or normalization techniques. At present, there has not been enough research.
  • a human speaks and a conversation sentence from the human is received by a voice input device such as a microphone connected to the computer.
  • the computer interprets the input conversation and creates a response to it.
  • This response sentence is output from a voice generator such as a speaker, and this output is also input to a voice input device such as a microphone connected to the computer.
  • a voice generator such as a speaker
  • this output is also input to a voice input device such as a microphone connected to the computer.
  • the convenience evening is similar to a human conversation It will try to interpret this response and create a further response.
  • Such an operation might confuse a program for operating a computer, and prevent proper dialogue.
  • focusing on the environment in which the dialogue takes place the surroundings are not completely silent, and in many cases there is a considerable level of noise.
  • a telephone as a device for holding a voice dialogue between humans, and the same problem on the transmitting side and the receiving side has been solved by various methods.
  • a “noise canceling device” according to Japanese Patent Application Laid-Open No. 9-133177, an audio signal is input from a first microphone, while noise and an audio signal are input from a second microphone. Then, the noise / voice signal from the second microphone is inverted in phase and combined with the voice signal from the first microphone to obtain a reduced noise signal, which is input to the voice input of the device to be used.
  • the noise and surrounding noise are reduced by synthesizing the signal obtained by inverting the phase of the audio signal with the audio output signal of the device and outputting the synthesized signal from the speaker.
  • the present invention deals with human-to-human dialogue, even if it is applied to a voice dialogue between a computer and a human, the computer cannot interpret human conversational sentences.
  • humans listen separately to the other party without being aware of their own conversations, recognize and interpret only what the other party has said, and create a response sentence taking into account their own past remarks.
  • a computer if no adjustment is made to the audio signal obtained by the microphone, all of them are judged to be speech recognition targets, and the process of creating a response sentence is started. This confuses the program that runs the dialogue as described above.
  • the noise canceller included in the audio input to the microphone plays an important role in accurately recognizing the conversation sentence by the combination, but the noise canceller invention in the telephone as described above is directly applied. Can not be used. That is, the method of always canceling noise as in the above-mentioned publication has a drawback that noise cannot be removed properly if the volume difference between the noise and the conversation becomes small. Therefore, attempts have been made to collect noise over a predetermined period and learn it to create a noise canceling signal.
  • computers and humans There has not yet been developed how to specifically apply such an attempt to a method of voice dialogue between them, and there has been a demand for the development of such adjustment technology or normalization technology.
  • An object of the present invention is to provide a voice interaction method and system capable of performing a natural and accurate dialogue between a computer and a human in response to the above demand. It is another object of the present invention to provide a speech dialogue method and system for surely canceling a response sentence from a computer and leaving only a human conversation sentence as an object of computer conversation recognition.
  • Another object of the present invention is to provide a speech dialogue method and system that can be easily recognized by a computer, and thus enable accurate recognition of conversational sentences.
  • a speech dialogue method comprising a step of canceling a sentence by a speech cancellation device and removing the sentence from a conversation recognition target by a computer.
  • the response sentence canceling step sets a flag on the response sentence creation signal created by the computer and outputs the response sentence from the voice generating device. After that, the voice input to the voice input device after a predetermined time is canceled using the response sentence creation signal as a reference signal.
  • the invention according to claim 3 is the voice interaction method according to claim 1 or 2, further comprising a step of removing noise from a voice received by the voice input device by a noise canceller device, and thereafter, a computer Thus, a response sentence to a conversation sentence from a human is created.
  • the noise canceling step includes a noise in a time zone when the voice level is so low that it is clear that no speech is being made from a human or a computer, At other times, cancel the utterance And learning the accumulated noise for a predetermined period of time, and canceling the learned noise from the voice signal from the voice input device and removing it during the next utterance from a human.
  • a second aspect of the present invention is a voice dialogue system between a computer and a human being, wherein the voice input device listens for a conversation between a computer and a human, and the output signal from the voice input device is Speech canceller that cancels the response to the response sent from the computer and removes it from the target of conversation recognition by the computer, and conversation recognition in the computer that interprets the conversation from a human and creates a response to it.
  • the present invention provides a voice dialogue system including a response text creation unit and a voice generating device for outputting a response text from a computer.
  • the voice canceling device flags the response sentence creation signal created by the computer, and sends the response sentence from the voice generating device.
  • a clock that measures the time from when the voice input device receives the voice after outputting it, and when the time until the voice input device receives the voice after outputting the response from the voice generating device is less than the specified time, Canceller means for determining that the voice received by the voice input device is a response sentence created by the computer, and canceling the response sentence creation signal as a reference signal.
  • the invention according to claim 7 is the voice dialogue system according to claim 5 or 6, further comprising a noise canceller device that removes noise from a voice received by the voice input device, whereby the computer includes a noise canceller device.
  • the conversation sentence recognition / response sentence creation unit is characterized in that it receives only conversation sentences from humans that do not include noise as conversation recognition targets.
  • the invention according to claim 8 is the method according to claim 7, wherein the noise canceller device is configured to perform noise during a time period when the voice level is low enough to make it clear that no speech is being made from a person or a computer, or In other time periods, the noise for which the utterance content has been canceled is accumulated for a predetermined period of time for learning, and during the next utterance from a human, the learned noise is canceled and removed from the voice signal from the voice input device. Noise learning / noise removing means.
  • FIG. 1 is a flowchart of one embodiment for explaining what kind of dialog between a computer and a human is applied to a voice dialog method according to the present invention.
  • FIG. 2 is a block diagram showing one embodiment of a system for performing any kind of dialogue with a human shown in FIG.
  • FIG. 3 is a flowchart of another embodiment of a dialog between a computer and a human.
  • Fig. 4 shows an example of dialogue when a computer provides travel guidance to a human.
  • FIG. 5 is a block diagram showing an embodiment of the host computer 9 of the system for executing the dialog shown in FIG.
  • Figure 6 is a table showing a conventional relational database.
  • FIG. 5 is a flowchart showing a flow of one embodiment of a voice dialogue method between a computer and a person according to the present invention.
  • FIG. 8 is a flowchart showing a flow of a method for canceling a response sentence signal by the voice canceling device employed in FIG.
  • 9 (A) and 9 (B) are a schematic configuration diagram of an embodiment of a voice dialogue system between a computer and a human according to the present invention, and a block diagram of a configuration in a computer 30, respectively. Confuse.
  • FIG. 10 is a schematic diagram for explaining the operation in the voice interaction system between the computer and the human shown in FIG.
  • step 1 a database and a program are recorded (step 1), and voice input is performed. If there is, it is word-decomposed 'After analyzing the sentence, it is determined whether or not the information item exists (step 2), and it is determined whether or not the information item necessary to identify the record is included in the input speech. If the answer is “No”, the required information items are asked to humans (Step 3), and if the answer is “Yes” to J1 or the information items necessary to identify the record by Step 3 are collected. In this case, the program proceeds according to the program (step 4).
  • a computer interacts with humans using data stored in a relational database stored in memory.
  • Fig. 6 is a table showing a conventional typical relational data rate data structure.
  • S l to Sn are attributes serving as search keys, that is, schemes, and T l l to Tmn are tuples that are contents or values. Each line makes up one record. If the relational database is for travel, the schemes sl to Sn may be, for example,
  • Each record describes the sample for these schemas.
  • Each record includes, for example, “Hawaii”, “Thai packet”, “Helsinki in Finland”, “Kyoto”, “Aomori in Mutsu”, “Okibashi”, “English”, etc. Key information is recorded.
  • the memory also specifies the dialogue sequence that defines the order in which each scheme should be put on the topic in dialogue with humans, the wording when each scheme becomes a topic, its deformation, etc.
  • the program is also recorded.
  • the computer's CPU can Call a program to perform the dialogue according to the program.
  • the dialogue sequence begins with a conversational sentence, such as "What's your business?"
  • step 2 when a human utters a voice to the computer, the computer recognizes phonemes using a microphone phone, voice recognition software, and the like. For example, “I want to be happy.” Using a word dictionary, syntax dictionary, case dictionary, etc., these words are disassembled and sentence analysis is performed, and then “I want to go to Hawaii.” And “I want to see the aurora.”
  • a value corresponding to each scheme (herein, referred to as an “information item”) exists in the speech input obtained by the word disassembly-sentence analysis in the decision J1.
  • Many records are stored in the relational data base, and this is to determine which of the records the audio input requires. That is, if the input information does not include the information items required to identify the record, the user is asked in step 3 about the missing information items, and all the required information items are heard. This identifies one (or a few) records.
  • the information item of the record in step 4 Using, the computer's CPU runs the program according to the dialog sequence. Typically, the dialogue sequence proceeds in a predetermined order using all of the records or information items corresponding to the appropriate scheme. In the illustrated embodiment, the missing information item recall process step 3 is skipped. — Ask the question by putting back the name of the song in the sentence.
  • FIG. 2 one embodiment of a system for performing the dialog shown in FIG. 1 is shown.
  • FIG. 2 shows an embodiment of a system for implementing a voice interaction method via an in-home network, but is not limited to this.
  • a voice interaction method via an in-home network
  • FIG. 2 shows an embodiment of a system for implementing a voice interaction method via an in-home network, but is not limited to this.
  • humans and computers interact directly without going through the Internet.
  • Such a topic dialogue system generally includes a voice input device 1 such as a microphone, a voice output device S3 such as a speaker and a headphone, a user terminal 5, and communication such as an Internet connection, an intranet, and a LAN. It comprises a line 7 and a host combination 9 for managing this system.
  • the voice input device 1 converts a voice uttered by a human being as a user into a digital signal that can be processed by a computer.
  • the sound output device 3 converts the sound into a sound based on a sound generation signal generated by the computer.
  • the user terminal 5 can be connected to the Internet by various well-known personal computers.
  • the processing result at the user terminal 5 is transmitted to the host computer 9 via the communication line 7, and the processing result at the host computer 9 can be received by the user terminal 5 via the communication line 7.
  • the host computer 9 is provided with a memory 11 for recording various data and programs, and a CPU 13 for calling a program recorded in the memory and performing various controls.
  • Memory 11 records a number of schemes, namely a relational data base 1 la consisting of schemas and tuples, and a program that defines the order in which each scheme is to be discussed in the topic.
  • a dialogue sequence unit 1 lb, and a word recording unit 11 c for recording a program that defines wording when each scheme becomes a topic are provided.
  • the CPU 13 includes an information item determination control means 13a for analyzing the input voice of the user by word decomposition and sentence analysis to determine whether or not there is an information item corresponding to each scheme, and a relational data processor.
  • the information items required to identify the record being based are included in the input audio. ⁇
  • a program progress control means 13c is provided for using the information item of the record to advance the program in accordance with the interactive sequence.
  • FIG. 3 is a flowchart of another embodiment of a dialog between a computer and a human.
  • FIG. 5 is a block diagram showing an embodiment of the host computer 9 of the system for performing the dialogue as shown in FIG.
  • the dialog shown in Fig. 3 is different from the dialog shown in Fig. 1 in that the computer can identify and interact with scenes (topics) from human voices. The difference is that it is possible to insert a small biz-like dialogue that adds a little bit to the topic during the conversation.
  • a database and a program are recorded (step 11), and then a human is asked for an index for identifying a scene (step 12).
  • the type data base is specified, it is recorded in the cache memory (step 13), and if there is a voice input from the user, it is analyzed after word analysis and sentence analysis to determine the existence of the information item. (Step 14), and judge whether the information item necessary to identify the record is included in the input voice. If “No” for J11, the human is required to hear the information item (Step 15). ), And if the judgment J 11 is “Yes” or if the information items necessary to identify the record are obtained in step 15, the program proceeds (step 16), and the predetermined scan is performed.
  • the program and / or tuple become a topic, the subroutine for small scenes is entered (step 17), and when the small scene subroutine is completed, the program returns to the original dialogue sequence and the remaining program is executed. (Step 18).
  • a plurality of relational databases are recorded and stored in the memory of the convenience store, and each of them is provided with an index that can identify the scene (topic) that is being handled on the basis of the data from the others.
  • the computer's memory also stores a relational database that defines small scenes associated with a given scheme and / or tuple.
  • the relational data base that defines the small scene is It consists of a structural example (corresponding to a schema) consisting of multiple items and a content example (corresponding to a tuple), which is the contents of the structural example.
  • a program that defines an interaction sequence and an item sequence, which is the order of making each scheme and item a topic, and a wording when each scheme and item becomes a topic is recorded. I have.
  • Fig. 4 shows an example of dialogue when a computer provides travel guidance to a human.
  • step 12 the computer first utters a question, such as “please do your business.”, Which is a question for inquiring an index for identifying a scene to which a conversation is directed to a human.
  • a human responds, for example, "Looking for a summer vacation destination", the scene of "travel information” is specified according to the input index of "travel destination", “search”, and the like.
  • search the user's response, for example,
  • Negative words such as " ⁇ ) bad", "What is the subscript m not ⁇ , ⁇ r ⁇ ?”, "About ⁇ without a hotel and ⁇ 3 ⁇ 41 ⁇ .”
  • the scene of “complaint” is specified.
  • the scene (topic) can be identified by finding a word serving as an index included therein. It has features in points. That is, a specific one scene can be selected from a large number of scenes by finding a word serving as a predetermined index.
  • step 13 the CPU calls the relational database of the scene (topic) of the travel guide specified by the index from the memory and records it in the cache memory so that it can be rewritten.
  • the schema of the scene of travel is, for example, “Destination”, “Purpose”, “Days”, “Departure (time)”, “Number of people”, “Breakdown of companions”, “Budget”, “Designation of airline” , “Designation of hotel”, “room specifications”, “meal availability”, “option”, “passport”, “necessity of visa”, “payment method”, etc. Therefore, in the combination, in the essential information item presence / absence determining step (step 14), it is determined whether or not all the essential information items for identifying the record are present. Then, in a counseling mode that searches for a target record from a series of questions and answers, a dialogue for searching for a missing information item is started.
  • the computer asked the question "Where do you want to go?" And the user answered “Is it UK or America?" From the answer to “Is it British or American?”, Combi U will detect that the user has not decided on the destination, and will transition to the advise mode to confirm this as soon as possible.
  • the dialogue between humans based on the knowledge that the counseling mode and the advice mode appear alternately and develop the dialogue, this was applied for the first time to the dialogue between the viewer and human beings.
  • the user's answer, “house line,” is used as an information item to recommend a piece of information.
  • step 17 a transition is made to a small scene of “Singing and sleeping child price” using “family” as a keyword. Specifically, after asking about the composition and age of the family, they explained about the “paying for a bed with a child” and asked if they would be eligible. If the answer in the sixth line of the dialogue shown in Fig. 4 is, for example, "honeymoon", then it is possible to shift to the small scene using "honeymoon" as a keyword. For example, it is possible to take up a variety of topics such as pick-up and drop-off from the airport to the hotel by limousine, a special dinner in a private room, and a room at the front desk of the wedding reception.
  • topics such as pick-up and drop-off from the airport to the hotel by limousine, a special dinner in a private room, and a room at the front desk of the wedding reception.
  • the program proceeds according to the dialogue sequence using the information item of the record. Normally, it proceeds to confirm the entered schema to the user in order. In the present embodiment, the user has indicated that he will go on a 10-night, 10-day trip to Orlando, Florida. However, it may be difficult to determine whether or not the intention is confirmed after confirming all the conditions of this package trip in the event of a dispute at a later date. Therefore, "Destination”, “Days”, “Cost” It is preferable to clearly confirm the items that have been confirmed. For example, "destination"
  • a dialogue with humans is conducted using a relational database consisting of a structural example consisting of a plurality of items and a content example that is the contents of the structural example.
  • the order in which the items appear in the dialogue is determined by the item sequence.
  • the items of family travel include "breakdown of family", “sex of child”, “age of child”, “whether or not to pay for bed-sharing child”, and "number of people". In the example above, "Please tell us about your family.”
  • the illustrated preferred embodiment is characterized in that the structural case defining the sub-scene is a past interactive case.
  • a small scene composed of a plurality of pieces of content information collected as described above can be recorded in a memory as an example of a dialog. Then, the next time the same small scene becomes a topic, control is performed so that the dialogue proceeds with the item sequence based on the dialogue example.
  • control is performed so that the dialogue proceeds with the item sequence based on the dialogue example.
  • Small scenes can be constructed in an infinite hierarchy. In other words, it can be constructed such that one small scene has a lower sub-scene, and that sub-scene has a lower sub-scene.
  • the variety of conversations between the computer and humans is infinitely widespread, and the conventional technology that simply handles the prepared conversations is used. It can completely dispel the peculiar monotony of interacting with computers, which has been a criticism.
  • the system of the present embodiment is the same as the Tobix dialogue system shown in FIG. 2 except for the configuration of the host computer 9, so that only different configurations will be described.
  • the same reference numerals as those in FIG. 2 are used for the same components as those in FIG.
  • the memory 11 of the host computer 9 contains a relational database 1 la in the evening, and a program that defines the order in which each scheme is to be placed in the topic. 1 lb, and a word recording unit 11 c that records a program that defines wording when each scheme becomes a topic.
  • the relational data base section 11a is divided into sections llaa to llan so that a relational database of a large number of different scenes (topics) can be stored.
  • each relational database is pre-selected and registered with one or more words that serve as an index that can be distinguished from others. Then, by finding the word, one relational data base is specified.
  • Each relational database is the same as that in Fig. 2 in that it consists of a schema and a tuple.
  • the memory 11 is also provided with one or more relational data bases 1 Id that define small scenes associated with a given scheme and / or tuple. If there are multiple relational data bases that specify small scenes, as in the data space section 11a, the relational data base section 11d should also be used for data processing.
  • Divided into L 1 dm.
  • the relational data base that defines a small scene also consists of a structural example (equivalent to a schema) consisting of a plurality of items and a content example (equivalent to a tuple) that is the content of the structural example. In the relational data base that defines each small scene, it is also necessary to determine in advance which of the relational data bases that define the scene will be transferred to the small scene when it appears. is necessary.
  • the memory 11 further records an item sequence, which is the order in which each item is made a topic, and an item sequence part 11 e and a program that records a program that defines wording when each item becomes a topic.
  • a recording unit 11 f is provided.
  • the host computer 9 further includes, in addition to the memory 11, a cache memory 15 for calling the relational data space specified by the index and recording it in a rewritable manner 6
  • the CPU 13 is provided with information item presence / absence control means 13a, essential information item hearing control means 13b, and program progress control means 13c. Have been.
  • the CPU 13 further includes an index query control means 13 d for inquiring an index for specifying which scene the dialog is about from the computer to a human, and a scene according to the input index.
  • an index query control means 13 d for inquiring an index for specifying which scene the dialog is about from the computer to a human, and a scene according to the input index.
  • a cache memory recording control means 13 e for calling the specified relational data base from the memory and rewritably recording it in the cache memory.
  • the CPU 13 calls a relational database for defining a small scene, and as a subroutine, a subroutine progress control means for executing a program according to an item sequence. 13 f, and a return sequence control means 13 g for returning to the interactive sequence and proceeding with the remaining program when the item sequence is completed.
  • the mandatory information item recall control means 13b asks for a question by putting the name of the scheme in a recall sentence, thereby recalling the missing key report item.
  • the method of voice dialogue between a computer and a human generally includes a step of receiving a conversation sentence from a human by a voice input device such as a microphone (step 2). 1), a step in which the computer creates a response sentence in accordance with a program for performing the conversation (step 22), a step in which the response sentence is output from a sound generator such as a speaker (step 23), and a sound such as a microphone.
  • Step 32 After setting the flag on the answer creation signal (step 31) and outputting the response sentence from the voice generator, the voice input to the voice input device after a predetermined time is canceled using the response sentence creation signal as a reference signal. (Step 32).
  • the input signal is preferably a pure signal with little noise. Therefore, it is preferable to interpose a step (step 26) for removing noise from the audio signal received by the audio input device by the noise canceller device between step 21 and step 22.
  • a step for removing noise from the audio signal received by the audio input device by the noise canceller device between step 21 and step 22.
  • noise cancellation noise is removed from a voice signal input from a voice input device such as a microphone, so that only a voice signal corresponding to a conversation sentence from a human remains.
  • the noise canceling step the noise is accumulated during a time period when the voice level is low enough that it is clear that no speech is being made from a human or a computer, or in other time periods, the noise canceling the utterance is accumulated for a predetermined period of time. Then, during the next utterance from a human, the learned noise is canceled and removed from the voice signal from the voice input device. As a result, even when the volume of the noise increases and the difference from the volume of the conversation sentence from a human decreases, the noise can be surely eliminated.
  • the noise collection time required for noise cancellation is about 3 seconds.
  • the voice dialogue system is a keyboardless keyboard having a microphone 31 for listening to a conversation between a computer and a human and a speed 32 for outputting a response sentence created by the computer. Includes 30 minutes.
  • the computer 30 recognizes phonemes of the voice input from the microphone 31, analyzes the words / sentences and analyzes them as a conversational sentence, and the computer recognizes the voice signal from the microphone 31.
  • a voice canceling device 3-4 that cancels the response to the created response sentence and removes it from the subject of conversation recognition by the computer, and
  • a conversation sentence recognition / response sentence creation unit 35 that interprets these conversation sentences and creates a response sentence therefor.
  • the voice canceling device 34 is a means for flagging the response sentence creation signal created by the viewer, and a voice after outputting the response sentence from the voice generating device.
  • a clock 34b that measures the time until the input device receives the voice, and if the time until the voice input device receives the voice after outputting the response from the voice generation device is within the specified time, the voice The canceller means 34c which determines that the voice received by the input device is a response sentence created by the computer and cancels the response sentence creation signal as a reference signal.
  • the noise canceller device 37 various types of devices including a conventionally known device can be adopted, whereby the conversational sentence recognition / response sentence creation unit 35 can use only noise-free conversational sentences from humans. As a conversation recognition target.
  • the noise canceller device 37 cancels the noise in the time period when the sound level is low, that is, when no human or the computer is speaking, or in other time periods.
  • the learned noise is accumulated for a predetermined time. Then, during the next utterance from a human, the learned noise is canceled and removed from the voice signal from the voice input device.
  • the computer can correctly recognize the human utterance even if a human interrupts while outputting a response sentence from the evening. This has the effect of enabling word analysis and sentence analysis to create a response sentence.
  • the conventional method has the disadvantage that if a human utters before the output of the computer, the computer will not be able to recognize the voice, or the program will be confused and the conversation will be impossible.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé d'interaction vocale personne/ordinateur, qui comprend les étapes suivantes: (21) réception de langage humain depuis un dispositif d'entrée vocale capable de reconnaître un dialogue personne/ordinateur, (22) interprétation de ce langage et création par l'ordinateur de mots en réponse audit langage, (23) conversion par un générateur vocal de ces mots de réponse, sous forme de langage humain, (24) entrée des mots de réponse créés par l'ordinateur, à l'aide du dispositif d'entrée vocale, et (25) annulation des mots de réponse en question par un dispositif d'annulation vocale, pour supprimer ces mots du contenu destiné à être reconnu par l'ordinateur.
PCT/JP2001/000628 2001-01-31 2001-01-31 Procede et systeme pour l'interaction vocale personne/ordinateur WO2002061729A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2001/000628 WO2002061729A1 (fr) 2001-01-31 2001-01-31 Procede et systeme pour l'interaction vocale personne/ordinateur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2001/000628 WO2002061729A1 (fr) 2001-01-31 2001-01-31 Procede et systeme pour l'interaction vocale personne/ordinateur

Publications (1)

Publication Number Publication Date
WO2002061729A1 true WO2002061729A1 (fr) 2002-08-08

Family

ID=11736963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2001/000628 WO2002061729A1 (fr) 2001-01-31 2001-01-31 Procede et systeme pour l'interaction vocale personne/ordinateur

Country Status (1)

Country Link
WO (1) WO2002061729A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014232289A (ja) * 2013-05-30 2014-12-11 三菱電機株式会社 誘導音声調整装置、誘導音声調整方法および誘導音声調整プログラム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5914769B2 (ja) * 1980-03-04 1984-04-06 三洋電機株式会社 音声機器
JPS612960B2 (fr) * 1978-08-30 1986-01-29 Fujitsu Ltd
JPH04287099A (ja) * 1991-03-15 1992-10-12 Nippondenso Co Ltd 音声認識システム
JPH05323993A (ja) * 1992-03-16 1993-12-07 Toshiba Corp 音声対話システム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS612960B2 (fr) * 1978-08-30 1986-01-29 Fujitsu Ltd
JPS5914769B2 (ja) * 1980-03-04 1984-04-06 三洋電機株式会社 音声機器
JPH04287099A (ja) * 1991-03-15 1992-10-12 Nippondenso Co Ltd 音声認識システム
JPH05323993A (ja) * 1992-03-16 1993-12-07 Toshiba Corp 音声対話システム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014232289A (ja) * 2013-05-30 2014-12-11 三菱電機株式会社 誘導音声調整装置、誘導音声調整方法および誘導音声調整プログラム

Similar Documents

Publication Publication Date Title
US7788095B2 (en) Method and apparatus for fast search in call-center monitoring
US9626959B2 (en) System and method of supporting adaptive misrecognition in conversational speech
US9263039B2 (en) Systems and methods for responding to natural language speech utterance
CN100578614C (zh) 用语音应用语言标记执行的语义对象同步理解
US8064573B2 (en) Computer generated prompting
US20100217591A1 (en) Vowel recognition system and method in speech to text applictions
US8812314B2 (en) Method of and system for improving accuracy in a speech recognition system
KR20120038000A (ko) 대화의 주제를 결정하고 관련 콘텐트를 획득 및 제시하는 방법 및 시스템
US20130136243A1 (en) Method and Apparatus For Voice Interactive Messaging
JP3437617B2 (ja) 時系列データ記録再生装置
WO2002061729A1 (fr) Procede et systeme pour l'interaction vocale personne/ordinateur
Thirion et al. The South African directory enquiries (SADE) name corpus
WO2002067244A1 (fr) Procede de reconnaissance de la parole pour interaction de la parole, systeme et programme de reconnaissance de la parole
MXPA97009035A (en) System and method for the sound interface with information hiperenlaz

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: "NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC (EPO FORM 1205A DATED 24/11/03)"

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP