WO2005091128A1 - 音声処理装置とシステム及び音声処理方法 - Google Patents
音声処理装置とシステム及び音声処理方法 Download PDFInfo
- Publication number
- WO2005091128A1 WO2005091128A1 PCT/JP2005/004959 JP2005004959W WO2005091128A1 WO 2005091128 A1 WO2005091128 A1 WO 2005091128A1 JP 2005004959 W JP2005004959 W JP 2005004959W WO 2005091128 A1 WO2005091128 A1 WO 2005091128A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- processing
- identification information
- client
- voice
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4938—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/146—Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/26—Network addressing or numbering for mobility support
Definitions
- the present invention relates to a voice processing technology, and more particularly, to a system, a device, and a method for transmitting voice information input on a terminal (client) side to a voice processing device via a network for processing.
- a mobile phone terminal makes a telephone connection to a voice processing server using a phone-to function or the like, and processes a user's voice for voice processing (voice recognition, speaker verification, etc.). Process), transmit the result from the voice processing server to the web server, create a screen reflecting the processing result on the web server, download the screen on the mobile phone terminal, and display it.
- a technique in which the above is linked for example, see Japanese Patent No. 3452250 (Reference 1)).
- a mobile phone terminal 11 and a voice processing server 13 transmit and receive data through a circuit switching network 15, and a mobile phone terminal 11 and a Web server 12 transmit data through a packet network 14. Transmission and reception.
- voice information such as feature vectors and compressed voice data has been transmitted from a client such as a personal digital assistant (PDA) or an in-vehicle terminal to a voice processing server via a packet network.
- PDA personal digital assistant
- voice processing server via a packet network.
- a technique for performing voice processing processing such as voice recognition and speaker verification
- reference 2 Japanese Patent Application Publication No. 2003-5949 (Reference 2)
- the prior art of the above document 1 is a method of linking a telephone number and a mobile phone terminal ID, so that a telephone number is not required! This is a technology that cannot be used.
- the server side determines the relationship between the screen downloaded to the client and the voice data transmitted from the client. A new technique to understand is needed.
- an object of the present invention is to provide an information providing server (information providing device) such as a web server, information downloaded to a client (terminal), and voice transmitted from the client to a voice processing server (voice processing device).
- the purpose is to enable the server to control the relationship with the information.
- Another object of the present invention is to enable downloading of appropriate information reflecting the results of voice processing even when the voice processing server and the information providing server are accessed from a plurality of clients. It is to be. Means for solving the problem
- a voice processing system includes a terminal that transmits input voice information and outputs received information, and performs voice processing based on voice information from the terminal.
- a voice processing device that performs the voice processing, and an information providing device that receives a result of the voice processing performed by the voice processing device and transmits information reflecting the voice processing result to the terminal.
- the terminal, the voice processing device, and the information providing device perform voice processing. Voice processing device and And process identification information corresponding to a series of processes performed by the information providing device.
- the terminal transmits the input voice information to the voice processing device, and the voice processing device performs voice processing of the voice information from the terminal. Transmitting the voice processing result to the information providing apparatus; providing the information providing apparatus with information reflecting the voice processing result in the voice processing apparatus; and transmitting the prepared information to the terminal.
- the terminal, the voice processing device, and the information providing device share process identification information corresponding to a series of processes performed by the voice processing device and the information providing device based on the voice information.
- the information providing server device performs the first receiving means for receiving a service request signal from a client, and performs the service request signal based on voice information from the client when the service request signal is received.
- Identification information generation means for generating processing identification information corresponding to a series of processing, means for generating first information to be presented to a client based on processing identification information, and transmission of processing identification information and first information to a client
- a first transmitting means for receiving voice processing results and processing identification information from a voice processing server which receives voice signals and processing identification information from a client and performs voice processing; and a voice processing server.
- the client device transmits to the client device a voice processing server that performs voice processing of voice information from the client device, and information reflecting the voice processing result of the voice processing server.
- a unique identification information output means for outputting the unique identification information of the client device as processing identification information corresponding to a series of processing by the information providing server, and a service request signal when the service is requested, and a service identification signal.
- a second transmitting unit for transmitting the input voice information to the voice processing server together with the process identification information.
- the voice processing server device includes a first receiving unit that receives a voice processing request signal from a client, and a voice processing request signal that is received from the client when the voice processing request signal is received.
- Identification information generating means for generating processing identification information corresponding to a series of processing performed based on these audio information, first transmission means for transmitting the processing identification information to the client, and audio information and processing from the client.
- Second receiving means for receiving the identification information; voice processing executing means for performing voice processing of voice information from the client; and a voice processing result by the voice processing performing means and processing identification information from the client for processing identification.
- transmitting means for generating information reflecting the result of the voice processing in association with the information and transmitting the information to the client.
- a program according to the present invention is a program for causing a computer constituting the information providing server device, the client device, or the voice processing server device to realize the functions of the respective devices.
- An information processing system includes a client and a plurality of servers,
- a series of processes (A), (B), and (C) are managed by common process identification information shared by a client, one server, and another server.
- the client terminal
- the voice processing server voice processing device
- the information providing server information providing device
- the server can grasp the relationship between the information downloaded from the information providing server to the client and the voice information transmitted from the client to the voice processing server. Becomes possible. As a result, even when the voice processing server and the information providing server are accessed from a plurality of clients, the user can download appropriate information reflecting the voice processing result.
- processing such as search is performed based on voice information uttered by the user.
- the result is displayed on the screen or when the user downloads appropriate information based on the voice information uttered by the user, it is possible to provide content in which voice processing and the screen are linked.
- FIG. 1 is a diagram showing a configuration of a conventional system.
- FIG. 2 is a diagram showing a configuration of an example of the present invention.
- FIG. 3 is a diagram showing a configuration of a first exemplary embodiment of the present invention.
- FIG. 4 is a diagram showing a configuration of a second exemplary embodiment of the present invention.
- FIG. 5 is a diagram showing a configuration of a third exemplary embodiment of the present invention.
- FIG. 6 is a diagram showing a configuration of a client in a first specific example of the present invention.
- FIG. 7 is a diagram showing a configuration of a Web server in a first specific example of the present invention.
- FIG. 8 is a diagram showing a configuration of a voice processing server according to a first specific example of the present invention.
- FIG. 9 is a diagram showing a configuration of a client in a second specific example of the present invention.
- FIG. 10 is a diagram showing a configuration of a Web server in a second specific example of the present invention.
- FIG. 11 is a diagram showing a configuration of a voice processing server in a third specific example of the present invention.
- FIG. 12 is a diagram for explaining the operation of the first specific example of the present invention.
- FIG. 14 is a diagram for explaining the operation of the second specific example of the present invention.
- FIG. 14 is a diagram for explaining the operation of the third example of the present invention.
- FIG. 15 is a diagram for explaining an example of transition of a screen (page) displayed on the client in the first specific example of the present invention.
- FIG. 16 is a diagram for explaining another example of a transition of a screen (page) displayed on the client in the first specific example of the present invention.
- a client (terminal) 10, a Web server (information providing server, information providing device) 20, and a voice processing server (voice processing device) 30 are connected to a network.
- Work is connected.
- the client 10 has a voice data input unit and a browser function, and has a communication function of connecting to a packet network 40 such as an IP network as a network.
- the client 10, the web server 20, and the voice processing server 30 share process identification information corresponding to a series of processes performed by the web server 20 and the voice processing server 30 based on voice data.
- the process identification information for example, an ID assigned to the session of the utterance process (referred to as “session ID”) or a unique ID held by the client 10 can be used.
- FIG. 3 is a diagram showing a configuration of the first exemplary embodiment of the present invention.
- the Web server 20 includes a session ID generation unit that generates a session ID generated for each session.
- a session ID is generated in the web server 20 when the client 10 requests the web server 20 for a service using voice processing.
- the generated session ID is transmitted from the Web server 20 to the client 10 when the client 10 downloads the screen information from the Web server 20.
- the session ID may be transmitted by being included in the screen information.
- the client 10 When transmitting the voice information of the input voice to the voice processing server 30, the client 10 transmits the session ID received from the web server 20 to the voice processing server 30.
- the method of transmitting the ID may be included in the voice information or may be transmitted separately.
- the voice processing server 30 performs voice processing (voice recognition, speaker verification, etc.) based on the received voice information.
- the voice processing server 30 also transmits the session ID when transmitting the voice processing result to the web server 20.
- the method of transmitting the session ID may be included in the voice processing result.
- the Web server 20 can associate the result of the voice processing in the voice processing server 30 with the client 10 that has requested the service by using the session I green report, and display the screen reflecting the processing result on the screen. It is possible to have the client 10 download it. that time, The Web server 20 transmits a screen (page) including voice processing result information such as a voice recognition result of an utterance to the client 10, and downloads screen information corresponding to the voice processing result by selection from the client 10. As well.
- FIG. 4 is a diagram showing the configuration of the second exemplary embodiment of the present invention, which has a configuration in which the ID held by the client 10 is used as a unique ID. Processing procedure when using the ID held by the client 10 in advance as a unique ID (unique ID) for the client, or when generating an ID unique to the client (unique ID) using the ID held by the client 10 in advance Let's explain.
- the client 10 When the client 10 requests the Web server 20 for a service using voice processing, the client 10 notifies the Web server 20 of the ID held in advance as a unique ID. Alternatively, the client 10 uses the ID previously held by the client 10 to newly generate a client-specific ID, and notifies the Web server 20 of the generated unique ID. As a method of generating a unique ID, for example, time stamp information may be added to an ID held in advance.
- the screen information of the requested service is downloaded from the Web server 20 to the client 10.
- the screen downloaded from the Web server 20 is displayed on the screen display unit 140 of the client 10, and the client 10 receives a voice signal input by the user, converts the voice signal into voice information, and converts the voice signal into voice information.
- the voice information is sent to the, the unique ID is also sent.
- the voice processing server 30 performs voice processing based on the received voice information.
- the voice processing server 30 also transmits the unique ID to the web server 20 when transmitting the voice processing result to the web server 20.
- Web server 20 receives the voice processing result and the unique ID from voice processing server 30.
- the Web server 20 can associate the voice processing result with the client 10 that has requested the service by using the unique ID from the voice processing server 30, and the screen information reflecting the voice processing result is transmitted to the client. 10 can be downloaded.
- the Web server 20 displays a screen (page 1) including the voice processing result information such as the voice recognition result of the utterance. ) Is transmitted to the client 10, and the screen information corresponding to the voice processing result is downloaded according to the selection from the client 10.
- FIG. 5 is a diagram showing the configuration of the third exemplary embodiment of the present invention.
- the audio processing server 30 includes a session ID generation unit that generates a session ID generated for each session. With reference to FIG. 5, the processing procedure of the present embodiment will be described.
- a session ID is generated by the session ID generation unit 31 of the voice processing server 30 and is notified to the client 10.
- the client 10 notifies the Web server 20 of the received session ID.
- the voice processing server 30 performs voice processing based on voice information received from the client 10.
- the voice processing server 30 also transmits the session ID to the web server 20 when transmitting the voice processing result to the web server 20.
- the Web server 20 it is possible to associate the voice processing result with the client having the service request by the session I green report, and the client 10 can download the screen reflecting the processing result to the client 10. It becomes possible. At that time, the Web server 20 transmits a screen (page) including the voice processing result information such as the voice recognition result of the utterance to the client 10, and downloads the screen information corresponding to the voice processing result according to the selection from the client 10. Configuration.
- a method for transmitting the session ID from the web server 20 to the client 10 is as follows.
- the transmission method for transmitting the session ID from the client 10 to the voice processing server 30 is as follows.
- a method of transmitting a session ID from the voice processing server 30 to the web server 20 is as follows.
- the client 10 is connected to a web server 20 and a voice processing server 30 via a network (packet network) 40.
- Examples of the client include a mobile terminal, a PDA (Personal Digital Assistant), an in-vehicle terminal, a PC (personal computer), and a home terminal.
- the Web server 20 and the voice processing server 30 include a computer equipped with Windows XP (registered trademark), Windows 2000 (registered trademark) or the like as an OS (operating system), or a computer equipped with Solaris (registered trademark) as an OS.
- the network (packet network) 40 an IP network such as the Internet (wired Z wireless) or an intranet is used.
- the Web server 20 has a session ID generation unit that generates a session ID.
- FIG. 6 is a diagram showing a configuration of the client 10 according to the first specific example of the present invention.
- client 10 includes a data input unit 110 that functions as a voice input unit and inputs voice data, a screen display unit 140, a data communication unit 130, and a control unit 120.
- FIG. 7 is a diagram showing a configuration of the Web server 20.
- the Web server 20 includes a data communication unit 210, a content management unit (information management unit) 220, and a session ID generation unit 230.
- FIG. 8 is a diagram showing a configuration of the audio processing server 30.
- the audio processing server 30 includes a data communication unit 310, a control unit 320, and an audio processing execution unit 330.
- FIG. 12 is a diagram for explaining the sequence operation of this specific example. Fig. 6 to Fig. 8, Fig. This specific example will be described with reference to FIG.
- the client 10 requests the Web server 20 for a service including voice processing (step S101). Specifically, a service request signal is transmitted to the Web server 20 by a click operation of a button on the screen displayed on the client 10, and the Web server 20 executes a service. And other programs are started.
- the service request signal from the client 10 is received by the data communication unit 210 (step S201) and transmitted to the content management unit 220.
- the session ID generation unit 230 receives the service request signal and generates a session ID (Step S202).
- the ID may be generated by counting up by a predetermined initial value access number.
- the generated session ID is transmitted to content management section 220.
- the content management unit 220 generates a screen to be downloaded to the client 10 based on the received session ID (step S203).
- the session ID may be included in the link destination URL (Uniform Resource Locator) information of the result acquisition button.
- the generated screen is downloaded to the client through the data communication unit 210 of the Web server 20 (step S204).
- the session ID is sent to the client 10 as well.
- the screen information and the session ID received from the Web server 20 are received by the data communication unit 130 (step S102), and transmitted to the control unit 120 of the client 10.
- the screen information is transmitted from the control unit 120 to the screen display unit 140 and displayed.
- the screen information on the client 10 includes, for example, a user's A prompt is displayed.
- the voice uttered by the user is input to the data input unit 110 of the client 10 (step S104), and transmitted to the control unit 120 in the client 10.
- Necessary data processing is performed by the control unit 120 of the client 10 (step S105).
- the data processing for example, digitization processing of input voice, voice detection processing, voice analysis processing, voice compression processing, and the like are performed.
- speech data for example, digitized speech data, compressed speech data, feature vectors, etc. are used (for details, see “Speech Recognition by Stochastic Models”, by Seiji Nakagawa, ⁇ .10-12; The Institute of Electronics, Information and Communication Engineers (Reference 3)).
- processing for including the session ID in the audio data is performed.
- processing for including the session ID in the audio data is performed.
- the data processed by the control unit 120 of the client 10 is sequentially transmitted from the data communication unit 130 to the voice processing server 30.
- the data sequentially transmitted from the client is received by the data communication unit 310 (step S 301), and when the control unit 320 determines that the data is voice data, It is transmitted to the voice processing execution unit 330.
- the voice processing execution unit 330 includes at least one of a recognition engine, a recognition dictionary, a synthesis engine, a synthesis dictionary, a speaker verification engine, and the like, all of which are required for voice processing and are not shown. And perform voice processing sequentially (step S302).
- the content of the audio processing varies depending on the type of data transmitted from the client 10. For example, if the data to be transmitted is compressed voice data, decompression, voice analysis, and matching of the compressed data are performed. On the other hand, when the feature vector is transmitted from the client 10, only the matching process is performed.
- the voice processing result transmitted from the voice processing server 30 to the Web server 20 is at least one of recognition result information, speaker verification information, and voice (synthesized voice, voice obtained by converting input voice, and the like). Including one.
- the session ID is also transmitted from the voice processing server 30 to the web server 20. As a method of sending the session ID,
- Web server 20 receives the voice processing result and the session ID by data communication unit 210 (step S205), and transmits them to content management unit 220.
- the content management unit 220 outputs result information based on the voice processing result (for example, voice recognition result information, see screens 1003 in Figs. 15 and 16 described later) or content information in which the voice processing result is reflected. (Screen, audio, video, etc.) is created for each session ID (step S206).
- the result information and the content, or only the content, created for each session ID are downloaded to the client 10 that has made the service request from the Web server 20 (step S207), and the client 10 downloads it.
- the received result information / content is received (step S106).
- the link destination URL of the button for acquiring the result of the screen downloaded from the web server 20 to the client 10 is a URL including the session ID.
- the content management unit 220 places the content information in which the result of the audio processing is reflected on the Web server 20 at a location represented by the URL including the session ID.
- the result acquisition button of the client 10 for example, the “display map” button on the screen 1003 in FIG. 15
- the URL including the session ID is designated, and the URL corresponding to the URL is specified.
- Content information (for example, the map screen of screen 1004 in FIG. 15) is downloaded.
- the web server 20 can be used for various processes, such as performing a process such as a search using the voice processing result.
- the processes of the client 10, the web server 20, and the voice processing server 30 shown in FIG. 12 are executed on a computer (computer) that configures the client 10, the web server 20, and the voice processing server 30.
- the function may be realized by a program that executes the program.
- the Web server 20 and the voice processing server 30 may be realized on one computer, or may be realized by a remote computer.
- the transfer of the ID between the web server 20 and the voice processing server 30 may be an argument of a subroutine call.
- the variable may be a commonly referred variable.
- the present embodiment can be applied to a system in which a client that makes a processing request to a server is mounted on the same computer as the server.
- the present invention can be applied to an arbitrary management system in which a plurality of servers cooperate and perform a client request.
- FIG. 9 is a diagram showing a configuration of the client 10 according to the second specific example of the present invention.
- client 10 includes a data input unit 110 which functions as a voice input unit and inputs voice data, a screen display unit 140, a data communication unit 130, a control unit 120, and a unique ID holding and generation unit.
- Unit (unique identification information output means) 150 is provided.
- FIG. 10 is a diagram showing a configuration of the Web server 20.
- Web server 20 includes a data communication unit 210 and a content management unit 220.
- the sound processing server 30 has the configuration shown in FIG. 8, and includes a data communication unit 310, a control unit 320, and a sound processing execution unit 330.
- FIG. 13 is a diagram for explaining the sequence operation of this specific example. This specific example will be described with reference to FIG. 9, FIG. 10, FIG. 8, and FIG. [0073]
- the client 10 uses the unique ID holding / generation unit 150 to store the ID (terminal identification information) previously held by the client 10 in the unique ID. It is transmitted to the control unit 120 as an ID (unique identification information) (Step Slll).
- the unique ID holding / generation unit 150 generates an ID unique to the client using the ID held in advance, and notifies the control unit 120 of the generated unique ID.
- time stamp information may be added to an ID held in advance.
- the control unit 120 receives the service request and the ID, and transmits the received unique ID to the Web server 20 via the data communication unit 130 (Step S112).
- the Web server 20 receives the service request signal including the received voice processing and the unique ID in the data communication unit 210 (Step S211).
- Data communication section 210 transmits the service request signal and the unique ID to content management section 220.
- the content management section 220 After checking the service, the content management section 220 generates a screen (first information) to be downloaded to the client 10 based on the received unique ID (step S212).
- the session ID may be included in the URL (Uniform Resource Locator) information of the link of the button for acquiring the result, as in the specific example.
- step S213 it is downloaded to the client 10 through the screen data communication unit 210 generated by the content management unit 220 (step S213).
- the screen information received from the Web server 20 is received by the data communication unit 130 (step S113), and transmitted to the control unit 120.
- the screen information is transmitted from the control unit 120 to the screen display unit 140 and displayed (step S114).
- the voice uttered by the user is input to the data input unit 110 of the client 10 (step S115), and transmitted to the control unit 120.
- the control unit 120 performs the data processing described in the specific example. At the time of this data processing, processing for including the unique ID in the audio data is performed.
- the processed data is sequentially transmitted from the data communication unit 130 to the voice processing server 30 (step S116).
- the process of including the unique ID in the audio data is the same as in the above specific example.
- the data sequentially transmitted from the client 10 is The data is received by the data communication unit 310 (step S311), the control unit 320 determines that the data is audio data, and transmits it to the audio processing execution unit 330.
- the voice processing execution unit 330 includes a recognition engine (not shown), a recognition engine, a recognition dictionary, and the like, which are required for voice processing (voice recognition, speaker verification, and the like), as in the specific example. It has at least one of a synthesis engine, a dictionary for synthesis, a speaker verification engine, and the like, and performs voice processing sequentially (step S312). After the end of the voice processing, the voice processing result is transmitted from the voice processing execution unit 330 to the data communication unit 310 via the control unit 320, and is transmitted from the data communication unit 310 to the Web server 20 (step S313). At this point, the unique ID is also transmitted from the voice processing server 30 to the Web server 20.
- the transmission method is the same as in the above specific example.
- the Web server 20 receives the voice processing result and the unique ID transmitted from the voice processing server 30 by the data communication unit 210 (Step S214), and transmits the result to the content management unit 220.
- the content management unit 220 of the Web server 20 associates the unique ID with the information reflecting the audio processing result (second information: audio processing result information and content information corresponding to the audio processing result, Or, content information corresponding to the audio processing result is prepared (step S215).
- the content management unit 220 of the Web server 20 can determine the client 10 to which the information reflecting the sound processing result is to be transmitted, based on the unique ID of the client.
- the Web server 20 sends the result information (for example, the voice recognition result screen of the screen 1003 in FIG. 15) and the content (for example, FIG. 15) created for each unique ID to the client 10 that made the service request.
- the result information for example, the voice recognition result screen of the screen 1003 in FIG. 15
- the content for example, FIG. 15
- the client 10 receives the downloaded information (step S117). Is displayed on the screen of client 10.
- the method of downloading the created content information is the same as the specific example described above.
- Each process of the client 10, the web server 20, and the voice processing server 30 shown in Fig. 13 is executed on a computer (computer) that configures the client 10, the web server 20, and the voice processing server 30.
- the function may be realized by a program that executes the program.
- the voice processing server 30 includes a processing unit that generates a session ID.
- FIG. 11 is a diagram showing a configuration of the audio processing server 30. Referring to FIG. 11, the voice processing server 30 of this specific example is different from the voice processing server 30 shown in FIG. 8 in that a session ID generation unit 340 is added. Note that the client 10 of this specific example has the configuration shown in FIG. 6, and the Web server 20 has the configuration shown in FIG. Hereinafter, the operation of this specific example will be described.
- FIG. 14 is a diagram for explaining the sequence operation of this specific example. This specific example will be described with reference to FIGS. 6, 10, 11, and 14. FIG.
- the client 10 requests the Web server 20 for a service including voice processing (step S121).
- the Web server 20 receives the service request signal at the data communication unit 210 (step S 221), and transmits the signal to the content management unit 220.
- the content management section 220 receives the service request signal, checks the service, generates a screen of the requested service (step S222), and transmits (downloads) the screen to the client 10 through the data communication section 210 (step S223). .
- the client 10 receives the screen information from the Web server 20 (step S122), and further transmits a voice processing request signal to the voice processing server 30 to transmit voice information to the voice processing server 30. (Step S123).
- the data communication unit 310 receives the voice processing request signal (step S321) and transmits it to the control unit 320.
- the control unit 320 transmits the audio processing request signal to the session ID generation unit 340.
- the session ID generation unit 340 of the voice processing server 30 receives the session ID request signal.
- the session ID generated by the session ID generation unit 340 of the voice processing server 30 is transmitted from the session ID generation unit 340 of the voice processing server 30 to the data communication unit 310 via the control unit 320.
- the data communication unit 310 of the voice processing server 30 transmits the session ID to the client 10 (Step S322).
- the client 10 receives the session ID from the voice processing server 30 (step S124), and transmits the session ID to the control unit 120 via the data communication unit 130.
- the session ID is transmitted to Web server 20 via data communication unit 130 of client 10 (step S125).
- the session ID is received by the data communication unit 210 (step S224), and transmitted to the content management unit 220 for management.
- the voice uttered by the user is input to the data input unit 110 (step S126), and transmitted to the control unit 120.
- the control unit 120 performs the same data processing as in the specific example described above.
- the session ID may be included in the audio data.
- the processed data is sequentially transmitted from the data communication unit 130 of the client 10 to the voice processing server 30 (step S127).
- the data sequentially transmitted from the client 10 is received by the data communication unit 310 (step S323), and the control unit 320 determines that the data is voice data. , To the voice processing execution unit 330.
- the speech processing execution unit 330 includes a recognition engine required for speech processing (speech recognition, speaker verification, etc.), a recognition dictionary, a synthesis engine, a synthesis dictionary, It has at least one function such as a person verification engine, and performs voice processing sequentially (step S324). After the end of the voice processing, the voice processing result is transmitted from the voice processing execution unit 330 to the data communication unit 310 via the control unit 320, and is transmitted from the data communication unit 310 to the Web server 20 (step S325). The result of the audio processing is the same as in the above specific example. At this point, the session ID is also changed from the voice processing server 30 to the web server 20. Sent to The transmission of the session ID is performed in the same manner as in the specific example.
- Web server 20 receives the voice processing result and the session ID in data communication section 210 (step S225), and transmits them to content management section 220.
- the result of the audio processing is the same as the specific example.
- the session ID is also transmitted from the voice processing server 30 to the Web server 20 as in the above-described specific example.
- the Web server 20 receives the voice processing result and the session ID in the data communication unit 210 and transmits them to the content management unit 220.
- the content management unit 220 of the Web server 20 stores information (speech processing result information and content information corresponding to the speech processing result, or information corresponding to the speech processing result) reflecting the speech processing result corresponding to the session ID. Content information) is created for each session ID (step S226).
- the Web server 20 transmits the result information (for example, the voice recognition result screen of screen 1003 in FIG. 15) and the content (for example, screen 1004 in FIG. 15) to the client that made the service request for each session ID.
- the client 10 receives the downloaded information from the Web server 20 (step S226), and only the content (eg, the map screen of the screen 1004 in FIG. 15) is downloaded (step S226).
- the client 10 is notified of the link destination URL of the button for acquiring the result of the screen downloaded to the client 10 at the start of the audio processing from the audio processing server 30.
- the client 10 executes processing to make the URL including the session ID, and the Web server 20 places the content information reflecting the audio processing result in the URL containing the session ID.
- the button for acquiring the result of the client screen for example, the “display map” button on the screen 1003 in FIG. 15
- the content information reflecting the audio processing result is downloaded to the client 10. You can.
- the processes of the client 10, the web server 20, and the voice processing server 30 shown in Fig. 14 are executed on a computer (computer) that configures the client 10, the web server 20, and the voice processing server 30.
- the function may be realized by a program
- FIG. 15 is a diagram illustrating an example of a transition of a screen (page) displayed on the screen display unit 140 of the client 10 in the first specific example of the present invention, which described the sequence operation with reference to FIG. .
- the screen display of the client 10 in the first specific example of the present invention will be described with reference to FIG. 15 and FIG.
- the screen 1001 is a screen downloaded from the web server 20 (the top page of “map search”), and the “voice input” button 1011 includes a CGI (for example, http: ⁇ ... .jp / a.cgi). Linked.
- the user makes a service request by clicking the “voice input” button 1011 displayed on the screen (corresponding to step S101 in FIG. 12).
- a process called "a.cgi” is started, and the input information is delivered.
- the Web server 20 creates HTML and returns it to the client 10 as a response.
- the "Voice Entry” screen 1002 appears. "Please say the address of the map you want to search for as” Mita, Minato-ku, Tokyo ". (Corresponding to steps S102 to S104 in FIG. 12). O The ID is embedded as a tag in the screen. In the state of this screen 1002, the user performs voice input (utterance).
- the “display result” button 1012 on the screen is linked to the page (http: ⁇ - ⁇ -// b.ID.html) generated for each ID! /.
- the recognition result recognized by the voice processing server 30 is displayed as in the next screen 1003. Note that the recognition result screen of the screen 1003 displays what is downloaded from the Web server 20 to the client 10.
- the Web server 20 When the user clicks the “display map” button 1013 on the screen, the Web server 20 also downloads the content information (corresponding to step 106 in FIG. 12) and displays the map screen (page) 1004.
- the screen 1004 may be directly displayed as a result of the screen 1002 without displaying the recognition result screen of the screen 1003. That is, the screen 1003 of the voice recognition result by the voice processing server 30 is created for each ID. By clicking the "display result" button 1012 of the screen 1002, the screen 1004 reflecting the voice recognition result is directly displayed. A configuration may be adopted (in this case, the screen 1003 in FIG. 15 is omitted).
- FIG. 15 and FIG. 16 described below show an example of a screen of a map guidance system by voice input.
- the present invention is not limited to a powerful system. And can be applied to any utterance management.
- FIG. 16 is a diagram showing a modification of FIG.
- the “display result” button 1012 of the screen 1002 in FIG. 15 is not displayed.
- a recognition result screen 1003 without clicking the “Display result” button 1002a on the screen is displayed as shown in a screen 1002 in FIG.
- Clicking the "Show Map” button 1013 displays the map on screen 1004.
- the map on the screen 1004 is directly displayed as a result of voice input on the screen 1002a without displaying the screen 1003.
- the Web server 20 transmits the URL information of the screen to the client 10, and the client 10 automatically receives the received URL information.
- the screens 1003 and 1004 shown in FIGS. 15 and 16 are displayed.
- a “re-input voice” t or button may be created on the screen 1004 in FIG. 15 or FIG. .
- an ID is newly created when the user clicks the “Re-enter voice” button on screen 1004.
- the 15th screen 1002 or the 1002a screen in FIG. 16 is displayed, and the voice input can be performed again.
- a button for “TOP page” is created on screen 1004 in FIG. 15 or FIG.
- the present invention can be applied to a service providing system in which a screen is displayed on a client, a request is made by voice, and the result is displayed on the screen.
- a service providing system in which a screen is displayed on a client, a request is made by voice, and the result is displayed on the screen.
- the client since data can be transmitted and received through a packet network, the client may be a personal digital assistant (PDA), a PC, an in-vehicle terminal, a home terminal, etc. Can be used.
- PDA personal digital assistant
- PC personal digital assistant
- in-vehicle terminal a home terminal, etc. Can be used.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
- Computer And Data Communications (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006511243A JP4725512B2 (ja) | 2004-03-18 | 2005-03-18 | 音声処理システム、音声処理方法、音声処理サーバ装置、およびプログラム |
US10/593,041 US7835728B2 (en) | 2004-03-18 | 2005-03-18 | Voice processing unit and system, and voice processing method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004079078 | 2004-03-18 | ||
JP2004-079078 | 2004-03-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005091128A1 true WO2005091128A1 (ja) | 2005-09-29 |
Family
ID=34993882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/004959 WO2005091128A1 (ja) | 2004-03-18 | 2005-03-18 | 音声処理装置とシステム及び音声処理方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7835728B2 (ja) |
JP (1) | JP4725512B2 (ja) |
WO (1) | WO2005091128A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010527467A (ja) * | 2007-04-02 | 2010-08-12 | グーグル・インコーポレーテッド | 電話による要求への位置を基にした応答 |
JP2017017669A (ja) * | 2015-06-30 | 2017-01-19 | 百度在線網絡技術(北京)有限公司 | 声紋による通信方法、装置及びシステム |
CN113542260A (zh) * | 2021-07-12 | 2021-10-22 | 宏图智能物流股份有限公司 | 一种基于分发方式的仓库用语音传输方法 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3885523B2 (ja) * | 2001-06-20 | 2007-02-21 | 日本電気株式会社 | サーバ・クライアント型音声認識装置及び方法 |
JP2008287674A (ja) * | 2007-05-21 | 2008-11-27 | Olympus Corp | 情報処理装置、クライアント装置、情報処理システム及びサービス接続方法 |
US10354689B2 (en) * | 2008-04-06 | 2019-07-16 | Taser International, Inc. | Systems and methods for event recorder logging |
CN103871410B (zh) * | 2012-12-11 | 2017-09-29 | 联想(北京)有限公司 | 一种数据处理方法和装置 |
US11172293B2 (en) * | 2018-07-11 | 2021-11-09 | Ambiq Micro, Inc. | Power efficient context-based audio processing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000040051A (ja) * | 1998-07-23 | 2000-02-08 | Toyo Commun Equip Co Ltd | クライアント・サーバーシステムにおけるメッセージ伝送方法及び装置 |
JP2002359688A (ja) * | 2001-03-30 | 2002-12-13 | Ntt Comware Corp | 音声認識による情報提供サーバならびにその方法 |
JP2003125109A (ja) * | 2001-10-18 | 2003-04-25 | Hitachi Software Eng Co Ltd | 音声入力サービス提供方法及びシステム |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5717740A (en) * | 1995-12-27 | 1998-02-10 | Lucent Technologies Inc. | Telephone station account number dialing device and method |
US5915001A (en) * | 1996-11-14 | 1999-06-22 | Vois Corporation | System and method for providing and using universally accessible voice and speech data files |
US6636596B1 (en) * | 1999-09-24 | 2003-10-21 | Worldcom, Inc. | Method of and system for providing intelligent network control services in IP telephony |
JP3452250B2 (ja) | 2000-03-15 | 2003-09-29 | 日本電気株式会社 | 無線携帯端末通信システム |
US6654722B1 (en) * | 2000-06-19 | 2003-11-25 | International Business Machines Corporation | Voice over IP protocol based speech system |
JP3885523B2 (ja) | 2001-06-20 | 2007-02-21 | 日本電気株式会社 | サーバ・クライアント型音声認識装置及び方法 |
-
2005
- 2005-03-18 WO PCT/JP2005/004959 patent/WO2005091128A1/ja active Application Filing
- 2005-03-18 JP JP2006511243A patent/JP4725512B2/ja active Active
- 2005-03-18 US US10/593,041 patent/US7835728B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000040051A (ja) * | 1998-07-23 | 2000-02-08 | Toyo Commun Equip Co Ltd | クライアント・サーバーシステムにおけるメッセージ伝送方法及び装置 |
JP2002359688A (ja) * | 2001-03-30 | 2002-12-13 | Ntt Comware Corp | 音声認識による情報提供サーバならびにその方法 |
JP2003125109A (ja) * | 2001-10-18 | 2003-04-25 | Hitachi Software Eng Co Ltd | 音声入力サービス提供方法及びシステム |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010527467A (ja) * | 2007-04-02 | 2010-08-12 | グーグル・インコーポレーテッド | 電話による要求への位置を基にした応答 |
US8650030B2 (en) | 2007-04-02 | 2014-02-11 | Google Inc. | Location based responses to telephone requests |
US8856005B2 (en) | 2007-04-02 | 2014-10-07 | Google Inc. | Location based responses to telephone requests |
US9600229B2 (en) | 2007-04-02 | 2017-03-21 | Google Inc. | Location based responses to telephone requests |
US9858928B2 (en) | 2007-04-02 | 2018-01-02 | Google Inc. | Location-based responses to telephone requests |
US10163441B2 (en) | 2007-04-02 | 2018-12-25 | Google Llc | Location-based responses to telephone requests |
JP2017017669A (ja) * | 2015-06-30 | 2017-01-19 | 百度在線網絡技術(北京)有限公司 | 声紋による通信方法、装置及びシステム |
US9865267B2 (en) | 2015-06-30 | 2018-01-09 | Baidu Online Network Technology (Beijing) Co., Ltd. | Communication method, apparatus and system based on voiceprint |
CN113542260A (zh) * | 2021-07-12 | 2021-10-22 | 宏图智能物流股份有限公司 | 一种基于分发方式的仓库用语音传输方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2005091128A1 (ja) | 2008-05-22 |
JP4725512B2 (ja) | 2011-07-13 |
US20070143102A1 (en) | 2007-06-21 |
US7835728B2 (en) | 2010-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100430953B1 (ko) | 네트워크 협동 대화 서비스를 제공하기 위한 시스템 및 방법 | |
KR101027548B1 (ko) | 통신 시스템용 보이스 브라우저 다이얼로그 인에이블러 | |
JP4725512B2 (ja) | 音声処理システム、音声処理方法、音声処理サーバ装置、およびプログラム | |
RU2491617C2 (ru) | Способ и устройство для реализации распределенных мультимодальных приложений | |
US20120059655A1 (en) | Methods and apparatus for providing input to a speech-enabled application program | |
US20030139933A1 (en) | Use of local voice input and remote voice processing to control a local visual display | |
CN108028044A (zh) | 使用多个识别器减少延时的语音识别*** | |
JP2017535852A (ja) | コンピュータベースの翻訳システムおよび方法 | |
US7277733B2 (en) | System and method for providing web content provision service using subscriber terminal in exchange system | |
US6631350B1 (en) | Device-independent speech audio system for linking a speech driven application to specific audio input and output devices | |
US20090012888A1 (en) | Text-to-speech streaming via a network | |
JP5768346B2 (ja) | 通信システム、並びに、通信端末及び通信プログラム | |
CN112084245A (zh) | 基于微服务架构的数据管理方法、装置、设备及存储介质 | |
KR101403680B1 (ko) | 주소록 관리 서버, 이동형 단말기 및 그의 제어방법 | |
US8073930B2 (en) | Screen reader remote access system | |
JP7319639B1 (ja) | 音声入力システム及びそのプログラム | |
KR100536911B1 (ko) | 인터넷 전화 서비스 제공 시스템 및 방법 | |
JP2011109342A (ja) | 携帯端末装置及び携帯端末装置の存在通知方法 | |
KR100785101B1 (ko) | 무선 인터넷 단말기에서의 전화번호 정보 처리방법 | |
JPH11234451A (ja) | 情報取得システム | |
JP2005339149A (ja) | データ処理装置、データ処理方法およびデータ処理プログラム | |
JP2006508596A (ja) | ネットワークのオーディオデータを処理する方法およびその方法を実行する装置 | |
Tsourakis et al. | An architecture for miultiemodal applications over wireless data networks | |
KR20020084337A (ko) | 웹브라우저의 url입력창을 이용한 통신 시스템 및 방법 | |
Bühler et al. | CONNECTING SPOKEN LANGUAGE DIALOGUE SYSTEMS TO THE IN" TERNET |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006511243 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007143102 Country of ref document: US Ref document number: 10593041 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 10593041 Country of ref document: US |