JP2003532164A

JP2003532164A - How to control the processing of content information

Info

Publication number: JP2003532164A
Application number: JP2001581272A
Authority: JP
Inventors: ジェイエルエイスウィレンズ，ピーター; ミドルジャンス，ジャコブス; アルバーダ，オケ; スタインビス，ヴォルカー
Original assignee: Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2000-05-03
Filing date: 2001-04-26
Publication date: 2003-10-28
Also published as: EP1281173A1; WO2001084539A1; KR20020027382A; CN1381039A; CN1193343C

Abstract

(57)【要約】ビデオ又はオーディオコンテンツ情報の再生又は他の処理のボイス制御は、コンテンツ情報に意味的に関連するボイスコマンドを使用する。 (57) [Summary] Voice control of the playback or other processing of video or audio content information uses voice commands that are semantically related to the content information.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】本発明は、特に、大衆消費電子製品（ＣＥ）によってコンテンツ情報を再生す
るためのボイス制御に関わる。The invention relates in particular to voice control for reproducing content information by consumer electronics (CE).

【０００２】ボイス制御される機器は、例えば、米国特許第４，５０６,３７７号、米国特
許第４，５５８，４５９号、米国特許第４，８５６，０７２号、米国特許第５，
２５５，３２６号、及び、米国特許第５，９５０，１６６号から公知であり、こ
れらは全て本願に参照として組み込まれる。特に、米国特許第５，２５５，３２
６号は、対話型オーディオ制御システムとしてマイクロプロセッサに結合される
サウンド信号プロセッサを使用する対話型オーディオシステムを扱う。ステレオ
スピーカー及び受信マイクロホンとして作動する一対の送受信器は、主なユーザ
からのボイスコマンドを受信する信号プロセッサと結合される。ボイスコマンド
は、プロセッサに信号を供給するようテレビジョン、テープ、ラジオ、又は，Ｃ
Ｄプレーヤーのような様々な種類の装置を作動させるために処理され、信号は、
所望のサウンドを生成するためにプロセッサから送受信器のスピーカーに供給さ
れる。主な聴取者に焦点が当てられるサウンドの「スイート・スポット」を維持
するようサウンドバランスを常に調節するために送受信システムを通じて信号を
プロセッサに送り返すよう主な聴取者の位置を常に三角測量するために追加の赤
外線センサが利用されてもよい。追加の装置は、言われたボイスコマンドに従っ
てこれら他の装置を作動させるために、信号プロセッサからの出力を生成するよ
う記憶されたコマンドと適合されるボイスコマンドに応答して信号プロセッサに
よって制御されてもよい。システムは、システムによって作動される任意の一つ
のサウンド源からのステレオサウンドの再生と同時にボイスコマンドに応答する
ことが可能である。Voice-controlled devices include, for example, US Pat. No. 4,506,377, US Pat. No. 4,558,459, US Pat. No. 4,856,072, US Pat.
255,326 and US Pat. No. 5,950,166, all of which are incorporated herein by reference. In particular, US Pat. No. 5,255,32
No. 6 deals with an interactive audio system that uses a sound signal processor coupled to a microprocessor as the interactive audio control system. A pair of transceivers, which act as a stereo speaker and a receiving microphone, are combined with a signal processor which receives voice commands from the main user. Voice commands can be sent to the television, tape, radio, or C to signal the processor.
Processed to activate various types of devices, such as D players, the signals are
The processor supplies the speakers of the transceiver to produce the desired sound. To constantly adjust the sound balance to maintain a “sweet spot” of the sound that is focused on the main listener To constantly triangulate the position of the main listener to send the signal back to the processor through the transmitting and receiving system Additional infrared sensors may be utilized. The additional device is controlled by the signal processor in response to the voice command being matched with the stored command to produce an output from the signal processor to operate these other devices in accordance with the said voice command. Good. The system is capable of responding to voice commands simultaneously with the playback of stereo sound from any one sound source activated by the system.

【０００３】音声認識は技術であり、その特徴は、本願で参照として組み込まれる例えば、
米国特許第５，９８７，４０９号、米国特許第５，９４６，６５５号、米国特許
第５，６１３，０３４号、米国特許第５，２２８，１１０号、及び、米国特許第
５，９５５，９３０号に記載される。Speech recognition is a technology, the features of which are incorporated herein by reference, for example:
US Pat. No. 5,987,409, US Pat. No. 5,946,655, US Pat. No. 5,613,034, US Pat. No. 5,228,110, and US Pat. No. 5,955,930. No.

【０００４】装置の公知の音声制御及びボイス制御又はアプリケーションは、機器と結び付
けられる固定の組のコマンドに制限される。発明者は、ボイスコマンド又は幾つ
かのボイスコマンドが装置又はプラットホームではなく再生されるべき情報コン
テンツにリンクされる場合、ボイス制御可能な機器の使い易さ、及び、作動中の
人間工学的な面が向上することを認識する。つまり、発明者は、大衆消費電子製
品の制御が装置を中心にしたものよりもコンテンツを中心にしたものであると考
える。Known voice controls and voice controls or applications of the device are limited to a fixed set of commands associated with the equipment. The inventor has found that when a voice command or some voice commands are linked to the information content to be played rather than to the device or platform, the ease of use of the voice controllable device and the ergonomic aspects in operation. Recognize that will improve. That is, the inventor believes that control of consumer electronic products is more content-centric than device-centric.

【０００５】従って、本発明の一つの面において、ＣＤ、ＤＶＤ、又は、ソリッド・ステー
ト・メモリのようなデータ担体における又はデータ担体のコンテンツ情報と音声
コマンドを統合することが提案される。コマンドは、コンテンツ情報の意味規則
に適応されることが好ましい。例えば、コンテンツ情報がオーディオ、例えば、
歌のコレクションを有する場合、曲名又は歌の歌詞の一部を言うことで特定の歌
を一つ以上選択することができる。この特徴を可能にするために特別なメタデー
タがＣＤのコンテンツに追加される。このメタデータは、典型的には、特定のＣ
Ｄ及びその曲に対するボイス制御を可能にするための装置のボイス制御器又はア
プリケーションによって要求される語彙の表示であるが必ずしもそうではない。
或いは、又は、補足的に、ユーザは、再生するよう選択するために所望の曲の一
部をハミング又は歌う（ことを試みる）。この文脈内では、本願に参照として組
み込まれる、BIBLIOGRAPHIC MUSIC DATA BASE WITH NORMALIZED MUSICAL
THEMESに対するMark Hoffbergに１０／５／９９に発行された米国特許第５，
９６３，９５７号（出願人番号ＰＨＡ２３，２４１）を参照する。この特許は、
曲のデータベースを有する情報処理システムに関わる。曲のデータベースは、音
符の同音異義的基準シーケンスを記憶する。基準シーケンスは、辞書編集的に記
憶され得るよう全て同じスケールの度合いに正規化される。一連の入力音符と特
定の基準シーケンスとの間の適合をN‐ary queryを通じて見つけると、システ
ムは、適合する基準シーケンスに関連のある書誌情報を提供する。このシステム
は、N−ary queryを介してユーザによってハミングされる入力を再生コマンド
に変換するためにも使用され得る。Therefore, in one aspect of the invention, it is proposed to integrate voice commands with the content information of or on a data carrier such as a CD, DVD or solid state memory. The commands are preferably adapted to the semantics of the content information. For example, if the content information is audio, for example,
If you have a collection of songs, you can select one or more specific songs by saying the song name or part of the lyrics of the song. Special metadata is added to the content of the CD to enable this feature. This metadata is typically a particular C
A representation of the vocabulary required by D or the device's voice controller or application to enable voice control for the song, but not necessarily.
Alternatively, or in addition, the user hums or sings a portion of the desired song to select for playback. Within this context, the BIBLIOGRAPHIC MUSIC DATA BASE WITH NORMALIZED MUSICAL, which is incorporated herein by reference,
US Patent No. 5, issued to Mark Hoffberg on 10/5/99 to THEMES
See 963,957 (Applicant No. PHA23,241). This patent
Involved in an information processing system that has a song database. The song database stores a homonymous reference sequence of notes. The reference sequences are all normalized to the same degree of scale so that they can be stored lexicographically. When a match between a series of input notes and a particular reference sequence is found through an N-ary query, the system will provide bibliographic information related to the matching reference sequence. This system can also be used to convert input hummed by the user via N-ary query into play commands.

【０００６】更に対策を講じること無くシステムのオーディオ出力は、例えば、歌が再生さ
れているとき音声制御された処理の望ましくない始動を引き起こし得る。この望
ましくない始動は、音声コマンド受け取りを作動させるよう遠隔装置、例えば、
Pronto（ＴＭ）、Philips Electronicsからのユニバーサルプログラム可能な遠
隔装置上の作動ボタンを押すことによって、又は、特定のジェスチャーをするユ
ーザが登録されている機器を有することによって例えば、反響消去を通じて防止
される。コンテンツ情報がビデオを有する場合、キーシーンは、キーワードによ
ってラベル付けされ、これらワードを言うことで関連するシーンの始まりで再生
を始める。ビデオコンテンツのキーワードプロファイルは、ユーザのボイス入力
をキーワードと一対一マッピングするか、ユーザのボイス入力をコンテンツのキ
ーワードラベル及びその同義語のインデックスされたリストに意味マッピングす
るいずれかを通じてあるシーンを識別するために使用されてもよい。例えば、あ
る固定のコマンド又は接頭辞のようなその一部を用いることによって望ましくな
い作動が起こることが防止されることが好ましい。同様にして、グラフィックス
、例えば、バーチャルリアリティー又はビデオゲームを用いる対話型ソフトウェ
アアプリケーションは、表示される又は表示されるべきグラフィックスオブジェ
クトの制御可能な特徴と音声入力を関連付けることを処理に可能にさせることで
音声制御可能にされる。例えば、グラフィックスオブジェクト、例えば、アバタ
ーによって実施されるべき動作、は、意味的文脈に合う正しいワードをユーザに
言わせることで音声制御可能又は音声選択可能にされる。これは、多数のモダリ
ティー（例えば、ジョイスティックを通じるハンド入力及び音声入力の両方）、
並びに、別の言語を教える或いは触知できるオブジェクト又は動作のようなある
コンセプトのための正しいワード或いは表現を子供達に教える教育プログラムを
可能にするビデオゲームに好適である。音声は、処理されるデータに変換され、
意図する正しい動作を識別する。これは、例えば、所定のルックアップテーブル
中のアイテムと音声データを意味マッチングし、最も近いマッチに対する候補を
見つけることを通じて実現される。音声入力と意図する動作との間の関連付けは
、ユーザ履歴を考慮することで訓練可能にされる。Without further measures, the audio output of the system can cause an undesired triggering of voice-controlled processing, for example when a song is being played. This undesired activation may cause a remote device, such as
Prevented through echo cancellation, for example, by pressing an activation button on a universal programmable remote device from Pronto (TM), Philips Electronics, or by having the device registered by the user making a particular gesture . If the content information comprises video, key scenes are labeled by keywords and saying these words will start playing at the beginning of the relevant scene. The video content keyword profile identifies a scene through either a one-to-one mapping of the user's voice input with keywords, or a semantic mapping of the user's voice input to a keyword label of the content and an indexed list of synonyms. May be used for. For example, it is preferable to prevent undesired operation from occurring by using a fixed command or a part thereof such as a prefix. Similarly, interactive software applications that use graphics, such as virtual reality or video games, allow the process to associate audio input with controllable features of the graphics object to be displayed or to be displayed. This makes it possible to control voice. For example, a graphics object, eg, an action to be performed by an avatar, is voice controllable or voice selectable by having the user say the correct word that fits the semantic context. This is due to a number of modalities (eg both hand and voice input through a joystick),
It is also suitable for video games that enable educational programs that teach children the correct words or expressions for certain concepts such as teaching or tactile objects or movements in another language. Voice is converted into data to be processed,
Identify the correct intended behavior. This is accomplished, for example, by semantically matching the audio data with the items in a given look-up table and finding candidates for the closest match. The association between voice input and intended action can be trained by considering the user history.

【０００７】本発明の別の面において、音声コマンドは、コンテンツがＷｅｂからダウンロ
ード及び／又は再生された後に局部的に記憶されるときコンテンツから得られる
。例えば、歌詞中のキーワードは、識別され、それらが関係するオーディオの一
つと関連があるとして記憶される。これは、専用のソフトウェアアプリケーショ
ンによって行われ得る。例えば、ボイス部分を楽器の部分から分離しボイス部分
を解析することによって、オーディオコンテンツの第１の再生中にディジタルデ
ータが解析、又は、可聴の歌詞が解析される。従って形成される音声コマンドは
、特定のコンテンツに付いている基本的な組に加えて又はその代わりに使用され
得る。In another aspect of the invention, voice commands are derived from content when the content is stored locally after being downloaded and / or played from the Web. For example, keywords in lyrics are identified and stored as associated with one of the audios they relate to. This can be done by a dedicated software application. For example, by separating the voice portion from the instrument portion and analyzing the voice portion, digital data is analyzed or audible lyrics are analyzed during the first playback of the audio content. The voice command thus formed may be used in addition to or in place of the basic set attached to a particular content.

【０００８】本発明の別の面において、特定のコンテンツ情報に関係し、且つ、ボイス制御
を可能にする目的のために情報コンテンツと意味的に関連があるとしてユーザの
機器に記憶されるべき、前から存在する又はカスタマイズされたコマンドをユー
ザは、Ｗｅｂからダウンロードすることができる。それにより、ユーザは、完全
に音声駆動される、ホームネットワークのためのリソースとして考えられる電子
コンテンツ情報のユーザ自身のライブラリを作ることができる。例えば、ユーザ
は、ＣＤ、ＤＶＤのコレクションをユーザ自身のジュークボックス及び／又はハ
ードディスクに有する。一般的に利用できるオーディオ及びビデオにコンテンツ
が関連する場合、サービスプロバイダは予め各コンテンツに対して注釈のライブ
ラリを作り、ユーザはユーザ自身のコレクションに関連のある要素をダウンロー
ドすることができる。ＣＤ又はＤＶＤに対する注釈は、ディスクの識別子並びに
そのセグメントに結び付けられ得る。ユーザが例えば、アルバム名を言うと、ア
ルバム名はある識別子とリンクされ、ジュークボックス中のＣＤ又はＤＶＤの引
出し及び選択が可能となる。歌の名前又はシーンは、ＣＤ或いはＤＶＤの識別子
と関係のあるキーフレームとの両方にリンクされ得る。ユーザは、「映画」「カ
ーチェイス」といった用語を言い、カーチェイスに関連するシーンを有する入手
可能な映画が得られる。In another aspect of the invention, it should be stored on the user's equipment as related to specific content information and semantically related to the information content for the purpose of enabling voice control, Pre-existing or customized commands can be downloaded by the user from the web. It allows users to create their own library of electronic content information, which is considered as a resource for home networks, which is fully voice driven. For example, a user has a collection of CDs, DVDs in his own jukebox and / or hard disk. If the content is associated with commonly available audio and video, the service provider may pre-create a library of annotations for each content and the user may download the elements relevant to their own collection. Annotations for a CD or DVD can be tied to the disc's identifier as well as its segments. When the user says, for example, the album name, the album name is linked to some identifier to allow the withdrawal and selection of the CD or DVD in the jukebox. The song name or scene may be linked to both the CD or DVD identifier and the associated keyframe. The user refers to terms such as "movie" and "car chase" to get available movies with scenes related to car chase.

【０００９】本発明の別の面では、音声コマンドは、電子プログラムガイド（ＥＰＧ）に表
示例えば、サービスプロバイダによって放送されるコンテンツにリンクされる。
音声インタフェースは、この場合においてもユーザが言うワードと適合する特定
のプログラム又はプログラムカテゴリーを選択することを可能にする。In another aspect of the invention, voice commands are linked to content displayed on an electronic program guide (EPG), eg, broadcast by a service provider.
The voice interface still allows the user to select a particular program or program category that matches the word said.

【００１０】本発明の別の面では、ユーザが言うコマンドは、サーバー、例えば、ホームサ
ーバ又はＷｅｂ上のサーバーを介して処理され、Ｗｅｂ許可された再生機器に命
令として送り返される。サーバーは、利用できるコンテンツのインベントリー及
びコンテンツの意味を表示するワードの辞書を有する。Ｗｅｂ許可された機器は
、例えば、ＣＤ又はＤＶＤの識別コードを通じて、或いは、ファイルのヘッダを
通じてコンテンツをサーバーに対して識別し、そのときこのコンテンツに対する
音声コマンドは、例えば、ルックアップテーブルを通じる制御に対する命令と容
易にマッチングされる。In another aspect of the invention, the command the user says is processed via a server, eg, a home server or a server on the Web, and sent back as an instruction to a Web-authorized playback device. The server has a dictionary of words that displays an inventory of available content and the meaning of the content. The web-authorized device identifies the content to the server, for example, through the identification code of the CD or DVD, or through the header of the file, and the voice command for this content is then controlled, for example, through a look-up table. Easily matched with instructions.

【００１１】ボイス制御は、例えば、再生するため、又は、記憶するため、或いは停止まで
早送りする等のためにコンテンツ情報を選択することを可能にする。更に、予め
キーワードでブックマークされたコンテンツは、キーワードのレベルでボイス入
力と適合するある抜粋部分を引出すためにボイス制御下でブラウジングされ得る
。Voice control allows selection of content information, for example for playback, for storage, or for fast-forwarding to stop, etc. In addition, pre-keyword bookmarked content can be browsed under voice control to extract certain excerpts that match the voice input at the keyword level.

【００１２】本発明の別の面は、一つの記憶媒体、例えば、ＣＤ又はＤＶＤから別の記憶媒
体にコンテンツ情報をコピーすることを扱う。第１の記憶媒体は、コンテンツ情
報、及び、上述の通りボイス制御を可能にする制御情報を有する。ボイス制御の
ための情報はコピー保護され、その結果コピー物は制御コマンドを有しないこと
が好ましい。これは、コンテンツ情報産業を支持する特徴として考えられている
。消費者がボイス制御されたバージョンの完全なコピーを望む場合、その消費者
は、ある価格でＣＤ番号又はＤＶＤ番号にリンクすることによって識別されるイ
ンターネット上のサーバーからボイス制御情報をダウンロードすることができる
。これは、価格が象徴的に過ぎなくとも著者の権利が認められるといった利点を
有する。従って、この特徴は、コンテンツ情報が著者又はその譲受人の知的所有
物といった認識を維持することに寄与する。Another aspect of the invention deals with copying content information from one storage medium, eg a CD or DVD, to another storage medium. The first storage medium has content information and control information that enables voice control as described above. The information for voice control is preferably copy protected so that the copy has no control commands. This is considered as a characteristic that supports the content information industry. If the consumer wants a complete copy of the voice controlled version, the consumer may download the voice control information from a server on the internet identified by linking to a CD or DVD number at a price. it can. This has the advantage that the author's rights are recognized even if the price is only symbolic. Therefore, this feature contributes to maintaining the recognition that the content information is the intellectual property of the author or its assignee.

【００１３】 CONTENT−DRIVEN SPEECH- OR AUDIO-BROWSERに対してMark Hoffberg 及
びEugene Shetynに７／１／９９に出願された米国出願第０９／３４５，３３９
（出願人番号ＰＨＡ２３，７００）は本願に参照として組み込まれる。この特許
文書は、ライブインターネット放送のような流れるオーディオを提供するリソー
スを見つけるためにインターネットを検索することに関わる。リソースは、夫々
のファイル拡張子に基づいて識別され、例えば、自然言語又は曲のスタイルに従
って分類される。ユーザは、テキスト又は曲の入力に基づいてコレクションをブ
ラウジングすることができる。US application Ser. No. 09 / 345,339 filed 7/1/99 by Mark Hoffberg and Eugene Shetyn for CONTENT-DRIVEN SPEECH- OR AUDIO-BROWSER
(Applicant's number PHA23,700) is hereby incorporated by reference. This patent document concerns searching the Internet for resources that provide streaming audio, such as live Internet broadcasts. Resources are identified based on their respective file extensions and are classified according to, for example, natural language or song style. Users can browse the collection based on text or song input.

【００１４】本願で使用する「ボイスコマンド」といった表現は、一つ以上のキーワードか
ら成るが、それ以上の冗長な言語表現を有してもよいボイス制御入力を示すこと
を意味する。As used herein, the phrase "voice command" is meant to indicate a voice control input that may consist of one or more keywords, but may have more redundant linguistic representations.

【００１５】本発明は、添付の図面を参照して例によって更に詳細に説明する。[0015] The invention will be explained in more detail by way of example with reference to the accompanying drawings.

【００１６】本発明は、記憶媒体に事前記録されたコンテンツを使用する装置のボイス制御
又はソフトウェアアプリケーションを可能にする。ボイスコマンドは、記憶媒体
に記憶されるコンテンツに意味的に関係し、関連付けられ、又は基づくボイスコ
マンドが使用される。従って、コマンドは、媒体のコンテンツのサンプル毎に異
なる。例えば、作曲家又は作詞家Ｘからの曲を有するＣＤに対して利用できるコ
マンドは、作曲家又は作詞家Ｙによって作曲される曲を有するＣＤに対するコマ
ンドと異なる。The present invention enables voice control or software applications of devices that use pre-recorded content on storage media. Voice commands are used for which voice commands are semantically related to, associated with, or based on the content stored on the storage medium. Thus, the command will be different for each sample of media content. For example, the commands available for a CD with songs from composer or songwriter X are different than the commands for a CD with songs written by composer or songwriter Y.

【００１７】ＣＤプレーヤーに関して以下のとおりに動作する。ユーザは、演奏者がDaan
van SchooneveldのＣＤをプレーヤーに挿入する。ＣＤは、ユーザがボイス制御
を通じてＣＤと対話できるよう曲及びソフトウェアを記憶している。ユーザが「
Mustang Danny」と言うと、プレーヤーは、SchooneveldのＣＤのトラックの一
つであそのタイトルのロックソングを再生し始める。ユーザが「leaking oil」
と言うと、歌詞中に「I wept gently in the rain as the gearbox wa
s still leaking oil」があるブルースソングをプレーヤーは再生し始める等
である。同様の制御シナリオがセットトップボックス又はＣＤドライブを有する
別の装置のボイス制御にも適用する。１つの歌当たりコマンドを分けるためにボ
イスコマンド間でユーザプログラム可能な遅延が必要となり得る。或いは、特定
の表現が１つの歌当たりのコマンド間の仕切りとして機能するよう使用され得る
。例えば、ユーザは、「Mustang Dannyを２回再生し、Leaking oilを１回再生
する」と言ってもよい。これは、「Mustang Danny」の歌を連続して２回再生し
、次に「leaking oil」に関連する歌を連続して２回再生すると解釈される。「
２回再生」及び「１回再生」といった表現は、各歌と、システムが別のボイスコ
マンドの受け取り準備をする前にシステムが何をすべきかを識別するための仕切
りとして機能する。The CD player operates as follows. The user is Daan
Insert the van Schooneveld CD into the player. The CD stores songs and software so that the user can interact with the CD through voice control. The user
"Mustang Danny," the player begins playing the rock song of that title on one of the tracks on Schooneveld's CD. User is "leaking oil"
When I said, in the lyrics, `` I wept gently in the rain as the gearbox wa
The player starts playing the blues song with "s still leaking oil". A similar control scenario applies to voice control of a set top box or another device with a CD drive. User programmable delays between voice commands may be required to separate commands per song. Alternatively, a particular expression can be used to act as a divider between commands per song. For example, the user may say "play Mustang Danny twice and Leaking oil once". This is interpreted as playing the song "Mustang Danny" twice in succession and then the song related to "leaking oil" twice in succession. "
The expressions "play twice" and "play once" serve as dividers to identify each song and what the system should do before the system is ready to receive another voice command.

【００１８】ＰＣ上でのジュークボックスアプリケーションのボイス制御を以下に示す。ジ
ュークボックスアプリケーションは、ＣＤコンテンツをＰＣのハードディスクド
ライブ（ＨＤＤ）にアーカイブすることを可能にするソフトウェアアプリケーシ
ョンである。ユーザは、ＨＤＤにJos Swillensの“Greatest Hits”ＣＤをア
ーカイブする。ユーザが“Swil,Beemer”と言うと、ジュークボックスは、ＰＣ
にアーカイブされているSwillenのＣＤのトラックの一つである“My Beemer f
its my crewcut”を再生し始める。ボイスコマンドは、キーワードだけでなく
、より多くの冗長な言語表現を有してもよい。例えば、ユーザは、「play from
Swillens’greatest hits the title about the crewcut（クルーカッ
トに関するタイトルをSwillenのgreatest hitsから再生する）」と言うと、シ
ステムは、例えば、インデックスリスト中の適切な検索アルゴリズムを用いて利
用できるオプションの一つに適合するようボイス入力を処理する。ユーザが“Sw
il, always be nice to your patent attorney”と言うと、ジュークボ
ックスは“Always be nice etc.”といったシンフォニークラシックを再生し
始める。Voice control of a jukebox application on a PC is shown below. A jukebox application is a software application that allows CD content to be archived on a hard disk drive (HDD) of a PC. The user archives Jos Swillens's "Greatest Hits" CD on the HDD. When the user says "Swil, Beemer", the jukebox is a PC
One of the tracks on Swillen's CD archived at "My Beemer f
Start playing "my my crewcut." Voice commands may have more verbose linguistic expressions than just keywords. For example, a user may say "play from
Swillens'greatest hits the title about the crewcut "means that the system is one of the options available, for example, using the appropriate search algorithm in the index list. Process voice input to match. If the user says “Sw
il, always be nice to your patent attorney ”, the jukebox starts playing Symphony Classics such as“ Always be nice etc. ”.

【００１９】ユーザは、Koos Middeljansからの“Greatest Hits”ＣＤもＰＣにアーカイ
ブする。ユーザが“Koos,Sweet Dommel Valley”と言うと、ジュークボックス
は、アーカイブされているＣＤのトラックの一つであるそのタイトルのフォーク
ソングを再生し始める。 “Koos, Nat the Lab”とユーザが言うと、ジュー
クボックスは、ＰＣにアーカイブされているMidの“Greatest Hits”ＣＤの別
のトラックである“Nat the Lab”を再生し始める。ユーザが“Middeljans,gr
eatest hits，random”と言うと、ジュークボックスはこのＣＤのトラックをラ
ンダムな順番で再生し始める。The user also archives the “Greatest Hits” CD from Koos Middeljans on his PC. When the user says "Koos, Sweet Dommel Valley", the jukebox will start playing the folk song of that title, which is one of the tracks on the archived CD. When the user says "Koos, Nat the Lab", the jukebox starts playing "Nat the Lab", another track on Mid's "Greatest Hits" CD archived on the PC. If the user says “Middeljans, gr
If you say "eatest hits, random", the jukebox will start playing the tracks on this CD in random order.

【００２０】著作権の点からのコンテンツ保護はデリケートな問題である。コピー保護対策
例えば、ＤＲＭ（ディジタル権利管理）が利用され、実行される。これに寄与す
るために、ＣＤ又はＤＶＤの意味的に関連のあるコンテンツ情報と共に供給され
る音声コマンドは、プレーヤーのオンボードメモリ以外の場所にコピーされ得な
いようにして実施され得る。別の場所への全てのコピーは、この特徴を失い、さ
ほど好ましくなくなる。Content protection in terms of copyright is a delicate issue. Copy protection measures For example, DRM (Digital Rights Management) is used and implemented. To contribute to this, the voice commands supplied with the semantically relevant content information of the CD or DVD may be implemented such that they cannot be copied to any place other than the player's onboard memory. All copies to another location lose this feature and are less preferred.

【００２１】別の例では、ユーザは、ジュークボックスに対して説明したのと同様の方法で
ボイス制御された選択及び再生を可能にする意味的に関連する制御データと一緒
にインターネットを介してコンテンツをダウンロードする。制御データは、本例
では、ダウンロードされたデータの一体形の部分であることが好ましい。[0021] In another example, a user may access content via the Internet along with semantically relevant control data to enable voice controlled selection and playback in a manner similar to that described for jukeboxes. To download. The control data is, in this example, preferably an integral part of the downloaded data.

【００２２】ジュークボックス技術に関する背景技術は、本願で参照として組み込まれるVI
RTUAL JUKEBOXに対してPieter van der Meulenに６／４／９９に出願された
米国出願第０９／３２６,５０６号（出願人番号ＰＨＡ２３,４１７）を参照する
。Background art regarding jukebox technology is provided in VI, incorporated herein by reference.
See U.S. Application Serial No. 09 / 326,506 (applicant number PHA23,417) filed on June 4/99 by Pieter van der Meulen for RTUAL JUKEBOX.

【００２３】例えば、ボイス認識を容易にするよう異なる地理的領域における言語、及び、
発音における違いを可能にするよう同じコンテンツ情報が音声学上異なる組のボ
イスコマンドに結び付けられ得る。この文脈内では、ユーザは、システムのボイ
ス制御に使用することをユーザ自身が望む言語の選択肢があることが好ましい。
記憶媒体の記憶容量は、使用される可能性が高い全ての言語のコマンドを記憶す
るには小さすぎる。ボイスコマンドが使用される可能性が高い言語の一つで媒体
から利用できない場合、再生装置は、所望の言語で同様の音声コマンドをダウン
ロードできることが好ましく、このときシステムは、コマンドを対応する命令に
実行時に翻訳する。専用のサービスがインターネット上で利用できる。この文脈
内では、CUSTOMIZED UPGRADING OF INTERNET−ENABLED DEVICES BASED ON
USER−PROFILE（Smart Connect（ＴＭ））に対してAdrian Tunrner外に９／
２５／９８に出願された米国出願第０９／１６０,４９０号（出願人番号ＰＨＡ
２３,５００）、及び、PERSONALIZING CE EQUIPMENT CONFIGURATION AT SE
RVER VIA WEB−ENABLED DEVICEに対してErik Ekkel外に３／６／００に出願
された米国出願第０９／５１９,５４６号（出願人番号ＵＳ００００１４）を参
照し、これら出願は本願に参照として組み込まれる。これら文書は、インターネ
ットを介してＣＥエンドユーザに提供されるサービスに関して記載する。Languages in different geographic regions to facilitate voice recognition, and
The same content information may be phonologically linked to different sets of voice commands to allow for differences in pronunciation. Within this context, the user preferably has a choice of languages that he or she wishes to use for voice control of the system.
The storage capacity of the storage medium is too small to store commands in all the languages that are likely to be used. If the voice command is not available from the medium in one of the languages in which it is likely to be used, the playback device is preferably able to download a similar voice command in the desired language, at which time the system will translate the command into the corresponding command. Translate at run time. Dedicated services are available on the Internet. In this context, CUSTOMIZED UPGRADING OF INTERNET−ENABLED DEVICES BASED ON
9 / outside Adrian Tunrner for USER-PROFILE (Smart Connect (TM))
US Application No. 09 / 160,490 filed 25/98 (applicant number PHA
23,500) and PERSONALIZING CE EQUIPMENT CONFIGURATION AT SE
See U.S. Application No. 09 / 519,546 filed 3/6/00 outside Erik Ekkel for RVER VIA WEB-ENABLED DEVICE, which application is incorporated herein by reference. . These documents describe the services provided to CE end users via the Internet.

【００２４】将来的にオーディオ及びビデオコンテンツは、インターネットを介してエンド
ユーザにより多く供給されることが予想される。レコーディングは、確実な状況
の下、家で実現される。ローカル・レコーディングは、特定のコンテンツ情報に
意味的に関連する消費者自身のコマンドの組を作ることを消費者に可能にさせる
ことが好ましい。これは、幾らかの編集、及び、コンテンツセグメント、ボイス
入力コマンド並びにアクション、又は所望の処理の間の関係を確立することに関
してユーザを補助する特定グラフィカルユーザインタフェース（ＧＵＩ）を好ま
しくは必要とする。例えば、コンテンツ情報が注釈付きでない場合、どのように
して制御するかはどのボイスコマンドを用いるかによるものの、ユーザは、別個
のアイテムとしてどのセグメントを制御したいか、及び、どのコマンド下でどの
セグメントに対してどのアクションを実施するかを特定しなくてはならない。コ
マンドの組は、一旦形成されると同じファイル中に特定のコンテンツと一緒に記
憶され得るか、固有の識別子を用いて特定のコンテンツとリンクされ得る。It is expected that in the future audio and video content will be more provided to end users via the Internet. Recordings are realized at home under certain circumstances. Local recordings preferably allow consumers to create their own sets of commands that are semantically related to particular content information. This preferably requires some editing and a specific graphical user interface (GUI) to assist the user in establishing relationships between content segments, voice input commands and actions, or desired processing. For example, if the content information is not annotated, how to control depends on which voice command is used, but the user wants to control which segment as a separate item, and which segment under which command. You must specify which action to take. Once set, the command set can be stored with the particular content in the same file once formed, or linked to the particular content using a unique identifier.

【００２５】より高度なシステムでは、音声書き換えは、音素インベントリーに関係無く例
えば、語彙のサブセットに制限される又は標準の発音の例外に対してだけ全ての
関連する形態の音声書き換えを網羅する。これは、必要な変更を加えて、任意の
音響モデル（音響基準）にも適用される。文章例、パターン或いは句、（確率）
有限状態文法、（確率）文脈自由文法、又は、別の種類の文法を介する場合、人
がどのようにしてシステムと典型的に対話し、文章を言うか（いわゆる、言語モ
デル）の記述を含む言語モデルも任意に使用され得る。言語モデルは、全ての標
準的な通信方法の変更だけを含んでもよい。音声理解についてシステムは、文法
を介して典型的に与えられるあるワード、コマンド、句、表現によってどの動作
がトリガされるべきかの全ての記述を任意に含む。システムは、ユーザの入力に
システムがどのように反応し、システムがどのようにダイアログモードを入力す
るかに関する記述を含むダイアログモデルを含んでもよい。例えば、システムは
、特定の状況下で明瞭化、又は、コマンドの再確認等を望む場合がある。システ
ムは、音声認識器を構成するデータと他のデータとの間の関係を用いてもよい。
例えば、システムは、現在のトラックを再生するためにユーザが何を言うことが
できるかを示すディスプレイを有する。In more sophisticated systems, speech rewriting covers all relevant forms of speech rewriting regardless of phoneme inventory, eg, only for a subset of vocabulary or for standard pronunciation exceptions. This also applies, mutatis mutandis, to any acoustic model (acoustic reference). Example sentences, patterns or phrases, (probability)
Includes a description of how people typically interact with the system and say sentences (so-called language models), via finite-state grammars, (stochastic) context-free grammars, or other types of grammars A language model can also be used optionally. The language model may include only all standard communication method changes. For speech comprehension, the system optionally includes all descriptions of which actions should be triggered by certain words, commands, phrases, expressions typically given via grammar. The system may include a dialog model that includes a description of how the system responds to user input and how the system enters a dialog mode. For example, the system may want clarification or reconfirmation of commands under certain circumstances. The system may use the relationship between the data that makes up the speech recognizer and other data.
For example, the system has a display that shows what the user can say to play the current track.

【００２６】記憶媒体、例えば、ＣＤ、ＤＶＤ、ソリッド・ステート（例えば、フラッシュ
）メモリ等は、始動中に認識され、ボイスコマンド特性の利用性を確認するビッ
トパターンを有することが好ましい。確認は、例えば、ディスプレイ上のポップ
アップスクリーン又はスピーカーを介して供給される、言われる事前記録された
テキストを通じてユーザに伝えられ得る。Storage media, such as CDs, DVDs, solid state (eg flash) memories, etc., preferably have a bit pattern that is recognized during startup and confirms the availability of voice command characteristics. The confirmation may be communicated to the user through said pre-recorded text, which is provided, for example, via a pop-up screen on the display or a speaker.

【００２７】媒体におけるボイス制御ソフトウェアのフォーマット化に関して、ＣＤ−ＤＡ
は、ＣＤのバックワード互換性を失うこと無くボイスコマンド特性を追加するた
めに使用され得るＲ−Ｗチャネルの余分な容量を有する。リードイントラックは
、異なる言語のバージョンに対して十分な記憶装置を有しなくてもよいが、デー
タは、ディスクからローカルメモリにダウンロードされ得る。この場合、各言語
は、ディスクに一回だけあればいよい。他方で、ＣＤ−ＲＯＭは、要求されると
おりディスクの音声制御ファイルを収容することを簡単にするファイル構造を有
する。ＤＶＤもファイル構造を有し、ＣＤ−ＲＯＭと同じアプローチを可能にす
る。フラッシュ、ＨＤＤ等も同様に取り扱われ得る。Regarding the formatting of the voice control software on the medium, CD-DA
Has the extra capacity of the RW channel that can be used to add voice command characteristics without losing backward compatibility of the CD. The lead-in track may not have sufficient storage for different language versions, but data may be downloaded from disk to local memory. In this case, each language only needs to be written once on the disc. CD-ROMs, on the other hand, have a file structure that makes it easy to accommodate the voice control files of a disc as required. DVD also has a file structure, allowing the same approach as CD-ROM. Flash, HDD, etc. can be handled similarly.

【００２８】図１は、本発明によるシステム１００のブロック図である。システム１００は
、担体１０６に記憶されるコンテンツ情報１０４を再生する再生装置１０２を有
する。担体１０６は、例えば、ＣＤ、ＤＶＤ、又は、ソリッド・ステート・メモ
リを有する。或いは、担体１０６は、インターネット又は別のデータネットワー
クを介してコンテンツ情報１０４がダウンロードされるＨＤＤを有する。これら
例におけるコンテンツ情報１０４は、ディジタル形態で記憶される。当業者には
、コンテンツ情報１０４がアナログ形態で記憶されてもよいことは明らかである
。装置１０２は、コンテンツ情報１０４をエンドユーザに利用できるようにさせ
るレンダリングサブシステム１０８を有する。例えば、コンテンツ情報１０４が
オーディオを有する場合、サブシステム１０８は、一つ以上のスピーカーを有し
、コンテンツ情報１０４がビデオ情報を有する場合サブシステム１０８はディス
プレイモニタを有する。FIG. 1 is a block diagram of a system 100 according to the present invention. The system 100 comprises a reproduction device 102 for reproducing the content information 104 stored on the carrier 106. The carrier 106 comprises, for example, a CD, DVD or solid state memory. Alternatively, the carrier 106 comprises a HDD onto which the content information 104 is downloaded via the internet or another data network. The content information 104 in these examples is stored in digital form. It will be apparent to those skilled in the art that the content information 104 may be stored in analog form. The device 102 has a rendering subsystem 108 that makes the content information 104 available to end users. For example, if the content information 104 comprises audio, the subsystem 108 will have one or more speakers, and if the content information 104 comprises video information, the subsystem 108 will have a display monitor.

【００２９】本発明によると、担体１０６は、コンテンツ情報１０４と意味的に関連のある
制御情報１１０を有する。制御情報１１０は、マイクロホンを介するユーザによ
るボイス入力１１４が制御情報中の情報アイテムに適合するか否かを判断するこ
とをデータ処理サブシステム１１２に可能にさせる。適合する場合、関連する再
生モードが選択され、その例は上記の通りである。一方で、制御情報１１０と他
方でコンテンツ情報１０４との間の意味関係は、上記にオーディオコンテンツの
再生の例において説明したように非常に直感的な適合により装置１０２とのユー
ザ対話を容易にする。利用できるコンテンツ及び／又は選択されるモードに関し
て、ローカルディスプレイ、例えば、小型ＬＣＤ１１６を介してビジュアルフィ
ードバックが供給される。According to the present invention, the carrier 106 has control information 110 that is semantically related to the content information 104. The control information 110 enables the data processing subsystem 112 to determine whether the voice input 114 by the user via the microphone matches the information item in the control information. If so, the relevant playback mode is selected, an example of which is given above. The semantic relationship between the control information 110 on the one hand and the content information 104 on the other hand facilitates user interaction with the device 102 by a very intuitive adaptation as explained in the example of playing audio content above. . Visual feedback is provided via a local display, eg, small LCD 116, regarding available content and / or selected modes.

【００３０】担体１０６は、一つづつ装置１０２に挿入され得る構成要素でもよい。或いは
、装置１０２は、担体１０６のような多数の担体（図示せず）の中から、又は、
例えば、ＣＤ並びにソリッド・ステート・メモリのように物理的に異なる担体の
中からさえもコンテンツを選択することを可能にするジュークボックス機能１１
８を有する。The carriers 106 may be components that can be inserted into the device 102 one by one. Alternatively, device 102 may be from a number of carriers (not shown), such as carrier 106, or
A jukebox feature 11 that allows selecting content even from among physically different carriers such as CDs and solid state memory.
Have eight.

【００３１】本願では、制御情報１１０は、担体１０６にコンテンツ情報１０４と一緒に記
憶又は記録されて示される。従って、ＣＤ、ＤＶＤ又はフラッシュは、事前記録
されたボイス制御アプリケーション及びコマンドを有して供給され得る。或いは
、制御情報１１０は、ボイス入力１１４を制御情報１１０中の利用できる一つ以
上のアイテムと適合させるためにデータ処理システム１１２で実行される専用の
ソフトウェアアプリケーションと協働する。この構造では、ソフトウェアアプリ
ケーションは、インターネット又は装置１０２をセットアップするセットアップ
ディスクのような制御情報のチャネルとは異なるチャネルを介して供給される。In the present application, control information 110 is shown stored or recorded on carrier 106 along with content information 104. Thus, a CD, DVD or flash can be supplied with prerecorded voice control applications and commands. Alternatively, the control information 110 cooperates with a dedicated software application running on the data processing system 112 to match the voice input 114 with one or more available items in the control information 110. In this structure, the software application is provided via a channel different from the control information channel, such as the Internet or a setup disc that sets up the device 102.

【００３２】ボイス制御自体が公知であり、装置の動作モードを選択するために装置とユー
ザ対話することも公知である。本発明は、制御インタフェースを用いることに関
わり、その一部は、再生するのに利用できるコンテンツ情報と意味的に関連付け
られる。Voice control itself is known, and it is also known to interact with the device to select the mode of operation of the device. The present invention involves using a control interface, some of which is semantically associated with content information available for playback.

【００３３】本発明のシステムと好ましくは一体化されるオプションは次のものを含む。シ
ステム１００は、伝えたコマンドを入力したユーザに応答して可聴又はビジュア
ルフィードバックを供給する。例えば、システム１００は、適合がある場合に例
えば、コマンドワード又は幾つかのコマンドワードを事前記録されたボイスで繰
り返すことによって、又は、適合がある場合に事前記録されたボイスでワード「
確認済」を供給することによってコマンドの受け取りを確認する。この特徴は、
１つの情報コンテンツアイテム当たり比較的小数の所定コマンドで容易に実行さ
れ得る。確認データは、制御データ１１０内に統合され得る。ユーザによって与
えられるボイスコマンドが理解されない場合、即ち、システム１００がこれを認
識せず制御データ１１０中に適合を見つけられない場合、システム１００は負の
状態を示す可聴フィードバックを供給する。例えば、システム１００は、事前記
録されたボイスで「このコマンドを処理できません」「このアーティストを検索
できません」、又は「この曲を検索できません」或いは同様の意味のワードを供
給する。可聴フィードバックの代わりに、又は、それに加えて、システム１００
は、ビジュアルフィードバック、例えば、システム１００がボイス入力を処理で
きる場合には緑の点滅ライトを、処理できない場合には赤のライトを提供する。
同じ観点から、システム１００は、事前記録された、又は、合成ボイスでアーテ
ィスト名及び再生するよう選択されるコンテンツの曲名又はアルバム名を発音す
る。合成ボイスは、この特徴のためにテキスト音声合成エンジンを使用し、シス
テムは、ダウンロードされた又は媒体担体から利用できる情報を使用することが
できる。テキスト音声合成（ＴＴＳ）システムは、コンピュータ文書（例えば、
ワードプロセッサ文書、ウェブページ）からのワードをスピーカーを通じて可聴
音声に変換する。ＴＴＳシステムでは、ワードはキャリヤ文章のイントネーショ
ン等を有する音声書き換えと夫々一緒に記憶されることが好ましい。更に、オプ
ションとして、制御データ１１０は、ユーザに対してどのコマンド、例えば、ど
の歌のキーワードが利用できるかを説明する事前記録された又は合成ボイスデー
タを有する。事前記録された又は合成ボイスデータも制御データ１１０の一部と
なり得る。ユーザは、システムが可聴フォードバックを供給することを望まない
とき、これをオン又はオフにする。Options that are preferably integrated with the system of the present invention include: The system 100 provides audible or visual feedback in response to the user entering the communicated command. For example, the system 100 may, for example, repeat the command word or some command words in a prerecorded voice when there is a match, or the word "in the prerecorded voice when there is a match.
Confirm receipt of the command by supplying "confirmed". This feature is
It can be easily implemented with a relatively small number of predetermined commands per information content item. The confirmation data may be integrated within the control data 110. If the voice command given by the user is not understood, i.e., the system 100 does not recognize it and finds a match in the control data 110, the system 100 provides audible feedback indicating a negative condition. For example, the system 100 provides pre-recorded voices with words "Unable to process this command", "Unable to search for this artist", or "Unable to search for this song" or similar words. Instead of or in addition to audible feedback, system 100
Provides visual feedback, for example, a flashing green light if the system 100 can handle voice input, and a red light if it cannot.
From the same perspective, the system 100 sounds the pre-recorded or synthesized voice with the artist name and the song or album name of the content selected for playback. Synthetic voice uses a text-to-speech engine for this feature, and the system can use information downloaded or available from the media carrier. Text-to-speech (TTS) systems are based on computer documents (eg,
Convert words from word processing documents, web pages) into audible audio through speakers. In the TTS system, the words are preferably stored together with the respective voice rewrites, including intonation of carrier sentences. Additionally, control data 110 optionally includes pre-recorded or synthetic voice data that describes to the user which commands, eg, song keywords, are available. Pre-recorded or synthetic voice data can also be part of the control data 110. The user turns it on or off when he does not want the system to provide an audible feedback.

【００３４】図２は、ＥＰＧを有するシステム２００を示す図であり、このとき利用できる
コンテンツ情報が識別され、ディスプレイモニタ２０６上に行２０２及び列２０
４に配置される。例えば、各行は、夫々のＴＶチャネルを表示し、各列は特定の
タイムスロットを表示する。各特定行及び列の対、例えば、行２０８及び列２１
０の交差するところでは、特定のチャネルからの特定のタイムスロットにおいて
利用できるコンテンツを表示するラベル又はタイトル２１２が示される。例えば
、トピックカテゴリー及び時間によって他のタイプの配置も代わりに使用され得
、又は、１つのチャネル或いはリソース（例えば、インターネット）当たりのプ
ロファイル等によるユーザの好みによってランク付けされ得る。ユーザは、ウィ
ンドウ２１４の境界内にある、表示されるＥＰＧの一部分を得るために、例えば
、ウィンドウ２１４を適切なユーザインタフェース（例えば、ワイヤレスキーボ
ード又は別の指向性装置上の矢印、図示せず）を通じてＥＰＧのグリッド上を移
動させることでＥＰＧをブラウジングすることができる。その際ユーザは、表示
される部分中の関連のあるラベルをクリック又はハイライトすることによって特
定のコンテンツ情報を選択することができる。FIG. 2 is a diagram illustrating a system 200 having an EPG, in which content information available at this time has been identified and displayed on the display monitor 206 in rows 202 and columns 20.
It is placed at 4. For example, each row displays a respective TV channel and each column displays a particular timeslot. Each particular row and column pair, eg, row 208 and column 21
At the zero crossings, a label or title 212 is shown that displays the content available in a particular time slot from a particular channel. Other types of arrangements may be used instead, for example, by topic category and time, or ranked by user preference, such as by profile per channel or resource (eg, Internet). The user may, for example, view window 214 in a suitable user interface (eg, an arrow on a wireless keyboard or another directional device, not shown) to obtain a portion of the displayed EPG that is within the boundaries of window 214. The EPG can be browsed by moving on the EPG grid through. The user can then select specific content information by clicking or highlighting the relevant label in the displayed portion.

【００３５】典型的には、ＥＰＧは、サービスプロバイダによってインターネットを介して
供給される。本発明では、ＥＰＧは、所望のラベルの従来のクリック又はハイラ
イト以外のＥＰＧとのユーザ対話モードを可能にする更なる制御ソフトウェア２
１６で高められる。制御ソフトウェア２１６は、ＥＰＧと一緒にダウンロード、
アップデート、又は、リフレッシュされることが好ましい。制御ソフトウェア２
１６は、ユーザ選択のためにＥＰＧ中のプログラムを識別するラベルの意味と関
連する制御情報２１８を有する。例えば、ユーザがユーザ入力装置２２０を通じ
て例えば、マイクロホンによるボイス入力によってデータ処理サブシステムに「
映画」といった表現が入力されるとき、ＥＰＧのグリッドは、「映画」といった
カテゴリーに従って利用できるプログラムだけをウィンドウ２１４中に示すよう
再び構成され、又は、映画プログラムは他のカテゴリー中のプログラムと異なる
ことをグラフィック的に示す。ユーザは、好ましくは音声コマンド下で「映画」
といったカテゴリーの中をブラウジングする。ユーザは、自身の好みの映画を観
、航空イベントについてのクラシック映画のＥＰＧに示されるタイトルである、
“The Magnificent Six and Okke”といった表現をボイス入力として入力す
る。別の例では、ユーザは、「今晩」及び「８時から１２時まで」を入力し、ウ
ィンドウ２１４はその日に８時（８：００ｐｍ）以降利用できるプログラムのコ
レクションを少なくとも部分的に示すよう位置される。別の例では、ユーザは、
ウィンドウ２１４中に表示されるＥＰＧの部分中に興味深いプログラムを識別し
、マイクロホン２２０にプログラムのタイトルを表わすワードを言う。次に、ユ
ーザは、「観る」或いは「録画」と言う。タイトルを表わすワードは、制御情報
２１８と比較するために適切なフォーマットに変換される。適合を見つけると、
制御ソフトウェア２１６は、マイクロプロセッサ２２２にチューナ２２４及び表
示モニタ２０６或いは録画装置２２６を制御させる。このようにして、ユーザは
、ボイス制御を用いてＥＰＧと対話することができる。EPGs are typically provided by the service provider over the Internet. In the present invention, the EPG provides additional control software that enables a user interaction mode with the EPG other than a conventional click or highlight of the desired label.
Increased by 16. Control software 216 downloaded with EPG,
It is preferably updated or refreshed. Control software 2
16 has control information 218 associated with the meaning of the label identifying the program in the EPG for user selection. For example, a user may enter the data processing subsystem via user input device 220, for example by voice input by a microphone.
When an expression such as "movie" is entered, the EPG grid is reconfigured to show in window 214 only the programs available according to the category such as "movie", or the movie program is different from the programs in other categories. Is shown graphically. User preferably "movies" under voice command
Browse through categories such as. The user watches his favorite movie, the title shown in the EPG of a classic movie about an aviation event,
Enter the expression "The Magnificent Six and Okke" as a voice input. In another example, the user inputs "Tonight" and "8 to 12", and the window 214 is positioned to at least partially indicate the collection of programs available after 8:00 pm on that day. To be done. In another example, the user
Identify the interesting program in the portion of the EPG displayed in window 214 and tell microphone 220 the word that represents the title of the program. Next, the user says "watch" or "record". The word representing the title is converted into an appropriate format for comparison with the control information 218. When it finds a match,
The control software 216 causes the microprocessor 222 to control the tuner 224 and the display monitor 206 or the recording device 226. In this way, the user can interact with the EPG using voice control.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明のシステムのブロック図である。[Figure 1] 1 is a block diagram of the system of the present invention.

【図２】本発明のシステムのブロック図である。[Fig. 2] 1 is a block diagram of the system of the present invention.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/28 Ｇ１０Ｌ 3/00 ５５１Ｇ (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ，ＴＲ)，ＣＮ，ＪＰ，ＫＲ (72)発明者スウィレンズ，ピータージェイエルエイオランダ国，5656 アーアーアインドーフェン，プロフ・ホルストラーン６ (72)発明者ミドルジャンス，ジャコブスオランダ国，5656 アーアーアインドーフェン，プロフ・ホルストラーン６ (72)発明者アルバーダ，オケオランダ国，5656 アーアーアインドーフェン，プロフ・ホルストラーン６ (72)発明者スタインビス，ヴォルカーオランダ国，5656 アーアーアインドーフェン，プロフ・ホルストラーン６Ｆターム(参考） 5D015 AA04 AA05 KK01 LL10 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/28 G10L 3/00 551G (81) Designated country EP (AT, BE, CH, CY, DE, DK) , ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, TR), CN, JP, KR (72) Inventor Swillens, Peter J. L. A. Netherlands, 5656 Ahr Ain Dough Feng, Plov Holstran 6 (72) Inventor Middlejans, Jacobs The Netherlands, 5656 Ahr Ain Dough Feng, Plov Holstran 6 (72) Inventor Albada, Oke The Netherlands, 5656 Ahr Ain Dough Feng, Plov Holstein 6 (7 2) Inventor Steinbis, Volker, The Netherlands, 5656 Aer Eindouven, Plov Holstran 6F Term (reference) 5D015 AA04 AA05 KK01 LL10

Claims

【特許請求の範囲】[Claims]

【請求項１】エンドユーザがコンテンツ情報の処理を制御することを可能
にする方法であって、処理されるべきコンテンツ情報と意味的に関連付けられる音声コマンドを処理
することを含む方法。1. A method for enabling an end user to control the processing of content information, the method comprising processing a voice command semantically associated with content information to be processed.

【請求項２】上記情報コンテンツと共に音声制御ソフトウェアを供給する
ことを含む請求項１記載の方法。2. The method of claim 1 including providing voice control software with the information content.

【請求項３】上記コマンドは、処理するために上記コンテンツ情報を識別
する請求項１記載の方法。3. The method of claim 1, wherein the command identifies the content information for processing.

【請求項４】上記コンテンツ情報は、オーディオを有し、上記コマンドは
、上記オーディオ中に現れるワードを有する請求項１記載の方法。4. The method of claim 1, wherein the content information comprises audio and the command comprises words that appear in the audio.

【請求項５】上記コンテンツ情報は、ビデオ情報を有し、上記コマンドは
、上記ビデオ中のイベント又はオブジェクトを識別する請求項１記載の方法。5. The method of claim 1, wherein the content information comprises video information and the command identifies an event or object in the video.

【請求項６】上記コンテンツ情報は、記憶媒体に記憶され、上記コマンド
は上記処理を制御するために上記記憶媒体に記憶される請求項１記載の方法。6. The method of claim 1, wherein the content information is stored on a storage medium and the commands are stored on the storage medium to control the process.

【請求項７】上記音声コマンドの処理の状態に関して上記エンドユーザに
フィードバックを供給する請求項１記載の方法。7. The method of claim 1, wherein feedback is provided to the end user regarding the status of processing of the voice command.

【請求項８】コンテンツ情報、及び、エンドユーザが音声を通じて上記コ
ンテンツ情報の処理を制御することを可能にする音声コマンドを表わすデータを
有する記憶媒体。8. A storage medium having content information and data representing voice commands that enable an end user to control the processing of said content information through voice.

【請求項９】上記音声コマンドは、上記コンテンツ情報に意味的に関連す
る請求項８記載の媒体。9. The medium of claim 8, wherein the voice command is semantically related to the content information.

【請求項１０】光ディスク、磁気ディスク、ソリッド・ステート・メモリ
の少なくとも一つを有する請求項８記載の媒体。10. The medium according to claim 8, comprising at least one of an optical disc, a magnetic disc, and a solid state memory.

【請求項１１】コンテンツ情報を処理する電子装置であって、音声コマンドを受ける音声入力と、上記コンテンツ情報、及び、上記コンテンツ情報の意味に対して固有の制御ソ
フトウェアを有する記憶媒体を受ける入力と、上記音声コマンドの制御下で上記ソフトウェアを介して上記コンテンツ情報を
処理するデータプロセッサとを有する電子装置。11. An electronic device for processing content information, comprising: a voice input for receiving a voice command; an input for receiving a storage medium having the content information and control software specific to the meaning of the content information. An electronic device having a data processor for processing the content information via the software under control of the voice command.

【請求項１２】上記データプロセッサは、上記コンテンツ情報に意味的に
関連する音声コマンドに応答して上記コンテンツ情報を処理する請求項１１記載
の装置。12. The apparatus of claim 11, wherein the data processor processes the content information in response to a voice command semantically related to the content information.

【請求項１３】上記記憶媒体は、光ディスク、磁気ディスク、ソリッド・
ステート・メモリの少なくとも一つを有する請求項１１記載の装置。13. The storage medium is an optical disk, a magnetic disk, or a solid disk.
The apparatus of claim 11, comprising at least one state memory.

【請求項１４】上記ボイスコマンドの処理の状態をエンドユーザに示す出
力を有する請求項１１記載の装置。14. The apparatus of claim 11 having an output that indicates to an end user the status of processing of said voice command.

【請求項１５】エンドユーザが制御データによって支持されるよう音声制
御を通じて特定のコンテンツ情報の処理を制御することを可能にする、特定のコ
ンテンツ情報の意味と関連付けられる制御データを供給する方法。15. A method of providing control data associated with the meaning of particular content information that enables an end user to control the processing of the particular content information through voice control as supported by the control data.

【請求項１６】ユーザがデータネットワークを介して上記制御データをダ
ウンロードすることを可能にする請求項１５記載の方法。16. The method of claim 15, which allows a user to download the control data via a data network.

【請求項１７】上記ダウンロードされた制御データは、上記特定のコンテ
ンツ情報のコピー物と使用するためのものである請求項１５記載の方法。17. The method of claim 15, wherein the downloaded control data is for use with a copy of the particular content information.

【請求項１８】ユーザがデータネットワークを介して上記コンテンツ情報
をダウンロードすることを可能にする請求項１５記載の方法。18. The method of claim 15, which enables a user to download the content information via a data network.

【請求項１９】上記コンテンツ情報はＥＰＧを有し、上記ＥＰＧと対話す
ることを含む請求項１５記載の方法。19. The method of claim 15, wherein the content information comprises an EPG and includes interacting with the EPG.

【請求項２０】プログラムリスティングによって表示されるコンテンツ情
報の意味に対して固有の制御データを有し、エンドユーザが音声入力を用いてＥ
ＰＧと対話することを可能にするよう作動されるＥＰＧ。20. The control information unique to the meaning of the content information displayed by the program listing is provided so that the end user can enter E by using voice input.
An EPG that is activated to allow it to interact with a PG.

【請求項２１】上記音声入力の処理の状態に関して上記エンドユーザにフ
ィードバックを供給することを制御するソフトウェアを有する請求項２０記載の
ＥＰＧ。21. The EPG of claim 20, having software that controls providing feedback to the end user regarding the status of processing of the voice input.

【請求項２２】プログラムリスティングによって表示されるコンテンツ情
報の意味に対して固有の制御データを有し、エンドユーザが音声入力を用いてＥ
ＰＧと対話することを可能にするよう作動されるＥＰＧ。22. Control data unique to the meaning of the content information displayed by the program listing, the end user using voice input to E
An EPG that is activated to allow it to interact with a PG.

【請求項２３】コンテンツ情報を電子処理することを制御する音声コマン
ドであって、上記コンテンツ情報の意味によって決定されるコマンド。23. A voice command for controlling electronic processing of content information, the command being determined by the meaning of the content information.