JP2022073709A

JP2022073709A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2022073709A
Application number: JP2020183866A
Authority: JP
Inventors: 祐志真田; Yuji Sanada; 直也來田; Naoya Kida
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2022-05-17

Abstract

To provide an information processing apparatus configured to search for a specific part that a user desires to view, regardless of the genre of a video content, an information processing method, and a program.SOLUTION: A pattern DB unit 122 of an information processing apparatus 10 stores genre-based patterns useful for searching for a video content 1211 according to genre. In the information processing apparatus 10, a genre setting unit 1141 sets a genre of the video content 1211 for which text information is to be generated, a recording processing unit 1142 determines whether the text information generated from image data or sound data of the video content 1211 is useful for genre search, on the basis of a genre-based pattern corresponding to the genre set by the genre setting unit 1141, and records the text information determined to be useful, in association with the video content 1211. A search unit 112 executes searching on the text information recorded by the recording processing unit 1142.SELECTED DRAWING: Figure 2

Description

本開示は、情報処理装置、情報処理方法及びプログラムに関する。 The present disclosure relates to information processing devices, information processing methods and programs.

動画コンテンツ記録装置の大容量化及び動画コンテンツ配信技術の進歩に伴い、ユーザが視聴することのできる動画コンテンツが増加し、視聴方法も多様化している。そのため、特定のトピックを視聴したい場合、ユーザは大量の動画コンテンツの中から所望するコンテンツを探し出し、その動画コンテンツの特定部分を検索する必要がある。 With the increase in the capacity of video content recording devices and the progress of video content distribution technology, the number of video contents that can be viewed by users is increasing, and the viewing methods are also diversifying. Therefore, when it is desired to watch a specific topic, the user needs to search for the desired content from a large amount of video content and search for a specific part of the video content.

動画コンテンツの特定部分を検索する方法として、予め定めた単一の手法を適用して検索可能なインデックスを生成し、それらの各々について、動画コンテンツと同じ時間軸上での開始時間を示すタイムスタンプ情報とインデックスとをデータベースに格納することで特定部分を検索する方法が用いられていた（例えば、特許文献１）。 As a method of searching a specific part of video content, a single predetermined method is applied to generate a searchable index, and a time stamp indicating the start time of each of them on the same time axis as the video content. A method of searching for a specific part by storing information and an index in a database has been used (for example, Patent Document 1).

特許文献１に記載の情報処理装置は、動画コンテンツに含まれる音声データを動画の時間軸上で複数に分割することで生成される複数の音声データの各々について、動画の時間軸上での開始時間を示すタイムスタンプ情報と、該音声データを文字列に変換したテキスト情報と、該動画とを対応づけて格納する。これにより、ユーザは所望するコンテンツの特定部分を検索することができると説明されている。 The information processing apparatus described in Patent Document 1 starts on the time axis of a moving image for each of a plurality of audio data generated by dividing the audio data included in the moving image content into a plurality of pieces on the time axis of the moving image. The time stamp information indicating the time, the text information obtained by converting the voice data into a character string, and the moving image are stored in association with each other. It is explained that this allows the user to search for a specific part of the desired content.

特許第６３８２４２３号公報Japanese Patent No. 6382423

動画コンテンツには多種多様なジャンルのものがあり、ジャンルによっては、文字列に変換可能な音声データを含まない動画コンテンツもある。これに対し、特許文献１に記載の情報処理装置は、動画コンテンツに含まれる音声データを文字列に変換してテキスト情報を生成するという単一の手法を用いて検索用のインデックスを生成する。 There are various genres of video content, and some genres do not include audio data that can be converted into character strings. On the other hand, the information processing apparatus described in Patent Document 1 generates a search index by using a single method of converting audio data included in a moving image content into a character string to generate text information.

このため、特許文献１に記載の情報処理装置によれば、音声データから検索に有用な文字列に変換できないジャンルの動画コンテンツの場合、ユーザが視聴を所望する特定部分の検索を行うことができないという課題があった。 Therefore, according to the information processing apparatus described in Patent Document 1, in the case of video content of a genre that cannot be converted from audio data into a character string useful for searching, it is not possible to search for a specific part desired by the user. There was a problem.

本開示は、上述のような事情に鑑みてなされたものであり、動画コンテンツのジャンルに依らず、ユーザが視聴を所望する特定部分の検索ができる情報処理装置、情報処理方法及びプログラムを提供することを目的とする。 The present disclosure has been made in view of the above circumstances, and provides an information processing device, an information processing method, and a program capable of searching a specific part desired to be viewed by a user regardless of the genre of video content. The purpose is.

上記目的を達成するため、本開示の情報処理装置は、ジャンル別の、動画コンテンツの検索に有用なテキスト情報のパターンであるジャンル別パターンを格納するパターンＤＢ部を備える。また、情報処理装置は、テキスト情報を生成する対象である動画コンテンツのジャンルを設定するジャンル設定部と、パターンＤＢ部に格納されている、ジャンル設定部が設定したジャンルに対応するジャンル別パターンに基づいて、動画コンテンツの画像データ又は音声データから生成したテキスト情報が、そのジャンルの検索用として有用か否かを判定し、有用であると判定したテキスト情報を動画コンテンツに対応づけて記録する記録処理部と、を備える。更に、情報処理装置は、記録処理部が記録したテキスト情報に対して検索を実行する検索部を備えることを特徴とする。 In order to achieve the above object, the information processing apparatus of the present disclosure includes a pattern DB unit for storing a genre-specific pattern, which is a pattern of text information useful for searching video content by genre. In addition, the information processing device has a genre setting unit that sets the genre of the video content that is the target of generating text information, and a genre-specific pattern that corresponds to the genre set by the genre setting unit that is stored in the pattern DB unit. Based on this, it is determined whether or not the text information generated from the image data or audio data of the video content is useful for searching the genre, and the text information determined to be useful is recorded in association with the video content. It is equipped with a processing unit. Further, the information processing apparatus is characterized by including a search unit that executes a search for the text information recorded by the recording processing unit.

本開示によれば、動画コンテンツのジャンルに適応したテキスト情報を検索に用いるため、動画コンテンツのジャンルに依らず、ユーザが視聴を所望する特定部分の検索が可能となる。 According to the present disclosure, since text information adapted to the genre of the video content is used for the search, it is possible to search for a specific part desired by the user regardless of the genre of the video content.

本開示の実施の形態に係る情報処理装置のハードウェア構成例を示すブロック図A block diagram showing a hardware configuration example of the information processing apparatus according to the embodiment of the present disclosure. 情報処理装置の機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing device チャプター情報の例を示す表A table showing examples of chapter information チャプター情報生成処理フローの例を示すフローチャートFlowchart showing an example of chapter information generation processing flow

（実施の形態）
以下に、本開示を実施するための形態について図面を参照して詳細に説明する。 (Embodiment)
Hereinafter, embodiments for carrying out the present disclosure will be described in detail with reference to the drawings.

この開示の実施の形態に係る情報処理装置１０は、動画コンテンツを記録し、動画コンテンツから特定部分の検索を実行する。図１は、本実施の形態に係る情報処理装置１０のハードウェア構成例を示すブロック図であり、図２は、情報処理装置１０の機能構成例を示した機能ブロック図である。 The information processing apparatus 10 according to the embodiment of this disclosure records video content and executes a search for a specific portion from the video content. FIG. 1 is a block diagram showing a hardware configuration example of the information processing apparatus 10 according to the present embodiment, and FIG. 2 is a functional block diagram showing a functional configuration example of the information processing apparatus 10.

情報処理装置１０は、図１に示すように、動画コンテンツに係る演算処理を実行するＣＰＵ（Central Processing Unit）１１と、ＣＰＵ１１により取得し又は生成したデータを記憶する記憶装置１２と、ユーザの入力操作を受け付ける操作入力デバイス１３と、動画コンテンツの映像データ及び付帯するデジタルデータを受け付ける動画コンテンツ入力デバイス１４と、動画コンテンツ及び検索結果を含む情報を出力する出力デバイス１５と、を有する。 As shown in FIG. 1, the information processing apparatus 10 includes a CPU (Central Processing Unit) 11 that executes arithmetic processing related to moving image content, a storage device 12 that stores data acquired or generated by the CPU 11, and user input. It has an operation input device 13 that accepts operations, a moving image content input device 14 that accepts video data of moving image contents and incidental digital data, and an output device 15 that outputs information including moving image contents and search results.

ＣＰＵ１１は、記憶装置１２に格納されるプログラムを実行することにより、図２に示すように、ＵＩ（User Interface）部１１１、検索部１１２、ジャンル指定部１１３、生成部１１４、及び、動画出力部１１５として機能する。なお、該プログラムは、例えばＣＰＵ１１に内蔵されている不揮発性半導体メモリに格納し、又は、一時的な揮発性半導体メモリに展開することができる。 By executing the program stored in the storage device 12, the CPU 11 executes a UI (User Interface) unit 111, a search unit 112, a genre designation unit 113, a generation unit 114, and a moving image output unit, as shown in FIG. Functions as 115. The program can be stored in, for example, a non-volatile semiconductor memory built in the CPU 11 or expanded into a temporary volatile semiconductor memory.

記憶装置１２は、任意の記憶装置であり、例えば、フラッシュメモリ、ＥＰＲＯＭ（Erasable Programmable Read Only Memory）を含む不揮発性半導体メモリ、又は、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ（Digital Versatile Disc）である。 The storage device 12 is an arbitrary storage device, and is, for example, a flash memory, a non-volatile semiconductor memory including an EPROM (Erasable Programmable Read Only Memory), or a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, or a DVD ( Digital Versatile Disc).

記憶装置１２は、図２に示すように、動画コンテンツ１２１１及び動画コンテンツ１２１１から生成したチャプター情報１２１２を記録する動画記録部１２１と、動画コンテンツ１２１１のジャンルに応じたテキスト情報のパターンを格納するデータベース（Data Base: ＤＢ）であるパターンＤＢ部１２２と、を含む。 As shown in FIG. 2, the storage device 12 has a video recording unit 121 that records chapter information 1212 generated from the video content 1211 and the video content 1211, and a database that stores a pattern of text information according to the genre of the video content 1211. (Data Base: DB), the pattern DB unit 122, and the like are included.

操作入力デバイス１３は、ユーザの入力操作を受け付ける任意のデバイスであって、例えば、マウス、タッチパッドを含むポインティングデバイス、又は、キーボードである。操作入力デバイス１３は、ユーザが所望する動画コンテンツ１２１１の特定部分を検索するためのキーワードを受け付け、ＣＰＵ１１のＵＩ部１１１を介して検索部１１２に該キーワードを渡す。また、操作入力デバイス１３は、動画コンテンツ１２１１のジャンルを指定するユーザの入力操作を受け付けて、ＣＰＵ１１のジャンル指定部１１３を介して生成部１１４に該ジャンルを渡す。 The operation input device 13 is an arbitrary device that accepts a user's input operation, and is, for example, a pointing device including a mouse and a touch pad, or a keyboard. The operation input device 13 accepts a keyword for searching a specific portion of the moving image content 1211 desired by the user, and passes the keyword to the search unit 112 via the UI unit 111 of the CPU 11. Further, the operation input device 13 accepts an input operation of a user who specifies the genre of the moving image content 1211 and passes the genre to the generation unit 114 via the genre designation unit 113 of the CPU 11.

動画コンテンツ入力デバイス１４は、データ通信機能を有し、動画コンテンツの映像データ及び付帯するデジタルデータを受け付ける。動画コンテンツのデジタルデータは、例えば、動画コンテンツ名、及び、属性又は関連する情報を含むメタ情報である。動画コンテンツ入力デバイス１４は、取得したデータをＣＰＵ１１の生成部１１４に渡す。 The video content input device 14 has a data communication function and receives video data of video content and incidental digital data. The digital data of the video content is, for example, a video content name and meta information including attributes or related information. The video content input device 14 passes the acquired data to the generation unit 114 of the CPU 11.

出力デバイス１５は、文字及び画像を出力し、ユーザに対して表示する任意の情報表示デバイスであって、例えばＬＣＤ（Liquid crystal display）である。出力デバイス１５は、ＵＩ部１１１が出力するユーザの入力操作画面を表示してもよい。 The output device 15 is an arbitrary information display device that outputs characters and images and displays them to the user, and is, for example, an LCD (Liquid crystal display). The output device 15 may display the user's input operation screen output by the UI unit 111.

次に、図２に示したＣＰＵ１１の各機能部について詳細に説明する。 Next, each functional unit of the CPU 11 shown in FIG. 2 will be described in detail.

ＣＰＵ１１のジャンル指定部１１３は、動画コンテンツ１２１１のジャンルについてユーザが操作入力デバイス１３へ入力した情報に基づいて、動画コンテンツ１２１１のジャンルを指定し、生成部１１４に入力する。生成部１１４は、動画コンテンツ入力デバイス１４より入力された動画コンテンツ１２１１の映像データ、及び、動画コンテンツ１２１１のデジタルデータに基づいて、テキスト情報及びタイムスタンプ情報を含むチャプター情報１２１２を生成し、記憶装置１２の動画記録部１２１に記録する。 The genre designation unit 113 of the CPU 11 designates the genre of the video content 1211 based on the information input by the user to the operation input device 13 for the genre of the video content 1211, and inputs the genre to the generation unit 114. The generation unit 114 generates chapter information 1212 including text information and time stamp information based on the video data of the video content 1211 input from the video content input device 14 and the digital data of the video content 1211, and is a storage device. It is recorded in the moving image recording unit 121 of 12.

生成部１１４は、ジャンル設定部１１４１と、記録処理部１１４２と、画像認識部１１４３と、音声認識部１１４４と、を有する。 The generation unit 114 includes a genre setting unit 1141, a recording processing unit 1142, an image recognition unit 1143, and a voice recognition unit 1144.

ジャンル設定部１１４１は、チャプター情報１２１２を生成する対象である動画コンテンツ１２１１のジャンルを、該動画コンテンツの名称（動画コンテンツ名）及びメタ情報に基づいて判定したジャンル、又は、ジャンル指定部１１３により指定されたジャンルに設定する。ジャンルを設定後、生成部１１４は、設定されたジャンルに対応づけられたテキスト情報生成方法を用いて、テキスト情報を生成する。 The genre setting unit 1141 designates the genre of the video content 1211 for which the chapter information 1212 is generated by the genre determined based on the name of the video content (video content name) and the meta information, or the genre designation unit 113. Set to the specified genre. After setting the genre, the generation unit 114 generates text information by using the text information generation method associated with the set genre.

例えば、テレビ番組の映像ストリームが動画コンテンツ入力デバイス１４に入力された場合、テレビ番組名やテレビ番組の公式情報（Service Information）に含まれるジャンルに係るメタ情報が生成部１１４のジャンル設定部１１４１に入力される。ジャンル設定部１１４１は、ジャンル指定部１１３がユーザによる指定を取得していない場合には、動画コンテンツ１２１１に付帯されているメタ情報に基づいてジャンルを決定する。 For example, when a video stream of a TV program is input to the video content input device 14, meta information related to a genre included in the TV program name or official information (Service Information) of the TV program is stored in the genre setting unit 1141 of the generation unit 114. Entered. When the genre designation unit 113 has not acquired the designation by the user, the genre setting unit 1141 determines the genre based on the meta information attached to the video content 1211.

記録処理部１１４２は、ジャンル設定部１１４１が設定したジャンルに対応づけられたテキスト情報生成方法に従って、動画コンテンツ入力デバイス１４に入力された動画コンテンツ１２１１の画像データを画像認識部１１４３に入力させ、又は、音声データを音声認識部１１４４に入力させる。あるいは、記録処理部１１４２は、ジャンル設定部１１４１が設定したジャンルに対応づけられたテキスト情報生成方法に従って、画像データを画像認識部１１４３に入力させ、かつ、音声データを音声認識部１１４４に入力させる。 The recording processing unit 1142 causes the image recognition unit 1143 to input the image data of the video content 1211 input to the video content input device 14 according to the text information generation method associated with the genre set by the genre setting unit 1141. , The voice data is input to the voice recognition unit 1144. Alternatively, the recording processing unit 1142 causes the image recognition unit 1143 to input the image data and the voice data to the voice recognition unit 1144 according to the text information generation method associated with the genre set by the genre setting unit 1141. ..

画像認識部１１４３は、記録処理部１１４２から入力された画像データからテキスト情報を生成し、記録処理部１１４２にテキスト情報を返す。音声認識部１１４４は、記録処理部１１４２から入力された音声データからテキスト情報を生成し、記録処理部１１４２にテキスト情報を返す。 The image recognition unit 1143 generates text information from the image data input from the recording processing unit 1142, and returns the text information to the recording processing unit 1142. The voice recognition unit 1144 generates text information from the voice data input from the recording processing unit 1142, and returns the text information to the recording processing unit 1142.

記録処理部１１４２は、画像認識部１１４３が生成したテキスト情報又は音声認識部１１４４が生成したテキスト情報、あるいは、画像認識部１１４３及び音声認識部１１４４が生成したテキスト情報が、ジャンル設定部１１４１が設定したジャンルに適したテキスト情報か否かを判定する。 In the recording processing unit 1142, the genre setting unit 1141 sets the text information generated by the image recognition unit 1143 or the text information generated by the voice recognition unit 1144, or the text information generated by the image recognition unit 1143 and the voice recognition unit 1144. It is determined whether or not the text information is suitable for the genre.

まず、記録処理部１１４２は、テキスト情報のジャンル別パターンが格納されているパターンＤＢ部１２２から、ジャンル設定部１１４１が設定したジャンルの判定用のパターンを読み込む。記録処理部１１４２は、画像認識部１１４３及び音声認識部１１４４が生成したテキスト情報を読み込んで判定用のジャンル別パターンと照合することで、生成したテキスト情報が検索用として有用か否かを判定する。ジャンル別パターンとの照合の結果、テキスト情報の少なくとも一部がジャンル別パターンに一致したときに、設定されたジャンルの検索用として有用であると判定してもよい。 First, the recording processing unit 1142 reads the pattern for determining the genre set by the genre setting unit 1141 from the pattern DB unit 122 in which the genre-specific pattern of the text information is stored. The recording processing unit 1142 reads the text information generated by the image recognition unit 1143 and the voice recognition unit 1144 and collates it with the pattern for each genre for determination, thereby determining whether or not the generated text information is useful for searching. .. As a result of collation with the genre-specific pattern, when at least a part of the text information matches the genre-specific pattern, it may be determined that the text information is useful for searching the set genre.

記録処理部１１４２は、有用と判定されたテキスト情報を、動画コンテンツ１２１１に対応づけて、チャプター情報１２１２として動画記録部１２１に記録する。 The recording processing unit 1142 records the text information determined to be useful in the video recording unit 121 as chapter information 1212 in association with the video content 1211.

ＵＩ部１１１は、ユーザが所望する動画コンテンツ１２１１の特定部分を検索するためのキーワードを受け付け、該キーワードを検索部１１２に入力する。検索部１１２は、動画記録部１２１に保存されているチャプター情報１２１２に対して、ＵＩ部１１１から入力されたキーワードを検索し、検索結果をＵＩ部１１１へ返す。 The UI unit 111 receives a keyword for searching a specific part of the moving image content 1211 desired by the user, and inputs the keyword into the search unit 112. The search unit 112 searches for the keyword input from the UI unit 111 with respect to the chapter information 1212 stored in the video recording unit 121, and returns the search result to the UI unit 111.

動画出力部１１５は、ＵＩ部１１１に返された検索結果に対して、ユーザによる指定操作があった場合に、指定された検索結果に係るテキスト情報を含むチャプター情報１２１２を参照し、該テキスト情報に対応した動画コンテンツ１２１１を出力する。 When the search result returned to the UI unit 111 is specified by the user, the video output unit 115 refers to the chapter information 1212 including the text information related to the specified search result, and refers to the text information. The video content 1211 corresponding to the above is output.

次に、図２に示した記憶装置１２に記憶した各種データついて詳細に説明する。 Next, various data stored in the storage device 12 shown in FIG. 2 will be described in detail.

パターンＤＢ部１２２は、動画コンテンツ１２１１に対するユーザのキーワード検索に有用なテキスト情報のパターンがジャンル別に格納されているデータベースである。このジャンル別のテキスト情報のパターンをジャンル別パターンと呼ぶ。ジャンル別パターンは、画像認識部１１４３又は音声認識部１１４４が生成したテキスト情報が、ジャンル設定部１１４１が設定したジャンルの検索用として有用か否かを判定するために用いられる。つまり、画像認識部１１４３及び音声認識部１１４４が生成したテキスト情報は、パターンＤＢ部１２２のジャンル別パターンと照合されることにより、生成したテキスト情報の適否を判断される。 The pattern DB unit 122 is a database in which patterns of text information useful for a user's keyword search for video content 1211 are stored for each genre. This genre-specific text information pattern is called a genre-specific pattern. The genre-specific pattern is used to determine whether the text information generated by the image recognition unit 1143 or the voice recognition unit 1144 is useful for searching the genre set by the genre setting unit 1141. That is, the text information generated by the image recognition unit 1143 and the voice recognition unit 1144 is collated with the genre-specific pattern of the pattern DB unit 122 to determine the suitability of the generated text information.

パターンＤＢ部１２２に格納されているテキスト情報の内容は、自動又は手動でアップデートすることが可能である。アップデートは例えばインターネットに接続して予め設定したサーバよりダウンロードすることにより行う。また、ユーザが必要に応じて操作入力デバイス１３に入力することによりカスタマイズしてもよい。 The content of the text information stored in the pattern DB unit 122 can be automatically or manually updated. The update is performed, for example, by connecting to the Internet and downloading from a preset server. Further, the user may customize it by inputting it to the operation input device 13 as needed.

さらに、ユーザが検索で使用したテキスト情報を、ユーザの好みのジャンル又は再生履歴に基づく使用頻度の高いジャンルのパターンとして自動的に学習し、その学習結果がパターンＤＢ部１２２に反映されてもよい。この学習結果は、動画コンテンツ１２１１をおすすめ動画として動画出力部１１５から出力する際に、ユーザに提示してもよく、また、ユーザの検索キーワード又は検索結果の表示順に反映させてもよい。 Further, the text information used by the user in the search may be automatically learned as a pattern of the user's favorite genre or a frequently used genre based on the playback history, and the learning result may be reflected in the pattern DB unit 122. .. This learning result may be presented to the user when the moving image content 1211 is output from the moving image output unit 115 as a recommended moving image, or may be reflected in the user's search keyword or the display order of the search result.

例えば、ジャンルが「クイズ番組」であれば、「問題」、「正解」、「優勝者」、「クイズ王」がジャンルに適したテキスト情報である。言い換えると、当該テキスト情報が、ジャンルが「クイズ番組」であるジャンル別パターンである。 For example, if the genre is "quiz program", "question", "correct answer", "winner", and "quiz king" are text information suitable for the genre. In other words, the text information is a genre-specific pattern in which the genre is a "quiz program".

このようなテキスト情報のジャンル別パターンを、テキスト情報の適否の判定用として用いることで、動画コンテンツ入力デバイス１４から入力されたクイズ番組の動画コンテンツのテロップ（画像データ）又は番組司会者の会話（音声データ）から画像認識部１１４３及び音声認識部１１４４が生成したテキスト情報が有用か否かを判定することができる。 By using such a pattern for each genre of text information for determining the suitability of text information, a telop (image data) of the video content of the quiz program input from the video content input device 14 or a conversation of the program host (image data) ( From the voice data), it can be determined whether or not the text information generated by the image recognition unit 1143 and the voice recognition unit 1144 is useful.

動画記録部１２１に記録された動画コンテンツ１２１１は、生成部１１４によりチャプター情報１２１２を生成する対象であり、又は、チャプター情報１２１２を生成済の動画コンテンツである。 The moving image content 1211 recorded in the moving image recording unit 121 is a target for generating chapter information 1212 by the generating unit 114, or is a moving image content for which chapter information 1212 has already been generated.

動画記録部１２１に記録されたチャプター情報１２１２は、動画コンテンツ１２１１の画像データ又は音声データから生成したテキスト情報であって、設定されたジャンルの検索用として有用であると判定されたテキスト情報を含む。チャプター情報１２１２は、該テキスト情報と、動画コンテンツ１２１１の名称と、動画コンテンツ１２１１の時間軸上での開始時刻であるタイムスタンプと、を互いに対応づけた情報である。つまり、チャプター情報１２１２は、テキスト情報と、該テキスト情報を作成した対象の動画コンテンツ１２１１と、を対応づけた情報である。 The chapter information 1212 recorded in the video recording unit 121 is text information generated from the image data or audio data of the video content 1211, and includes text information determined to be useful for searching the set genre. .. The chapter information 1212 is information in which the text information, the name of the moving image content 1211, and the time stamp which is the start time of the moving image content 1211 on the time axis are associated with each other. That is, the chapter information 1212 is information in which the text information is associated with the target moving image content 1211 for which the text information is created.

図３は本実施の形態１に係るチャプター情報１２１２の一例を示す図である。図３において、チャプター情報１２１２を構成する要素は、動画記録部１２１に保存される動画コンテンツ１２１１の名称（動画コンテンツ名）と、該動画コンテンツ１２１１と同じ時間軸上のタイムスタンプと、画像認識部１１４３又は音声認識部１１４４が生成し、設定されたジャンルの検索用として有用であると判定されたテキスト情報である。チャプター情報１２１２には、ジャンル設定部１１４１が設定したジャンル、又は、ユーザがジャンル指定時に指定したテキスト情報を含めてもよい。 FIG. 3 is a diagram showing an example of chapter information 1212 according to the first embodiment. In FIG. 3, the elements constituting the chapter information 1212 are the name (video content name) of the video content 1211 stored in the video recording unit 121, the time stamp on the same time axis as the video content 1211, and the image recognition unit. This is text information generated by 1143 or the voice recognition unit 1144 and determined to be useful for searching a set genre. The chapter information 1212 may include the genre set by the genre setting unit 1141 or the text information specified by the user when the genre is specified.

以上のように構成された情報処理装置１０の動作について、図４に示したフローチャートを用いて説明する。図４は、ＣＰＵ１１が実行するチャプター情報生成処理フローの例を示すフローチャートである。 The operation of the information processing apparatus 10 configured as described above will be described with reference to the flowchart shown in FIG. FIG. 4 is a flowchart showing an example of a chapter information generation processing flow executed by the CPU 11.

まず、ＣＰＵ１１の生成部１１４は、動画コンテンツ入力デバイス１４から取得した動画コンテンツ１２１１のジャンルを設定する（ステップＳ１０１：ジャンル設定ステップ）。具体的には、ジャンル設定部１１４１が、動画コンテンツ入力デバイス１４から入力された動画コンテンツ名及びメタ情報に基づいて判定したジャンルに設定し、又は、ジャンル指定部１１３により指定されたジャンルに設定する。 First, the generation unit 114 of the CPU 11 sets the genre of the video content 1211 acquired from the video content input device 14 (step S101: genre setting step). Specifically, the genre setting unit 1141 sets the genre determined based on the video content name and meta information input from the video content input device 14, or sets the genre specified by the genre specification unit 113. ..

次に、生成部１１４は、ステップＳ１０１で設定したジャンルに応じたテキスト情報生成方法を選択する（ステップＳ１０２）。具体的には、生成部１１４がチャプター情報１２１２に含まれるテキスト情報を生成する方法として、ステップＳ１０１で設定したジャンルに応じたテキスト情報生成方法を選択する。テキスト情報生成方法は、テキスト情報を生成する元となるデータとして、動画コンテンツ１２１１の画像データ及び音声データのいずれか一方を用いるか、あるいは両方のデータを用いるかを、ジャンルに応じて定めている。また、テキスト情報生成方法は、テキスト情報の照合用データとして、パターンＤＢ部１２２から読み込むジャンル別パターンのいずれを用いるかを、ジャンルに応じて定めている。 Next, the generation unit 114 selects a text information generation method according to the genre set in step S101 (step S102). Specifically, as a method for the generation unit 114 to generate the text information included in the chapter information 1212, a text information generation method according to the genre set in step S101 is selected. In the text information generation method, it is determined according to the genre whether to use either one of the image data and the audio data of the video content 1211 or both data as the data from which the text information is generated. .. Further, in the text information generation method, which of the genre-specific patterns read from the pattern DB unit 122 is used as the collation data of the text information is determined according to the genre.

記録処理部１１４２は、ステップＳ１０２で選択したテキスト情報生成方法に応じて、動画コンテンツ入力デバイス１４に入力された動画コンテンツ１２１１の画像データを画像認識部１１４３に入力させ、又は、音声データを音声認識部１１４４に入力させる。あるいは、記録処理部１１４２が、テキスト情報生成方法に応じて、画像データを画像認識部１１４３に入力させ、かつ、音声データを音声認識部１１４４に入力させる。 The recording processing unit 1142 causes the image recognition unit 1143 to input the image data of the video content 1211 input to the video content input device 14 or the voice data to be voice-recognized according to the text information generation method selected in step S102. It is input to the unit 1144. Alternatively, the recording processing unit 1142 causes the image recognition unit 1143 to input the image data and the voice data to the voice recognition unit 1144 according to the text information generation method.

画像認識部１１４３は、入力された画像データから画像認識によりテキスト情報を生成し、音声認識部１１４４は、入力された音声データから音声認識によりテキスト情報を生成する（ステップＳ１０３）。生成されたテキスト情報は、記録処理部１１４２に入力される。 The image recognition unit 1143 generates text information by image recognition from the input image data, and the voice recognition unit 1144 generates text information by voice recognition from the input voice data (step S103). The generated text information is input to the recording processing unit 1142.

次に、記録処理部１１４２は、画像認識部１１４３又は音声認識部１１４４が生成したテキスト情報が、ジャンル設定部１１４１が設定したジャンルに適したテキスト情報か否かを判定するために、該テキスト情報を、パターンＤＢ部１２２のジャンル別パターンと照合する（ステップＳ１０４）。 Next, the recording processing unit 1142 determines whether or not the text information generated by the image recognition unit 1143 or the voice recognition unit 1144 is text information suitable for the genre set by the genre setting unit 1141. Is collated with the genre-specific pattern of the pattern DB unit 122 (step S104).

記録処理部１１４２は、テキスト情報生成方法に従って、ジャンル別パターンを読み込み、画像認識部１１４３又は音声認識部１１４４が生成したテキスト情報を、読み込んだジャンル別パターンと照合することで、生成されたテキスト情報が検索に適しているか否かを判定する（ステップＳ１０５）。 The recording processing unit 1142 reads the genre-specific pattern according to the text information generation method, and collates the text information generated by the image recognition unit 1143 or the voice recognition unit 1144 with the read genre-specific pattern to generate the text information. Is suitable for the search (step S105).

生成されたテキスト情報をジャンル別パターンと照合した結果、検索に適していると判定された場合には（ステップＳ１０５：Ｙｅｓ）、記録処理部１１４２が、該テキスト情報をチャプター情報１２１２として動画記録部１２１に保存する（ステップＳ１０６：記録処理ステップ）。このとき、テキスト情報を生成した対象の動画コンテンツ１２１１の名称と、動画コンテンツ１２１１の時間軸上での開始時刻を示すタイムスタンプも、該テキスト情報に対応づけてチャプター情報１２１２として保存する。 As a result of collating the generated text information with the genre-specific pattern, if it is determined that the text information is suitable for the search (step S105: Yes), the recording processing unit 1142 uses the text information as chapter information 1212 and the moving image recording unit. Save in 121 (step S106: recording processing step). At this time, the name of the target moving image content 1211 for which the text information is generated and the time stamp indicating the start time of the moving image content 1211 on the time axis are also stored as chapter information 1212 in association with the text information.

生成されたテキスト情報が検索に適していないと判定された場合には（ステップＳ１０５：Ｎｏ）、ステップＳ１０３に戻り、次のテキスト情報の生成及び判定を行う。テキスト情報が動画記録部１２１に保存された後は（ステップＳ１０６）、ユーザによる終了指示があるか否かを判定し（ステップＳ１０７）、終了しない場合には（ステップＳ１０７：Ｎｏ）ステップＳ１０３に戻り、テキスト情報の生成及び判定を継続する。ユーザによる終了指示があった場合には（ステップＳ１０７：Ｙｅｓ）、チャプター情報生成処理を終了する。 If it is determined that the generated text information is not suitable for the search (step S105: No), the process returns to step S103 to generate and determine the next text information. After the text information is saved in the moving image recording unit 121 (step S106), it is determined whether or not there is an end instruction by the user (step S107), and if not, the process returns to step S103 (step S107: No). , Continue to generate and determine text information. When the user gives an end instruction (step S107: Yes), the chapter information generation process is terminated.

検索部１１２は、図４に示したチャプター情報生成処理により生成されたチャプター情報１２１２に含まれるテキスト情報に対して、ユーザが操作入力デバイス１３に入力したキーワードを検索する（検索ステップ）。検索した結果を、ＵＩ部１１１が出力デバイス１５に表示する。この検索結果に対して、ユーザが動画コンテンツ１２１１の特定箇所を指定した場合には、動画出力部１１５は、チャプター情報１２１２を参照して、動画コンテンツ１２１１を読み出し、該当箇所から再生を実行する。 The search unit 112 searches for the keyword input to the operation input device 13 by the user with respect to the text information included in the chapter information 1212 generated by the chapter information generation process shown in FIG. 4 (search step). The UI unit 111 displays the search result on the output device 15. When the user specifies a specific part of the moving image content 1211 with respect to this search result, the moving image output unit 115 reads the moving image content 1211 with reference to the chapter information 1212, and executes playback from the corresponding part.

このようにして、情報処理装置１０は、動画コンテンツ１２１１のジャンルに適合したテキスト情報生成方法に基づいて、検索に有用なテキスト情報を含むチャプター情報１２１２を生成し、生成したチャプター情報１２１２に対して検索を実行することにより、ユーザの希望に沿った検索結果を得ることができる。例えば、音声データから検索に有用な文字列に変換できない動画コンテンツ１２１１の場合でも、ユーザが視聴を所望する特定部分の検索が可能になる。 In this way, the information processing apparatus 10 generates chapter information 1212 including text information useful for searching based on the text information generation method suitable for the genre of the moving image content 1211, and the generated chapter information 1212 is used. By executing the search, it is possible to obtain search results according to the user's wishes. For example, even in the case of moving image content 1211 in which audio data cannot be converted into a character string useful for searching, it is possible to search for a specific part desired by the user.

以上説明したように本実施の形態に係る情報処理装置１０は、記憶装置１２のパターンＤＢ部１２２に、ジャンル別の、動画コンテンツ１２１１の検索に有用なテキスト情報のパターンであるジャンル別パターンを格納しておく。生成部１１４のジャンル設定部１１４１はテキスト情報を生成する対象である動画コンテンツ１２１１のジャンルを設定する。記録処理部１１４２は、パターンＤＢ部１２２に格納されている、設定したジャンルに対応するジャンル別パターンに基づいて、動画コンテンツ１２１１の画像データ又は音声データから生成したテキスト情報が、ジャンルの検索用として有用か否かを判定し、有用であると判定したテキスト情報を動画コンテンツ１２１１に対応づけてチャプター情報１２１２として記録する。そして、検索部１１２が、チャプター情報１２１２に対してユーザが入力したキーワードを検索することとした。これにより、動画コンテンツ１２１１のジャンルに依らず、ユーザが視聴を所望する特定部分の検索が可能になる。 As described above, the information processing device 10 according to the present embodiment stores in the pattern DB unit 122 of the storage device 12 a genre-specific pattern, which is a text information pattern useful for searching the video content 1211. I will do it. The genre setting unit 1141 of the generation unit 114 sets the genre of the moving image content 1211 for which the text information is generated. In the recording processing unit 1142, text information generated from the image data or audio data of the video content 1211 based on the genre-specific pattern corresponding to the set genre stored in the pattern DB unit 122 is used for searching the genre. It is determined whether or not it is useful, and the text information determined to be useful is recorded as chapter information 1212 in association with the moving image content 1211. Then, the search unit 112 decides to search for the keyword input by the user for the chapter information 1212. This makes it possible to search for a specific part that the user wants to view, regardless of the genre of the moving image content 1211.

なお、上記実施の形態において、生成部１１４の記録処理部１１４２が、パターンＤＢ部１２２を参照して、画像認識部１１４３または音声認識部１１４４が生成したテキスト情報が適しているか否かを判定し、適していると判定されたテキスト情報を動画記録部１２１に保存するとしたが、本開示はこれに限定されるものではない。すなわち、本開示は、生成部１１４が動画コンテンツ１２１１のジャンル情報を利用して、検索に有用なテキスト情報をチャプター情報１２１２として動画記録部１２１に保存すればよい。例えば、記録処理部１１４２は必ずしも情報処理装置１０内のパターンＤＢ部１２２を利用しなくてもよい。このとき、パターンＤＢ部１２２の代わりにサーバ上のデータベースを利用してもよい。 In the above embodiment, the recording processing unit 1142 of the generation unit 114 refers to the pattern DB unit 122 to determine whether or not the text information generated by the image recognition unit 1143 or the voice recognition unit 1144 is suitable. The text information determined to be suitable is stored in the moving image recording unit 121, but the present disclosure is not limited to this. That is, in the present disclosure, the generation unit 114 may use the genre information of the video content 1211 and store the text information useful for the search as the chapter information 1212 in the video recording unit 121. For example, the recording processing unit 1142 does not necessarily have to use the pattern DB unit 122 in the information processing device 10. At this time, the database on the server may be used instead of the pattern DB unit 122.

また、上記実施の形態に示したハードウェア構成及びフローチャートは一例であり、任意に変更及び修正が可能である。ＣＰＵ１１及び記憶装置１２で実現する各機能は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。 Further, the hardware configuration and the flowchart shown in the above embodiment are examples, and can be arbitrarily changed and modified. Each function realized by the CPU 11 and the storage device 12 can be realized by using a normal computer system without using a dedicated system.

例えば、上記実施の形態の動作を実行するためのプログラムを、コンピュータが読み取り可能なＣＤ－ＲＯＭ（Compact Disc Read-Only Memory）、ＤＶＤ、ＭＯ（Magneto Optical Disc）、メモリカード等の記録媒体に格納して配布し、プログラムをコンピュータにインストールすることにより、各機能を実現することができるコンピュータを構成してもよい。そして、各機能をＯＳ（Operating System）とアプリケーションとの分担、又はＯＳとアプリケーションとの協同により実現する場合には、ＯＳ以外の部分のみを記録媒体に格納してもよい。 For example, a program for executing the operation of the above embodiment is stored in a recording medium such as a computer-readable CD-ROM (Compact Disc Read-Only Memory), DVD, MO (Magneto Optical Disc), or memory card. By distributing the DVD and installing the program on the computer, a computer capable of realizing each function may be configured. When each function is realized by sharing the OS (Operating System) and the application or by cooperating with the OS and the application, only the part other than the OS may be stored in the recording medium.

本開示は、上記実施の形態に限定されず、本開示の要旨を逸脱しない範囲での種々の変更は勿論可能である。 The present disclosure is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present disclosure.

１０情報処理装置、１１ＣＰＵ、１２記憶装置、１３操作入力デバイス、１４動画コンテンツ入力デバイス、１５出力デバイス、１１１ＵＩ部、１１２検索部、１１３ジャンル指定部、１１４生成部、１１５動画出力部、１２１動画記録部、１２２パターンＤＢ部、１１４１ジャンル設定部、１１４２記録処理部、１１４３画像認識部、１１４４音声認識部、１２１１動画コンテンツ、１２１２チャプター情報。 10 Information processing device, 11 CPU, 12 Storage device, 13 Operation input device, 14 Video content input device, 15 Output device, 111 UI section, 112 Search section, 113 Genre specification section, 114 Generation section, 115 Video output section, 121 Video recording unit, 122 pattern DB unit, 1141 genre setting unit, 1142 recording processing unit, 1143 image recognition unit, 1144 voice recognition unit, 1211 video content, 1212 chapter information.

Claims

ジャンル別の、動画コンテンツの検索に有用なテキスト情報のパターンであるジャンル別パターンを格納するパターンＤＢ部と、
テキスト情報を生成する対象である前記動画コンテンツのジャンルを設定するジャンル設定部と、
前記パターンＤＢ部に格納されている、前記ジャンル設定部が設定した前記ジャンルに対応する前記ジャンル別パターンに基づいて、前記動画コンテンツの画像データ又は音声データから生成したテキスト情報が、前記ジャンルの検索用として有用か否かを判定し、有用であると判定した前記テキスト情報を前記動画コンテンツに対応づけて記録する記録処理部と、
前記記録処理部が記録した前記テキスト情報に対して検索を実行する検索部と、を備える、
情報処理装置。 A pattern DB section that stores patterns by genre, which are patterns of text information useful for searching video content by genre,
A genre setting unit that sets the genre of the video content that is the target for generating text information, and
Based on the genre-specific pattern stored in the pattern DB unit and corresponding to the genre set by the genre setting unit, the text information generated from the image data or audio data of the video content is the search for the genre. A recording processing unit that determines whether or not it is useful for use and records the text information determined to be useful in association with the moving image content.
A search unit that executes a search for the text information recorded by the recording processing unit is provided.
Information processing equipment.

前記ジャンル設定部は、前記動画コンテンツに付帯されたメタ情報に基づいて、前記動画コンテンツのジャンルを判定して設定する、
請求項１に記載の情報処理装置。 The genre setting unit determines and sets the genre of the video content based on the meta information attached to the video content.
The information processing apparatus according to claim 1.

前記ジャンル設定部は、ユーザの操作により指定されたジャンルに設定する、
請求項１に記載の情報処理装置。 The genre setting unit sets the genre specified by the user's operation.
The information processing apparatus according to claim 1.

前記記録処理部は、前記ジャンル設定部が設定した前記ジャンルの検索用として有用であると判定したテキスト情報を、前記動画コンテンツの名称、及び、前記動画コンテンツの時間軸上での開始時刻を示すタイムスタンプと対応づけた情報であるチャプター情報を記録する、
請求項１から３のいずれか１項に記載の情報処理装置。 The recording processing unit indicates the name of the video content and the start time of the video content on the time axis of the text information set by the genre setting unit that is determined to be useful for searching the genre. Record chapter information, which is information associated with the time stamp,
The information processing apparatus according to any one of claims 1 to 3.

前記記録処理部は、前記動画コンテンツの前記画像データ又は前記音声データから生成したテキスト情報を、前記パターンＤＢ部に格納されている、前記ジャンル設定部が設定した前記ジャンルに対応する前記ジャンル別パターンに照合し、前記テキスト情報の少なくとも一部が前記ジャンル別パターンに一致した場合に、該ジャンルの検索用として有用であると判定して記録する、
請求項１から４のいずれか１項に記載の情報処理装置。 The recording processing unit stores the text information generated from the image data or the audio data of the moving image content in the pattern DB unit, and the genre-specific pattern corresponding to the genre set by the genre setting unit. When at least a part of the text information matches the genre-specific pattern, it is determined that the text information is useful for searching the genre and recorded.
The information processing apparatus according to any one of claims 1 to 4.

前記動画コンテンツの前記画像データから画像認識により前記テキスト情報を生成する画像認識部と、
前記動画コンテンツの前記音声データから音声認識により前記テキスト情報を生成する音声認識部と、を更に備え、
前記記録処理部は、前記ジャンル設定部が設定した前記ジャンルに対応するテキスト情報生成方法に従って、前記画像認識部により生成した前記テキスト情報及び前記音声認識部により生成した前記テキスト情報のいずれか一方、あるいは、両方を、該ジャンルの検索用として有用か否かを判定し、有用であると判定した前記テキスト情報を前記動画コンテンツに対応づけて記録する、
請求項１から５のいずれか１項に記載の情報処理装置。 An image recognition unit that generates text information by image recognition from the image data of the video content, and
A voice recognition unit that generates the text information by voice recognition from the voice data of the video content is further provided.
The recording processing unit is one of the text information generated by the image recognition unit and the text information generated by the voice recognition unit according to the text information generation method corresponding to the genre set by the genre setting unit. Alternatively, it is determined whether or not both are useful for searching the genre, and the text information determined to be useful is recorded in association with the moving image content.
The information processing apparatus according to any one of claims 1 to 5.

前記検索部の検索結果に対してユーザの指定操作があった場合に、指定された検索結果に係る前記テキスト情報を含むチャプター情報に基づいて、前記動画コンテンツを出力する動画出力部を、更に備える、
請求項１から６のいずれか１項に記載の情報処理装置。 Further provided is a video output unit that outputs the video content based on the chapter information including the text information related to the specified search result when the user specifies an operation for the search result of the search unit. ,
The information processing apparatus according to any one of claims 1 to 6.

テキスト情報を生成する対象である動画コンテンツのジャンルを設定するジャンル設定ステップと、
前記動画コンテンツの検索に有用なテキスト情報のパターンであるジャンル別パターンに基づいて、前記動画コンテンツから生成したテキスト情報が、前記ジャンル設定ステップで設定した前記ジャンルの検索用として有用か否かを判定し、有用であると判定した前記テキスト情報を前記動画コンテンツに対応づけて記録する記録処理ステップと、
前記記録処理ステップで記録した前記テキスト情報に対して検索を実行する検索ステップと、
を有する、
情報処理方法。 A genre setting step that sets the genre of the video content for which text information is generated, and
Based on the genre-specific pattern that is a pattern of text information useful for searching the video content, it is determined whether or not the text information generated from the video content is useful for searching the genre set in the genre setting step. The recording processing step of recording the text information determined to be useful in association with the moving image content, and
A search step for executing a search for the text information recorded in the recording processing step, and
Have,
Information processing method.

コンピュータを、
テキスト情報を生成する対象である動画コンテンツのジャンルを設定するジャンル設定部、
前記動画コンテンツの検索に有用なテキスト情報のパターンであるジャンル別パターンに基づいて、前記動画コンテンツから生成したテキスト情報が、前記ジャンル設定部が設定した前記ジャンルの検索用として有用であると判定した場合に、前記テキスト情報を前記動画コンテンツに対応づけて記録する記録処理部、
前記記録処理部が記録した前記テキスト情報に対して検索を実行する検索部、
として機能させるプログラム。 Computer,
Genre setting unit that sets the genre of the video content that is the target of generating text information,
Based on the genre-specific pattern, which is a pattern of text information useful for searching the video content, it is determined that the text information generated from the video content is useful for searching the genre set by the genre setting unit. In this case, a recording processing unit that records the text information in association with the video content,
A search unit that executes a search for the text information recorded by the recording processing unit,
A program that functions as.