JP2009218874A

JP2009218874A - Recording/reproducing device

Info

Publication number: JP2009218874A
Application number: JP2008060784A
Authority: JP
Inventors: Satoshi Wakabayashi; 聡若林
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2008-03-11
Filing date: 2008-03-11
Publication date: 2009-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a recording/reproducing device which can suppress an increase of throughput and can prevent a thumbnail image not to be seen from being displayed as an original image. <P>SOLUTION: When a thumbnail image generating means 4 generates and stores a thumbnail image corresponding to each position per preset reproduction start position, a voice detection means 6 analyzes voice information stored in a predetermined section corresponding to the reproduction start position corresponding to the thumbnail image, and determines whether a voice of a predetermined pattern previously designated exists in the information. A thumbnail image processing means 7 processes a thumbnail image so that a clarity degree of at least one part of an area is reduced with respect to the thumbnail image corresponding to the voice information in which the presence of the voice of the predetermined pattern is determined. A thumbnail image display means 5 makes a display means 8 display the thumbnail image. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、映像情報及び音声情報の記録、再生が可能な記録再生装置に係り、特に、記録された映像情報及び音声情報を再生する際の再生開始位置に関連したサムネイル画像を生成して表示する機能を有する記録再生装置に関する。 The present invention relates to a recording / reproducing apparatus capable of recording and reproducing video information and audio information, and in particular, generates and displays a thumbnail image related to a reproduction start position when reproducing recorded video information and audio information. The present invention relates to a recording / reproducing apparatus having a function to

周知のように、放送局から放送される番組コンテンツなどの映像情報や音声情報（以下、コンテンツと呼ぶ）を記録再生するＤＶＤ（Digital Versatile Disc）レコーダやＨＤＤ（Hard Disc Drive）レコーダなどの記録再生装置が存在する。 As is well known, recording / playback of a DVD (Digital Versatile Disc) recorder or HDD (Hard Disc Drive) recorder that records and plays back video information and audio information (hereinafter referred to as content) such as program content broadcast from a broadcasting station. The device exists.

上記記録再生装置では、記録されたコンテンツに対して、多数の再生開始位置を定義することができる。この再生開始位置は通常ブックマークとかチャプターあるいは単にマークという名前で呼ばれる。また、その再生開始位置の情報（マーク情報）は記録再生装置内で保持されて利用される場合や、記録媒体にその情報を記録しておき読み出して利用される場合がある。 In the recording / reproducing apparatus, a number of reproduction start positions can be defined for the recorded content. This playback start position is usually called a bookmark, a chapter or simply a mark. In addition, the reproduction start position information (mark information) may be held and used in a recording / reproducing apparatus, or may be used by recording the information on a recording medium and reading it out.

上述したマークは、タイトル作成者やコンテンツを再生するユーザが任意に設定することができる。また、記録再生装置が音声情報の切り替わりや映像情報の切り替わりなどを検出して自動的にマークを設定することができる場合もある。そして、この設定されたマークに対応する位置に記録されている動画像をフレーム画像として取り込み、必要に応じて縮小し、画面上にサムネイル画像で一覧形式（サムネイル画像一覧）で表示する。 The above-described mark can be arbitrarily set by the title creator or the user who reproduces the content. In some cases, the recording / playback apparatus can automatically set a mark by detecting a change in audio information or a change in video information. Then, the moving image recorded at the position corresponding to the set mark is captured as a frame image, reduced as necessary, and displayed as a thumbnail image list (thumbnail image list) on the screen.

また、マーク位置の動画像を取り込まず、サムネイル画像用の別の静止画像を用意し、その静止画像を画面に表示する場合もある。そしてユーザはそのサムネイル画像一覧に表示されているサムネイル画像を見ることで、そのマークに対応する位置に記憶されているコンテンツの内容がおおよそ分かり、その位置から再生を開始することができる。 In some cases, the moving image at the mark position is not captured, but another still image for the thumbnail image is prepared and the still image is displayed on the screen. Then, by viewing the thumbnail images displayed in the thumbnail image list, the user can roughly understand the contents of the content stored at the position corresponding to the mark, and can start reproduction from that position.

サムネイル画像を生成して表示する従来例として、サムネイル画像の基になる映像情報を特定する特定情報をユーザに指定させ、その特定情報を記憶する一方、記録媒体に対するコンテンツの記録が終了した後、記録媒体に記録されたコンテンツから特定情報が示す映像情報を取得し、得られた映像情報からサムネイル画像を生成して、記憶するものがある（例えば、下記の特許文献１参照）。 As a conventional example of generating and displaying a thumbnail image, the user is allowed to specify specific information for specifying video information on which the thumbnail image is based, and the specific information is stored, while the recording of the content on the recording medium is finished, There is one that acquires video information indicated by specific information from content recorded on a recording medium, generates a thumbnail image from the obtained video information, and stores the thumbnail image (for example, see Patent Document 1 below).

また、映像中に登場する人物の顔を区別して表示するべく、映像から顔を検出してその顔を識別する一方、映像中から顔が写っているフレームを検出し、そのフレームから顔画像を抽出し、抽出したすべての顔画像から同一登場人物の顔をグループ化し、登場人物別にその代表顔画像を抽出するようにし、映像中の登場人物を識別するものがある（下記の特許文献２参照）。
特開２００３−３２５８１号公報（要約）特開２００１−１６７１１０号公報（要約） In addition, in order to distinguish and display the face of a person appearing in the video, the face is detected from the video to identify the face, while the frame in which the face is reflected is detected from the video, and the face image is detected from the frame. There is a technique for identifying the characters in the video by extracting and grouping the faces of the same character from all the extracted face images, extracting the representative face image for each character (see Patent Document 2 below) ).
JP 2003-32581 A (summary) JP 2001-167110 A (summary)

しかしながら、特許文献１に記載されているように、サムネイル画像の基になる映像情報を特定する特定情報をユーザに指定させ、記録媒体に記録されたコンテンツから特定情報が示す映像情報を取得し、得られた映像情報から生成したサムネイル画像には、自分が見ても問題はないが、他人や子供などには見せたくない画像が含まれている場合がある。その一つの例として、個人の顔など人物に関する画像が挙げられる。 However, as described in Patent Document 1, the user specifies specific information for specifying video information that is the basis of the thumbnail image, and acquires video information indicated by the specific information from the content recorded on the recording medium. The thumbnail image generated from the obtained video information may include an image that is not problematic for the user but is not desired to be seen by others or children. One example is an image related to a person such as an individual's face.

そこで、人物画像などを見せたくない場合、例えば、特許文献２に記載の技術を用いれば、サムネイル画像に人物などがあるかどうかを解析し、人物があった場合にはサムネイル画像にモザイク処理などを施して表示することが可能である。しかし、画像内の人物を検出する処理はサムネイル画像程度の大きさの画面であっても、その処理量が大きくなってしまうという課題がある。 Therefore, when it is not desired to show a person image or the like, for example, if the technique described in Patent Document 2 is used, it is analyzed whether or not there is a person or the like in the thumbnail image. Can be displayed. However, the processing for detecting a person in an image has a problem that the processing amount is large even if the screen is about the size of a thumbnail image.

本発明は、上記の課題をかんがみてなされたもので、その目的は、処理量の増大を抑制しつつ、見られたくないサムネイル画像を原画像のまま表示してしまうことを防止することができる記録再生装置を提供することにある。 The present invention has been made in view of the above-described problems, and its object is to prevent the display of thumbnail images that are not desired to be viewed as the original images while suppressing an increase in the processing amount. It is to provide a recording / reproducing apparatus.

上記の目的を達成するために、本発明は、映像情報と、この映像情報に関連する音声情報とを対応付けて記録媒体に記録する映像音声記録手段と、
前記記録媒体に記録された前記映像情報及び前記音声情報を再生する映像音声再生手段と、
前記映像音声再生手段によって再生された前記映像情報を表示するための表示手段と、
前記映像音声再生手段による前記映像情報及び前記音声情報の再生開始位置を設定する再生開始位置設定手段と、
前記再生開始位置設定手段によって設定された前記再生開始位置ごとに、それぞれの位置に対応するサムネイル画像を生成し、生成された前記サムネイル画像を記憶するサムネイル画像生成手段と、
前記サムネイル画像生成手段によって生成された前記サムネイル画像に対応した前記再生開始位置に対応した所定区間に記録されている音声情報を取得し、取得された前記音声情報を解析し、その中にあらかじめ指定された所定パターンの音声が有るか否かを判定する音声検出手段と、
前記音声検出手段によって前記所定パターンの音声が有ると判定された前記音声情報に対応する前記サムネイル画像の少なくとも一部の領域の明瞭度を低下させるように、前記サムネイル画像生成手段に記憶された前記サムネイル画像を加工して前記サムネイル画像生成手段に記憶し直すサムネイル画像加工手段と、
前記サムネイル画像生成手段に記憶された前記サムネイル画像を前記表示手段に表示させるサムネイル画像表示手段とを、
備えた記録再生装置である。 In order to achieve the above object, the present invention relates to video and audio recording means for recording video information and audio information related to the video information in association with each other on a recording medium,
Video / audio reproduction means for reproducing the video information and the audio information recorded on the recording medium;
Display means for displaying the video information reproduced by the video / audio reproduction means;
Reproduction start position setting means for setting a reproduction start position of the video information and the audio information by the video / audio reproduction means;
Thumbnail image generation means for generating a thumbnail image corresponding to each position for each of the playback start positions set by the playback start position setting means, and storing the generated thumbnail images;
Acquire audio information recorded in a predetermined section corresponding to the playback start position corresponding to the thumbnail image generated by the thumbnail image generation means, analyze the acquired audio information, and specify in advance in the audio information Voice detection means for determining whether or not there is a predetermined pattern of voice;
The thumbnail image generation means stored in the thumbnail image generation means so as to reduce the clarity of at least a part of the thumbnail image corresponding to the audio information determined by the sound detection means to have the sound of the predetermined pattern. Thumbnail image processing means for processing thumbnail images and re-storing them in the thumbnail image generating means;
Thumbnail image display means for causing the display means to display the thumbnail image stored in the thumbnail image generation means;
The recording / reproducing apparatus provided.

本発明によれば、処理量の増大を抑制しつつ、見られたくないサムネイル画像を原画像のまま表示してしまうことを防止することができる。 According to the present invention, it is possible to prevent a thumbnail image that is not desired to be viewed from being displayed as an original image while suppressing an increase in processing amount.

以下、本発明を図面に示す好適な実施の形態に基づいて詳細に説明する。図１は本発明に係る記録再生装置の一実施の形態の構成を示す機能ブロック図である。本実施の形態は、例えば、ＤＶＤやＨＤなどの記録媒体１０に対して、映像音声記録手段１が映像情報及び音声情報を記録する一方、映像音声再生手段２が記録媒体１０に記録された映像情報及び音声情報を再生する。そして再生された映像情報が表示手段８に表示される。映像音声再生手段２による映像情報の再生に際して、この映像情報を含むコンテンツのタイトルが表示される場合もある。再生開始位置設定手段３は、映像情報に対するマークを設定する手段である。具体的には記録媒体１０に記録されたコンテンツに対して、ユーザによって入力されるマーク情報に基づいてマークを設定したり、装置自身が音声情報の切り替わりや映像情報の切り替わりなどを検出して自動的にマークを設定したりするものである。再生開始位置設定手段３で設定されたマークに対応付けてサムネイル画像生成手段４がサムネイル画像を生成する。 Hereinafter, the present invention will be described in detail based on preferred embodiments shown in the drawings. FIG. 1 is a functional block diagram showing a configuration of an embodiment of a recording / reproducing apparatus according to the present invention. In the present embodiment, for example, video / audio recording means 1 records video information and audio information on a recording medium 10 such as a DVD or HD, while video / audio reproduction means 2 records video recorded on the recording medium 10. Play information and audio information. The reproduced video information is displayed on the display means 8. When the video information is reproduced by the video / audio reproduction means 2, the title of the content including the video information may be displayed. The reproduction start position setting means 3 is a means for setting a mark for video information. Specifically, the content recorded on the recording medium 10 is automatically set by setting a mark based on the mark information input by the user, or the apparatus itself detects a change in audio information or a change in video information. The mark is set automatically. The thumbnail image generation unit 4 generates a thumbnail image in association with the mark set by the reproduction start position setting unit 3.

本実施の形態は、さらに、サムネイル画像生成手段４によって生成されたサムネイル画像の位置に対応する音声情報を取得し、この取得した音声情報を解析し、その中にあらかじめ指定された所定パターンの音声が有るか否かを判定する音声検出手段６と、この音声検出手段６によって所定パターンの音声が有ると判定された音声情報に対応するサムネイル画像の少なくとも一部の領域の明瞭度を低下させるように、サムネイル画像生成手段４に記憶されたサムネイル画像を加工するサムネイル画像加工手段７と、サムネイル画像生成手段４に記憶された加工後のサムネイル画像を表示手段８に一覧表で表示させるサムネイル画像表示手段５とを備えている。このように、所定パターンの音声が有る音声情報に対応するサムネイル画像の明瞭度を低下させることによって、処理量の増大を抑制しつつ、見られたくないサムネイル画像を原画像のまま表示してしまうことを防止することができる。 In the present embodiment, the sound information corresponding to the position of the thumbnail image generated by the thumbnail image generating means 4 is acquired, the acquired sound information is analyzed, and a predetermined pattern of sound specified in advance in the sound information is analyzed. The sound detection means 6 for determining whether or not there is sound, and the clarity of at least a part of the thumbnail image corresponding to the sound information determined to have sound of a predetermined pattern by the sound detection means 6. The thumbnail image processing means 7 for processing the thumbnail images stored in the thumbnail image generation means 4 and the thumbnail image display for causing the display means 8 to display the processed thumbnail images stored in the thumbnail image generation means 4 as a list. Means 5 are provided. In this way, by reducing the clarity of thumbnail images corresponding to audio information having a predetermined pattern of audio, an increase in the processing amount is suppressed, and thumbnail images that are not desired to be displayed are displayed as the original images. This can be prevented.

図２は図１に示した記録再生装置の具体的な構成を示すブロック図である。図２において、記録媒体１０はビデオファイルや静止画データファイルを記録再生することができるディスクであり、この記録媒体１０を回転駆動して情報の読み書きを実行する駆動装置１１と、情報の読み書きをするためのデータを貯めておくバッファ１２とを備えている。そして、録画側の構成要素として、テレビチューナのような入力手段１６を介して取り込まれた音声、映像データを符号化してバッファ１２に供給するエンコーダ部１３を備えている。また、再生側の構成要素として、バッファ１２から画像データを受け取りそれをデコードする画像デコーダ部１４と、この画像デコーダ部１４でデコードし、キャプチャーして得られた静止画像データを格納するほか、サムネイル画像用データが記録媒体１０に用意されている場合は、そのサムネイル画像用データを取り出して格納する静止画データ部１５と、バッファ１２から音声データを受け取りそれをデコードする音声デコーダ部２２とを備えている。 FIG. 2 is a block diagram showing a specific configuration of the recording / reproducing apparatus shown in FIG. In FIG. 2, a recording medium 10 is a disk capable of recording and reproducing a video file and a still image data file, and a drive device 11 that rotates and drives the recording medium 10 to read / write information and read / write information. And a buffer 12 for storing data to be stored. As a recording side component, an encoder unit 13 is provided that encodes audio and video data captured via an input unit 16 such as a TV tuner and supplies the encoded audio and video data to the buffer 12. In addition, as a component on the reproduction side, an image decoder unit 14 that receives image data from the buffer 12 and decodes it, and stores still image data obtained by decoding and capturing by the image decoder unit 14, as well as thumbnails When image data is prepared in the recording medium 10, a still image data unit 15 that extracts and stores the thumbnail image data and an audio decoder unit 22 that receives audio data from the buffer 12 and decodes the data are provided. ing.

また、サムネイル画像の位置に対応する音声情報を音声デコーダ部２２から取得し、その音声情報の中にユーザによってあらかじめ指定された所定パターンの音声が有るか否かを判定する音声検出部１７と、画像デコーダ部１４でデコードして得られた画像データや静止画データ部１５に格納された画像データを表示する表示デバイス１８と、表示デバイス１８に表示される映像によってユーザにマーク位置を選択させるための、例えば、マウス、リモコンなどで構成される選択手段１９と、エンコーダ部１３、画像デコーダ部１４、静止画データ部１５、音声検出部１７、表示デバイス１８、音声デコーダ部２２の動作を制御するＣＰＵなどで構成される制御手段２０と、人物検出をするかどうかを示す情報などの制御情報が設定されているメモリ２１とを備えている。 In addition, the audio detection unit 17 that acquires audio information corresponding to the position of the thumbnail image from the audio decoder unit 22 and determines whether the audio information includes a predetermined pattern of audio specified in advance by the user; A display device 18 that displays image data obtained by decoding by the image decoder unit 14 or image data stored in the still image data unit 15, and a method for causing a user to select a mark position according to a video displayed on the display device 18. The operation of the selection means 19 composed of, for example, a mouse, a remote controller, etc., and the operation of the encoder unit 13, the image decoder unit 14, the still image data unit 15, the audio detection unit 17, the display device 18, and the audio decoder unit 22 are controlled. Control means 20 composed of a CPU or the like and control information such as information indicating whether or not to detect a person are set. And a memory 21.

上記のように構成された記録再生装置の動作について、サムネイル画像の表示処理について図３〜５のフローチャートに従って説明する。図３はＣＰＵなどで構成される制御手段２０を中心にして行われるサムネイル画像の表示処理を示している。ここでは、記録媒体１０に記録された複数のコンテンツのタイトルが表示デバイス１８に表示されているものとする。そこで、まず、ステップＳ１０１で、ユーザが表示デバイス１８に表示されたタイトルを見て、選択手段１９を用いてサムネイル画像を生成すべきフレームを指定する。次に、ステップＳ１０２で、ユーザによって指定されたフレームを解釈し、このフレームに対応するコンテンツを読み出すべき記録媒体１０の場所を決定する。次に、ステップＳ１０３で、駆動装置１１を介して、記録媒体１０に記録されている符号化された動画像からステップＳ１０２で決定された場所に記録されているMPEG-2動画像データを取得する。次に、ステップＳ１０４で、ステップＳ１０３で取得された動画像データを画像デコーダ部１４でデコードし、キャプチャーして、ステップＳ１０１で指定されたフレームに対するサムネイル画像用データを静止画データ部１５に格納する。また、ステップＳ１０１で指定されたフレームに対応する別のサムネイル画像用データが、記録媒体１０に用意されている場合には、指定されたフレームに記録されているＭＰＥＧ―２動画像を読まずに、サムネイル画像用データを読み取って、静止画データ部１５に格納する。 Regarding the operation of the recording / reproducing apparatus configured as described above, the thumbnail image display processing will be described with reference to the flowcharts of FIGS. FIG. 3 shows a thumbnail image display process performed mainly by the control means 20 constituted by a CPU or the like. Here, it is assumed that titles of a plurality of contents recorded on the recording medium 10 are displayed on the display device 18. Therefore, first, in step S101, the user looks at the title displayed on the display device 18, and uses the selection means 19 to designate a frame for generating a thumbnail image. Next, in step S102, the frame specified by the user is interpreted, and the location of the recording medium 10 from which the content corresponding to this frame is to be read is determined. Next, in step S103, the MPEG-2 moving image data recorded at the location determined in step S102 is acquired from the encoded moving image recorded in the recording medium 10 via the driving device 11. . Next, in step S104, the moving image data acquired in step S103 is decoded and captured by the image decoder unit 14, and the thumbnail image data for the frame specified in step S101 is stored in the still image data unit 15. . If another thumbnail image data corresponding to the frame specified in step S101 is prepared in the recording medium 10, the MPEG-2 moving image recorded in the specified frame is not read. The thumbnail image data is read and stored in the still image data section 15.

次に、ステップＳ１０５では、ステップＳ１０４で静止画データ部１５に格納されたサムネイル画像用データが、動画像データとは別に用意されていたものか、否かを判断する。ステップＳ１０５で、サムネイル画像用データが動画像データとは別に用意されていたものであると判断した場合には、ステップＳ１０６、Ｓ１０７の処理を飛ばしてステップＳ１０８の処理へ進む。一方、ステップＳ１０５で、動画像データとは別に用意されていたものではないと判断した場合、すなわち、サムネイル画像用データが動画像データをデコードして取り出されたものであると判断した場合には、ステップＳ１０６でサムネイル画像の全画像を表示してもよいモードであるか、少なくとも一部の領域の明瞭度を低下させてから表示するモード（サムネイル非表示モード）であるかを、メモリ２１から情報を取得して判断する。なお、メモリ２１に対するこの情報の設定は下記（１）、（２）のいずれかの方法によって行う。 Next, in step S105, it is determined whether the thumbnail image data stored in the still image data unit 15 in step S104 is prepared separately from the moving image data. If it is determined in step S105 that the thumbnail image data is prepared separately from the moving image data, the processes in steps S106 and S107 are skipped and the process proceeds to step S108. On the other hand, if it is determined in step S105 that the thumbnail image data is not prepared separately from the moving image data, that is, if it is determined that the thumbnail image data is obtained by decoding the moving image data. From the memory 21, it is determined whether it is a mode in which all thumbnail images can be displayed in step S106 or a mode in which at least a part of the area is displayed with reduced clarity (thumbnail non-display mode). Get information and make decisions. This information is set in the memory 21 by any one of the following methods (1) and (2).

（１）あらかじめユーザが選択手段１９により、サムネイル画像の全画像を表示可能にするモードであるか、サムネイル画像の所定の領域の明瞭度を低下させるモード（サムネイル非表示モード）であるかを設定し、その情報をメモリ２１に保存する。
（２）再生すべき記録媒体内やコンテンツ内に記録されている視聴制限情報（視聴できる番組やコンテンツを制限する情報）をメモリ２１に保存する。 (1) The user sets in advance whether the mode enables the display of all thumbnail images by the selection means 19 or the mode for reducing the clarity of a predetermined area of the thumbnail image (thumbnail non-display mode). Then, the information is stored in the memory 21.
(2) The viewing restriction information (information for restricting programs and contents that can be viewed) recorded in the recording medium to be reproduced or in the contents is stored in the memory 21.

上述したようにステップＳ１０６では、メモリ２１に保存されている情報を取得し、サムネイル非表示モードか否かを判断する。ここで、サムネイル非表示モードになっていると判断した場合には、ステップＳ１０７で音声検出部１７を用いてサムネイル画像の人物検出及び加工処理を行う。このステップＳ１０７に関する処理内容は後述する。ステップＳ１０６でサムネイル非表示モードでないと判断した場合にはステップＳ１０８の処理に進む。ステップＳ１０８では、サムネイル画像に加工処理を施した画像又はそのままのサムネイル画像を表示デバイス１８に表示する。
以上がサムネイル画像の表示処理の大まかな流れとなる。 As described above, in step S106, information stored in the memory 21 is acquired, and it is determined whether or not the thumbnail non-display mode is set. If it is determined that the thumbnail non-display mode is selected, the person detection and processing of the thumbnail image are performed using the audio detection unit 17 in step S107. The details of the processing related to step S107 will be described later. If it is determined in step S106 that the mode is not the thumbnail non-display mode, the process proceeds to step S108. In step S108, an image obtained by processing the thumbnail image or the thumbnail image as it is is displayed on the display device 18.
The above is a rough flow of the thumbnail image display processing.

次に、ステップＳ１０７で行うサムネイル画像の人物検出及び加工処理について、図４のフローチャートに従って以下に説明する。まず、ステップＳ２０１で、駆動装置１１を介して、サムネイル画像が取得された場所付近の所定区間の音声データを数秒間取得する。ステップＳ２０２では、音声デコーダ部２２を介して、ステップＳ２０１で取得された音声データをデコードする。次に、ステップＳ２０３で、音声検出部１７を介して、ステップＳ２０２で取得された音声データ内にあらかじめ指定された所定パターンの音声として、人間の声が有るか否かを判定する。ステップＳ２０３で人間の声が有ると判定した場合には、ステップＳ２０４でサムネイル画像領域(x,y)-(x1,y2)（矩形領域）に画像処理を施す。その後、ステップＳ２０５の処理へ移り、画像処理された画像を元画像に合成し、画像処理されたサムネイル画像を生成する。
以上がサムネイル画像の人物検出及び加工処理の流れとなる。 Next, the person detection and processing of thumbnail images performed in step S107 will be described with reference to the flowchart of FIG. First, in step S201, audio data of a predetermined section near the place where the thumbnail image is acquired is acquired via the driving device 11 for several seconds. In step S202, the audio data acquired in step S201 is decoded via the audio decoder unit 22. Next, in step S203, it is determined whether or not there is a human voice as a predetermined pattern of voice specified in advance in the voice data acquired in step S202 via the voice detection unit 17. If it is determined in step S203 that there is a human voice, image processing is performed on the thumbnail image area (x, y)-(x1, y2) (rectangular area) in step S204. Thereafter, the process proceeds to step S205, where the image-processed image is combined with the original image, and a thumbnail image subjected to image processing is generated.
The above is the flow of person detection and processing of thumbnail images.

ここで、サムネイル画像が取得された場所に対応する音声データの取得方法の一例を以下に説明する。コンテンツがＭＰＥＧ２トランスポートストリームの場合、図６に示すように、コンテンツの画像データや音声データ、その他のデータは188バイトからなる複数のトランスポートパケットに格納される。具体的には、各トランスポートパケットはヘッダ情報とペイロードからなり、ペイロード部分には１フレームの画像データや音声データが制御情報とともに分割されて格納される。 Here, an example of a method for acquiring audio data corresponding to the place where the thumbnail image is acquired will be described below. When the content is an MPEG2 transport stream, as shown in FIG. 6, the image data, audio data, and other data of the content are stored in a plurality of transport packets composed of 188 bytes. Specifically, each transport packet includes header information and a payload, and one frame of image data or audio data is divided and stored together with control information in the payload portion.

ペイロード部分をいくつか組み合わせて構成されるデータは、画像ストリーム又は音声ストリームを適当な大きさに分割してパケット化したＰＥＳ(Packetized Elementary Stream)パケットデータと、再生時刻などの情報を含むＰＥＳパケットヘッダからなっている。ＰＥＳパケットヘッダ内の情報の一つとして、ＰＥＳパケットデータに含まれている画像データや音声データを表示する時刻（ＰＴＳ:Presentation Time Stamp）がある。 The data composed of a combination of several payload parts is a PES packet header including information such as PES (Packetized Elementary Stream) packet data obtained by dividing an image stream or audio stream into an appropriate size and packetized, and reproduction time It is made up of. One of the information in the PES packet header is a time (PTS: Presentation Time Stamp) at which image data and audio data included in the PES packet data are displayed.

サムネイル画像に対応する音声データを探すためには、図７に示すように、サムネイル画像としてユーザが指示あるいは機器が自動的に選択した画像データＶ（Video Data）のＰＴＳを取得する。そして画像データＶ以降に配置されている音声データＡ（Audio Data）を検索して、同じＰＴＳをもつ音声データＡを選択する。なお、一般に同じＰＴＳを持つ画像データと音声のデータは、画像データと音声データの単位時間当たりの大きさの関係で、音声データＡの方が画像データＶよりも後ろに配置されるため、指定した場所より後ろ側だけ検索すればよい。なお、図中、ＳＰはサブピクチャーデータ（Subpictute Data）である。 In order to search for audio data corresponding to a thumbnail image, as shown in FIG. 7, a PTS of image data V (Video Data) selected by a user or automatically selected by a device is acquired as a thumbnail image. Then, the audio data A (Audio Data) arranged after the image data V is searched, and the audio data A having the same PTS is selected. In general, image data and audio data having the same PTS are specified because the audio data A is arranged after the image data V because of the relationship between the size of the image data and the audio data per unit time. You only need to search behind the place you did. In the figure, SP is sub-picture data.

ここで、ステップＳ２０３で行う音声データ内に人間の声があるか否かの判断において、人間の声の検出方法について説明する。人間の声の検出は、例えば、次の（１）又は（２）に挙げるような方法を用いて行う。
（１）音声のサンプリング周波数を調べ、人間の声の周波数(約100〜3500Hz)の間のレベルがある程度の数値以上ある。周波数判定を行う前に、音声帯域の補正やダイナミックレンジ圧縮、ノイズキャンセラーなどを付け加えてから判定する場合もある。
（２）さらに、デコードした音声を特徴量データへ変換し、すでに記録されたデータとの特徴と照らし合わせながら、音声認識をして、音声に人間の言語が含まれているかどうかを認識する。この場合、言葉まで認識する必要はなく、言語であるかどうかを認識すればよい。 Here, a method for detecting a human voice in determining whether there is a human voice in the audio data performed in step S203 will be described. The human voice is detected using, for example, the following method (1) or (2).
(1) The voice sampling frequency is checked, and the level between human voice frequencies (about 100-3500 Hz) is above a certain value. Before performing frequency determination, there are cases where determination is made after adding audio band correction, dynamic range compression, noise canceller, or the like.
(2) Further, the decoded speech is converted into feature amount data, and speech recognition is performed while comparing the feature with the already recorded data to recognize whether the speech contains a human language. In this case, it is not necessary to recognize even a word, and it is sufficient to recognize whether it is a language.

次に、ステップＳ２０４で行う画像加工処理について以下に説明する。ステップＳ２０４で行う画像加工処理は、画像は詳細には分からないが、ぼんやりとはどのような画像かが分かる画像加工処理、すなわち、明瞭度を低下させる画像加工処理を用いることによって、その画像からどういうイメージかが推測できるものを利用する。一例として、
・ぼかし処理
・モザイク処理
などの画像処理方法により画像加工を行う。 Next, the image processing process performed in step S204 will be described below. The image processing performed in step S204 does not know the image in detail, but the image processing that can be understood what the image is blurry, that is, by using the image processing that reduces the clarity, from the image Use what you can guess what the image is. As an example,
・ Image processing is performed by image processing methods such as blurring and mosaic processing.

ここで、「ぼかし処理」とは、ある点のピクセルに関して、そのピクセルの色と周囲４ピクセル（上下左右）の色との演算を行い、ある点のピクセルの色とするという処理である。
例えば、処理対象ピクセルの色と、周囲のピクセルの色とを混ぜ合わせる比率を２：１とすれば、処理対象となるピクセル（ｘ、ｙ）の色は以下のようになる。
・周囲の色平均＝（（ｘ、ｙ＋１）の色＋（ｘ＋１、ｙ）の色＋（ｘ、ｙ−１）の色＋（ｘ−１、ｙ）の色）／４
・（ｘ、ｙ）の色成分＝（（ｘ、ｙ）の色成分×２＋周囲の色成分平均）／３
以上の演算をＲ、Ｇ、Ｂそれぞれの色に対して行う。 Here, the “blurring process” is a process of calculating the color of a pixel at a certain point and the color of the surrounding four pixels (up, down, left and right) to obtain the color of the pixel at a certain point.
For example, assuming that the ratio of the color of the pixel to be processed and the color of surrounding pixels is 2: 1, the color of the pixel (x, y) to be processed is as follows.
Ambient color average = ((x, y + 1) color + (x + 1, y) color + (x, y−1) color + (x−1, y) color) / 4
Color component of (x, y) = (color component of (x, y) × 2 + average of surrounding color components) / 3
The above calculation is performed for each of R, G, and B colors.

なお、上記の例では周囲４ピクセルを用いて２：１で混ぜ合わせたが、必ずしも４ピクセルを使うだけでなく、あるピクセルに対して上下左右２ピクセルずつ及び斜め１ピクセルを用いてもよい。 In the above example, the surrounding 4 pixels are used and mixed at a ratio of 2: 1. However, not only 4 pixels are necessarily used, but one pixel may be used in the upper, lower, left, and right directions for each pixel.

また「モザイク処理」とは、ある一定領域をその領域の色素の平均色で置き換える手法である。例えば、４ピクセル×４ピクセルごとにその領域内の色素の平均値をもとめ、その４ピクセル×４ピクセルの領域を平均値の色に置き換える。これもかならずしも４ピクセル×４ピクセルではなく違うサイズの大きさでも可能である。 “Mosaic processing” is a method of replacing a certain area with the average color of the pigment in the area. For example, the average value of the pigment in the region is obtained every 4 pixels × 4 pixels, and the region of 4 pixels × 4 pixels is replaced with the average color. This is not necessarily 4 pixels × 4 pixels, but can be a different size.

次に、「ぼかし処理」や「モザイク処理」を行う領域（以下、ぼかし領域という）を決定する処理を以下に述べる。ぼかし領域としては以下の（１）、（２）の２種類の領域を選択することができる。
（１）サムネイル画像全体に対して行う。
（２）サムネイル画像の一部分だけに対して行う。
一部分だけの場合、一部分の場所の決定方法はいろいろあるが、今回は処理量を抑制するという観点から、一例として肌色部分を抽出し、そこをぼかす領域とする方法を図５のフローチャートに従って説明する。 Next, a process for determining an area (hereinafter referred to as a blur area) for performing “blurring process” or “mosaic process” will be described below. The following two types of areas (1) and (2) can be selected as the blur area.
(1) Perform on the entire thumbnail image.
(2) Perform only on a part of the thumbnail image.
In the case of only a part, there are various methods for determining the location of the part, but this time, from the viewpoint of suppressing the processing amount, as an example, a method of extracting a skin color part and making it an area to be blurred will be described with reference to the flowchart of FIG. .

まず、ステップＳ３０１でサムネイル画像領域全体をぼかすのか、それともある一部分だけの領域をぼかすのかを判断する。サムネイル画像全体をぼかすと判断した場合は、ステップＳ３０６でサムネイル画像全体の領域をぼかし処理対領域と決定する。一部をぼかすと判断した場合は、ステップＳ３０２で、サムネイル画像の色表記データがRGBの場合は各ピクセルをＹＵＶのデータに変換する。この変換式は、以下の変換式で行う。
Y=(77*R/256)+(150*G/256)+(29*B/256)
U(Cb)=-(44*R/256)-(87*G/256)+(131*B/256)+128
V(Cr)=(131*R/256)-(110*G/256)-(21*B/256)+128
(0未満の値の場合は0,256以上の値の場合は255とする) First, in step S301, it is determined whether to blur the entire thumbnail image area or only a part of the area. If it is determined that the entire thumbnail image is to be blurred, the region of the entire thumbnail image is determined to be the blurring processing pair region in step S306. If it is determined that a part is to be blurred, in step S302, if the color notation data of the thumbnail image is RGB, each pixel is converted to YUV data. This conversion formula is performed by the following conversion formula.
Y = (77 * R / 256) + (150 * G / 256) + (29 * B / 256)
U (Cb) =-(44 * R / 256)-(87 * G / 256) + (131 * B / 256) +128
V (Cr) = (131 * R / 256)-(110 * G / 256)-(21 * B / 256) +128
(If the value is less than 0, 0, and if it is more than 256, 255.)

次に、ノイズとして入ってしまった肌色を削除するため、ステップＳ３０３でサムネイル画像のノイズ検出及び除去処理を行う。ノイズ検出及び除去処理は、画像処理でよく知られる手法を用いる。今回は、一例としてサムネイル画像のエッジを抽出し、エッジとして抽出された各ピクセルに対して、その各対象ピクセルと、その上下及び左右のピクセルに対して、上下及び左右どちらに色のつながりがあるかを調べ、よりつながりがある方向に対して、対象ピクセルと平均化してノイズを除去する方法である。もちろんこれ以外の手法を採用することも可能である。次に、ステップＳ３０４で、各ピクセルのYUVピクセルデータが肌色と判定されている閾値内に入っているか否かを調べる。肌色の閾値は上記YUVのデータに対して、
64≦Y≦240, 94≦U≦125、131≦V≦254
とする。なおこの閾値はサムネイル画像の画像条件などにより変更することも可能である。 Next, in order to delete the flesh color that has entered as noise, noise detection and removal processing of the thumbnail image is performed in step S303. For noise detection and removal processing, a technique well known in image processing is used. This time, as an example, the edge of the thumbnail image is extracted, and for each pixel extracted as an edge, there is a color connection either up, down, left, or right with respect to each target pixel and its top, bottom, left, and right pixels This is a method of removing noise by averaging with the target pixel in a direction where there is more connection. Of course, other methods can be employed. Next, in step S304, it is checked whether or not the YUV pixel data of each pixel is within the threshold value determined as the skin color. The skin color threshold is the above YUV data.
64 ≦ Y ≦ 240, 94 ≦ U ≦ 125, 131 ≦ V ≦ 254
And This threshold value can be changed according to the image condition of the thumbnail image.

次に、ステップＳ３０５で、上記ステップＳ３０２からＳ３０４において処理され、肌色として判定された領域を(x,y)-(x1,y2)と設定する。なお、この領域は複数あることもある。最後に、ステップＳ３０５又はステップＳ３０６において設定された領域(x,y)-(x1,y2)に対して、ステップＳ３０７でぼかし処理を行う。 Next, in step S305, the region processed in steps S302 to S304 and determined as the skin color is set as (x, y)-(x1, y2). There may be a plurality of such areas. Finally, the blurring process is performed in step S307 on the region (x, y)-(x1, y2) set in step S305 or step S306.

以上の説明によって明らかなように、本実施の形態によれば、サムネイル画像を表示する際に、処理量の増大を抑制しつつ、見られたくないサムネイル画像を原画像のまま表示してしまうことを防止することができる。この結果、他人や子供に個人情報に関する画像やアダルト画像などの情報を隠蔽し、安心してサムネイル画像一覧を見せることができる。 As is apparent from the above description, according to the present embodiment, when displaying a thumbnail image, while suppressing an increase in processing amount, a thumbnail image that is not desired to be displayed is displayed as the original image. Can be prevented. As a result, it is possible to conceal information such as images related to personal information and adult images from other people and children and to display a thumbnail image list with peace of mind.

なお、上記の実施の形態では人間を対象としたが、人間の表示に限らず、コンテンツオーナーが明瞭度を低下させることを望むサムネイル画像がある場合、そのサムネイル画像を示す所定パターンの音声があるかどうかで判定し、その結果に基づきサムネイル画像を加工して表示するようにすることも可能である。例えば、犬のサムネイル画像を表示したくない場合は、犬の声があるかどうかで判定し、サムネイル画像を原画像のまま表示しないようにするなどである。 In the above embodiment, human beings are targeted. However, not only human display, but there is a thumbnail image that the content owner desires to lower the clarity, there is a predetermined pattern of sound indicating the thumbnail image. It is also possible to determine whether or not the thumbnail image is processed and displayed based on the result. For example, when it is not desired to display a thumbnail image of a dog, it is determined whether there is a dog's voice, and the thumbnail image is not displayed as the original image.

本発明に係る記録再生装置の一実施の形態の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of one Embodiment of the recording / reproducing apparatus which concerns on this invention. 図１に示した記録再生装置の具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of the recording / reproducing apparatus shown in FIG. 図２に示した記録再生装置の動作を説明するために、サムネイル画像の表示処理の大まかな流れを示したフローチャートである。FIG. 3 is a flowchart showing a rough flow of thumbnail image display processing in order to explain the operation of the recording / reproducing apparatus shown in FIG. 2. FIG. 図２に示した記録再生装置の動作を説明するために、サムネイル画像の人物検出及び加工処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing a flow of processing for detecting and processing a person in a thumbnail image in order to explain the operation of the recording / reproducing apparatus shown in FIG. 図２に示した記録再生装置の動作を説明するために、ぼかし領域検出処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing the flow of blur area detection processing in order to explain the operation of the recording / reproducing apparatus shown in FIG. 2. FIG. 図２に示した記録再生装置の動作を説明するために、ＭＰＥＧ２トランスポートストリームの構成を示した図である。FIG. 3 is a diagram showing the structure of an MPEG2 transport stream in order to explain the operation of the recording / reproducing apparatus shown in FIG. 図２に示した記録再生装置の動作を説明するために、画像データと音声データの位置の関係を示した図である。FIG. 3 is a diagram showing the relationship between the positions of image data and audio data in order to explain the operation of the recording / reproducing apparatus shown in FIG. 2.

符号の説明Explanation of symbols

１０記録媒体
１１駆動装置
１２バッファ
１３エンコーダ部
１４画像デコーダ部
１５静止画データ部
１６入力手段
１７音声検出部
１８表示デバイス
１９選択手段
２０制御手段
２１メモリ
２２音声デコーダ部 DESCRIPTION OF SYMBOLS 10 Recording medium 11 Drive apparatus 12 Buffer 13 Encoder part 14 Image decoder part 15 Still image data part 16 Input means 17 Audio | voice detection part 18 Display device 19 Selection means 20 Control means 21 Memory 22 Audio | voice decoder part

Claims

映像情報と、この映像情報に関連する音声情報とを対応付けて記録媒体に記録する映像音声記録手段と、
前記記録媒体に記録された前記映像情報及び前記音声情報を再生する映像音声再生手段と、
前記映像音声再生手段によって再生された前記映像情報を表示するための表示手段と、
前記映像音声再生手段による前記映像情報及び前記音声情報の再生開始位置を設定する再生開始位置設定手段と、
前記再生開始位置設定手段によって設定された前記再生開始位置ごとに、それぞれの位置に対応するサムネイル画像を生成し、生成された前記サムネイル画像を記憶するサムネイル画像生成手段と、
前記サムネイル画像生成手段によって生成された前記サムネイル画像に対応した前記再生開始位置に対応した所定区間に記録されている音声情報を取得し、取得された前記音声情報を解析し、その中にあらかじめ指定された所定パターンの音声が有るか否かを判定する音声検出手段と、
前記音声検出手段によって前記所定パターンの音声が有ると判定された前記音声情報に対応する前記サムネイル画像の少なくとも一部の領域の明瞭度を低下させるように、前記サムネイル画像生成手段に記憶された前記サムネイル画像を加工して前記サムネイル画像生成手段に記憶し直すサムネイル画像加工手段と、
前記サムネイル画像生成手段に記憶された前記サムネイル画像を前記表示手段に表示させるサムネイル画像表示手段とを、
備えた記録再生装置。 Video and audio recording means for recording video information and audio information related to the video information in association with each other on a recording medium;
Video / audio reproduction means for reproducing the video information and the audio information recorded on the recording medium;
Display means for displaying the video information reproduced by the video / audio reproduction means;
Reproduction start position setting means for setting a reproduction start position of the video information and the audio information by the video / audio reproduction means;
Thumbnail image generation means for generating a thumbnail image corresponding to each position for each of the playback start positions set by the playback start position setting means, and storing the generated thumbnail images;
Acquire audio information recorded in a predetermined section corresponding to the reproduction start position corresponding to the thumbnail image generated by the thumbnail image generation means, analyze the acquired audio information, and specify in advance in the audio information Voice detection means for determining whether or not there is a predetermined pattern of voice;
The thumbnail image generation means stored in the thumbnail image generation means so as to reduce the clarity of at least a part of the thumbnail image corresponding to the audio information determined by the sound detection means to have the sound of the predetermined pattern. Thumbnail image processing means for processing thumbnail images and re-storing them in the thumbnail image generating means;
Thumbnail image display means for causing the display means to display the thumbnail image stored in the thumbnail image generation means;
Recording / reproducing apparatus provided.