JP2016119577A

JP2016119577A - Digest generation device, digest generation method, and program

Info

Publication number: JP2016119577A
Application number: JP2014258334A
Authority: JP
Inventors: 伊藤　稔; Minoru Ito; 稔伊藤
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2014-12-22
Filing date: 2014-12-22
Publication date: 2016-06-30

Abstract

PROBLEM TO BE SOLVED: To provide a digest generation device, a digest generation method and a program, in which the highlight scene is detected simply at a high speed to be generated as a digest video.SOLUTION: A digest generation device 1 includes a digest video generation part 110 in which the ratio occupied in a frame image by a hue corresponding to color of the ground surface in a stadium is calculated, it is determined on the basis of the ratio whether the frame image is a zoom shot or a shot other than the zoom shot, and the frame image determined as the zoom shot is used as a candidate of the highlight scene constituting the digest video.SELECTED DRAWING: Figure 1

Description

本発明はダイジェスト生成装置、ダイジェスト生成方法及びプログラムに関する。 The present invention relates to a digest generation device, a digest generation method, and a program.

テレビの多チャンネル化や高速大容量インターネット回線を通じた動画配信サービスの増加などにより、宅内外問わずに映像コンテンツを視聴する機会が増えており、個人の可処分時間を有効に使うための一つの手段として、動画のハイライトシーンだけで編集されたものをダイジェスト視聴するということが一般化している。
例えば、特許文献１には、情報処理装置において、映像コンテンツに出現する人物のうち、注目する人物が画面上に出現する期間だけを抽出して再生することが記載されている。 With the increase in the number of TV channels and the increase in video distribution services via high-speed and large-capacity Internet lines, the opportunity to view video content both inside and outside the home has increased, and this is one way to use personal disposable time effectively. As a means, it is common to view a digest of a movie edited only with a highlight scene.
For example, Patent Document 1 describes that an information processing apparatus extracts and reproduces only a period in which a noticed person appears on a screen among persons appearing in video content.

特開２００８−２８３４８６号公報JP 2008-283486 A

一般に、ハイライトシーンの検出精度を上げるためには、使用するフレーム画像は多い方が良い。しかし、映像が例えば映像フォーマット１０８０ｉで放送されている場合には、フレーム周波数が２９．９７Ｈｚであり、各フレームで選手を検出するために１フレームあたりに割ける時間は少ない。そこで、録画中または視聴中に遅延することなくハイライトシーンを検出しダイジェスト映像として生成するためには、なるべく簡易に、そして高速に計算できるようにすることが求められる。 In general, in order to increase the accuracy of highlight scene detection, it is better to use more frame images. However, when the video is broadcast in, for example, the video format 1080i, the frame frequency is 29.97 Hz, and the time that can be allocated per frame is small in order to detect the player in each frame. Therefore, in order to detect a highlight scene and generate a digest video without delay during recording or viewing, it is required to be able to calculate as easily and as fast as possible.

本発明は、このような問題を解決するためになされたものであり、簡易に、そして高速にハイライトシーンを検出してダイジェスト映像として生成することができるダイジェスト生成装置、ダイジェスト生成方法及びプログラムを提供することを目的とする。 The present invention has been made to solve such a problem. A digest generation apparatus, a digest generation method, and a program that can detect a highlight scene easily and at high speed to generate a digest video are provided. The purpose is to provide.

本発明に係るダイジェスト生成装置は、競技場の地表面の色に対応する色相がフレーム画像中に占める割合を算出し、割合に基づいてフレーム画像がズームショットか、ズームショット以外のショットであるかを判定し、ズームショットと判定したフレーム画像をダイジェスト映像を構成するハイライトシーンの候補として用いるダイジェスト映像生成部を備えるものである。
この構成により、簡易に、そして高速にハイライトシーンを検出してダイジェスト映像として生成することができる。 The digest generation device according to the present invention calculates a ratio of the hue corresponding to the color of the ground surface of the stadium in the frame image, and whether the frame image is a zoom shot or a shot other than the zoom shot based on the ratio And a digest video generation unit that uses a frame image determined to be a zoom shot as a candidate for a highlight scene constituting the digest video.
With this configuration, it is possible to detect a highlight scene easily and at high speed and generate a digest video.

また、本発明に係る撮像方法は、ダイジェスト生成装置が、競技場の地表面の色に対応する色相がフレーム画像中に占める割合を算出するステップと、割合に基づいてフレーム画像がズームショットか、ズームショット以外のショットであるかを判定するステップと、ズームショットと判定したフレーム画像をダイジェスト映像を構成するハイライトシーンの候補として用いるステップとを有するものである。
この構成により、簡易に、そして高速にハイライトシーンを検出してダイジェスト映像として生成することができる。 Further, in the imaging method according to the present invention, the digest generation device calculates a ratio of the hue corresponding to the color of the ground surface of the stadium in the frame image, and whether the frame image is a zoom shot based on the ratio, A step of determining whether the shot is a shot other than a zoom shot, and a step of using a frame image determined to be a zoom shot as a candidate for a highlight scene constituting a digest video.
With this configuration, it is possible to detect a highlight scene easily and at high speed and generate a digest video.

本発明により、簡易に、そして高速にハイライトシーンを検出してダイジェスト映像として生成するダイジェスト生成装置、ダイジェスト生成方法及びプログラムを提供することができる。 According to the present invention, it is possible to provide a digest generation device, a digest generation method, and a program that detect a highlight scene easily and at high speed to generate a digest video.

実施の形態１に係るダイジェスト生成装置１の概略構成を示す図である。It is a figure which shows schematic structure of the digest production | generation apparatus 1 which concerns on Embodiment 1. FIG. 実施の形態１に係るダイジェスト生成方法の処理手順を示すフローチャートである。4 is a flowchart illustrating a processing procedure of a digest generation method according to the first embodiment. 実施の形態１に係る取得したフレーム画像の例である。4 is an example of an acquired frame image according to the first embodiment. 実施の形態１に係る単純な２値化処理、輪郭追跡処理後の画像の例である。It is an example of the image after the simple binarization process which concerns on Embodiment 1, and an outline tracking process. 実施の形態１に係る４ｘ３領域に分割した画像の例である。4 is an example of an image divided into 4 × 3 areas according to the first embodiment. 実施の形態１に係るマスクした画像の例である。4 is an example of a masked image according to the first embodiment. 実施の形態１に係る選手として分割した画像の例である。It is an example of the image divided | segmented as a player based on Embodiment 1. FIG. 実施の形態１に係るマスクした別の画像の例である。4 is an example of another masked image according to the first embodiment. 実施の形態１に係る２値化した別の画像の例である。4 is an example of another binarized image according to the first embodiment. 実施の形態１に係る選手として分割した別の画像の例である。It is an example of another image divided as a player concerning Embodiment 1. 実施の形態２に係る画像に対して、特定の色相毎に、当該色相に含まれる画素をＹ軸に射影した結果を示す図である。It is a figure which shows the result of having projected the pixel contained in the said hue on the Y-axis with respect to the image which concerns on Embodiment 2 for every specific hue. 実施の形態２に係る画像に対して、特定の色相毎に、当該色相に含まれる画素をＸ軸に射影した結果を示す図である。It is a figure which shows the result of having projected the pixel contained in the said hue on the X-axis for every specific hue with respect to the image which concerns on Embodiment 2. FIG.

（実施の形態１）
以下、図面を参照して本実施の形態１に係るダイジェスト生成装置及びダイジェスト生成方法について説明する。
本実施の形態１に係るダイジェスト生成装置及びダイジェスト生成方法は、テレビ放送または動画配信によるスポーツ中継、特にサッカーの試合映像について、任意の選手に着目してハイライトシーンを検出し、ダイジェスト映像を自動で生成するものである。 (Embodiment 1)
Hereinafter, the digest generating apparatus and the digest generating method according to the first embodiment will be described with reference to the drawings.
The digest generating apparatus and the digest generating method according to the first embodiment detect a highlight scene focusing on an arbitrary player in a sports broadcast by television broadcasting or video distribution, particularly a soccer game video, and automatically generate a digest video. Is generated by

まず、本実施の形態１に係るダイジェスト生成装置の構成について説明する。
図１は、本実施の形態１に係るダイジェスト生成装置１の概略構成を示す図である。
ダイジェスト生成装置１は、チューナ１０、デマルチプレクサ２０、ＳＩ／ＰＳＩ部３０、ビデオデコーダ４０、オーディオデコーダ５０、キャプションデコーダ６０、フレームバッファ７０、ディスプレイ８０、スピーカ９０、データベース１００、ダイジェスト映像生成部１１０などを備えている。 First, the configuration of the digest generation device according to the first embodiment will be described.
FIG. 1 is a diagram showing a schematic configuration of a digest generation apparatus 1 according to the first embodiment.
The digest generation apparatus 1 includes a tuner 10, a demultiplexer 20, an SI / PSI unit 30, a video decoder 40, an audio decoder 50, a caption decoder 60, a frame buffer 70, a display 80, a speaker 90, a database 100, a digest video generation unit 110, and the like. It has.

チューナ１０は、ダイジェスト生成装置１が入力したデジタルテレビ放送のサッカーの試合映像のＲＦ（ＲａｄｉｏＦｒｅｑｕｅｎｃｙ）信号に対して周波数選択処理により選局を行い、トランスポートストリーム（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ、ＴＳ）を出力する。
デマルチプレクサ２０は、トランスポートストリームを映像/音声のエレメンタリストリーム（ＥｌｅｍｅｎｔａｒｙＳｔｒｅａｍ、ＥＳ）に分離する。
また、デマルチプレクサ２０は、トランスポートストリームから番組情報（ＳＩ／ＰＳＩ）を分離する。 The tuner 10 performs channel selection by frequency selection processing on an RF (Radio Frequency) signal of a soccer game video of digital TV broadcasting input by the digest generation device 1 and outputs a transport stream (Transport Stream, TS). .
The demultiplexer 20 separates the transport stream into video / audio elementary streams (ES).
Further, the demultiplexer 20 separates program information (SI / PSI) from the transport stream.

ＳＩ／ＰＳＩ部３０は番組情報を入力して解析する。
ビデオデコーダ４０、オーディオデコーダ５０、キャプションデコーダ６０の各デコーダは、エレメンタリストリームを復号して映像データ、音声データ、字幕データを出力する。
ディスプレイ８０はフレームバッファ７０を介して映像データ及び／又は字幕データを入力し、映像及び／又は字幕を表示する。
スピーカ９０は音声データを入力し、発音する。 The SI / PSI unit 30 inputs and analyzes program information.
Each of the video decoder 40, the audio decoder 50, and the caption decoder 60 decodes the elementary stream and outputs video data, audio data, and caption data.
The display 80 inputs video data and / or caption data via the frame buffer 70 and displays the video and / or caption.
The speaker 90 inputs voice data and produces a sound.

データベース１００はサッカー選手の背番号、名前、所属チームなどの情報を保有し、また、ダイジェスト生成装置１外からＷｉＦｉ（登録商標）またはＥＴＨＥＲＮＥＴ（登録商標）を介して新しい情報を取得する。
ダイジェスト映像生成部１１０は、ビデオデコーダ４０が復号した映像データをフレームバッファ７０を介して取得し、ハイライトシーンの有無を判定したのちに、ダイジェスト映像を生成する。 The database 100 holds information such as a soccer player's spine number, name, team, and the like, and obtains new information from outside the digest generation device 1 via WiFi (registered trademark) or ETHERNET (registered trademark).
The digest video generation unit 110 acquires the video data decoded by the video decoder 40 through the frame buffer 70, determines the presence or absence of a highlight scene, and then generates a digest video.

なお、ダイジェスト映像生成部１１０が実現する各構成要素は、例えば、コンピュータであるダイジェスト映像生成部１１０が備える演算装置（図示せず）の制御によって、プログラムを実行させることにより実現できる。
より具体的には、ダイジェスト映像生成部１１０は、記憶部（図示せず）に格納されたプログラムを主記憶装置（図示せず）にロードし、演算装置の制御によってプログラムを実行して実現する。また、各構成要素は、プログラムによるソフトウェアで実現することに限ることなく、ハードウェア、ファームウェア及びソフトウェアのうちのいずれかの組み合わせなどにより実現しても良い。 In addition, each component which the digest image | video production | generation part 110 implement | achieves is realizable by making a program run by control of the arithmetic unit (not shown) with which the digest image | video production | generation part 110 which is a computer is provided, for example.
More specifically, the digest video generation unit 110 loads a program stored in a storage unit (not shown) to a main storage device (not shown), and executes the program under the control of the arithmetic device. . Each component is not limited to being realized by software by a program, and may be realized by any combination of hardware, firmware, and software.

上述したプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。
非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えば、フレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば、光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory））を含む。 The above-described program can be stored using various types of non-transitory computer readable media and supplied to a computer. Non-transitory computer readable media include various types of tangible storage media.
Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-ROMs. R, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)).

また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されても良い。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバなどの有線通信路、または無線通信路を介して、プログラムをコンピュータに供給できる。
以上、ダイジェスト生成装置１の構成について簡単に説明したが、デジタルテレビ放送を受信して表示する各構成の詳細については、例えば、特開２０１０―１５４２５８号公報を参照することができる。 Further, the program may be supplied to the computer by various types of temporary computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The transitory computer readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
Although the configuration of the digest generation device 1 has been briefly described above, the details of each configuration for receiving and displaying a digital television broadcast can be referred to, for example, Japanese Unexamined Patent Application Publication No. 2010-154258.

つぎに、本実施の形態１に係るダイジェスト生成方法について説明する。
ハイライトシーンといっても、どのようなシーンがハイライトシーンとして楽しめるかは、視聴する個人に依存するところである。一般的には、シュートシーン、得点シーンなどがハイライトシーンとしてあげられるが、本実施の形態１に係るダイジェスト生成方法では、視聴者が指定した選手が大きく映っているシーンがハイライトシーンであるとする。 Next, a digest generation method according to the first embodiment will be described.
Even if it is a highlight scene, what kind of scene can be enjoyed as a highlight scene depends on the individual to watch. In general, shot scenes, scoring scenes, and the like are given as highlight scenes. However, in the digest generation method according to the first embodiment, a scene in which a player designated by the viewer is greatly reflected is a highlight scene. And

サッカーの試合では７割以上がフィールドを俯瞰した映像（フィールドショット）、２割程度が選手をズームした映像（ズームショット）、そして１割弱が観客や監督などを映した映像である。選手へのズーム映像というのは、その選手が好プレーをした、または逆にミスを犯したなど、注目を集めるようなイベントが発生したときに映し出されることが多い。
また、サッカーの試合中に挿入されるリプレイシーンなども、ズーム映像で構成されることが多い。したがって、視聴者が望む選手がズーム映像中に映るシーンだけで構成された映像は、ダイジェストとして成り立つ。 In soccer games, more than 70% of the video is a bird's-eye view of the field (field shot), about 20% is a zoomed image of the player (zoom shot), and a little less than 10% is a video of the audience and the director. A zoom image for a player is often shown when an event that attracts attention occurs, such as when the player played well or made a mistake.
In addition, replay scenes inserted during soccer games are often composed of zoom images. Therefore, a video composed only of scenes in which the player desired by the viewer appears in the zoom video is a digest.

図２は、本実施の形態１に係るダイジェスト生成方法の処理手順を示すフローチャートである。
ここでは、ダイジェスト映像生成部１１０が主に各処理を担当し、取得したフレーム画像内に選手がいることを検出し、検出結果を使用してダイジェスト映像を生成する。
まず、ダイジェスト生成処理を開始すると、フレームバッファから１フレームを取得し（ステップＳ１０）、取得したフレームを縮小する（ステップＳ２０）。 FIG. 2 is a flowchart showing a processing procedure of the digest generation method according to the first embodiment.
Here, the digest video generation unit 110 is mainly in charge of each process, detects that there is a player in the acquired frame image, and generates a digest video using the detection result.
First, when the digest generation process is started, one frame is acquired from the frame buffer (step S10), and the acquired frame is reduced (step S20).

図３は、本実施の形態１に係る取得したフレーム画像の例である。なお、実際のカラー画像では、左の選手は赤いシャツ、白いパンツ、白いストッキングを着け、右の選手は白いシャツ、紺色のパンツ、オレンジ色のストッキング、黄色のシューズを着けている。また、グランドは芝で、白線が引いてあり、フェンスは青地に白い文字が入っている。ボールは白色で、観客席は赤系統の色が多い。 FIG. 3 is an example of the acquired frame image according to the first embodiment. In the actual color image, the left player wears a red shirt, white pants, and white stockings, and the right player wears a white shirt, scarlet pants, orange stockings, and yellow shoes. The ground is turf with white lines and the fence has white letters on a blue background. The ball is white and the spectator seats have many red colors.

つぎに、選手を検出する前の処理として、取得したフレームがフィールドショットであるかズームショットであるかを判断する。このとき、取得したフレーム下部の芝生の色相の割合が、しきい値よりも多い場合にはフィールドショットであると判断し、しきい値よりも少ない場合にはズームショットであると判断する。 Next, as processing before detecting a player, it is determined whether the acquired frame is a field shot or a zoom shot. At this time, if the acquired ratio of the hue of the lawn under the frame is larger than the threshold value, it is determined to be a field shot.

具体的には、フレームの色空間をＨＳＶに変換し（ステップＳ３０）、フレーム下部１／３の色相―明度（Ｈ−Ｖ）のヒストグラムを計算し（ステップＳ４０）、芝のグリーンの比率を算出する（ステップＳ５０）。
そして、グリーンの比率がしきい値より小さいかを判断する（ステップＳ６０）。グリーンの比率がしきい値より大きいとき（ステップＳ６０のＮｏ）は、取得したフレームがフィールドショット、または、ズームショット以外のショットであると判断して、ステップＳ１０に戻る。グリーンの比率がしきい値より小さいとき（ステップＳ６０のＹｅｓ）は、取得したフレームがズームショットであると判断して、当該フレームから選手を検出する処理に進む。 Specifically, the color space of the frame is converted to HSV (step S30), a hue-lightness (HV) histogram of the lower third of the frame is calculated (step S40), and the green ratio of the turf is calculated. (Step S50).
It is then determined whether the green ratio is smaller than the threshold value (step S60). When the green ratio is larger than the threshold (No in step S60), it is determined that the acquired frame is a shot other than the field shot or the zoom shot, and the process returns to step S10. When the green ratio is smaller than the threshold (Yes in step S60), it is determined that the acquired frame is a zoom shot, and the process proceeds to processing for detecting a player from the frame.

選手を検出するときは、ズームショットでは、選手が画面中央に映っていることを利用する。２値画像から輪郭追跡により画像内のオブジェクトを求め、拘束条件により選手となり得る領域を求める。拘束条件としては、例えば、得られた輪郭の最外殻が縦に長い矩形であるとする。フィールドに立つ選手の場合は、多くがこの拘束条件に当てはまる。
この拘束条件の場合、２値化の精度が選手の検出精度に大きく影響する。単純な２値化処理では、選手と後ろの看板や横に伸びる白線とがうまく分離できないことがある。 When the player is detected, the fact that the player is reflected in the center of the screen is used in the zoom shot. An object in the image is obtained from the binary image by contour tracking, and an area that can be a player is obtained according to the constraint condition. As a constraint condition, for example, it is assumed that the outermost shell of the obtained contour is a vertically long rectangle. For players standing on the field, this is often the case.
In the case of this constraint condition, the binarization accuracy greatly affects the player detection accuracy. In a simple binarization process, the player and the signboard behind and the white line extending horizontally may not be separated well.

図４は、本実施の形態１に係る単純な２値化処理、輪郭追跡処理後の画像の例である。
輪郭追跡により得られた輪郭の最外殻は、横に長い矩形領域である。すると、上記の拘束条件には当てはまらず、選手として検出をすることができないことがある。
本実施の形態１に係るダイジェスト生成方法では、上記の単純な２値化処理を採用することも可能であるが、選手の検出性能を更に向上させるために、フレーム中央の領域に含まれる色相の割合から２値化を行う。 FIG. 4 is an example of an image after simple binarization processing and contour tracking processing according to the first embodiment.
The outermost shell of the contour obtained by contour tracking is a rectangular region that is long horizontally. Then, it does not apply to the above-mentioned constraint conditions and may not be detected as a player.
In the digest generation method according to the first embodiment, it is possible to adopt the above simple binarization process, but in order to further improve the detection performance of the player, the hue included in the region in the center of the frame Binarization is performed from the ratio.

具体的には、取得したフレーム画像を任意のＭｘＮ領域、ここでは４ｘ３領域に分割する。
図５は、本実施の形態１に係る４ｘ３領域に分割した画像の例である。
そして、中央の２領域について色相―明度（Ｈ−Ｖ）のヒストグラムを計算して、当該２領域に含まれる色相の割合を算出する（ステップＳ７０）。つまり、色相が０度から３６０度の間の値とすると、６０度ずつ、６つに分けて、その６つの範囲の色相がどれだけ、中央の２領域に含まれているのかを求める。 Specifically, the acquired frame image is divided into arbitrary MxN regions, here, 4x3 regions.
FIG. 5 is an example of an image divided into 4 × 3 regions according to the first embodiment.
Then, a hue-lightness (HV) histogram is calculated for the two central regions, and the ratio of the hues included in the two regions is calculated (step S70). In other words, if the hue is a value between 0 degrees and 360 degrees, the hue is divided into six 60 degrees, and how many hues in the six ranges are included in the two central regions is obtained.

また、６つの色相の中で一番割合（使用比率）が高い色相を検出し（ステップＳ８０）、検出された色相以外をマスクする（ステップＳ９０）。図５に示した画像の例では、中央の２領域でいちばん割合の多い色相は、左の選手のユニフォームの色を含むため、その色相を残すようにマスクする。
図６は、本実施の形態１に係るマスクした画像の例である。 Further, the hue having the highest ratio (usage ratio) among the six hues is detected (step S80), and the hues other than the detected hue are masked (step S90). In the example of the image shown in FIG. 5, the hue with the highest proportion in the two central regions includes the uniform color of the left player, so that the hue is masked so as to leave the hue.
FIG. 6 is an example of a masked image according to the first embodiment.

そして、マスクした画像を２値化し（ステップＳ１００）、輪郭を検出し（ステップＳ１１０）、拘束条件により選手として分割（セグメント）する（ステップＳ１２０）。
図７は、本実施の形態１に係る選手として分割した画像の例である。
各矩形は輪郭の最外殻を含む領域であることを表す。さらに、上記拘束条件により、選手として検出された輪郭は左の選手を対象とする矩形となる。他の矩形は、拘束条件から外れた、選手ではない輪郭を含む矩形となる。 Then, the masked image is binarized (step S100), the contour is detected (step S110), and divided (segmented) as a player according to the constraint conditions (step S120).
FIG. 7 is an example of an image divided as a player according to the first embodiment.
Each rectangle represents a region including the outermost shell of the outline. Further, the contour detected as the player by the constraint condition is a rectangle for the left player. The other rectangle is a rectangle including a contour that is not a player and is out of the constraint condition.

また、もう一方の選手を検出するために、中央の２領域の２番目に割合が多い色相を残すように当該色相以外をマスクし、２値化し、輪郭を求め、拘束条件により選手としてセグメントする（ステップＳ８０〜ステップＳ１２０）。
図８、９、１０はそれぞれ、本実施の形態１に係る、マスクした別の画像の例、２値化した別の画像の例、選手として分割した別の画像の例である。 In addition, in order to detect the other player, other than that hue is masked so as to leave the hue with the second highest ratio in the two central areas, binarized, the contour is obtained, and the player is segmented according to the constraint condition (Steps S80 to S120).
FIGS. 8, 9, and 10 are examples of another masked image, another binarized image, and another image divided as a player according to the first embodiment.

そして、これまでの処理で選手を検出できたか判断する（ステップＳ１３０）。選手を検出できなかったとき（ステップＳ１３０のＮｏ）はステップＳ１０に戻る。選手を検出できたとき（ステップＳ１３０のＹｅｓ）は、検出した選手を特定する処理に進む。
選手を特定する処理では、ステップＳ１２０でセグメントされた矩形画像を２値化し（ステップＳ１４０）、２値化した画像を収縮／膨張し（ステップＳ１５０）、ラベリングし（ステップＳ１６０）、文字認識（ＯＣＲ）を行う（ステップＳ１７０）。 Then, it is determined whether the player has been detected by the process so far (step S130). When a player cannot be detected (No in step S130), the process returns to step S10. When a player can be detected (Yes in step S130), the process proceeds to a process of specifying the detected player.
In the process of identifying the player, the rectangular image segmented in step S120 is binarized (step S140), and the binarized image is contracted / expanded (step S150), labeled (step S160), and character recognition (OCR) (Step S170).

そして、文字認識結果が数字であるかを判断する（ステップＳ１８０）。文字認識結果が数字でないとき（ステップＳ１８０のＮｏ）は、ステップＳ１０に戻る。文字認識結果が数字であるとき（ステップＳ１８０のＹｅｓ）は、認識結果の信頼度、つまり、数字であることの信頼度を計算する（ステップＳ１９０）。 Then, it is determined whether the character recognition result is a number (step S180). When the character recognition result is not a number (No in step S180), the process returns to step S10. When the character recognition result is a number (Yes in step S180), the reliability of the recognition result, that is, the reliability of being a number is calculated (step S190).

また、認識結果の数字は背番号であるとして、データベースを参照し（ステップＳ２００）、データベースから得られた背番号に紐づけられた選手名を取得し（ステップＳ２１０）、ステップＳ１０で取得したフレーム画像をその選手のハイライトシーンとして登録する（ステップＳ２２０）。 Also, assuming that the number of the recognition result is a back number, the database is referred to (step S200), the player name associated with the back number obtained from the database is acquired (step S210), and the frame acquired in step S10 The image is registered as the highlight scene of the player (step S220).

そして、映像の全てのフレームを処理したかを判断する（ステップＳ２３０）。全てのフレームを処理してないとき（ステップＳ２３０のＮｏ）は、ステップＳ１０に戻る。全てのフレームを処理したとき（ステップＳ２３０のＹｅｓ）は、ダイジェスト映像を作成することができたので処理を終了する。 Then, it is determined whether all the frames of the video have been processed (step S230). When all the frames have not been processed (No in step S230), the process returns to step S10. When all the frames have been processed (Yes in step S230), the digest video has been created, and the processing ends.

なお、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、サッカーの試合映像における芝生の色相の割合に基づいて、あるフレームがフィールドショットであるか、ズームショットであるかを判断したが、この判断に用いる色相は競技によって当然異なる。一般的には、観客席などを含まない、選手が競技中に動き回る範囲である競技場（playing field、ground）の地面または表面の色相、例えば、ボクシングのリングの白色、テニスのクレーコートの茶色などを用いてショットを判断すれば良い。 In the digest generating device or the digest generating method according to the first embodiment, it is determined whether a certain frame is a field shot or a zoom shot based on the ratio of the grass hue in the soccer game video. Of course, the hue used for this judgment varies depending on the competition. In general, the hue of the ground or surface of the playing field (ground), which does not include spectator seats, and is the range in which the player moves around during the competition, for example, the white color of the boxing ring, the brown color of the tennis clay court The shot may be determined using such as.

また、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、選手を特定するときに、背番号を文字認識する代わりに、選手として検出された矩形領域周辺の顔を画像認識しても良い。
また、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、データベースに競技場の表面の色情報を登録し、その色情報をもとに、ズームショットか、フィールドショットかを判断しても良い。これにより、ショットを判断するときの精度を向上させることができる。 Further, in the digest generation device or the digest generation method according to the first embodiment, when a player is specified, the face around the rectangular area detected as the player is image-recognized instead of character recognition of the player number. good.
In the digest generating apparatus or digest generating method according to the first embodiment, the color information of the surface of the stadium is registered in the database, and it is determined whether the shot is a zoom shot or a field shot based on the color information. Also good. Thereby, the accuracy when determining a shot can be improved.

さらに、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、データベースに天気（晴れ、曇り、雨、雪など）に応じた競技場の表面の色情報を登録し、その色情報をもとに、ズームショットか、フィールドショットかを判断しても良い。例えば、試合中の天気が雪であれば、芝の緑に代えて、雪の白を用いてショットを判断しても良い。 Furthermore, in the digest generating apparatus or the digest generating method according to the first embodiment, the color information on the surface of the stadium according to the weather (sunny, cloudy, rain, snow, etc.) is registered in the database, and the color information is also stored. In addition, it may be determined whether it is a zoom shot or a field shot. For example, if the weather during the game is snow, the shot may be determined using snow white instead of green grass.

また、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、データベースに選手が着用するユニフォームの色情報を登録し、その色情報をもとに、選手をセグメンテーションしてもよい。例えば、ホームゲーム用のユニフォームの色情報、アウェイゲーム用のユニフォームの色情報をあらかじめデータベースに登録しておけば、選手をセグメンテーションするときの精度を向上させることができる。また、ユニフォームの色情報が分かっていれば、フレーム画像の中心領域の各色相の比率を必ずしも用いなくとも良い。 Moreover, in the digest production | generation apparatus or digest production | generation method concerning this Embodiment 1, the color information of the uniform which a player wears may be registered into a database, and a player may be segmented based on the color information. For example, if the color information of the uniform for the home game and the color information of the uniform for the away game are registered in the database in advance, it is possible to improve the accuracy when the player is segmented. If the color information of the uniform is known, the ratio of each hue in the center area of the frame image need not be used.

また、本実施の形態１に係るダイジェスト生成装置またはダイジェスト生成方法では、選手を検出するときにユニフォームの色を用いたが、ボクサーのグローブの色などユニフォーム以外の色を用いても良い。 Moreover, in the digest production | generation apparatus or digest production | generation method concerning this Embodiment 1, although the color of the uniform was used when detecting a player, you may use colors other than a uniform, such as the color of a boxer's glove.

以上、説明したように、本実施の形態１に係るダイジェスト生成装置１は、競技場の地表面の色に対応する色相がフレーム画像中に占める割合を算出し、割合に基づいてフレーム画像がズームショットか、ズームショット以外のショットであるかを判定し、ズームショットと判定したフレーム画像をダイジェスト映像を構成するハイライトシーンの候補として用いるダイジェスト映像生成部１１０を備えるものである。 As described above, the digest generation apparatus 1 according to the first embodiment calculates the ratio of the hue corresponding to the color of the ground surface of the stadium in the frame image, and the frame image is zoomed based on the ratio. A digest video generation unit 110 that determines whether the shot is a shot other than a zoom shot and uses a frame image determined to be a zoom shot as a candidate for a highlight scene that constitutes the digest video is provided.

また、本実施の形態１に係るダイジェスト生成装置１は、ダイジェスト映像生成部１１０が、ズームショットであると判定したフレーム画像の中から選手を検出し、検出した選手の背番号を認識し、検出した選手別にハイライトシーンを生成することが好ましい。
また、本実施の形態１に係るダイジェスト生成装置１は、ダイジェスト映像生成部１１０が、選手を検出するときに、フレーム画像中央の領域の色相の割合を求めることが好ましい。 Moreover, the digest production | generation apparatus 1 which concerns on this Embodiment 1 detects a player from the frame image which the digest image | video production | generation part 110 determined to be a zoom shot, recognizes the detected player's back number, and detects it. It is preferable to generate a highlight scene for each player.
Moreover, it is preferable that the digest production | generation apparatus 1 which concerns on this Embodiment 1 calculates | requires the ratio of the hue of the area | region of the center of a frame image, when the digest image | video production | generation part 110 detects a player.

（実施の形態２）
本実施の形態２に係るダイジェスト生成装置及びダイジェスト生成方法も、サッカーの試合映像について、任意の選手に着目してハイライトシーンを検出し、ダイジェスト映像を自動で生成するものである。 (Embodiment 2)
The digest generating apparatus and the digest generating method according to the second embodiment also detect a highlight scene of a soccer game video by paying attention to an arbitrary player, and automatically generate a digest video.

本実施の形態２に係るダイジェスト生成装置及びダイジェスト生成方法は、ズームショットと判断した画像から選手を検出するときの処理手順に特徴を有している。このため、ダイジェスト生成装置１の概略構成、フレームバッファから取得した画像をズームショットと判断するまでの処理手順（ステップＳ１０〜ステップＳ６０）、及び、選手を検出できたか判断するところから、ダイジェスト映像を作成するまでの処理手順（ステップＳ１３０〜ステップＳ２２０）は第１の実施の形態のものと同様でよく、ここでは説明を省略する。 The digest generation device and the digest generation method according to the second embodiment are characterized by a processing procedure when a player is detected from an image determined to be a zoom shot. Therefore, from the schematic configuration of the digest generation device 1, the processing procedure (step S10 to step S60) until the image acquired from the frame buffer is determined to be a zoom shot, and whether a player has been detected, the digest video is determined. The processing procedure (steps S130 to S220) up to the creation may be the same as that of the first embodiment, and the description is omitted here.

本実施の形態２に係るダイジェスト生成方法では、フレームバッファより得られた画像に対して、特定の色相毎に、それぞれの色相に含まれる画素をＸ軸及びＹ軸に射影し、その結果に基づいて、選手を特定する。
図１１は、本実施の形態２に係る画像に対して、特定の色相毎に、当該色相に含まれる画素をＹ軸に射影した結果を示す図である。縦軸は入力画像のＹ方向の画素数、つまり、入力画像の高さである。また、横軸は各Ｙ座標において、特定の色相に含まれる画素の数である。つまり、画像の任意のＹラインにおいて、特定の色相に含まれる画素の数である。ここでは、色相を６系列に分け、画像に含まれる画素が多かった系列１〜４をグラフ化している。 In the digest generation method according to the second embodiment, for each specific hue, the pixels included in each hue are projected onto the X axis and the Y axis for the image obtained from the frame buffer, and based on the result. To identify players.
FIG. 11 is a diagram illustrating a result of projecting pixels included in the hue onto the Y axis for each specific hue with respect to the image according to the second embodiment. The vertical axis represents the number of pixels in the Y direction of the input image, that is, the height of the input image. The horizontal axis represents the number of pixels included in a specific hue in each Y coordinate. That is, the number of pixels included in a specific hue in an arbitrary Y line of the image. Here, the hues are divided into 6 series, and series 1 to 4 in which there are many pixels included in the image are graphed.

入力フレーム画像の上半分は系列１の色相がほぼ占めており、また画像中央から下１／４程度までは系列２の色相がほぼ占めていることが分かる。つまり、系列１、２の色相の画素が横に広がって分布していることがわかる。
逆に系列４の色相は、画像の縦方向中央に存在するが、系列１、２の色相に比べて割合が低い。しかし、Ｙ軸方向の広範囲に渡っていることから、縦長に分布していることが分かる。 It can be seen that the upper half of the input frame image is almost occupied by the hue of the series 1, and the hue of the series 2 is almost occupied from the center of the image to the lower quarter. That is, it can be seen that the pixels of the hues of series 1 and 2 are spread and distributed horizontally.
Conversely, the hue of the series 4 exists in the center in the vertical direction of the image, but the ratio is lower than the hues of the series 1 and 2. However, since it extends over a wide range in the Y-axis direction, it can be seen that it is distributed vertically.

図１２は、本実施の形態２に係る画像に対して、特定の色相毎に、当該色相に含まれる画素をＸ軸に射影した結果を示す図である。横軸は入力画像のＸ方向の画素数、つまり、入力画像の幅である。また、縦軸は各Ｘ座標において、特定の色相に含まれる画素の数である。つまり、画像の任意の縦方向への１ラインにおいて、特定の色相に含まれる画素の数である。
系列１、２の色相の画素は、画面左右の端から中央にかけて、横に広がって分布していることがわかる。また、系列４の色相は画像中央で***しているため、当該色相の画素が中央に縦に分布していることが分かる。 FIG. 12 is a diagram illustrating a result of projecting pixels included in the hue onto the X axis for each specific hue with respect to the image according to the second embodiment. The horizontal axis represents the number of pixels in the X direction of the input image, that is, the width of the input image. The vertical axis represents the number of pixels included in a specific hue in each X coordinate. That is, the number of pixels included in a specific hue in one line in an arbitrary vertical direction of the image.
It can be seen that the pixels of the hues of series 1 and 2 are spread and distributed horizontally from the left and right edges of the screen to the center. Further, since the hue of the series 4 is raised at the center of the image, it can be seen that the pixels of the hue are vertically distributed in the center.

以上より、系列１、２の色相の画素は画像全体、横に広がっていることが分かり、また、系列４の色相の画素は画面中央、縦に分布していることが分かる。そして、画像における選手の拘束条件が縦長であるとすると、系列４の色相の画素が画面中央に位置する選手に対応するものである可能性が高いことがわかり、選手としてセグメントすることができる。
そして、選手を検出できたか判断する（ステップＳ１３０）などの処理を継続する。 From the above, it can be seen that the pixels of the hues of the series 1 and 2 are spread across the entire image, and the pixels of the hue of the series 4 are distributed in the center of the screen and vertically. Then, assuming that the player's constraint condition in the image is vertically long, it can be seen that there is a high possibility that the pixel of the hue of series 4 corresponds to the player located in the center of the screen, and the player can be segmented.
Then, processing such as determining whether or not a player has been detected (step S130) is continued.

以上、説明したように、本実施の形態１に係るダイジェスト生成装置１は、ダイジェスト映像生成部１１０が、選手を検出するときに、フレーム画像を構成する各画素の色相をＸ軸及びＹ軸方向に射影することが好ましい。 As described above, the digest generating apparatus 1 according to the first embodiment determines the hue of each pixel constituting the frame image in the X-axis and Y-axis directions when the digest video generating unit 110 detects a player. Projection is preferable.

１ダイジェスト生成装置
４０ビデオデコーダ
７０フレームバッファ
８０ディスプレイ
９０スピーカ
１００データベース
１１０ダイジェスト映像生成部 1 Digest Generation Device 40 Video Decoder 70 Frame Buffer 80 Display 90 Speaker 100 Database 110 Digest Video Generation Unit

Claims

競技場の地表面の色に対応する色相がフレーム画像中に占める割合を算出し、前記割合に基づいて前記フレーム画像がズームショットか、ズームショット以外のショットであるかを判定し、前記ズームショットと判定したフレーム画像をダイジェスト映像を構成するハイライトシーンの候補として用いるダイジェスト映像生成部を備える
ダイジェスト生成装置。 The ratio of the hue corresponding to the color of the ground surface of the stadium to the frame image is calculated, and based on the ratio, it is determined whether the frame image is a zoom shot or a shot other than the zoom shot, and the zoom shot A digest generation device including a digest video generation unit that uses a frame image determined as a candidate for a highlight scene constituting a digest video.

前記ダイジェスト映像生成部は、
前記ズームショットであると判定したフレーム画像の中から選手を検出し、前記検出した選手の背番号を認識し、前記検出した選手別にハイライトシーンを生成する
請求項１記載のダイジェスト生成装置。 The digest video generation unit
The digest generation device according to claim 1, wherein a player is detected from the frame image determined to be the zoom shot, a back number of the detected player is recognized, and a highlight scene is generated for each detected player.

前記ダイジェスト映像生成部は、
前記選手を検出するときに、前記フレーム画像中央の領域の色相の割合を求める
請求項２記載のダイジェスト生成装置。 The digest video generation unit
The digest generation device according to claim 2, wherein when the player is detected, a hue ratio of a central region of the frame image is obtained.

前記ダイジェスト映像生成部は、
前記選手を検出するときに、前記フレーム画像を構成する各画素の色相をＸ軸及びＹ軸方向に射影する
請求項２記載のダイジェスト生成装置。 The digest video generation unit
The digest production | generation apparatus of Claim 2 which projects the hue of each pixel which comprises the said frame image to a X-axis and a Y-axis direction when detecting the said player.

ダイジェスト生成装置が、
競技場の地表面の色に対応する色相がフレーム画像中に占める割合を算出するステップと、
前記割合に基づいて前記フレーム画像がズームショットか、ズームショット以外のショットであるかを判定するステップと、
前記ズームショットと判定したフレーム画像をダイジェスト映像を構成するハイライトシーンの候補として用いるステップと
を有するダイジェスト生成方法。 The digest generator
Calculating the proportion of the hue corresponding to the color of the ground surface of the stadium in the frame image;
Determining whether the frame image is a zoom shot or a shot other than a zoom shot based on the ratio;
Using the frame image determined to be a zoom shot as a candidate for a highlight scene constituting a digest video.

前記ダイジェスト生成装置に、請求項５記載のダイジェスト生成方法の各ステップを実行させるためのプログラム。 The program for making the said digest production | generation apparatus perform each step of the digest production | generation method of Claim 5.