JP7428855B2

JP7428855B2 - Video analysis system, video analysis device, video analysis method, and program

Info

Publication number: JP7428855B2
Application number: JP2020045954A
Authority: JP
Inventors: 直也中嶋; 加寿代水野; 直晃山下
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2024-02-07
Anticipated expiration: 2040-03-17
Also published as: JP2021149232A

Description

特許法第３０条第２項適用開催日：令和１年１２月８日集会名：Ｙａｈｏｏ！ＪＡＰＡＮＳｃｉｅｎｃｅＣｏｎｆｅｒｅｎｃｅ２０１９開催場所：東京都千代田区紀尾井町１番３号（ヤフー株式会社本社）Article 30, Paragraph 2 of the Patent Act applies Event date: December 8, 2020 Meeting name: Yahoo! JAPAN Science Conference 2019 Venue: 1-3 Kioicho, Chiyoda-ku, Tokyo (Yahoo Japan Corporation Headquarters)

本発明は、動画解析システム、動画解析装置、動画解析方法、およびプログラムに関する。 The present invention relates to a video analysis system, a video analysis device, a video analysis method, and a program.

従来、動画コンテンツにおけるシーンの切り替わりを検出し、各シーンの画像を前記動画コンテンツから抽出する抽出部と、前記動画コンテンツに含まれる音響情報の時間的推移を解析する解析部と、前記抽出部によって抽出された前記各シーンの画像の一覧と、前記解析部によって解析された前記音響情報の時間的推移を示す画像とを表示部へ表示する表示処理部と、を備え、前記解析部は、前記音響情報の時間的推移として、前記動画コンテンツに含まれる音響信号の大きさおよび周波数のうち少なくとも一以上の時間的推移を解析することを特徴とする情報処理装置の発明が開示されている（特許文献１）。 Conventionally, an extraction unit detects a scene change in video content and extracts an image of each scene from the video content, an analysis unit analyzes the temporal transition of audio information included in the video content, and the extraction unit a display processing unit that displays on a display unit a list of extracted images of each scene and an image showing a temporal transition of the acoustic information analyzed by the analysis unit; The invention of an information processing device is disclosed, which is characterized in that it analyzes the temporal transition of at least one of the magnitude and frequency of the audio signal included in the video content as the temporal transition of the audio information (patent application). Reference 1).

特開２０１７－０９７８８８号公報JP2017-097888A

従来の技術では、動画に含まれる一以上のフレーム画像のうち比較対象画像に類似する画像を抽出する処理を、高速に行うことができない場合があった。 With conventional techniques, there are cases where it is not possible to perform a process of extracting an image similar to a comparison target image from among one or more frame images included in a video at high speed.

本発明は、このような事情を考慮してなされたものであり、動画に含まれる一以上のフレーム画像のうち比較対象画像に類似する画像を抽出する処理を、高速に行うことを目的の一つとする。 The present invention has been made in consideration of these circumstances, and one of its objectives is to quickly perform a process of extracting an image similar to a comparison target image from among one or more frame images included in a video. Let's do one.

本発明の一態様は、動画に対する処理内容を指示する指示情報を操作者から受け付け、前記指示情報に応じたＡＰＩを実行する第１処理部と、前記実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を分散環境で実行する第２処理部と、を備え、前記第１処理部は、前記第２処理部によって実行された処理の結果を出力部に出力させ、前記第２処理部が実行する処理は、動画に含まれる一以上のフレーム画像のうち比較対象画像に類似する画像を抽出する類似画像抽出処理を含み、前記第２処理部は、前記類似画像抽出処理の少なくとも一部を分散環境で実行する、動画解析システムである。 One aspect of the present invention includes a first processing unit that receives instruction information instructing processing content for a video from an operator and executes an API according to the instruction information; a second processing unit that executes at least a portion of the process in a distributed environment; the first processing unit causes an output unit to output a result of the processing executed by the second processing unit; The process to be executed includes a similar image extraction process for extracting an image similar to the comparison target image from among one or more frame images included in the video, and the second processing unit performs at least a part of the similar image extraction process. It is a video analysis system that runs in a distributed environment.

本発明の一態様によれば、動画に含まれる一以上のフレーム画像のうち比較対象画像に類似する画像を抽出する処理を、高速に行うことができる。 According to one aspect of the present invention, processing for extracting an image similar to a comparison target image from among one or more frame images included in a video can be performed at high speed.

動画解析システムを構成する動画解析装置１００の構成と使用環境の一例を示す図である。1 is a diagram illustrating an example of the configuration and usage environment of a video analysis device 100 that constitutes a video analysis system. 第１実施形態に係る動画解析装置１００の機能について説明するための図である。FIG. 2 is a diagram for explaining the functions of the video analysis device 100 according to the first embodiment. 操作者端末装置２０によって表示される結果出力画像ＩＭの一例を示す図である。3 is a diagram showing an example of a result output image IM displayed by the operator terminal device 20. FIG. 第２実施形態に係る動画解析装置１００の機能について説明するための図である。FIG. 7 is a diagram for explaining functions of a video analysis device 100 according to a second embodiment.

以下、図面を参照し、本発明の動画解析システムの実施形態について説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments of a video analysis system of the present invention will be described with reference to the drawings.

＜第１実施形態＞
図１は、動画解析システムを構成する動画解析装置１００の構成と使用環境の一例を示す図である。動画解析装置１００は、一以上のプロセッサにより実現される。動画解析装置１００は、ネットワークＮＷを介して入稿者端末装置１０や操作者端末装置２０、動画配信装置２００などと通信する。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、セルラー網、公衆回線等を含む。なお、動画解析装置１００は、動画配信装置２００の一部であってもよいし、動画配信装置２００とは別体の装置であってもよい。また、図１では動画解析システムの構成要素が動画解析装置１００に実装されているものとしているが、例えばフロントエンド処理部１２０が操作者端末装置２０に実装されてもよい。その場合、操作者端末装置２０は動画解析システムの一部を構成する。 <First embodiment>
FIG. 1 is a diagram showing an example of the configuration and usage environment of a video analysis device 100 that constitutes a video analysis system. Video analysis device 100 is realized by one or more processors. The video analysis device 100 communicates with the submitter terminal device 10, the operator terminal device 20, the video distribution device 200, etc. via the network NW. The network NW includes, for example, a WAN (Wide Area Network), a LAN (Local Area Network), the Internet, a cellular network, a public line, and the like. Note that the video analysis device 100 may be a part of the video distribution device 200 or may be a separate device from the video distribution device 200. Further, although FIG. 1 shows that the components of the video analysis system are installed in the video analysis device 100, the front end processing unit 120 may be installed in the operator terminal device 20, for example. In that case, the operator terminal device 20 constitutes a part of the video analysis system.

入稿者端末装置１０は、動画配信装置２００によって不特定多数の端末装置に配信される動画を動画配信装置２００に送信する。入稿者端末装置１０を操作する入稿者は、動画配信装置２００を運営する運営者とは異なる主体（コンテンツプロバイダー）の担当者である。 The submitter terminal device 10 transmits to the video distribution device 200 a video that is distributed by the video distribution device 200 to an unspecified number of terminal devices. The submitter who operates the submitter terminal device 10 is a person in charge of a different entity (content provider) from the operator who operates the video distribution device 200.

操作者端末装置２０は、動画解析装置１００を操作し、動画解析装置１００の処理結果を利用する操作者によって使用される。操作者端末装置２０は、表示装置やスピーカなどの出力部を備える。操作者端末装置２０では、動画解析装置１００が提供するインターフェース画像が表示され、各種の指示情報の入力が可能となっている。動画解析装置１００は、例えばウェブサーバの機能を有しており、操作者端末装置２０は、ブラウザやアプリケーションプログラムなどのＵＡ（User Agent）によって動画解析装置１００の提供するインターフェース画像を表示する。 The operator terminal device 20 is used by an operator who operates the video analysis device 100 and uses the processing results of the video analysis device 100. The operator terminal device 20 includes an output section such as a display device and a speaker. On the operator terminal device 20, an interface image provided by the video analysis device 100 is displayed, and various instruction information can be input. The video analysis device 100 has, for example, the function of a web server, and the operator terminal device 20 displays an interface image provided by the video analysis device 100 using a UA (User Agent) such as a browser or an application program.

動画解析装置１００は、例えば、通信部１１０、フロントエンド処理部１２０（第１処理部の一例）、ファイルシステム１３０、およびバックエンド処理部１４０（第２処理部の一例）を備える。フロントエンド処理部１２０、ファイルシステム１３０、およびバックエンド処理部１４０は、例えば、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることで記憶装置にインストールされてもよい。バックエンド処理部１４０が分散環境で処理を行うものであるため、ハードウェアプロセッサは、マルチコアプロセッサなどの並列処理が可能なものであることが好ましい。なお、バックエンド処理部１４０は、シングルコアプロセッサを用いて時分割で分散処理を行っても構わない。 The video analysis device 100 includes, for example, a communication section 110, a front-end processing section 120 (an example of a first processing section), a file system 130, and a back-end processing section 140 (an example of a second processing section). The front-end processing unit 120, the file system 130, and the back-end processing unit 140 are realized, for example, by a hardware processor such as a CPU (Central Processing Unit) executing a program (software). Some or all of these components are hardware (circuit parts) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). (including circuitry), or may be realized by collaboration between software and hardware. The program may be stored in advance in a storage device (a storage device with a non-transitory storage medium) such as an HDD (Hard Disk Drive) or flash memory, or may be stored in a removable storage device such as a DVD or CD-ROM. It may be stored in a medium (non-transitory storage medium), and installed in the storage device by loading the storage medium into a drive device. Since the back-end processing unit 140 performs processing in a distributed environment, the hardware processor is preferably one capable of parallel processing, such as a multi-core processor. Note that the back-end processing unit 140 may perform time-sharing distributed processing using a single-core processor.

通信部１１０は、ネットワークＮＷにアクセスするための通信インターフェースである。例えば、通信部１１０はネットワークカードを含む。 The communication unit 110 is a communication interface for accessing the network NW. For example, communication unit 110 includes a network card.

フロントエンド処理部１２０は、動画に対する処理内容を指示する指示情報を操作者端末装置２０から受け付け、指示情報に応じたＡＰＩ（Application Programming Interface）を実行する。フロントエンド処理部１２０は、バックエンド処理部１４０によって実行された処理の結果を操作者端末装置２０に出力させる。フロントエンド処理部１２０は、例えば、フリーソフトウェアであるFFmpegに周辺機能を付加することで実現される。フロントエンド処理部１２０は、コマンドラインのツールとして機能し、ライブラリを直に読み出して細かい制御をすることを可能とする。フロントエンド処理部１２０は、Pythonのコールバックを可能にすることで効率的に情報を取得でき、長尺の動画についてシーケンシャルな処理を行うこと、或いはシーク処理などを可能としている。 The front-end processing unit 120 receives instruction information from the operator terminal device 20 that instructs processing contents for the moving image, and executes an API (Application Programming Interface) according to the instruction information. The front-end processing unit 120 causes the operator terminal device 20 to output the results of the processing executed by the back-end processing unit 140. The front-end processing unit 120 is realized, for example, by adding peripheral functions to FFmpeg, which is free software. The front-end processing unit 120 functions as a command line tool and allows detailed control by directly reading the library. The front-end processing unit 120 can efficiently acquire information by enabling Python callbacks, and can perform sequential processing or seek processing on long videos.

ファイルシステム１３０は、記憶部１５０にデータを展開して管理する。ファイルシステム１３０は、フロントエンド処理部１２０とバックエンド処理部１４０との間で受け渡されるデータを仲介する役割を有している。 The file system 130 expands and manages data in the storage unit 150. The file system 130 has a role of mediating data exchanged between the front-end processing section 120 and the back-end processing section 140.

バックエンド処理部１４０は、フロントエンド処理部１２０により実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を、分散環境で実行する。分散環境とは、典型的には同時並行して処理を行うことが可能な複数の処理主体によって処理を実行する環境をいう。バックエンド処理部１４０が実行する処理は、動画に含まれる一以上のフレーム画像のうち比較対象画像に類似する画像を抽出する類似画像抽出処理を含み、バックエンド処理部１４０は、類似画像抽出処理の少なくとも一部を分散環境で実行する。比較対象画像とは、例えば入稿者端末装置１０から動画と共に取得されたサムネイル画像である。そのように取得されるサムネイル画像には、誤って選択された他の動画のフレーム画像であったり、シリーズ物の動画において何巻目に対応するのか不明な場合があったりするため、類似画像抽出処理は、動画に対して適切なサムネイル画像であるか否かを判断するための情報を提供することを目的としている。なお、類似画像抽出処理の他にも、バックエンド処理部１４０は、動画トランスコード、動画エンコード（目視確認のためのサイズダウン処理）などの処理を、必要に応じて分散環境で行うことができる。 The back-end processing unit 140 executes at least part of the processing on the video according to the API executed by the front-end processing unit 120 in a distributed environment. A distributed environment typically refers to an environment in which processing is executed by a plurality of processing entities that can perform processing concurrently. The processing executed by the back-end processing unit 140 includes similar image extraction processing for extracting an image similar to the comparison target image from among one or more frame images included in a video, and the back-end processing unit 140 performs similar image extraction processing. run at least a portion of it in a distributed environment. The comparison target image is, for example, a thumbnail image acquired together with a video from the submitter's terminal device 10. The thumbnail images obtained in this way may be frame images of other videos that were selected by mistake, or it may be unclear which volume of a series of videos they correspond to, so similar image extraction may be necessary. The purpose of the processing is to provide information for determining whether the thumbnail image is appropriate for the video. In addition to similar image extraction processing, the backend processing unit 140 can perform processing such as video transcoding and video encoding (size reduction processing for visual confirmation) in a distributed environment as necessary. .

図２は、第１実施形態に係る動画解析装置１００の機能について説明するための図である。まず、入稿者端末装置１０から動画とサムネイル画像が入稿される。これに対して、操作者端末装置２０に対して操作者が操作を行うことで、動画スプリット処理、キーフレーム抽出処理、特徴量抽出処理、サムネイル類似度判定処理が、順次あるいは一連のシーケンス処理としてフロントエンド処理部１２０に指示される。指示情報とは、これらを順次指示する複数の情報であってもよいし、一連のシーケンス処理として指示する情報であってもよい。 FIG. 2 is a diagram for explaining the functions of the video analysis device 100 according to the first embodiment. First, a video and a thumbnail image are submitted from the submitter terminal device 10. On the other hand, when the operator performs an operation on the operator terminal device 20, video split processing, key frame extraction processing, feature extraction processing, and thumbnail similarity determination processing are performed sequentially or as a series of sequence processing. The front end processing unit 120 is instructed. The instruction information may be a plurality of pieces of information that instruct these sequentially, or may be information that instructs a series of sequence processes.

まず、フロントエンド処理部１２０は、動画スプリット処理をバックエンド処理部１４０に実行させるためのＡＰＩを実行する。バックエンド処理部１４０には、複数のタスクキュー部１４２がデーモンプログラムとして常駐している。そのうち一つのタスクキュー部１４２がＡＰＩの実行に反応し、例えば一つのプロセス実行部１４４に通知する。 First, the front-end processing unit 120 executes an API for causing the back-end processing unit 140 to perform video split processing. A plurality of task queue units 142 reside in the back-end processing unit 140 as daemon programs. One of the task queue units 142 reacts to the execution of the API, and notifies one of the process execution units 144, for example.

プロセス実行部１４４は、ＯＳ（Operating System）機能を有している。上記の通知を受けたプロセス実行部１４４は、ＡＰＩによって指示される動画スプリット処理を一括で行う。動画スプリット処理は分散環境で実行されてもさほど効率が向上しないものであるため、バックエンド処理部１４０では動画スプリット処理を一括処理で行う。入稿者端末装置１０から入稿された動画は、例えば、MPEG2-PSなどの規格によって区切られており、プロセス実行部１４４は、FFmpegの機能を用いて、規格による区切りに沿って動画を分割する。プロセス実行部１４４による処理が終了すると、タスクキュー部１４２が処理結果である分割された動画（分割動画）をファイルシステム１３０に登録する。 The process execution unit 144 has an OS (Operating System) function. Upon receiving the above notification, the process execution unit 144 collectively performs the video splitting process instructed by the API. Since the efficiency of video split processing does not improve much even if it is executed in a distributed environment, the backend processing unit 140 performs the video split processing in batch processing. The video submitted from the submitter terminal device 10 is divided according to standards such as MPEG2-PS, and the process execution unit 144 uses the FFmpeg function to divide the video according to the standard divisions. do. When the processing by the process execution unit 144 is completed, the task queue unit 142 registers the divided video (divided video) that is the processing result in the file system 130.

次に、フロントエンド処理部１２０は、キーフレーム抽出処理をバックエンド処理部１４０に実行させるためのＡＰＩを実行する。一つのタスクキュー部１４２がＡＰＩの実行に反応し、複数のプロセス実行部１４４に通知してキーフレーム抽出処理を分散処理させる。キーフレーム抽出処理とは、分割画像のそれぞれにおけるキーフレーム（代表フレーム）を抽出する処理である。プロセス実行部１４４は、例えば、前後のフレーム画像との輝度変化の大きいフレーム画像を分割動画の中から二つ抽出し、抽出したフレーム画像の時間的に中間の位置にあるフレーム画像、すなわち輝度変化の小さい平穏なフレーム画像をキーフレームとする。プロセス実行部１４４による処理が終了すると、タスクキュー部１４２が処理結果である分割画像ごとのキーフレームをファイルシステム１３０に登録する。 Next, the front-end processing unit 120 executes an API for causing the back-end processing unit 140 to execute key frame extraction processing. One task queue unit 142 responds to the execution of the API and notifies a plurality of process execution units 144 to perform distributed key frame extraction processing. The key frame extraction process is a process of extracting key frames (representative frames) in each divided image. For example, the process execution unit 144 extracts two frame images with a large brightness change from the previous and subsequent frame images from the divided video, and extracts a frame image at a temporally intermediate position of the extracted frame images, that is, a frame image with a large brightness change. Use a small, peaceful frame image as a key frame. When the processing by the process execution unit 144 is completed, the task queue unit 142 registers the key frame for each divided image, which is the processing result, in the file system 130.

次に、フロントエンド処理部１２０は、キーフレームおよびサムネイル画像の特徴量抽出処理をバックエンド処理部１４０に実行させるためのＡＰＩを実行する。一つのタスクキュー部１４２がＡＰＩの実行に反応し、複数のプロセス実行部１４４に通知して特徴量抽出処理を分散処理させる。プロセス実行部１４４は、例えば、ＳＩＦＴ（Scale-Invariant Feature Transform）やHarrisなどの処理を行なってもよいし、ＣＮＮ（Convolution Neural Network）などの機械学習を利用したモデルにキーフレームを入力することで出力を得るようにしてもよい。プロセス実行部１４４による処理が終了すると、タスクキュー部１４２が処理結果であるキーフレームごとの特徴量とサムネイル画像の特徴量をファイルシステム１３０に登録する。 Next, the front-end processing unit 120 executes an API for causing the back-end processing unit 140 to perform feature extraction processing for key frames and thumbnail images. One task queue unit 142 responds to the execution of the API and notifies the plurality of process execution units 144 to perform distributed feature extraction processing. For example, the process execution unit 144 may perform processing such as SIFT (Scale-Invariant Feature Transform) or Harris, or input key frames into a model using machine learning such as CNN (Convolution Neural Network). It may also be possible to obtain output. When the processing by the process execution unit 144 is completed, the task queue unit 142 registers the feature amount of each key frame and the feature amount of the thumbnail image, which are the processing results, in the file system 130.

次に、フロントエンド処理部１２０は、サムネイル類似度判定処理をバックエンド処理部１４０に実行させるためのＡＰＩを実行する。一つのタスクキュー部１４２がＡＰＩの実行に反応し、一つまたは複数のプロセス実行部１４４に通知してサムネイル類似度判定処理を実行させる。プロセス実行部１４４は、サムネイル画像の特徴量と、複数のキーフレームごとの特徴量とを一つずつ比較し、類似度が高い順にソートする。特徴量は、例えばベクトル形式で表されており、プロセス実行部１４４は、それらの距離を類似度として算出する。プロセス実行部１４４による処理が終了すると、タスクキュー部１４２が処理結果である類似度ランキング結果をファイルシステム１３０に登録する。類似度ランキング結果は、複数のキーフレームを、特徴量の類似度が高い順にランキングした結果である。 Next, the front-end processing unit 120 executes an API for causing the back-end processing unit 140 to execute thumbnail similarity determination processing. One task queue unit 142 responds to the execution of the API and notifies one or more process execution units 144 to execute thumbnail similarity determination processing. The process execution unit 144 compares the feature amounts of the thumbnail image and the feature amounts of each of a plurality of key frames one by one, and sorts them in descending order of similarity. The feature amounts are expressed, for example, in a vector format, and the process execution unit 144 calculates the distance between them as the degree of similarity. When the processing by the process execution unit 144 is completed, the task queue unit 142 registers the similarity ranking result, which is the processing result, in the file system 130. The similarity ranking result is the result of ranking a plurality of key frames in descending order of feature amount similarity.

類似度ランキング結果がファイルシステム１３０に登録されると、フロントエンド処理部１２０は、自動的に、或いは操作者の操作に応じて、類似度ランキング結果に基づく情報を操作者端末装置２０に出力する。操作者端末装置２０は、入力された情報に応じた画像を表示する。図３は、操作者端末装置２０によって表示される結果出力画像ＩＭの一例を示す図である。図示するように、結果出力画像ＩＭは、動画タイトルや再生時間の他、サムネイル画像と、類似度ランキング結果に含まれる類似画像とを含む。図中、類似画像（１）となっているものは、特徴量がサムネイル画像の特徴量に最も類似すると判定されたキーフレームである。以下、類似画像（２）、（３）についても同様である。操作者は、このような画像を視認することで、動画に対して適切なサムネイル画像が付加されているかどうかを確認することができる。 When the similarity ranking results are registered in the file system 130, the front end processing unit 120 automatically or in response to an operator's operation outputs information based on the similarity ranking results to the operator terminal device 20. . The operator terminal device 20 displays an image according to the input information. FIG. 3 is a diagram showing an example of the result output image IM displayed by the operator terminal device 20. As shown in the figure, the result output image IM includes a video title, playback time, thumbnail images, and similar images included in the similarity ranking results. In the figure, a similar image (1) is a key frame whose feature amount is determined to be most similar to the feature amount of the thumbnail image. The same applies to similar images (2) and (3) below. By visually checking such images, the operator can confirm whether or not an appropriate thumbnail image has been added to the video.

以上説明した第１実施形態によれば、動画に含まれる一以上のフレーム画像のうち比較対象画像（サムネイル画像）に類似する画像を抽出する処理を、高速に行うことができる。 According to the first embodiment described above, the process of extracting an image similar to a comparison target image (thumbnail image) from one or more frame images included in a video can be performed at high speed.

＜第２実施形態＞
以下、第２実施形態について説明する。図４は、第２実施形態に係る動画解析装置１００の機能について説明するための図である。第２実施形態において、フロントエンド処理部１２０によって各種処理を指示するためのＡＰＩが実行されると、その情報がジョブキュー部１４６に伝えられる。ジョブキュー部１４６は、処理のリストを生成する。そして、例えば複数のタスク登録部１４７が常駐している。これによって、動画に対する指示が複数、略同時になされた場合でも、タスク登録部１４７の数に応じたタスクの実行が可能である。タスク登録部１４７は、リストから取得した処理の内容をスケジューラ１４８に登録する。スケジューラ１４８は、例えば、順序性のある処理について優先順を付与した処理のリストを生成する。スケジューラ１４８により生成された処理のリストに応じた数のプロセス実行部１４４が自動的に生成される。但し、ハードウェアの限界に達する数がプロセス実行部１４４の数の上限となる。プロセス実行部１４４は、スケジューラ１４８のリストを参照し、自身でファイルシステム１３０内の対象情報を取得して第１実施形態と同様の各種処理（動画スプリット処理、キーフレーム抽出処理、特徴量抽出処理、サムネイル類似度判定処理、動画トランスコード、動画エンコードなど）を行う。これによって、プロセス実行部１４４の数に応じた並列処理が実行され、高速処理が実現されるが、ハードウェアリソースが足りない場合は、スケジューラ１４８に登録された状態で処理待ちとなる。 <Second embodiment>
The second embodiment will be described below. FIG. 4 is a diagram for explaining the functions of the video analysis device 100 according to the second embodiment. In the second embodiment, when the front end processing unit 120 executes an API for instructing various processes, the information is transmitted to the job queue unit 146. The job queue unit 146 generates a list of processes. For example, a plurality of task registration units 147 are resident therein. As a result, even if a plurality of instructions for moving images are given substantially simultaneously, tasks can be executed according to the number of tasks registered in the task registration section 147. The task registration unit 147 registers the contents of the process acquired from the list in the scheduler 148. For example, the scheduler 148 generates a list of prioritized processes for sequential processes. The number of process execution units 144 corresponding to the list of processes generated by the scheduler 148 is automatically generated. However, the number that reaches the hardware limit is the upper limit of the number of process execution units 144. The process execution unit 144 refers to the list of the scheduler 148, acquires target information in the file system 130 by itself, and performs various processes similar to those in the first embodiment (video split processing, key frame extraction processing, feature amount extraction processing). , thumbnail similarity determination processing, video transcoding, video encoding, etc.). As a result, parallel processing is executed according to the number of process execution units 144, and high-speed processing is achieved. However, if hardware resources are insufficient, the processing is registered in the scheduler 148 and waits for processing.

以上説明した第２実施形態によれば、第１実施形態と同様の効果を奏することができる。 According to the second embodiment described above, the same effects as the first embodiment can be achieved.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the mode for implementing the present invention has been described above using embodiments, the present invention is not limited to these embodiments in any way, and various modifications and substitutions can be made without departing from the gist of the present invention. can be added.

１０入稿者端末装置
２０操作者端末装置
１００画像解析装置
１１０通信部
１２０フロントエンド処理部
１３０ファイルシステム
１４０バックエンド処理部
１４２タスクキュー部
１４４プロセス実行部
１４６ジョブキュー部
１４７タスク登録部
１４８スケジューラ 10 Submitter terminal device 20 Operator terminal device 100 Image analysis device 110 Communication section 120 Front-end processing section 130 File system 140 Back-end processing section 142 Task queue section 144 Process execution section 146 Job queue section 147 Task registration section 148 Scheduler

Claims

少なくとも対象となる動画とサムネイル画像を指定する情報を含み、動画に対する処理内容を指示する指示情報を操作者から受け付け、前記指示情報に応じたＡＰＩ（Application Programming Interface）を実行する第１処理部と、
前記実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を分散環境で実行する第２処理部と、
を備え、
前記第２処理部が実行する処理は、前記動画に含まれる一以上のフレーム画像のうち前記サムネイル画像に類似する画像を抽出する類似画像抽出処理を含み、
前記第２処理部は、前記類似画像抽出処理の少なくとも一部を分散環境で実行し、
前記第１処理部は、前記第２処理部によって抽出された前記サムネイル画像に類似する度合いが高い一以上のフレーム画像を表示部に表示させ、
前記類似画像抽出処理は、動画スプリット処理を一括処理で実行し、少なくとも動画スプリット処理の結果として得られた分割動画のそれぞれに対する処理を分散環境で実行するものである、
動画解析システム。 a first processing unit that receives instruction information from an operator that includes information specifying at least a target video and a thumbnail image and instructs processing contents for the video, and executes an API (Application Programming Interface) according to the instruction information; ,
a second processing unit that executes at least part of the processing on the video according to the executed API in a distributed environment;
Equipped with
The process executed by the second processing unit includes a similar image extraction process of extracting an image similar to the thumbnail image from among one or more frame images included in the video,
The second processing unit executes at least a part of the similar image extraction process in a distributed environment,
The first processing unit causes a display unit to display one or more frame images that are highly similar to the thumbnail image extracted by the second processing unit,
The similar image extraction process is performed by performing a video splitting process in a batch process, and at least processing for each of the divided videos obtained as a result of the video splitting process is executed in a distributed environment.
Video analysis system.

前記分割動画のそれぞれに対する処理は、キーフレーム抽出処理および特徴量抽出処理を含む、
請求項１記載の動画解析システム。 The processing for each of the divided videos includes key frame extraction processing and feature amount extraction processing,
The video analysis system according to claim 1.

前記第２処理部は、前記指示情報の内容を保持すると共に前記第１処理部によりＡＰＩが実行されるのに応じてＡＰＩの実行を通知するタスクキュー部と、
前記通知に応じて前記第１処理部により実行されたＡＰＩに応じた処理を分散環境で実行する複数のプロセス実行部と、
を備える、請求項１または２記載の動画解析システム。 The second processing unit includes a task queue unit that holds the contents of the instruction information and notifies execution of the API in response to the execution of the API by the first processing unit;
a plurality of process execution units that execute processing in a distributed environment according to the API executed by the first processing unit in response to the notification;
The video analysis system according to claim 1 or 2, comprising:

前記第２処理部は、前記指示情報の内容をリストによって保持するジョブキュー部と、
前記リストから処理の内容を取得してスケジューラに渡すタスク登録部と、
前記スケジューラから処理の内容を取得して分散環境で処理を実行する複数のプロセス実行部と、
を備える、請求項１または２記載の動画解析システム。 The second processing unit includes a job queue unit that holds the contents of the instruction information in a list;
a task registration unit that acquires processing details from the list and passes them to a scheduler;
a plurality of process execution units that acquire processing contents from the scheduler and execute the processing in a distributed environment;
The video analysis system according to claim 1 or 2, comprising:

少なくとも対象となる動画とサムネイル画像を指定する情報を含み、動画に対する処理内容を指示する指示情報を操作者から受け付け、前記指示情報に応じたＡＰＩを実行する第１処理部と、
前記実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を分散環境で実行する第２処理部と、
を備え、
前記第２処理部が実行する処理は、前記動画に含まれる一以上のフレーム画像のうち前記サムネイル画像に類似する画像を抽出する類似画像抽出処理を含み、
前記第２処理部は、前記類似画像抽出処理の少なくとも一部を分散環境で実行し、
前記第１処理部は、前記第２処理部によって抽出された前記サムネイル画像に類似する度合いが高い一以上のフレーム画像を表示部に表示させ、
前記類似画像抽出処理は、動画スプリット処理を一括処理で実行し、少なくとも動画スプリット処理の結果として得られた分割動画のそれぞれに対する処理を分散環境で実行するものである、
動画解析装置。 a first processing unit that receives instruction information from an operator that includes information specifying at least a target video and a thumbnail image and instructs processing contents for the video, and executes an API according to the instruction information;
a second processing unit that executes at least part of the processing on the video according to the executed API in a distributed environment;
Equipped with
The process executed by the second processing unit includes a similar image extraction process of extracting an image similar to the thumbnail image from among one or more frame images included in the video,
The second processing unit executes at least a part of the similar image extraction process in a distributed environment,
The first processing unit causes a display unit to display one or more frame images that are highly similar to the thumbnail image extracted by the second processing unit,
The similar image extraction process is performed by performing a video splitting process in a batch process, and at least processing for each of the divided videos obtained as a result of the video splitting process is executed in a distributed environment.
Video analysis device.

動画解析装置が、
少なくとも対象となる動画とサムネイル画像を指定する情報を含み、動画に対する処理内容を指示する指示情報を操作者から受け付け、前記指示情報に応じたＡＰＩを実行する第１処理と、
前記実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を分散環境で実行する第２処理と、
を実行し、
前記第２処理は、前記動画に含まれる一以上のフレーム画像のうち前記サムネイル画像に類似する画像を抽出する類似画像抽出処理と、前記類似画像抽出処理の少なくとも一部を分散環境で実行する処理とを含み、
前記第１処理は、前記第２処理によって抽出された前記サムネイル画像に類似する度合いが高い一以上のフレーム画像を表示部に表示させることを含み、
前記類似画像抽出処理は、動画スプリット処理を一括処理で実行し、少なくとも動画スプリット処理の結果として得られた分割動画のそれぞれに対する処理を分散環境で実行するものである、
動画解析方法。 The video analysis device
a first process of receiving instruction information from an operator that includes information specifying at least a target video and a thumbnail image and instructing processing details for the video, and executing an API according to the instruction information;
a second process of executing at least a part of the process on the video according to the executed API in a distributed environment;
Run
The second process includes a similar image extraction process that extracts an image similar to the thumbnail image from among one or more frame images included in the video, and a process that executes at least a part of the similar image extraction process in a distributed environment. including
The first process includes displaying on a display unit one or more frame images that are highly similar to the thumbnail image extracted by the second process,
The similar image extraction process is performed by performing a video splitting process in a batch process, and at least processing for each of the divided videos obtained as a result of the video splitting process is executed in a distributed environment.
Video analysis method.

動画解析装置に、
少なくとも対象となる動画とサムネイル画像を指定する情報を含み、動画に対する処理内容を指示する指示情報を操作者から受け付け、前記指示情報に応じたＡＰＩを実行する第１処理と、
前記実行されたＡＰＩに応じた動画に対する処理の少なくとも一部を分散環境で実行する第２処理と、
を実行させ、
前記第２処理は、前記動画に含まれる一以上のフレーム画像のうち前記サムネイル画像に類似する画像を抽出する類似画像抽出処理と、前記類似画像抽出処理の少なくとも一部を分散環境で実行する処理とを含み、
前記第１処理は、前記第２処理によって抽出された前記サムネイル画像に類似する度合いが高い一以上のフレーム画像を表示部に表示させることを含み、
前記類似画像抽出処理は、動画スプリット処理を一括処理で実行し、少なくとも動画スプリット処理の結果として得られた分割動画のそれぞれに対する処理を分散環境で実行するものである、
プログラム。 Video analysis equipment,
a first process of receiving instruction information from an operator that includes information specifying at least a target video and a thumbnail image and instructing processing details for the video, and executing an API according to the instruction information;
a second process of executing at least a part of the process on the video according to the executed API in a distributed environment;
run the
The second process includes a similar image extraction process that extracts an image similar to the thumbnail image from among one or more frame images included in the video, and a process that executes at least a part of the similar image extraction process in a distributed environment. including
The first process includes displaying on a display unit one or more frame images that are highly similar to the thumbnail image extracted by the second process,
The similar image extraction process is performed by performing a video splitting process in a batch process, and at least processing for each of the divided videos obtained as a result of the video splitting process is executed in a distributed environment.
program.