JP2023119326A

JP2023119326A - Video image analysis apparatus and video image analysis method

Info

Publication number: JP2023119326A
Application number: JP2022022167A
Authority: JP
Inventors: 大石丸; Masaru Ishimaru; 日美生山内; Himio Yamauchi; 忠良木村; Tadayoshi Kimura; 将之徳永; Masayuki Tokunaga
Original assignee: TVS Regza Corp
Current assignee: TVS Regza Corp
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2023-08-28
Also published as: WO2023155433A1; CN116806347A

Abstract

To provide a video image analysis apparatus and a video image analysis method for analyzing a plurality of pieces of image data with various resolutions.SOLUTION: A video image analysis apparatus relating to one embodiment switches a size and model parameters of a detection window of a neural network according to a resolution of image data obtained by performing digital sampling on a video image frame, and detects a detection target from the image data.SELECTED DRAWING: Figure 1

Description

実施形態は、映像、画像を解析する映像解析装置および映像解析方法に関する。 The embodiments relate to a video analysis apparatus and a video analysis method for analyzing videos and images.

ニューラルネットワークを用いた画像認識においては、画像の中から特定の情報を検出する際に、検出窓という一定のサイズ（画素数）の枠を用いる。例えば、映像フレームからデジタルサンプリングにより得た画像に対して、検出窓を用いて情報の検出をしようとした場合、画像の画角（撮影領域）が同一でも解像度が異なると、同じサイズの検出窓に含まれる情報には違いが出る（例えば低解像度の方がより広い領域の情報を拾うこととなる）。通常、画像を拡大したり縮小したりすることで、検出窓のサイズとのバランスが取られる。 In image recognition using a neural network, a frame of a certain size (the number of pixels) called a detection window is used to detect specific information from an image. For example, when trying to detect information using a detection window for an image obtained by digital sampling from a video frame, even if the angle of view (shooting area) of the image is the same but the resolution is different, the same size detection window differ in the information contained in (for example, a lower resolution picks up a wider area of information). The image is usually scaled up or down to balance the size of the detection window.

特許第６７０６７８８号公報Japanese Patent No. 6706788 特許第６８６７１１７号公報Japanese Patent No. 6867117

しかしながら、解像度の異なる複数の画像から、同様の大きさの対象物をニューラルネットワークで検出する場合、検出窓内の情報の違いにより、安定した学習を行うことが困難であり、また検出窓に合わせて画像のサイズを縮小すると特徴量が失われてしまう問題がある。 However, when using a neural network to detect objects of similar size from multiple images with different resolutions, it is difficult to perform stable learning due to differences in the information within the detection window. There is a problem that feature values are lost when the size of the image is reduced by using the method.

本発明が解決しようとする課題は、解像度の異なる複数の画像データを解析する映像解析装置処理、方法およびプログラムを提供することを目的とする。 An object of the present invention is to provide a video analysis device process, method, and program for analyzing a plurality of image data with different resolutions.

一実施形態に係る映像解析装置は、映像フレームをデジタルサンプリングして得た画像データの解像度に応じてニューラルネットワークの検出窓のサイズ及びモデルパラメータを切り替えて、前記画像データから検出対象を検出する。 A video analysis apparatus according to one embodiment switches the detection window size and model parameters of a neural network according to the resolution of image data obtained by digitally sampling a video frame, and detects a detection target from the image data.

図１は、第１の実施形態に係る映像解析装置の構成図である。FIG. 1 is a configuration diagram of a video analysis apparatus according to the first embodiment. 図２は、第１の実施形態に係る映像解析装置が解析処理するデータフローの例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a data flow analyzed by the video analysis apparatus according to the first embodiment. 図３は、第１の実施形態に係る映像解析装置による解析処理を示すフローチャートである。FIG. 3 is a flowchart showing analysis processing by the video analysis device according to the first embodiment. 図４は、実施形態に係る映像解析装置が解析処理する映像サンプリングデータの例を示す模式図である。FIG. 4 is a schematic diagram showing an example of video sampling data analyzed by the video analysis device according to the embodiment. 図５は、第２の実施形態に係る映像解析装置による解析処理を示すフローチャートである。FIG. 5 is a flowchart showing analysis processing by the video analysis device according to the second embodiment. 図６は、第２の実施形態に係る映像解析装置が備えるテーブルデータの例である。FIG. 6 is an example of table data provided in the video analysis apparatus according to the second embodiment. 図７は、第３の実施形態に係る映像解析装置による解析処理を示すフローチャートである。FIG. 7 is a flowchart showing analysis processing by the video analysis device according to the third embodiment. 図８は、第４の実施形態に係る映像解析装置による解析処理を示すフローチャートである。FIG. 8 is a flowchart showing analysis processing by the video analysis device according to the fourth embodiment.

以下、実施の形態について図面を参照して説明する。
（第１の実施形態）
例えば同じ場面を撮影した映像からデジタルサンプリングにより画像を取得する場合、デジタルサンプリングによっては画像の解像度が異なることがある。本実施形態においては、ニューラルネットワークを用いて、画角（撮影範囲）が同一もしくは同様であるが解像度の異なる複数の画像データからだいたい一定の大きさとなる対象（検出対象と称する）を検出する例を示す。 Hereinafter, embodiments will be described with reference to the drawings.
(First embodiment)
For example, when an image is obtained by digital sampling from a video of the same scene, the resolution of the image may differ depending on the digital sampling. In this embodiment, a neural network is used to detect an object (referred to as a detection object) of approximately a certain size from a plurality of image data having the same or similar angle of view (shooting range) but different resolutions. indicates

図１は、実施形態に係る映像解析装置の構成図である。 FIG. 1 is a configuration diagram of a video analysis device according to an embodiment.

映像解析装置１は、入力された映像や画像から検出対象を検出し、外部に出力する装置であり、ＣＰＵやメモリなどのコンピュータ、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）などのデジタル信号処理手段を備えていてもよい。 The video analysis device 1 is a device that detects a detection target from an input video or image and outputs it to the outside. good too.

映像入力部１１は、外部からデジタルデータである画像データを映像解析装置１に取り込み、画像データを出力する。入力される画像データは、映像データの静止画像など任意の画像データであってよい。映像入力部１１は、入力された画像データからメタ情報（画素数または解像度）を取り出して、画角取得部１２に画角情報としてメタ情報を出力する。
また、映像入力部１１は、画像データを出力する際に、例えば解析を実行しようとする画像データのサンプルを用いて、サンプルに合わせるように出力する画像データの画角（撮影範囲）を調整することでもよい。 The video input unit 11 loads image data, which is digital data, from the outside into the video analysis device 1 and outputs the image data. The input image data may be arbitrary image data such as a still image of video data. The video input unit 11 extracts meta information (number of pixels or resolution) from the input image data, and outputs the meta information as angle of view information to the angle of view acquisition unit 12 .
Further, when outputting image data, the image input unit 11 uses, for example, a sample of image data to be analyzed, and adjusts the angle of view (shooting range) of the image data to be output so as to match the sample. It's okay.

画角取得部１２は、映像入力部１１から画角情報を取得する。 The angle-of-view acquisition unit 12 acquires angle-of-view information from the video input unit 11 .

検出用ＮＮ選択部１３は、画角取得部１２から入力される画角情報などに基づいて、画像データから検出対象を検出するためのニューラルネットワークＮＮを選択する。ニューラルネットワークＮＮは、例えばＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ（ＤＮＮ）であり、ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ（ＣＮＮ）を含んでもよい。また、ニューラルネットワークＮＮは、ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ（ＲＮＮ）、ＬｏｎｇＴｅｒｍＳｈｏｒｔＭｅｍｏｒｙ（ＬＴＳＭ）など任意のニューラルネットワークを含めてもよい。各種ニューラルネットワークは一般的な技術であり、説明を省略する。 The detection NN selection unit 13 selects a neural network NN for detecting a detection target from image data based on the angle-of-view information input from the angle-of-view acquisition unit 12 . The neural network NN is, for example, a Deep Neural Network (DNN) and may include a Convolutional Neural Network (CNN). Also, the neural network NN may include any neural network such as a Recurrent Neural Network (RNN) or Long Term Short Memory (LTSM). Various neural networks are common techniques, and description thereof is omitted.

ニューラルネットワークＮＮは、検出窓サイズをパラメータとして備える。検出窓は、画像データから検出対象を検出するための画像データ上の領域を示し、検出窓サイズは、検出窓に含まれる画素数を示す。 The neural network NN has a detection window size as a parameter. A detection window indicates an area on image data for detecting a detection target from image data, and a detection window size indicates the number of pixels included in the detection window.

記憶部１４は、メモリであり、テーブル１４１やＮＮモデルパラメータ１４２など各種情報が格納される。 The storage unit 14 is a memory and stores various information such as a table 141 and NN model parameters 142 .

テーブル１４１は、映像入力部１１から入力された画像データの画角情報ごとに、紐づけられたニューラルネットワークＮＮの検出窓サイズ、ＮＮモデルなどが格納されている。すなわち、画像データの画角情報ごとにニューラルネットワークＮＮを紐づけられたＮＮモデルに変えることを示す。例えば、データＴＢ１は、映像入力部１１から入力された画像データの解像度が７２０［画素数］×４８０［画素数］の場合、ＮＮモデルを検出用ＮＮ１とし、検出用ＮＮ１の検出窓サイズは、１６×１６とするニューラルネットワークＮＮを用いて、画像データを処理することを示す。テーブル１４１は、出荷時に設定されていることでもよいし、また映像解析装置１がインターネット経由でサーバなどからダウンロードして取得することでもよい。その他任意の方法でテーブル１４１を記憶部１４に設定できるようにしてもよい。 The table 141 stores the detection window size of the associated neural network NN, the NN model, and the like for each field angle information of the image data input from the video input unit 11 . That is, it indicates that the neural network NN is changed to an NN model that is associated with each field angle information of the image data. For example, in the data TB1, when the resolution of the image data input from the video input unit 11 is 720 [number of pixels]×480 [number of pixels], the NN model is the detection NN1, and the detection window size of the detection NN1 is It shows the processing of image data using a 16×16 neural network NN. The table 141 may be set at the time of shipment, or may be downloaded by the video analysis apparatus 1 from a server or the like via the Internet. Any other method may be used to set the table 141 in the storage unit 14 .

ＮＮモデルパラメータ１４２は、ニューラルネットワークＮＮで用いられるパラメータであり、テーブル１４１における使用ＮＮモデルに相当するニューラルネットワークＮＮのパラメータである。例えば、データＴＢ１の検出用ＮＮ１のモデルパラメータは、ＭＰ１であり、データＴＢ２の検出用ＮＮ２のモデルパラメータは、ＭＰ２であり、データＴＢ２の検出用ＮＮ２のモデルパラメータは、ＭＰ３であることを示す。ＮＮモデルパラメータ１４２は、テーブル１４１で設定される検出窓サイズのデータで学習されて得られているものとする。 The NN model parameters 142 are parameters used in the neural network NN, and are parameters of the neural network NN corresponding to the used NN model in the table 141 . For example, it indicates that the model parameter of the detection NN1 of data TB1 is MP1, the model parameter of the detection NN2 of data TB2 is MP2, and the model parameter of the detection NN2 of data TB2 is MP3. It is assumed that the NN model parameters 142 are obtained by learning with the detection window size data set in the table 141 .

画像認識部１５は、映像入力部１１が取り込んだ画像データに対して、検出用ＮＮ選択部１３によって選択されたニューラルネットワークＮＮを実行する。 The image recognition unit 15 executes the neural network NN selected by the detection NN selection unit 13 on the image data captured by the video input unit 11 .

特徴検出部１５１は、画像認識部１５の機能の一部としてニューラルネットワークＮＮによる特徴検出を実行する。特徴検出部１５１は、例えばＣＮＮであってもよい。 The feature detection unit 151 executes feature detection by a neural network NN as part of the function of the image recognition unit 15 . The feature detector 151 may be, for example, a CNN.

領域計算部１５２は、映像入力部１１が取り込んだ画像データに対して、画像認識による特定の物体の検出を実行する。領域計算部１５２は、一般的なオブジェクト認識手法を用いてもよいし、ニューラルネットワークであってもよい。 The region calculation unit 152 executes detection of a specific object by image recognition on the image data captured by the video input unit 11 . The area calculation unit 152 may use a general object recognition technique, or may be a neural network.

結果出力部１６は、画像認識部１５によって解析した結果を図示せぬモニタなどの外部装置へ出力する。 The result output unit 16 outputs the result analyzed by the image recognition unit 15 to an external device such as a monitor (not shown).

図２は、実施形態に係る映像解析装置が解析処理するデータフローの例を示す模式図である。本実施形態の映像解析装置１は、リンゴを解析対象ＡＯとし、リンゴ上の虫食いなどのキズを検出する。虫食いなどのキズを検出対象ＤＯと称する。 FIG. 2 is a schematic diagram illustrating an example of a data flow analyzed by the video analysis apparatus according to the embodiment; The video analysis apparatus 1 of the present embodiment uses an apple as an analysis target AO and detects scratches such as worm-eaten spots on the apple. A flaw such as worm-eaten is referred to as a detection target DO.

映像解析装置１は、解析対象ＡＯであるリンゴの撮影データのデジタルサンプリングによって、低解像度の画像データＳＤ１のデータを得たとする。 Assume that the video analysis apparatus 1 obtains low-resolution image data SD1 by digital sampling of photographed data of an apple, which is an analysis target AO.

画角取得部１２は、低解像度の画像データＳＤ１が入力されると、画角情報を取得し、検出用ＮＮ選択部１３に入力する。検出用ＮＮ選択部１３は、入力された画角情報の解像度に基づいて記憶部１４から情報を取得し、検出窓サイズＤＷ１を５×７、検出用ＮＮのモデルを低解像度用のＤＮＮ（検出用のニューラルネットワークＮＮ１）とする。画像認識部１５は、ニューラルネットワークＮＮ１で、画像データＳＤ１を解析する。ここでニューラルネットワークＮＮ１は、検出対象ＤＯの含まれた検出窓サイズＤＷ１のデータで学習されているものとする。画像認識部１５の解析により、画像データＳＤ１から検出対象ＤＯが検出される。 When the low-resolution image data SD1 is input, the angle-of-view acquisition unit 12 acquires angle-of-view information and inputs it to the NN selection unit 13 for detection. The detection NN selection unit 13 acquires information from the storage unit 14 based on the resolution of the input field angle information, sets the detection window size DW1 to 5×7, and sets the model of the detection NN to the DNN for low resolution (detection It is assumed that the neural network NN1) for The image recognition unit 15 analyzes the image data SD1 with the neural network NN1. Here, it is assumed that the neural network NN1 is trained with data of a detection window size DW1 including the detection target DO. Through analysis by the image recognition unit 15, the detection target DO is detected from the image data SD1.

同様に、映像解析装置１は、解析対象ＡＯであるリンゴの撮影データのデジタルサンプリングによって、高解像度の画像データＳＤ２のデータを得た場合、画像認識部１５は、検出窓サイズＤＷ２を２５×３７、検出用ＮＮのモデルを高解像度用のＤＮＮ（検出用のニューラルネットワークＮＮ２）で、画像データＳＤ２を解析する。画像データＳＤ２の画角は画像データＳＤ１と同様であるものとし、図２に示すように画像データＳＤ１、ＳＤ２において、画像データＳＤの大きさに対する解析対象ＡＯであるりんごの大きさは同様であるものとする。ニューラルネットワークＮＮ２は、検出対象ＤＯの含まれた検出窓サイズＤＷ２のデータで学習されているものとし、画像認識部１５の解析により、画像データＳＤ２から検出対象ＤＯが検出される。 Similarly, when the video analysis apparatus 1 obtains the high-resolution image data SD2 by digital sampling of the photographed data of the apple that is the analysis target AO, the image recognition unit 15 sets the detection window size DW2 to 25×37. , the model of the NN for detection is DNN for high resolution (neural network NN2 for detection), and the image data SD2 is analyzed. The angle of view of the image data SD2 is assumed to be the same as that of the image data SD1, and as shown in FIG. shall be The neural network NN2 is trained with data of a detection window size DW2 including the detection target DO, and the detection target DO is detected from the image data SD2 by the analysis of the image recognition unit 15. FIG.

なお、画像データＳＤ１またはＳＤ２、検出窓サイズＤＷ１またはＤＷ２、ニューラルネットワークＮＮ１またはＮＮ２は、特に区別しない場合は、それぞれ画像データＳＤ、検出窓サイズＤＷ、ニューラルネットワークＮＮと称する。また、本実施形態において、映像解析装置１は、図１のテーブル１４１、ＮＮモデルパラメータ１４２においてそれぞれ３つのデータを備える場合について示すが、３つ以上備えていてもよい。 The image data SD1 or SD2, detection window size DW1 or DW2, and neural network NN1 or NN2 are referred to as image data SD, detection window size DW, and neural network NN, respectively, unless otherwise distinguished. Further, in the present embodiment, the video analysis device 1 shows a case where each of the table 141 and the NN model parameter 142 in FIG. 1 has three data, but may have three or more.

図３は、実施形態に係る映像解析装置による解析処理を示すフローチャートである。 FIG. 3 is a flowchart showing analysis processing by the video analysis device according to the embodiment.

映像解析装置１において、映像入力部１１は、解析対象ＡＯ（図２におけるりんごに相当する）を含む画像データＳＤ（静止画像）を得ると、画角取得部１２は、画像データＳＤの画角情報を取得する（ステップ１０１）。なお、画角取得部１２は、静止画像を用いずに、別の経路で例えばユーザの設定などにより画角情報を取得してもよい。 In the video analysis device 1, when the video input unit 11 obtains the image data SD (still image) including the analysis target AO (corresponding to the apple in FIG. 2), the view angle acquisition unit 12 obtains the view angle of the image data SD. Information is acquired (step 101). Note that the angle-of-view acquisition unit 12 may acquire the angle-of-view information through a different route, for example, by user settings without using a still image.

検出用ＮＮ選択部１３は、記憶部１４のテーブル１４１（画角と検出窓サイズおよびＮＮモデルの関係テーブル）の情報を利用して、ステップＳ１０１において取得された画角情報に紐づけられた使用ＮＮモデルを選択する（ステップＳ１０２）。図１のＴＢ１の例でより具体的に説明すると、検出用ＮＮ選択部１３は、ステップＳ１０１において取得された解像度情報「７２０×４８０」に紐づけられた検出窓サイズＤＷ「１６×１６」、検出用ＮＮのモデル「検出用ＮＮ１」を選択する。 The detection NN selection unit 13 uses the information in the table 141 (the relationship table of the angle of view, the detection window size, and the NN model) of the storage unit 14 to select the usage data associated with the angle of view information acquired in step S101. An NN model is selected (step S102). More specifically, with the example of TB1 in FIG. Select the detection NN model "Detection NN1".

検出用ＮＮ選択部１３は、選択したＮＮ情報および検出窓サイズを画像認識部１５に伝えると、画像認識部１５は、映像入力部が出力した画像データから対象となる物体（検出対象ＤＯ）の検出処理を行う（ステップ１０３）。より具体的にステップ１０３においては、画像ＳＤ全体を画像認識部１５のＮＮへの入力に使うのではなく、検出窓ＤＷの単位でＮＮへ入力する。検出窓ＤＷは画像ＳＤのサイズより小さいため、画像認識部１５は、例えばまず画像ＳＤ上にて検出窓位置（範囲１とする）を決定して、範囲１についてＮＮ計算して解析する。範囲１の解析が終了したら、また別の検出窓位置（範囲２とする）を決定して、範囲２についてＮＮ計算するという流れで解析を実行する。以降、検出窓位置を少しずつずらして画像全体に対する解析を実行することでもよい。 When the NN selection unit 13 for detection transmits the selected NN information and the detection window size to the image recognition unit 15, the image recognition unit 15 identifies the target object (detection target DO) from the image data output by the video input unit. Detection processing is performed (step 103). More specifically, in step 103, instead of using the entire image SD for input to the NN of the image recognition section 15, the image is input to the NN in units of detection windows DW. Since the detection window DW is smaller than the size of the image SD, the image recognition unit 15 first determines, for example, the detection window position (range 1) on the image SD, and performs NN calculation and analysis for the range 1. FIG. After the analysis of range 1 is completed, another detection window position (range 2) is determined, and analysis is performed in the flow of NN calculation for range 2. FIG. Thereafter, the detection window position may be shifted little by little to analyze the entire image.

ＮＮ計算の結果、検出対象が検出された場合、結果出力部１６は、図示せぬモニタなどに出力することでもよい（ステップ１０４）。ステップ１０４における出力方法は、何かしらの映像を画面に出しても良いし、ログとして記憶部１４などのファイルに保管する方法でもよい。 When a detection target is detected as a result of the NN calculation, the result output unit 16 may output to a monitor (not shown) or the like (step 104). The output method in step 104 may be a method of displaying some kind of video on the screen, or a method of storing it in a file such as the storage unit 14 as a log.

以上の手順により、画角は同様だが解像度の異なる複数の画像データＳＤに対して、各画像ＳＤ中において一定の大きさである検出対象ＤＯを検出することができる。 According to the above procedure, a detection target DO having a constant size can be detected in each image SD for a plurality of image data SD having the same angle of view but different resolutions.

図４は、実施形態に係る映像解析装置が解析処理する映像サンプリングデータの例を示す模式図である。映像入力部１１への入力データの種類（映像、画像など）および検出対象ＤＯなどの性質により、検出範囲ＤＡを決定することが考えられる。 FIG. 4 is a schematic diagram showing an example of video sampling data analyzed by the video analysis device according to the embodiment. It is conceivable that the detection range DA is determined according to the type of input data (video, image, etc.) to the video input unit 11 and the nature of the detection target DO.

図４（ａ）は、画像データＳＤ１に複数の検出対象ＤＯ１１、ＤＯ１２が存在する場合の例である。映像解析装置１は、画像データＳＤ１全体に検出窓ＤＷを移動させながら図３の処理を実行することにより、検出対象ＤＯ１１、ＤＯ１２が検出できる。例えば、製品の品質チェックにおいて、製品の撮影映像を映像解析装置１に入力することにより、解析対象の製品にいくつの異常があるかを検出することができる。また例えば製品の良品検査においては、１つでも異常が検出された時点で不合格とする場合、結果出力部１６などが結果を出力するとともに、画像認識部１５における認識プロセスを終了させることでもよい。図４（ｂ）以降については、以下の実施形態において説明する。
（第２の実施形態）
本実施形態は、図４（ｂ）のケースにおいて検出対象ＤＯを検出する例を示す。 FIG. 4A shows an example in which a plurality of detection targets DO11 and DO12 exist in the image data SD1. The video analysis apparatus 1 can detect the detection targets DO11 and DO12 by executing the process of FIG. 3 while moving the detection window DW over the entire image data SD1. For example, in the quality check of a product, by inputting a photographed image of the product to the image analysis device 1, it is possible to detect how many abnormalities are present in the product to be analyzed. Further, for example, in the non-defective product inspection of a product, if even one abnormality is detected and the product is rejected, the result output unit 16 may output the result, and the recognition process in the image recognition unit 15 may be terminated. . FIG. 4B and subsequent figures will be described in the following embodiments.
(Second embodiment)
This embodiment shows an example of detecting a detection target DO in the case of FIG. 4(b).

図４（ｂ）は、画像データＳＤ２上の既知の特定領域に、１つもしくは複数の検出対象ＤＯが存在する場合の例である。図４（ｂ）のように、画像データＳＤ２内において対象物が存在する領域ＤＡが決まっている場合、映像解析装置１は、既知の特定領域を検出範囲ＤＡとし、その検出範囲ＤＡのみ認識を実行することでもよい。 FIG. 4(b) is an example in which one or a plurality of detection targets DO exist in a known specific area on the image data SD2. As shown in FIG. 4B, when the area DA where the object exists is determined in the image data SD2, the video analysis apparatus 1 sets the known specific area as the detection area DA, and recognizes only the detection area DA. can be executed.

図５は、第２の実施形態に係る映像解析装置による解析処理を示すフローチャートである。 FIG. 5 is a flowchart showing analysis processing by the video analysis device according to the second embodiment.

画角取得部１２は、入力画像データＳＤ２の画角情報を取得し、検出用ＮＮ選択部１３に入力する（ステップＳ２０１）。検出用ＮＮ選択部１３は、検出対象ＤＯ、検出範囲ＤＡを取得する（ステップＳ２０２）。検出対象ＤＯ、検出範囲ＤＡは、映像解析装置１に接続された例えばキーボードなどからユーザが設定してもよいし、図４（ｂ）のようなイメージを映像解析装置１に接続した図示せぬモニタなどに表示して、ユーザが検出対象ＤＯ、検出範囲ＤＡを設定できるようにしてもよい。 The angle-of-view acquisition unit 12 acquires the angle-of-view information of the input image data SD2, and inputs it to the detection NN selection unit 13 (step S201). The detection NN selection unit 13 acquires the detection target DO and the detection range DA (step S202). The detection target DO and the detection range DA may be set by the user from, for example, a keyboard connected to the video analysis apparatus 1, or may be set by a user (not shown) connected to the video analysis apparatus 1 with an image as shown in FIG. 4(b). It may be displayed on a monitor or the like so that the user can set the detection target DO and the detection range DA.

検出用ＮＮ選択部１３は、入力された画角情報、検出対象ＤＯ、検出範囲ＤＡに従って、検出窓サイズおよびＮＮモデルを選択し、画像認識部１５に設定する（ステップＳ２０２）。 The detection NN selection unit 13 selects a detection window size and an NN model according to the input angle of view information, detection target DO, and detection range DA, and sets them in the image recognition unit 15 (step S202).

図６は、第２の実施形態に係る映像解析装置が備えるテーブルデータの例である。 FIG. 6 is an example of table data provided in the video analysis apparatus according to the second embodiment.

テーブル１４１１は、記憶部１４に備えられたデータであり、図１のテーブル１４１の内容に加え、入力画像の解像度に「検出対象」、「検出範囲」が紐づけられている。例えば、データＴＢ１１は、映像入力部１１から入力された画像データの解像度が７２０×４８０の場合、解析に使用するニューラルネットワークＮＮのモデルを「検出用ＮＮ１」、検出用ＮＮ１の検出窓サイズを「１６×１６」（例えば図４（ｂ）のＤＷ２１とする）、検出対象ＤＯを「Ｘ部の異常」（例えば図４（ｂ）のＤＯ２１とする）、検出範囲ＤＡを「３６０ｘ１００＋３２０ｘ１２０」（例えば図４（ｂ）のＤＡ２１とする）とすることを示す。検出対象ＤＯ２１は、「Ｘ部の異常」として予めキズなどの異常の種類が決まっており、さらに検出範囲ＤＡ２１が座標（ｘ１、ｙ１）＝（３６０、１００）、（ｘ２、ｙ２）＝（３２０ｘ１２０）の２点を対角とする四角であることを示す。本実施形態における検出用ＮＮ１は、検出対象ＤＯ２１「Ｘ部の異常」を含む検出窓ＤＷ２１「１６×１６」の画像データで学習がなされる。 A table 1411 is data provided in the storage unit 14, and in addition to the contents of the table 141 in FIG. 1, the resolution of the input image is associated with the "detection target" and the "detection range". For example, in the data TB11, when the resolution of the image data input from the video input unit 11 is 720×480, the model of the neural network NN used for analysis is "detection NN1", and the detection window size of the detection NN1 is " 16×16” (for example, DW21 in FIG. 4(b)), the detection target DO is “abnormal part X” (for example, DO21 in FIG. 4(b)), and the detection range DA is “360×100+320×120” (for example, FIG. DA21 in 4(b)). For DO21 to be detected, the type of abnormality such as a scratch is determined in advance as "abnormality in X part", and detection range DA21 has coordinates (x1, y1)=(360, 100), (x2, y2)=(320x120). ) is a square diagonally opposite the two points. The detection NN 1 in this embodiment performs learning with the image data of the detection window DW 21 "16×16" including the detection target DO 21 "abnormality in X part".

また例えば、データＴＢ１４は、映像入力部１１から入力された画像データの解像度が７２０×４８０の場合、解析に使用するニューラルネットワークＮＮのモデルを「検出用ＮＮ１」、検出用ＮＮ１の検出窓サイズを「１６×１６」（例えば図４（ｂ）のＤＷ２２とする）、検出対象ＤＯを「Ｙ部の異常」（例えば図４（ｂ）のＤＯ２２とする）、検出範囲ＤＡを「１８０ｘ２４０＋２４０ｘ２４０」（例えば図４（ｂ）のＤＡ２２とする）とすることを示す。本実施形態における検出用ＮＮ１は、データＴＢ１１による学習に加え、データＴＢ１４による学習もなされる。なお、図４（ｂ）のＤＷ２１、ＤＷ２２の例のように、検出対象ＤＯの種類によって、検出窓ＤＷのサイズを変更してもよい。この場合は、ニューラルネットワークＮＮは、それぞれの検出窓ＤＷのサイズを用いて学習させ、異なるニューラルネットワークＮＮとすることでもよい。例えば図４（ｂ）において、特に検出対象ＤＯ２１（例えばＸ部異常とする）、検出対象ＤＯ２１（例えばＹ部異常とする）の特徴が大きく異なるときには、それぞれ別に学習した専用のＮＮを使う方が、検出対象ＤＯの認識精度が高くなることがある。 Further, for example, when the resolution of the image data input from the video input unit 11 is 720×480, the data TB14 sets the model of the neural network NN used for analysis to “detection NN1” and sets the detection window size of the detection NN1 to “16×16” (for example, DW22 in FIG. 4(b)), detection target DO as “abnormal part Y” (for example, DO22 in FIG. 4(b)), detection range DA as “180×240+240×240” (for example, DA22 in FIG. 4B). The detection NN1 in this embodiment learns not only from the data TB11 but also from the data TB14. Note that the size of the detection window DW may be changed according to the type of detection target DO, as in the example of DW21 and DW22 in FIG. 4(b). In this case, the neural networks NN may be trained using the sizes of the respective detection windows DW and may be different neural networks NN. For example, in FIG. 4B, when the features of the detection target DO 21 (for example, the X section is abnormal) and the detection target DO 21 (for example, the Y section is abnormal) are significantly different, it is better to use a dedicated NN learned separately. , the recognition accuracy of the detection target DO may be increased.

図５に戻り、検出用ＮＮ選択部１３は、記憶部１４などに格納されたテーブル１４１１の情報から使用ＮＮモデルを選択する（ステップＳ２０３）。画像認識部１５は、検出範囲ＤＡにおける検出窓ＤＷの範囲について、ステップＳ２０３で選択したＮＮにより検出対象ＤＯの検出処理を実行する（ステップＳ２０４）。ステップＳ２０４において検出対象ＤＯが検出された場合（例えば図４（ｂ）のＤＷ２１とする）、結果出力部１６は、検出した検出対象ＤＯの位置などの情報を記憶部１４などに格納することでもよい（ステップＳ２０５のＹｅｓ、ステップＳ２０６）。なお、ステップＳ２０６の代わりにステップＳ２０８に移り、結果出力部１６に結果を出力することでもよい。 Returning to FIG. 5, the detection NN selection unit 13 selects the NN model to be used from the information in the table 1411 stored in the storage unit 14 (step S203). The image recognition unit 15 executes detection processing of the detection target DO with the NN selected in step S203 for the range of the detection window DW in the detection range DA (step S204). When a detection target DO is detected in step S204 (for example, DW 21 in FIG. 4B), the result output unit 16 stores information such as the position of the detected detection target DO in the storage unit 14 or the like. Good (Yes in step S205, step S206). It should be noted that the process may proceed to step S208 instead of step S206 and the result may be output to the result output unit 16. FIG.

ステップＳ２０４において検出対象ＤＯが検出されなかった場合（例えば、図４（ｂ）のＤＡ２３の例）、次の検出範囲ＤＡに移り、ステップＳ２０３から同様の処理を繰り返す（ステップＳ２０５のＮｏ、ステップＳ２０７のＮｏ）。全ての検出範囲ＤＡに対する処理が終了したら、ステップＳ２０６で格納した検出対象ＤＯの情報を結果出力部１６に結果を出力することでもよい（ステップＳ２０７のＹｅｓ、ステップＳ２０８）。 If the detection target DO is not detected in step S204 (for example, the example of DA23 in FIG. 4B), the next detection range DA is entered, and the same processing is repeated from step S203 (No in step S205, step S207 No). When the processing for all detection ranges DA is completed, the information on the detection target DO stored in step S206 may be output to the result output unit 16 (Yes in step S207, step S208).

以上の手順により、複数の検出範囲ＤＡに対する画像認識が可能となる。本実施形態においては、検出対象ＤＯの認識範囲を映像全体ではなく、事前に設定した範囲（検出範囲ＤＡ）にのみ行うため、短時間で処理ができる。
（第３の実施形態）
本実施形態においては、図４（ｃ）の例のように、映像内に映っているある物体（解析対象ＡＯ）の領域において検出対象ＤＯを検出する例について示す。本実施形態の映像解析装置１は、例えば、食品検査において、映像（画像データＳＤ）に映っている解析対象ＡＯの異常を検査しようとした場合、まず解析対象ＡＯの検出を行い、その解析対象ＡＯの領域に対して検出対象ＤＯの検出を行う。 The above procedure enables image recognition for a plurality of detection areas DA. In the present embodiment, the recognition range of the detection target DO is not the entire image, but only the range set in advance (detection range DA), so that the processing can be performed in a short time.
(Third embodiment)
In this embodiment, as in the example of FIG. 4C, an example of detecting a detection target DO in a region of an object (analysis target AO) appearing in an image will be described. For example, in a food inspection, the image analysis apparatus 1 of the present embodiment first detects an analysis target AO, and then detects an analysis target AO in a video (image data SD). A detection target DO is detected in the AO area.

図７は、第３の実施形態に係る映像解析装置による解析処理を示すフローチャートである。以下、図４（ｃ）を例として説明する。 FIG. 7 is a flowchart showing analysis processing by the video analysis device according to the third embodiment. Hereinafter, FIG. 4C will be described as an example.

画角取得部１２は、入力画像データＳＤ３の画角情報を取得し、検出用ＮＮ選択部１３に入力する（ステップＳ３０１）。また同時に画像認識部１５においては、領域計算部１５２が画像データＳＤ３に対して解析対象ＡＯの検出処理をする（ステップＳ３０２）。ステップＳ３０２において、画像認識部１５は、解析対象ＡＯに関する情報（例えば、りんご）が予め設定されていることでもよい。例えば映像解析装置１に接続されたキーボードなどから、ユーザが画像認識部１５に解析対象ＡＯに関する情報を設定することでもよい。 The angle-of-view acquisition unit 12 acquires the angle-of-view information of the input image data SD3, and inputs it to the detection NN selection unit 13 (step S301). At the same time, in the image recognition unit 15, the area calculation unit 152 performs analysis target AO detection processing on the image data SD3 (step S302). In step S302, the image recognition unit 15 may be preset with information about the analysis target AO (for example, an apple). For example, the user may set information about the analysis target AO to the image recognition unit 15 using a keyboard connected to the video analysis apparatus 1 or the like.

領域計算部１５２は、入力された画像データＳＤ３から解析対象ＡＯの検出およびその領域の特定を行う。例えばリンゴの品質管理を行う例において、画像データＳＤ３内に複数のリンゴが存在した場合、それらリンゴの表示エリアを計算し、取得する。この特定には一般的なオブジェクト認識手法を用いる。表示エリアは、解析対象ＡＯそのものの形としてもよいし、解析対象ＡＯを含む矩形などとして表してもいいし、その他のフォーマットで表現しても良い。画像認識部１５は、算出された表示エリアを検出範囲ＤＡとする。図４（ｃ）の例では、ステップＳ３０２において、解析対象ＡＯとして解析対象ＡＯ３、解析対象ＡＯ３１が検出される。 The area calculation unit 152 detects the analysis target AO from the input image data SD3 and specifies the area thereof. For example, in an example of quality control of apples, if a plurality of apples exist in the image data SD3, the display areas of those apples are calculated and obtained. A general object recognition technique is used for this identification. The display area may have the shape of the analysis target AO itself, may be represented as a rectangle including the analysis target AO, or may be represented in another format. The image recognition unit 15 sets the calculated display area as the detection range DA. In the example of FIG. 4C, in step S302, analysis target AO3 and analysis target AO31 are detected as analysis target AO.

検出用ＮＮ選択部１３は、入力された画角情報、検出対象ＤＯを含む情報から、例えば図６のテーブル１４１１で紐づけられた検出窓サイズおよびＮＮモデルを選択し、画像認識部１５に設定する（ステップＳ３０３）。画像認識部１５は、入力映像の中の、領域計算部１５２から与えられた検出範囲ＤＡ（解析対象ＡＯの表示エリアに相当）に対して、選択された検出用ＮＮにより検出対象ＤＯの検出処理を行う（ステップＳ３０４）。ステップＳ３０４における処理の結果はモニタなどに出力されることでもよい（ステップＳ３０５）。ステップＳ３０３からＳ３０５の手順を、ステップＳ３０２において検出された解析対象ＡＯすべてに対して実行することでもよい。 The detection NN selection unit 13 selects, for example, the detection window size and the NN model linked in the table 1411 of FIG. (step S303). The image recognition unit 15 performs detection processing of the detection target DO using the selected detection NN for the detection range DA (corresponding to the display area of the analysis target AO) given from the area calculation unit 152 in the input video. (step S304). The result of the processing in step S304 may be output to a monitor or the like (step S305). The procedure from steps S303 to S305 may be executed for all the analysis target AOs detected in step S302.

本実施形態の映像解析装置１によれば、検出対象ＤＯの認識範囲を映像全体ではなく、事前に取得した解析対象ＡＯの表示エリアを検出範囲ＤＡとすることで、短時間で検出対象ＤＯの検出ができる。
（第４の実施形態）
本実施形態においては、図４（ｄ）の例のように、画像データＳＤ内に散らばっている検出対象ＤＯを検出する例について示す。例えば、アナログデータである映像フィルム（フィルムの１枚１枚）には、フィルムグレインという現象が発生することがある。本実施形態の映像解析装置１は、例えば画像データＳＤにおいて、検出窓の位置をランダムに決めて、検出窓内の画像データに対してフィルムグレインの検出処理する例を示す。 According to the video analysis apparatus 1 of the present embodiment, the recognition range of the detection target DO is not the entire image, but the detection range DA is the display area of the analysis target AO obtained in advance. can be detected.
(Fourth embodiment)
In the present embodiment, as in the example of FIG. 4D, an example of detecting detection targets DO scattered in the image data SD will be described. For example, a phenomenon called film grain may occur in video film (one film at a time), which is analog data. The video analysis apparatus 1 of this embodiment randomly determines the position of the detection window in the image data SD, for example, and performs film grain detection processing on the image data within the detection window.

図８は、第４の実施形態に係る映像解析装置による解析処理を示すフローチャートである。以下、図４（ｄ）を例として説明する。 FIG. 8 is a flowchart showing analysis processing by the video analysis device according to the fourth embodiment. Hereinafter, FIG. 4D will be described as an example.

画角取得部１２は、入力画像データＳＤ４の画角情報を取得し、検出用ＮＮ選択部１３に入力する（ステップＳ４０１）。検出用ＮＮ選択部１３は、検出対象ＤＯ（フィルムグレインに相当する）の情報を取得する（ステップＳ４０２）。検出対象ＤＯの情報は、映像解析装置１に接続された例えばキーボードなどからユーザが検出用ＮＮ選択部１３に設定してもよい。 The angle-of-view acquisition unit 12 acquires the angle-of-view information of the input image data SD4, and inputs it to the detection NN selection unit 13 (step S401). The detection NN selection unit 13 acquires information on the detection target DO (corresponding to film grain) (step S402). The information of the DO to be detected may be set in the NN selection unit 13 for detection by the user from, for example, a keyboard connected to the video analysis apparatus 1 .

検出用ＮＮ選択部１３は、取得した画角情報、検出対象ＤＯに従って、例えば図６のテーブル１４１１から、紐づけられる検出窓サイズおよびＮＮモデルを選択し、画像認識部１５に設定する（ステップＳ４０３）。ステップＳ４０３において選択されたニューラルネットワークＮＮは、例えば図６のテーブル１４１１における検出対象「フィルムグレイン」を含む検出窓サイズのデータで予めフィルムグレインが学習されている。 The detection NN selection unit 13 selects the associated detection window size and NN model from, for example, the table 1411 in FIG. ). For the neural network NN selected in step S403, the film grain is learned in advance with data of the detection window size including the detection object "film grain" in the table 1411 of FIG. 6, for example.

また、図６のテーブル１４１１において、検出対象「フィルムグレイン」に対してさらに紐づける項目を増やしてもよい。例えば、項目としてフィルムグレインを検出する映像フィルムの「撮影年代」、実写、アニメなどの「映像種類」が考えられる。また映像フィルムから画像データを取り出すプロセスの違い、例えば、アナログデータからデジタル化したデータか、デジタル化したデータをさらにサンプリングしたデータかなどの「デジタル化プロセス」、ブルーレイ、ＤＶＤ、４Ｋ／２Ｋブルーレイなどの「データ源」、フィルムのサイズ（８ｍｍ、１６ｍｍ、２４ｍｍ、３５ｍｍなど）の「フィルムサイズ」などの項目も考えられる。これらの項目を考慮した場合には、それぞれに対してニューラルネットワークＮＮの検出窓サイズおよびＮＮモデルを設定することでもよい。また、項目の組み合わせによって、別々のＮＮモデルを設定することでもよく、組み合わせた項目に合致するデータを用いて紐づけられたＮＮモデルの学習を実施する。 Also, in the table 1411 of FIG. 6, the number of items linked to the detection target "film grain" may be increased. For example, the item may be the "filming age" of the video film whose film grain is to be detected, and the "video type" such as live action or animation. In addition, the difference in the process of extracting image data from video film, for example, "digitization process" such as data digitized from analog data or data further sampled from digitized data, Blu-ray, DVD, 4K / 2K Blu-ray, etc. Items such as "data source" for the size of the film (8mm, 16mm, 24mm, 35mm, etc.) and "film size" are also possible. If these items are considered, the detection window size of the neural network NN and the NN model may be set for each. Also, different NN models may be set depending on the combination of items, and learning of the associated NN model is performed using data matching the combined items.

画像認識部１５は、検出窓ＤＷ４の位置を決定する（ステップＳ４０４）。ステップＳ２０４においては、画像データＳＤ４中の任意の位置にランダムに検出窓ＤＷ４の位置を決定することでもよいし、ある規則に従って決定した画像データＳＤ４中の位置を検出窓ＤＷ４の位置として決定することでもよい。 The image recognition unit 15 determines the position of the detection window DW4 (step S404). In step S204, the position of the detection window DW4 may be randomly determined at an arbitrary position in the image data SD4, or the position in the image data SD4 determined according to a certain rule may be determined as the position of the detection window DW4. It's okay.

ステップＳ４０４において位置選択された検出窓ＤＷ４のデータに対して、図３のステップＳ１０３と同様に、ステップＳ４０３で選択したＮＮにより検出対象ＤＯの検出処理を実行する（ステップＳ４０５）。検出対象ＤＯが検出された場合（ステップＳ４０６のＹｅｓ）、結果出力部１６は、検出した検出対象ＤＯの位置などの情報を、外部のモニタなどに出力することでもよい（ステップＳ４０７）。一方、検出対象ＤＯが検出されなかった場合（ステップＳ４０６のＮｏ）、ステップＳ４０４に戻り、検出窓ＤＷ４に対する次の位置を決定し、同様の処理を繰り返す。戻った場合のステップＳ４０４において画像認識部１５は、前回までに決定された検出窓ＤＷ４の範囲に重ならないように検出窓ＤＷ４の位置を決定することが望ましい。 For the data of the detection window DW4 whose position is selected in step S404, the NN selected in step S403 performs detection processing of the detection target DO (step S405), as in step S103 of FIG. When the detection target DO is detected (Yes in step S406), the result output unit 16 may output information such as the position of the detected detection target DO to an external monitor or the like (step S407). On the other hand, if the detection target DO is not detected (No in step S406), the process returns to step S404 to determine the next position with respect to the detection window DW4 and repeat the same processing. In step S404 when returning, the image recognition unit 15 preferably determines the position of the detection window DW4 so as not to overlap the range of the detection window DW4 determined up to the last time.

以上の手順により、映像フィルムをデジタルサンプリングして得た画像データＳＤに発生するフィルムグレインの検出が可能となる。 By the above procedure, it is possible to detect the film grain occurring in the image data SD obtained by digitally sampling the video film.

以上に述べた少なくとも１つの実施形態によれば、解像度の異なる複数の画像データを解析する映像解析装置および映像解析方法を提供することができる。 According to at least one embodiment described above, it is possible to provide a video analysis apparatus and a video analysis method for analyzing a plurality of image data with different resolutions.

なお、図面に示した解析画面などに表示される条件パラメータやそれらに対する選択肢、値、評価指標などの名称や定義、種類などは、本実施形態において一例として示したものであり、本実施形態に示されるものに限定されるものではない。 It should be noted that the names, definitions, types, etc. of the conditional parameters, options, values, evaluation indices, etc. displayed on the analysis screens shown in the drawings are shown as examples in the present embodiment. It is not limited to what is shown.

本発明のいくつかの実施形態を説明したが、これらの実施形態は例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。さらにまた、請求項の各構成要素において、構成要素を分割して表現した場合、或いは複数を合わせて表現した場合、或いはこれらを組み合わせて表現した場合であっても本発明の範疇である。また、複数の実施形態を組み合わせてもよく、この組み合わせで構成される実施例も発明の範疇である。 While several embodiments of the invention have been described, these embodiments have been presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof. Furthermore, in each constituent element of the claims, even if the constituent element is divided and expressed, a plurality of constituent elements are expressed together, or a combination of these is expressed, it is within the scope of the present invention. Moreover, a plurality of embodiments may be combined, and examples configured by such combinations are also within the scope of the invention.

また、図面は、説明をより明確にするため、実際の態様に比べて、各部の幅、厚さ、形状等について模式的に表される場合がある。ブロック図においては、結線されていないブロック間もしくは、結線されていても矢印が示されていない方向に対してもデータや信号のやり取りを行う場合もある。フローチャートに示す処理は、ＩＣチップ、デジタル信号処理プロセッサ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒまたはＤＳＰ）などのハードウェアもしくはマイクロコンピュータを含めたコンピュータなどで動作させるソフトウェア（プログラムなど）またはハードウェアとソフトウェアの組み合わせによって実現してもよい。また請求項を制御ロジックとして表現した場合、コンピュータを実行させるインストラクションを含むプログラムとして表現した場合、及び前記インストラクションを記載したコンピュータ読み取り可能な記録媒体として表現した場合でも本発明の装置を適用したものである。また、使用している名称や用語についても限定されるものではなく、他の表現であっても実質的に同一内容、同趣旨であれば、本発明に含まれるものである。 Also, in order to make the description clearer, the drawings may schematically show the width, thickness, shape, etc. of each part compared to the actual mode. In the block diagram, data and signals may be exchanged between unconnected blocks, or between connected blocks in directions not indicated by arrows. The processes shown in the flowcharts are implemented by hardware such as IC chips and digital signal processors (DSP), software (such as programs) operated by computers including microcomputers, or a combination of hardware and software. may In addition, when the claims are expressed as control logic, when expressed as a program including instructions to be executed by a computer, and when expressed as a computer-readable recording medium in which the instructions are written, the apparatus of the present invention is applied. be. Also, the names and terms used are not limited, and other expressions are included in the present invention as long as they have substantially the same content and the same meaning.

１…映像解析装置、１１…映像入力部、１２…画角取得部、１３…検出用ＮＮ選択部、１４…記憶部、１５…画像認識部、１６…結果出力部、１４１…テーブル、１４２…ＮＮモデルパラメータ、１５１…特徴検出部、１５２…領域計算部。 DESCRIPTION OF SYMBOLS 1... Video analysis apparatus 11... Video input part 12... Angle-of-view acquisition part 13... NN selection part for detection 14... Storage part 15... Image recognition part 16... Result output part 141... Table 142... NN model parameters, 151... Feature detection unit, 152... Area calculation unit.

Claims

映像フレームをデジタルサンプリングして得た画像データの解像度に応じてニューラルネットワークの検出窓のサイズ及びモデルパラメータを切り替えて、前記画像データから検出対象を検出する映像解析装置。 A video analysis apparatus for detecting a detection target from image data by switching the detection window size and model parameters of a neural network according to the resolution of image data obtained by digitally sampling a video frame.

前記画像データは、表示範囲である画角が調整されたデータである請求項１に記載の映像解析装置。 2. The video analysis apparatus according to claim 1, wherein said image data is data in which an angle of view, which is a display range, is adjusted.

前記解像度に紐づけられたニューラルネットワークの前記検出窓のサイズと前記モデルパラメータとを格納する記憶部を備える請求項１または請求項２のいずれか１項に記載の映像解析装置。 3. The video analysis apparatus according to claim 1, further comprising a storage unit that stores the size of the detection window of the neural network linked to the resolution and the model parameters.

前記解像度と予め決められた検出対象とに紐づけられた前記ニューラルネットワークの前記検出窓のサイズと前記モデルパラメータとを格納する記憶部を備える請求項３に記載の映像解析装置。 4. The video analysis apparatus according to claim 3, further comprising a storage unit that stores the size of the detection window of the neural network and the model parameters associated with the resolution and a predetermined detection target.

前記解像度と予め決められた前記画像データ上の検出範囲とに紐づけられた前記ニューラルネットワークの前記検出窓のサイズと前記モデルパラメータとを格納する記憶部を備える請求項３に記載の映像解析装置。 4. The video analysis apparatus according to claim 3, further comprising a storage unit for storing the size of the detection window of the neural network and the model parameters associated with the resolution and a predetermined detection range on the image data. .

前記画像データから特定の物体を検出し、前記物体から前記検出対象を検出する請求項１乃至請求項５のいずれか１項に記載の映像解析装置。 6. The video analysis apparatus according to any one of claims 1 to 5, wherein a specific object is detected from the image data, and the detection target is detected from the object.

前記映像フレームのフレームグレインを前記検出対象として検出する請求項１乃至請求項５のいずれか１項に記載の映像解析装置。 6. The video analysis apparatus according to any one of claims 1 to 5, wherein the frame grain of the video frame is detected as the detection target.

映像フレームをデジタルサンプリングして得た画像データの解像度に応じてニューラルネットワークの検出窓のサイズ及びモデルパラメータを切り替えて、前記画像データから検出対象を検出する映像解析方法。 A video analysis method for detecting a detection target from the image data by switching the detection window size and model parameters of a neural network according to the resolution of image data obtained by digitally sampling a video frame.