JP2022068012A

JP2022068012A - Information processing device and information processing method

Info

Publication number: JP2022068012A
Application number: JP2020176922A
Authority: JP
Inventors: 功雄清水; Norio Shimizu
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2022-05-09

Abstract

To provide a technique for acquiring from a video a partial video of a frame section in which each annotator performs annotation work taking into account the quality of annotation work for the video and person-hours involved in the annotation work.SOLUTION: From a video, a first partial video corresponding to a first frame section and a second partial video corresponding to a second frame section having an overlapping portion with the first frame section are acquired. The length of the overlapping portion is controlled on the basis of a parameter representing the proficiency of an operator performing annotation work on each of the first partial video and the second partial video and/or a parameter pertaining to the video.SELECTED DRAWING: Figure 1

Description

本発明は、動画に対するアノテーション作業に係る技術に関するものである。 The present invention relates to a technique relating to annotation work for moving images.

機械学習に用いられる学習データを作成するためのプログラムとして、対象となるデータに対して学習すべき正解情報（正解ラベル）を付与するアノテーションツールが利用されている。アノテーションツールには、例えば、学習データの作成に係るユーザへの作業負荷を低減するための機能群（すなわち、ユーザ補助のための機能群）が用意されている場合がある。動画のアノテーションを行う際においては、シーンの切り替わりや動画再生時間の長さに応じて作業量が増加することから、任意の数に分割した上でアノテーション作業者に割り当てることがある。 As a program for creating learning data used for machine learning, an annotation tool that assigns correct answer information (correct answer label) to be learned to the target data is used. The annotation tool may be provided with, for example, a function group for reducing the workload on the user related to the creation of learning data (that is, a function group for accessibility). When annotating a moving image, the amount of work increases according to the change of scenes and the length of the moving image playback time, so the annotation worker may be assigned after dividing into an arbitrary number.

特開２０１９－１５９８１９号公報Japanese Unexamined Patent Publication No. 2019-159819 特開２００９－１４７５３８号公報Japanese Unexamined Patent Publication No. 2009-147538

動画のアノテーション作業における作業工数を最小限にすることを優先させる場合、既存ツールにおいてもフレーム単位で動画を分割して複数名のアノテーション作業者に割り当てることは可能である。しかし、人手で行われる動画のアノテーション作業は作業難易度が高く、アノテーション作業者のスキル不足・経験不足により、付加するラベルを誤って選択するような状況が想定され得る。そのため、品質チェックの取り組みは重要である。しかしながら、重複箇所を確保せずに単純分割を行った場合には、システムによる作業内容の自動チェックを行うことができないため、品質確保が難しい。 When giving priority to minimizing the work man-hours in the annotation work of the video, it is possible to divide the video in frame units and assign it to a plurality of annotation workers even in the existing tool. However, the manual animation annotation work has a high degree of difficulty, and it is possible that the annotation worker may mistakenly select the label to be added due to lack of skill or experience. Therefore, quality check efforts are important. However, if simple division is performed without securing overlapping parts, it is difficult to ensure quality because the system cannot automatically check the work contents.

一方で、品質確保を優先的に行う場合、複数名のアノテーション作業者が同一の対象動画に対してアノテーション作業を行うことは可能である。しかし、長時間のアノテーション対象動画に対して、重複する動画に対するアノテーション作業になるため、トータルの作業工数（コスト）として増大してしまう可能性が高い。 On the other hand, when quality assurance is prioritized, it is possible for a plurality of annotation workers to perform annotation work on the same target video. However, there is a high possibility that the total work man-hours (cost) will increase because the annotation work is performed for duplicate videos for a long-time annotation target video.

本発明では、動画に対するアノテーション作業の品質および該アノテーション作業の工数を考慮して、該動画からそれぞれのアノテーション作業者がアノテーション作業を行うフレーム区間の部分動画を取得するための技術を提供する。 The present invention provides a technique for acquiring a partial moving image of a frame section in which each annotation worker performs an annotation work from the moving image in consideration of the quality of the annotation work for the moving image and the man-hours of the annotating work.

本発明の一様態は、動画から、第１フレーム区間に対応する第１部分動画と、該第１フレーム区間と重複部分を有する第２フレーム区間に対応する第２部分動画と、を取得する取得手段と、前記第１部分動画および前記第２部分動画のそれぞれに対するアノテーション作業を行う作業者の熟練度を表すパラメータおよび／または前記動画に係るパラメータに基づいて、前記重複部分の長さを制御する制御手段とを備えることを特徴とする。 The uniform state of the present invention is to acquire, from a moving image, a first partial moving image corresponding to a first frame section and a second partial moving image corresponding to a second frame section having an overlapping portion with the first frame section. The length of the overlapping portion is controlled based on the means and the parameter representing the skill level of the operator performing the annotation work for each of the first partial moving image and the second partial moving image and / or the parameter related to the moving image. It is characterized by being provided with a control means.

本発明の構成によれば、動画に対するアノテーション作業の品質および該アノテーション作業の工数を考慮して、該動画からそれぞれのアノテーション作業者がアノテーション作業を行うフレーム区間の部分動画を取得することができる。 According to the configuration of the present invention, the partial moving image of the frame section in which each annotation worker performs the annotation work can be obtained from the moving image in consideration of the quality of the annotation work for the moving image and the man-hours of the annotating work.

システムの構成例を示すブロック図。A block diagram showing a system configuration example. 情報処理装置１００のハードウェア構成例を示すブロック図。The block diagram which shows the hardware configuration example of the information processing apparatus 100. 情報処理装置１００が行う処理のフローチャート。The flowchart of the process performed by the information processing apparatus 100. 情報処理装置１００が行う処理のフローチャート。The flowchart of the process performed by the information processing apparatus 100. 情報処理装置１００が行う処理のフローチャート。The flowchart of the process performed by the information processing apparatus 100. 情報処理装置１００が行う処理のフローチャート。The flowchart of the process performed by the information processing apparatus 100. ＧＵＩ７０１の表示例を示す図。The figure which shows the display example of GUI701. ステップＳ３０２およびステップＳ３０３における処理の詳細を示すフローチャート。The flowchart which shows the detail of the process in step S302 and step S303. 重複部分の長さ（重複幅）の制御例を示す図。The figure which shows the control example of the length (overlap width) of the overlap part. 重複部分の長さ（重複幅）の制御例を示す図。The figure which shows the control example of the length (overlap width) of the overlap part. 重複部分の長さ（重複幅）の制御例を示す図。The figure which shows the control example of the length (overlap width) of the overlap part.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential for the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are given the same reference numbers, and duplicate explanations are omitted.

所謂機械学習に基づいて学習モデルの学習（学習モデルの構築）を行う手法の一例として、教師あり学習が挙げられる。教師あり学習では、学習モデルへの入力となるデータと、該データから予測すべき正解ラベルと、を関連付けた学習データを含むデータセットが、学習モデルの構築に使用される。このようなデータセットが存在しない場合には、例えば、入力となるデータを収集した後に、アノテーション作業者（作業者）が該データに対してアノテーションとして正解ラベルを付加するアノテーション作業を行うことでデータセットの構築を行う。このようなアノテーション作業において作業者がより容易にアノテーション作業を行えるようにすることを目的として、データに対して正解ラベルを付加する作業を支援する機能を有したアノテーションツールが用いられる場合がある。 Supervised learning is an example of a method for learning a learning model (construction of a learning model) based on so-called machine learning. In supervised learning, a data set containing training data in which data to be input to the learning model and a correct answer label to be predicted from the data are associated with each other is used for constructing the learning model. When such a data set does not exist, for example, after collecting input data, an annotation worker (worker) performs annotation work to add a correct answer label as an annotation to the data. Build the set. For the purpose of making it easier for workers to perform annotation work in such annotation work, an annotation tool having a function of supporting the work of adding a correct answer label to data may be used.

アノテーションツールは、アノテーションの対象となる画像・動画・文書等のようなデータ（以下、「対象データ」とも称する）を作業者に対して提示したうえで、作業者から該対象データに対してアノテーションとして付加する正解ラベルの指定を受け付ける。そして、アノテーションツールは、対象データに対して、作業者から指定された正解ラベルをアノテーションとして付加することで、データセットに含める学習データを生成する。 The annotation tool presents data such as images, videos, documents, etc. to be annotated (hereinafter, also referred to as "target data") to the worker, and then the worker annotates the target data. Accepts the specification of the correct answer label to be added as. Then, the annotation tool generates learning data to be included in the data set by adding the correct answer label specified by the worker as an annotation to the target data.

各種のアノテーションツールの中には、上述した対象データに対して正解ラベルを付加するラベル付け作業を効率化するために、事前の機械学習に基づき構築された学習モデル（以降では、「学習済みモデル」とも称する）を利用するツールがある。具体的な一例として、学習済みモデルを利用するツールは、該学習済みモデルに対象データを解析させることで、該対象データにアノテーションとして付加される正解ラベルの候補を抽出させ、抽出された正解ラベルの候補を作業者に提示する。これにより、作業者は、アノテーションツールから提示された候補の中から、対象データに対して正解ラベルとして付加する候補を選択することが可能となる。 Among various annotation tools, a learning model constructed based on prior machine learning in order to streamline the labeling work of adding a correct label to the above-mentioned target data (hereinafter, "trained model"). There is a tool that uses (also called). As a specific example, a tool that uses a trained model causes the trained model to analyze the target data to extract candidates for correct labels to be added as annotations to the target data, and the extracted correct label. Candidates for the worker are presented to the worker. As a result, the worker can select a candidate to be added as a correct label to the target data from the candidates presented by the annotation tool.

以下の説明では、アノテーション作業の対象となる動画（映像・アニメーション・ストリーミングデータなどを含む）にアノテーションとして付加する正解ラベルは、少なくとも文字情報を含むものとする。しかし、正解ラベルがどのような情報であっても、どのような情報を含んでもよく、正解ラベルが特定の情報であること、特定の情報を含むこと、に限定しない。 In the following description, the correct label to be added as an annotation to the moving image (including video, animation, streaming data, etc.) to be annotated shall include at least character information. However, the correct answer label may contain any information, and is not limited to the fact that the correct answer label is specific information or contains specific information.

上記に例示したアノテーションツールによる、動画に対するアノテーション作業用の機能として、該動画におけるシーンの切り替わりのフレームや動画再生時間の長さに応じて該動画を複数の部分動画に分割し、該分割したそれぞれの部分動画を複数の作業者のそれぞれに割り当てることがある。 As a function for annotating a video by the annotation tool exemplified above, the video is divided into a plurality of partial videos according to the frame of scene switching in the video and the length of the video playback time, and each of the divided videos is divided. Partial video may be assigned to each of multiple workers.

一方で、人手で行われる動画のアノテーション作業においては、作業者の習熟度が低い場合や作業ミスによって誤ってラベルを選択するような状況が想定される。そのため、品質確保の対応として、部分動画に対して作業者がアノテーション付けした結果に対して人手による目視確認を行っており、品質確保に労力が掛かってしまう可能性がある。また、品質確保を優先的に行う場合には、上記の如く、トータルの作業工数としては増大してしまう可能性が高い。 On the other hand, in the manual annotation work of moving images, it is assumed that the worker's proficiency level is low or that a label is mistakenly selected due to a work mistake. Therefore, as a measure to ensure quality, the result of annotation by the worker on the partial moving image is visually confirmed manually, which may take labor to ensure quality. Further, when quality assurance is prioritized, there is a high possibility that the total work man-hours will increase as described above.

以下では、動画に対するアノテーション作業の品質および該アノテーション作業の工数を考慮し、該動画からそれぞれの作業者がアノテーション作業を行うフレーム区間の部分動画を取得するための技術について説明する。 Hereinafter, a technique for acquiring a partial moving image of a frame section in which each worker performs an annotation work will be described in consideration of the quality of the annotation work for the moving image and the man-hours of the annotating work.

まず、本実施形態に係るシステムの構成例について、図１のブロック図を用いて説明する。図１に示す如く、本実施形態に係るシステムは、情報処理装置１００および端末装置１９０を有し、情報処理装置１００と端末装置１９０との間は有線および／または無線のネットワーク１８０を介して互いにデータ通信が可能なように構成されている。また、情報処理装置１００には、入力装置１０１および出力装置１１７が接続されている。 First, a configuration example of the system according to the present embodiment will be described with reference to the block diagram of FIG. As shown in FIG. 1, the system according to the present embodiment has an information processing device 100 and a terminal device 190, and the information processing device 100 and the terminal device 190 are connected to each other via a wired and / or wireless network 180. It is configured to enable data communication. Further, an input device 101 and an output device 117 are connected to the information processing device 100.

まず、入力装置１０１について説明する。入力装置１０１は、情報処理装置１００への各種の情報入力を行うためにユーザが操作するユーザインターフェースである。このようなユーザインターフェースには、例えば、キーボード、マウス、タッチパネル画面、トラックボール、ペンタブレットなどが適用可能である。 First, the input device 101 will be described. The input device 101 is a user interface operated by the user to input various information to the information processing device 100. For example, a keyboard, a mouse, a touch panel screen, a trackball, a pen tablet, and the like can be applied to such a user interface.

次に、出力装置１１７について説明する。出力装置１１７は、情報処理装置１００による処理結果を画像や文字などでもって表示することができる表示装置である。このような表示装置には、例えば、液晶画面やタッチパネル画面を有する表示装置が適用可能である。なお、出力装置１１７は表示装置に限らず、例えば、各種の情報を外部の装置に対して送信する通信機器であっても良い。 Next, the output device 117 will be described. The output device 117 is a display device capable of displaying the processing result of the information processing device 100 with images, characters, and the like. For such a display device, for example, a display device having a liquid crystal screen or a touch panel screen can be applied. The output device 117 is not limited to the display device, and may be, for example, a communication device that transmits various types of information to an external device.

次に、端末装置１９０について説明する。端末装置１９０は、作業者が自身に割り当てられた部分動画に対するアノテーション作業を行うために操作する装置である。端末装置１９０は、アノテーション作業の対象となる部分動画を情報処理装置１００からダウンロードする。作業者は端末装置１９０を操作して部分動画に対するアノテーション作業を行い、端末装置１９０は、該アノテーション作業に応じて該部分画像の各フレームに対して正解ラベルを付与する。そして端末装置１９０は、作業者からアップロードの指示を受けると、該アノテーション作業済みの部分動画を情報処理装置１００にアップロードする。図１では、説明を簡単にするために、ネットワーク１８０に接続されている端末装置１９０の台数を１としているが、実際には、複数台（例えば作業者の人数分）の端末装置１９０がネットワーク１８０に接続されている。そしてそれぞれの端末装置１９０は同様に動作して、アノテーション作業済みの部分動画を生成し、該アノテーション作業済みの部分動画を情報処理装置１００にアップロードする。 Next, the terminal device 190 will be described. The terminal device 190 is a device operated by a worker to perform annotation work on a partial moving image assigned to himself / herself. The terminal device 190 downloads a partial moving image to be annotated from the information processing device 100. The operator operates the terminal device 190 to perform annotation work on the partial moving image, and the terminal device 190 assigns a correct answer label to each frame of the partial image according to the annotation work. Then, when the terminal device 190 receives an upload instruction from the worker, the terminal device 190 uploads the annotation-operated partial moving image to the information processing device 100. In FIG. 1, for the sake of simplicity, the number of terminal devices 190 connected to the network 180 is set to 1, but in reality, a plurality of terminal devices 190 (for example, the number of workers) are connected to the network. It is connected to 180. Then, each terminal device 190 operates in the same manner to generate a partial moving image having been annotated, and uploads the annotated partial moving image to the information processing device 100.

次に、情報処理装置１００について説明する。情報処理装置１００は、動画から各作業者に割り当てる部分動画を生成すると共に、各作業者によるアノテーション作業を評価するためのコンピュータ装置である。このようなコンピュータ装置には、例えば、スマートフォン、タブレット端末装置、ＰＣ（パーソナルコンピュータ）などが適用可能である。 Next, the information processing apparatus 100 will be described. The information processing device 100 is a computer device for generating a partial moving image to be assigned to each worker from the moving image and evaluating the annotation work by each worker. For example, a smartphone, a tablet terminal device, a PC (personal computer), or the like can be applied to such a computer device.

情報処理装置１００のハードウェア構成例について、図２のブロック図を用いて説明する。なお、図２に示した構成は、情報処理装置１００に適用可能なハードウェア構成の一例であり、図２に示した構成に限定するものではなく、適宜変更／変形が可能である。 An example of the hardware configuration of the information processing apparatus 100 will be described with reference to the block diagram of FIG. The configuration shown in FIG. 2 is an example of a hardware configuration applicable to the information processing apparatus 100, and is not limited to the configuration shown in FIG. 2, and can be appropriately changed / modified.

ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているコンピュータプログラムやデータを用いて各種の処理を実行する。これによりＣＰＵ２０１は、情報処理装置１００全体の動作制御を行うと共に、情報処理装置１００が行うものとして説明する各種の処理を実行もしくは制御する。 The CPU 201 executes various processes using computer programs and data stored in the ROM 202 and the RAM 203. As a result, the CPU 201 controls the operation of the entire information processing apparatus 100, and also executes or controls various processes described as those performed by the information processing apparatus 100.

ＲＯＭ２０２には、情報処理装置１００の設定データ、情報処理装置１００の起動に係るコンピュータプログラムやデータ、情報処理装置１００の基本動作に係るコンピュータプログラムやデータ、などが格納されている。 The ROM 202 stores setting data of the information processing device 100, computer programs and data related to the activation of the information processing device 100, computer programs and data related to the basic operation of the information processing device 100, and the like.

ＲＡＭ２０３は、ＲＯＭ２０２や補助記憶装置２０４からロードされたコンピュータプログラムやデータを格納するためのエリア、Ｉ／Ｆ２０７を介して外部の装置から受信したデータを格納するためのエリア、を有する。さらにＲＡＭ２０３は、ＣＰＵ２０１が各種の処理を実行する際に用いるワークエリアを有する。このようにＲＡＭ２０３は，各種のエリアを適宜提供することができる。 The RAM 203 has an area for storing computer programs and data loaded from the ROM 202 and the auxiliary storage device 204, and an area for storing data received from an external device via the I / F 207. Further, the RAM 203 has a work area used by the CPU 201 when executing various processes. As described above, the RAM 203 can appropriately provide various areas.

補助記憶装置２０４は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などの不揮発性メモリである。補助記憶装置２０４には、ＯＳ（オペレーティングシステム）、情報処理装置１００が行うものとして説明する各種の処理をＣＰＵ２０１に実行もしくは制御させるためのコンピュータプログラムやデータ、などが保存されている。補助記憶装置２０４に保存されているコンピュータプログラムやデータは、ＣＰＵ２０１による制御に従って適宜ＲＡＭ２０３にロードされ、ＣＰＵ２０１による処理対象となる。なお、図１に示したＤＢ１０３およびＤＢ１１３は補助記憶装置２０４内に設けることができる。 The auxiliary storage device 204 is a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The auxiliary storage device 204 stores an OS (operating system), computer programs and data for causing the CPU 201 to execute or control various processes described as those performed by the information processing device 100. The computer programs and data stored in the auxiliary storage device 204 are appropriately loaded into the RAM 203 according to the control by the CPU 201, and are processed by the CPU 201. The DB 103 and DB 113 shown in FIG. 1 can be provided in the auxiliary storage device 204.

Ｉ／Ｆ２０７は、上記のネットワーク１８０を介して端末装置１９０との間のデータ通信を行うために利用される通信インターフェースである。ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、出力装置１１７、入力装置１０１、Ｉ／Ｆ２０７は何れもシステムバス２０８に接続されている。 The I / F 207 is a communication interface used for performing data communication with the terminal device 190 via the network 180. The CPU 201, ROM 202, RAM 203, auxiliary storage device 204, output device 117, input device 101, and I / F 207 are all connected to the system bus 208.

以下では、図１に示した情報処理装置１００の機能部を処理の主体として説明する。しかし実際には、該機能部の機能をＣＰＵ２０１に実行させるためのコンピュータプログラムを該ＣＰＵ２０１が実行することで、該機能部の機能が実現される。これは端末装置１９０についても同様である。つまり、以下では、図１に示した端末装置１９０の機能部を処理の主体として説明する。しかし、例えば、端末装置１９０が図２に示したハードウェア構成を有する場合、該機能部の機能をＣＰＵ２０１に実行させるためのコンピュータプログラムを該ＣＰＵ２０１が実行することで、該機能部の機能が実現される。 Hereinafter, the functional unit of the information processing apparatus 100 shown in FIG. 1 will be described as the main body of processing. However, in reality, the function of the functional unit is realized by the CPU 201 executing a computer program for causing the CPU 201 to execute the function of the functional unit. This also applies to the terminal device 190. That is, in the following, the functional unit of the terminal device 190 shown in FIG. 1 will be described as the main body of the process. However, for example, when the terminal device 190 has the hardware configuration shown in FIG. 2, the function of the functional unit is realized by the CPU 201 executing a computer program for causing the CPU 201 to execute the function of the functional unit. Will be done.

以下では、動画に対するアノテーション作業を複数の作業者が行うので、情報処理装置１００は、該複数の作業者のそれぞれについて、該動画から、該作業者がアノテーション作業を行う対象となるフレーム区間の部分動画を取得する。そして情報処理装置１００は、それぞれの作業者の部分動画を管理し、作業者の端末装置１９０から部分画像のダウンロード要求を受けると、ダウンロード要求の対象となる部分画像を該端末装置１９０に対して送信する。情報処理装置１００がそれぞれの作業者に対する部分動画を生成するために行う処理について、図３のフローチャートに従って説明する。 In the following, since a plurality of workers perform annotation work on the moving image, the information processing apparatus 100 is a portion of a frame section from the moving image to which the worker performs annotation work for each of the plurality of workers. Get the video. Then, the information processing device 100 manages a partial moving image of each worker, and when a partial image download request is received from the worker's terminal device 190, the partial image to be the download request is sent to the terminal device 190. Send. The process performed by the information processing apparatus 100 to generate a partial moving image for each worker will be described with reference to the flowchart of FIG.

＜ステップＳ３０１＞
取得部１０２は、アノテーション作業の対象となる動画を取得し、該取得した動画をＤＢ（データベース）１０３に格納する。取得部１０２による動画の取得方法は特定の取得方法に限らない。例えば、取得部１０２は、情報処理装置１００に接続されている外部の装置から動画を取得（受信）するようにしても良いし、撮像装置により撮像された動画を取得するようにしても良い。また、情報処理装置１００は、以下の処理のための様々な初期設定を行う。 <Step S301>
The acquisition unit 102 acquires a moving image to be annotated, and stores the acquired moving image in the DB (database) 103. The method of acquiring a moving image by the acquisition unit 102 is not limited to a specific acquisition method. For example, the acquisition unit 102 may acquire (receive) a moving image from an external device connected to the information processing device 100, or may acquire a moving image captured by the imaging device. Further, the information processing apparatus 100 performs various initial settings for the following processing.

＜ステップＳ３０２＞
決定部１０４は、ステップＳ３０１でＤＢ１０３に格納された動画に対するアノテーション作業の難易度を取得する。決定部１０４による「動画に対するアノテーション作業の難易度」の取得方法は特定の取得方法に限らない。 <Step S302>
The determination unit 104 acquires the difficulty level of the annotation work for the moving image stored in the DB 103 in step S301. The method of acquiring "difficulty of annotation work for moving images" by the determination unit 104 is not limited to a specific acquisition method.

例えば、動画が長時間の動画（フレーム数が多い動画）であり、且つ該動画についてアノテーション作業を行う作業者の数が少ない場合は、一人の作業者がアノテーション作業を行うフレーム区間の長さは長くなる。このような状況では、集中力欠如などによりミスを誘発する可能性が高く、このような動画に対するアノテーション作業の難易度は比較的高くなる。 For example, if the video is a long-time video (a video with a large number of frames) and the number of workers annotating the video is small, the length of the frame section in which one worker performs the annotation work is become longer. In such a situation, there is a high possibility that mistakes will be induced due to lack of concentration, and the difficulty of annotation work for such a video will be relatively high.

また、動画（映像）が細かいほど（たとえば、動画に登場する物体のバリエーションの数が多いほど）、該動画に対するアノテーション作業の難易度は比較的高くなる。また、動画における映像が大まかなほど（たとえば、映像に登場する物体の大きさが大きいほど）、該動画に対するアノテーション作業の難易度は比較的低くなる。 Further, the finer the moving image (video) (for example, the larger the number of variations of the object appearing in the moving image), the higher the difficulty of the annotation work for the moving image. Further, the rougher the image in the moving image (for example, the larger the size of the object appearing in the image), the lower the difficulty of the annotation work for the moving image.

決定部１０４は、上記のような、動画のフレーム数や動画における映像の細かさなどといった、該動画における複雑度のみに基づいて難易度を取得しても良い。また決定部１０４は、それぞれの作業者のアノテーション作業の熟練度（たとえば、該作業者がこれまでに行ったアノテーション作業のトータル実績工数、該アノテーション作業に対する管理者による評価結果など）のみに基づいて難易度を取得しても良い。また決定部１０４は、動画における複雑度およびそれぞれの作業者のアノテーション作業の熟練度に基づいて難易度を取得しても良い。 The determination unit 104 may acquire the difficulty level based only on the complexity of the moving image, such as the number of frames of the moving image and the fineness of the image in the moving image as described above. Further, the determination unit 104 is based only on the skill level of the annotation work of each worker (for example, the total actual man-hours of the annotation work performed by the worker so far, the evaluation result by the administrator for the annotation work, etc.). You may get the difficulty level. Further, the determination unit 104 may acquire the difficulty level based on the complexity in the moving image and the skill level of the annotation work of each worker.

「動画に対するアノテーション作業の難易度」は、様々な要因に応じて規定されるものである。以下では、「動画に対するアノテーション作業の難易度」の一例としていくつか述べる。 The "difficulty of annotation work for moving images" is defined according to various factors. In the following, some examples of "difficulty of annotation work for video" will be described.

＜ステップＳ３０３＞
情報処理装置１００のユーザが入力装置１０１を操作して「動画から、それぞれの作業者がアノテーション作業を行う部分動画を取得する」ための指示を入力すると、操作部１０７は、該指示を該入力装置１０１から取得する。そして操作部１０７は、該指示の取得に応じて分割部１０８に、処理の開始を指示する。 <Step S303>
When the user of the information processing apparatus 100 operates the input device 101 and inputs an instruction for "acquiring a partial moving image in which each worker performs annotation work from the moving image", the operation unit 107 inputs the instruction. Obtained from device 101. Then, the operation unit 107 instructs the division unit 108 to start the process in response to the acquisition of the instruction.

分割部１０８は、操作部１０７から処理の開始の指示を受けると、複数の作業者のそれぞれについて、ステップＳ３０１でＤＢ１０３に格納された動画から、該作業者がアノテーション作業を行うフレーム区間の部分動画を取得する。その際、分割部１０８は、着目フレーム区間と、該着目フレーム区間と重複部分を有するフレーム区間と、において、該重複部分の長さ（重複幅）を、ステップＳ３０２で取得した難易度に基づいて制御する。次に、上記のステップＳ３０２およびステップＳ３０３における処理の詳細について、図８のフローチャートに従って説明する。 When the division unit 108 receives an instruction to start processing from the operation unit 107, for each of the plurality of workers, from the moving image stored in the DB 103 in step S301, the partial moving image of the frame section in which the worker performs the annotation work. To get. At that time, the division unit 108 determines the length (overlap width) of the overlapped portion in the frame section of interest and the frame section having the overlapped portion with the frame section of interest, based on the difficulty level acquired in step S302. Control. Next, the details of the processes in steps S302 and S303 will be described with reference to the flowchart of FIG.

＜ステップＳ８０１＞
決定部１０４は、管理部１０５が管理している「動画に対するアノテーション作業を行うそれぞれの作業者のアノテーション作業の熟練度を示すパラメータ」を取得する。管理部１０５が管理している「動画に対するアノテーション作業を行うそれぞれの作業者のアノテーション作業の熟練度を示すパラメータ」は、外部から設定されても良い。また、管理部１０５が管理している「動画に対するアノテーション作業を行うそれぞれの作業者のアノテーション作業の熟練度を示すパラメータ」は、作業者によるアノテーション作業に対する評価（評価を行う主体は問わない）の結果に応じて更新されても良い。 <Step S801>
The determination unit 104 acquires "a parameter indicating the skill level of the annotation work of each worker who performs the annotation work on the moving image" managed by the management unit 105. The "parameter indicating the skill level of the annotation work of each worker who performs the annotation work on the moving image" managed by the management unit 105 may be set from the outside. Further, the "parameter indicating the skill level of the annotation work of each worker who performs the annotation work on the video" managed by the management unit 105 is the evaluation of the annotation work by the worker (regardless of the subject who performs the evaluation). It may be updated according to the result.

＜ステップＳ８０２＞
判定部１０６は、ステップＳ３０１で取得した動画における複雑度を判定することで、「動画における複雑度を示すパラメータ」を取得する。上記の如く、「動画における複雑度」は、動画に係る様々な情報に基づいて規定することができる。 <Step S802>
The determination unit 106 acquires the "parameter indicating the complexity in the moving image" by determining the complexity in the moving image acquired in step S301. As described above, "complexity in moving images" can be defined based on various information related to moving images.

「動画における複雑度」は、例えば、該動画のフレーム数が多いほど／少ないほど高い／低い。また例えば、「動画における複雑度」は、該動画に登場するオブジェクトの数が多いほど／少ないほど高い／低い。また例えば、「動画における複雑度」は、該動画に含まれるシーンの数が多いほど／少ないほど高い／低い。また例えば、「動画における複雑度」は、動画（映像）が細かいほど／大まかなほど（たとえば、動画に登場する物体のバリエーションの数が多いほど／少ないほど）高い／低い。 The “complexity in a moving image” is, for example, higher / lower as the number of frames in the moving image increases / decreases. Further, for example, the “complexity in a moving image” is higher / lower as the number of objects appearing in the moving image is larger / smaller. Further, for example, the “complexity in a moving image” is higher / lower as the number of scenes included in the moving image is larger / smaller. Further, for example, the "complexity in a moving image" is higher / lower as the moving image (video) is finer / roughly (for example, the larger / smaller the number of variations of the object appearing in the moving image).

＜ステップＳ８０３＞
決定部１０４は、ステップＳ８０１で取得したパラメータおよび／またはステップＳ８０２で取得したパラメータに基づいて、「動画に対するアノテーション作業の難易度」を取得する。 <Step S803>
The determination unit 104 acquires "difficulty of annotation work for moving images" based on the parameters acquired in step S801 and / or the parameters acquired in step S802.

＜ステップＳ８０４＞
分割部１０８は、複数の作業者のそれぞれについて、ステップＳ３０１でＤＢ１０３に格納された動画から、該作業者がアノテーション作業を行うフレーム区間の部分動画を取得する。その際、分割部１０８は、着目フレーム区間と、該着目フレーム区間と重複部分を有するフレーム区間と、において、該重複部分の長さを、ステップＳ８０３で取得した難易度に基づいて制御する。 <Step S804>
The division unit 108 acquires a partial moving image of a frame section in which the worker performs annotation work from the moving image stored in the DB 103 in step S301 for each of the plurality of workers. At that time, the division unit 108 controls the length of the overlapped portion in the frame section of interest and the frame section having the overlapped portion with the frame section of interest, based on the difficulty level acquired in step S803.

動画に対するアノテーション作業を行う複数の作業者をそれぞれＳ１,…,ＳＮ（Ｎは作業者の人数）と表記すると、分割部１０８は、動画の先頭フレーム側から、Ｓ１に対応するフレーム区間Ｆ１、…、ＳＮに対応するフレーム区間ＦＮを設定する。このとき分割部１０８は、フレーム区間Ｆｉ（１≦ｉ＜Ｎ）と重複部分を有するようにフレーム区間Ｆ（ｉ＋１）を設定するのであるが、該重複部分の長さＬを上記の難易度に応じて制御する。 When a plurality of workers performing annotation work on a moving image are expressed as S1, ..., SN (N is the number of workers), the division unit 108 starts from the first frame side of the moving image and has a frame section F1 corresponding to S1. , Set the frame section FN corresponding to SN. At this time, the dividing portion 108 sets the frame section F (i + 1) so as to have an overlapping portion with the frame section Fi (1 ≦ i <N), and the length L of the overlapping portion is set to the above difficulty level. Control accordingly.

例えば、「それぞれの作業者のアノテーション作業の熟練度」を難易度とする場合、分割部１０８は、Ｓｉの熟練度およびＳ（ｉ＋１）の熟練度が高い（つまり難易度が低い）ほど長さＬを短くする。また、分割部１０８は、Ｓｉの熟練度およびＳ（ｉ＋１）の熟練度が低い（つまり難易度が高い）ほど長さＬを長くする。 For example, when the difficulty level is "the skill level of each worker's annotation work", the length of the divided portion 108 increases as the skill level of Si and the skill level of S (i + 1) are higher (that is, the difficulty level is lower). Shorten L. Further, the length L of the divided portion 108 becomes longer as the skill level of Si and the skill level of S (i + 1) are lower (that is, the difficulty level is higher).

また例えば、「動画における複雑度」を難易度とする場合、分割部１０８は、「動画における複雑度」が高い（つまり難易度が高い）ほど長さＬを長くし、「動画における複雑度」が低い（つまり難易度が低い）ほど長さＬを短くする。 Further, for example, when the "complexity in the moving image" is set as the difficulty level, the division unit 108 increases the length L as the "complexity in the moving image" is higher (that is, the higher the difficulty level), and the "complexity in the moving image". The lower the value (that is, the lower the difficulty level), the shorter the length L.

また例えば「それぞれの作業者のアノテーション作業の熟練度」および「動画における複雑度」を難易度とする場合、分割部１０８は、Ｓｉの熟練度およびＳ（ｉ＋１）の熟練度が高いほど長さＬを短くする。また分割部１０８は、Ｓｉの熟練度およびＳ（ｉ＋１）の熟練度が低いほど長さＬを長くする。このとき、分割部１０８は、「動画における複雑度」が高いほど長さＬを長くし、複雑度が低いほど長さＬを短くする。 Further, for example, when the difficulty level is "proficiency level of annotation work of each worker" and "complexity in moving image", the length of the divided portion 108 increases as the skill level of Si and the skill level of S (i + 1) increase. Shorten L. Further, the split portion 108 has a longer length L as the skill level of Si and the skill level of S (i + 1) are lower. At this time, the division portion 108 increases the length L as the “complexity in the moving image” increases, and shortens the length L as the complexity decreases.

このようにして分割部１０８は、それぞれの作業者に対応するフレーム区間を設定すると共に、着目フレーム区間と、該着目フレーム区間と重複部分を有するフレーム区間と、において、該重複部分の長さを難易度に応じて制御することができる。 In this way, the division unit 108 sets the frame section corresponding to each worker, and sets the length of the overlapped portion in the frame section of interest and the frame section having the overlapped portion with the frame section of interest. It can be controlled according to the difficulty level.

そして分割部１０８は、複数の作業者のそれぞれについて、ステップＳ３０１でＤＢ１０３に格納した動画から、該作業者について設定したフレーム区間の部分動画を取得する。そして分割部１０８は、それぞれの作業者について取得した部分動画をＤＢ１０３に格納する。 Then, the division unit 108 acquires a partial moving image of the frame section set for the worker from the moving image stored in the DB 103 in step S301 for each of the plurality of workers. Then, the division unit 108 stores the partial moving images acquired for each worker in the DB 103.

端末装置１９０の通信部１９１はＤＢ１０３にアクセスして、該端末装置１９０のユーザである作業者に対応する部分動画をダウンロードする。該ダウンロード後、作業者は端末装置１９０を操作して該ダウンロードした部分動画に対するアノテーション作業を行うので、付与部１９２は、該アノテーション作業に応じて、該部分動画における各フレームに正解ラベルを付与する。 The communication unit 191 of the terminal device 190 accesses the DB 103 and downloads a partial moving image corresponding to a worker who is a user of the terminal device 190. After the download, the worker operates the terminal device 190 to perform annotation work on the downloaded partial moving image. Therefore, the addition unit 192 assigns a correct label to each frame in the partial moving image according to the annotation work. ..

付与部１９２により部分動画に対してアノテーションとして正解ラベルを付加する処理は、アノテーション作業の種別に応じたタスクごとに異なってもよい。具体的な一例として、部分動画中に撮像された物体を検出するタスクの場合には、付与部１９２は、該物体の該部分動画中の位置の特定と、該物体を示すラベルの付加と、に係る処理を実行する。また、他の一例として、部分動画をシーンごとに分類するタスクの場合には、付与部１９２は、分類したシーンに対し、該シーンのカテゴリを示すラベルを付加する処理を実行する。このように、アノテーションの対象となる部分動画の種別や、アノテーションとしてのラベル付けの目的等に応じて、付与部１９２による部分動画に対するラベル付けに係る処理の内容が適宜変更されてもよい。 The process of adding the correct answer label as an annotation to the partial moving image by the assigning unit 192 may be different for each task according to the type of annotation work. As a specific example, in the case of a task of detecting an object captured in a partial moving image, the granting unit 192 identifies the position of the object in the partial moving image, adds a label indicating the object, and the like. Executes the processing related to. Further, as another example, in the case of a task of classifying a partial moving image for each scene, the granting unit 192 executes a process of adding a label indicating the category of the scene to the classified scene. As described above, the content of the process related to the labeling of the partial moving image by the adding unit 192 may be appropriately changed depending on the type of the partial moving image to be annotated, the purpose of labeling as the annotation, and the like.

アノテーション作業が完了すると、作業者は端末装置１９０を操作して、アノテーション作業済みの部分動画を情報処理装置１００にアップロードする指示を入力する。通信部１９１は、該入力を受けると、付与部１９２によるアノテーション作業済みの部分動画（各フレームに正解ラベルが付与された部分動画）をＤＢ１１３にアップロードする。なお、アノテーション作業済みの部分動画は、例えば、教師あり学習における教師データとして利用可能なデータ形式でＤＢ１１３に格納されても良い。 When the annotation work is completed, the operator operates the terminal device 190 and inputs an instruction to upload the partially animated image for which the annotation work has been completed to the information processing device 100. Upon receiving the input, the communication unit 191 uploads the annotation-worked partial moving image (partial moving image with the correct label attached to each frame) by the giving unit 192 to the DB 113. The annotation-completed partial moving image may be stored in the DB 113 in a data format that can be used as teacher data in supervised learning, for example.

なお、図１では、ＤＢ１０３およびＤＢ１１３は情報処理装置１００が有するものとして示している。しかし、これらのＤＢのうち１以上を情報処理装置１００がアクセス可能な外部の装置に設けても良い。 In addition, in FIG. 1, DB 103 and DB 113 are shown as having the information processing apparatus 100. However, one or more of these DBs may be provided in an external device accessible to the information processing device 100.

ＤＢ１１３に全ての作業者のアノテーション作業済みの部分動画がアップロードされると、情報処理装置１００は、適当なタイミングで図４のフローチャートに従った処理を行う。 When the partial moving images having been annotated by all the workers are uploaded to the DB 113, the information processing apparatus 100 performs processing according to the flowchart of FIG. 4 at an appropriate timing.

例えば、情報処理装置１００は定期的若しくは不定期的に、全ての作業者のアノテーション作業済みの部分動画がＤＢ１１３にアップロードされているか否かを確認する。そして、情報処理装置１００は、ＤＢ１１３に全ての作業者のアノテーション作業済みの部分動画がアップロードされていることを確認すると、図４のフローチャートに従った処理を行う。 For example, the information processing apparatus 100 periodically or irregularly confirms whether or not the annotation-operated partial moving images of all the workers have been uploaded to the DB 113. Then, when the information processing apparatus 100 confirms that the annotation-operated partial moving images of all the workers have been uploaded to the DB 113, the information processing apparatus 100 performs the processing according to the flowchart of FIG.

また例えば、情報処理装置１００のユーザが、全ての作業者のアノテーション作業済みの部分動画がＤＢ１１３にアップロードされているか否かを確認する。そして、ユーザは、ＤＢ１１３に全ての作業者のアノテーション作業済みの部分動画がアップロードされていることを確認すると、入力装置１０１を用いて、図４のフローチャートに従った処理の開始指示を入力する。情報処理装置１００は、図４のフローチャートに従った処理の開始指示が入力されると、図４のフローチャートに従った処理を行う。 Further, for example, the user of the information processing apparatus 100 confirms whether or not the annotation-operated partial moving images of all the workers have been uploaded to the DB 113. Then, when the user confirms that the annotation-operated partial moving images of all the workers have been uploaded to the DB 113, the user inputs the processing start instruction according to the flowchart of FIG. 4 using the input device 101. When the processing start instruction according to the flowchart of FIG. 4 is input, the information processing apparatus 100 performs the processing according to the flowchart of FIG.

＜ステップＳ４０１＞
読み込み部１１４は、ＤＢ１１３に格納されているアノテーション作業結果（それぞれの作業者の端末装置１９０からアップロードされたアノテーション作業済みの部分動画）を取得する。 <Step S401>
The reading unit 114 acquires the annotation work result (partial moving image of the annotation work uploaded from the terminal device 190 of each worker) stored in the DB 113.

＜ステップＳ４０２＞
検査部１１５は、上記の重複部分ごとに、該重複部分についてそれぞれの作業者が行ったアノテーション作業のずれを検知する。例えば、検査部１１５は、重複部分を構成するフレームごとに正解ラベルの差（該フレームについて一方の作業者が付した正解ラベル（数値）と他方の作業者が付した正解ラベル（数値）との差分）を求める。そして検査部１１５は、該重複部分に含まれる全てのフレームについて求めた差分の合計Ｓを「該重複部分についてそれぞれの作業者が行ったアノテーション作業のずれ」として求める。 <Step S402>
The inspection unit 115 detects the deviation of the annotation work performed by each operator for the overlapping portion for each of the overlapping portions. For example, the inspection unit 115 determines the difference in the correct answer label for each frame constituting the overlapping portion (the correct answer label (numerical value) attached by one worker and the correct answer label (numerical value) attached by the other worker for the frame). Difference) is calculated. Then, the inspection unit 115 obtains the total S of the differences obtained for all the frames included in the overlapping portion as "the deviation of the annotation work performed by each worker for the overlapping portion".

＜ステップＳ４０３＞
通知部１１６は、ステップＳ４０２で求めた重複部分ごとの「ずれ」のうち１つでも異常と見なされる「ずれ」があった場合には、「動画に対するアノテーション作業に所定基準以上の異常がある」と判断する。例えば、通知部１１６は、ステップＳ４０２で求めた重複部分ごとの合計Ｓのうち１つでも閾値を超える合計Ｓがあれば、「動画に対するアノテーション作業に所定基準以上の異常がある」と判断する。 <Step S403>
When the notification unit 116 has a "deviation" that is regarded as an abnormality even in one of the "deviations" for each overlapping portion obtained in step S402, "the annotation work for the moving image has an abnormality of more than a predetermined standard". Judge. For example, if even one of the total S for each overlapping portion obtained in step S402 has a total S that exceeds the threshold value, the notification unit 116 determines that "the annotation work for the moving image has an abnormality of a predetermined standard or more".

そして通知部１１６は、「動画に対するアノテーション作業に所定基準以上の異常がある」と判断した場合には、該判断の結果（アラート）を出力装置１１７に対して出力する。これにより出力装置１１７の画面には、「動画に対するアノテーション作業に所定基準以上の異常がある」ことをユーザに通知するための画面が表示される。 Then, when the notification unit 116 determines that "the annotation work for the moving image has an abnormality equal to or higher than a predetermined reference", the notification unit 116 outputs the result (alert) of the determination to the output device 117. As a result, on the screen of the output device 117, a screen for notifying the user that "the annotation work for the moving image has an abnormality of a predetermined standard or more" is displayed.

次に、上記の重複部分の長さをさらに調整するために情報処理装置１００が行う処理について、図５のフローチャートに従って説明する。図５のフローチャートに従った処理は、上記のステップＳ８０４で各作業者のフレーム区間を設定した後に、該フレーム区間の端部の位置（フレーム位置）を調整するために行われる。 Next, the process performed by the information processing apparatus 100 in order to further adjust the length of the overlapping portion will be described with reference to the flowchart of FIG. The process according to the flowchart of FIG. 5 is performed in order to adjust the position (frame position) of the end portion of the frame section after setting the frame section of each worker in step S804.

＜ステップＳ５０２＞
分割部１０８は、上記のステップＳ８０４において設定した各作業者のフレーム区間の端部の位置（フレーム位置）を取得する。 <Step S502>
The division unit 108 acquires the position (frame position) of the end portion of the frame section of each worker set in step S804.

＜ステップＳ５０３＞
検知部１０９は、動画においてシーンが切り替わっているフレーム位置（シーン切り替わりフレーム位置）を取得する。シーン切り替わりフレーム位置の取得方法は特定の取得方法に限らない。例えば、事前に動画にメタ情報として添付されているシーン切り替わりフレーム位置を取得しても良いし、学習モデルなどを用いて動画から取得しても良い。 <Step S503>
The detection unit 109 acquires the frame position (scene switching frame position) at which the scene is switched in the moving image. The acquisition method of the scene switching frame position is not limited to a specific acquisition method. For example, the scene switching frame position attached to the moving image as meta information may be acquired in advance, or may be acquired from the moving image using a learning model or the like.

そして分割部１０８は、ステップＳ５０２で取得したそれぞれのフレーム位置について、該フレーム位置から一定範囲（閾値としてのフレーム数）内にシーン切り替わりフレーム位置が存在するか否かをチェックする。つまり分割部１０８は、作業者がアノテーション作業を行うフレーム区間の端部の近傍にシーン切り替わりフレーム位置が存在するか否かをチェックする。 Then, the division unit 108 checks whether or not the scene switching frame position exists within a certain range (number of frames as a threshold value) from the frame position for each frame position acquired in step S502. That is, the division unit 108 checks whether or not the scene switching frame position exists in the vicinity of the end portion of the frame section in which the operator performs the annotation work.

そしてこのチェックの結果、ステップＳ５０２で取得したフレーム位置のうち、一定範囲内にシーン切り替わりフレーム位置が存在するフレーム位置がある（作業者がアノテーション作業を行うフレーム区間の端部の近傍にシーン切り替わりフレーム位置が存在する）場合には、処理はステップＳ５０４に進む。 Then, as a result of this check, among the frame positions acquired in step S502, there is a frame position in which the scene switching frame position exists within a certain range (the scene switching frame near the end of the frame section in which the worker performs annotation work). If the position exists), the process proceeds to step S504.

一方、ステップＳ５０２で取得したフレーム位置のうち、一定範囲内にシーン切り替わりフレーム位置が存在するフレーム位置はない（作業者がアノテーション作業を行うフレーム区間の端部の近傍にシーン切り替わりフレーム位置が存在しない）場合には、図５のフローチャートに従った処理は終了し、確定したフレーム区間から部分動画の取得を行う。 On the other hand, among the frame positions acquired in step S502, there is no frame position in which the scene switching frame position exists within a certain range (the scene switching frame position does not exist near the end of the frame section in which the operator performs the annotation work). ), The process according to the flowchart of FIG. 5 is completed, and the partial moving image is acquired from the fixed frame section.

＜ステップＳ５０４＞
分割部１０８は、ステップＳ５０２で取得したフレーム位置のうち、一定範囲内にシーン切り替わりフレーム位置が存在するフレーム位置（対象フレーム位置）を、シーン切り替わりフレーム位置との距離がより大きくなるように移動させる。つまり分割部１０８は、対象フレーム位置を端部とする重複部分の長さをより長くする。 <Step S504>
The dividing unit 108 moves the frame position (target frame position) in which the scene switching frame position exists within a certain range among the frame positions acquired in step S502 so that the distance from the scene switching frame position becomes larger. .. That is, the divided portion 108 makes the length of the overlapping portion whose end is the target frame position longer.

作業者の習熟度や動画の複雑度をもとに上記の如く各作業者のフレーム区間を設定しても、フレーム区間の端部の近傍にシーン切り替わりフレーム位置が存在した場合、重複幅は十分ではない。しかるにこのようなケースにおいて、図５のフローチャートに従った処理によって重複幅の制御を行うことは有用である。 Even if the frame section of each worker is set as described above based on the proficiency level of the worker and the complexity of the video, if the scene switching frame position exists near the end of the frame section, the overlap width is sufficient. is not. However, in such a case, it is useful to control the overlap width by the process according to the flowchart of FIG.

ただし、シーンの切れ目が多数含まれる動画などの場合、図５のフローチャートに従った処理では重複幅を増やしすぎてしまい、結局は複数の作業者が同一の動画全体に対してアノテーション作業を行うことと同一になってしまう。よって、品質確保と作業効率のバランスを考慮するという考えに基づき、重複幅の増加については一定の上限・下限を設定するようにしても良い。 However, in the case of a video containing many breaks in the scene, the overlap width is increased too much by the processing according to the flowchart of FIG. 5, and in the end, a plurality of workers perform annotation work on the same video as a whole. Will be the same as. Therefore, based on the idea of considering the balance between quality assurance and work efficiency, a certain upper limit / lower limit may be set for the increase in the overlap width.

なお、ＤＢ１１３に全ての作業者のアノテーション作業済みの部分動画がアップロードされると、情報処理装置１００は、図４のフローチャートに従った処理の代わりに、図６のフローチャートに従った処理を行うようにしても良い。以下に、図６のフローチャートに従った処理について説明する。 When the partial moving images of all the workers having been annotated are uploaded to the DB 113, the information processing apparatus 100 performs the processing according to the flowchart of FIG. 6 instead of the processing according to the flowchart of FIG. You can do it. The processing according to the flowchart of FIG. 6 will be described below.

＜ステップＳ６０１＞
推定部６０１は、学習済みモデルを用いて、動画における推定単位ごとに正解ラベルを推定する。推定単位はフレームであっても良いし、シーンであっても良いし、部分動画であっても良い。 <Step S601>
The estimation unit 601 estimates the correct label for each estimation unit in the moving image using the trained model. The estimation unit may be a frame, a scene, or a partial moving image.

＜ステップＳ６０２＞
検査部１１５は、ステップＳ６０１における推定結果と、部分動画ごとのアノテーション作業結果と、を照合する。例えば、推定部６０１が動画のフレームごとに正解ラベルを推定した場合、検査部１１５は、部分動画ごとに、該部分動画における各フレームの正解ラベルと、該フレームの推定結果である推定ラベルと、が一致しているか否か（一致若しくは不一致）を判断する。 <Step S602>
The inspection unit 115 collates the estimation result in step S601 with the annotation work result for each partial moving image. For example, when the estimation unit 601 estimates the correct answer label for each frame of the moving image, the inspection unit 115 determines the correct answer label of each frame in the partial moving image, the estimated label which is the estimation result of the frame, and the estimated label for each partial moving image. Determine if they match (match or disagree).

＜ステップＳ６０３＞
通知部１１６は、ステップＳ６０２における検査部１１５による照合結果に基づいて「所定基準以上の異常がある」か否かを判断する。例えば通知部１１６は、部分動画に含まれるフレームのうち一定フレーム数以上のフレームについての照合の結果が「不一致」であった場合には、「所定基準以上の異常がある」と判断する。 <Step S603>
The notification unit 116 determines whether or not there is an "abnormality equal to or higher than a predetermined reference" based on the collation result by the inspection unit 115 in step S602. For example, the notification unit 116 determines that "there is an abnormality of a predetermined reference or more" when the collation result of the frames included in the partial moving image is "mismatch" for a certain number of frames or more.

そして通知部１１６は、「所定基準以上の異常がある」と判断した場合には、該判断の結果（アラート）を出力装置１１７に対して出力する。これにより出力装置１１７の画面には、「所定基準以上の異常がある」ことをユーザに通知するための画面が表示される。 Then, when the notification unit 116 determines that "there is an abnormality equal to or higher than a predetermined reference", the notification unit 116 outputs the result (alert) of the determination to the output device 117. As a result, on the screen of the output device 117, a screen for notifying the user that "there is an abnormality of a predetermined reference or more" is displayed.

［第１の実施形態］
本実施形態では、情報処理装置１００の具体的なユースケースについて説明する。本実施形態では、図７に例示するＧＵＩ（グラフィカルユーザインターフェース）７０１を出力装置１１７に表示する。ユーザは入力装置１０１を用いてＧＵＩ７０１を操作する。以下では、ユーザがＧＵＩ７０１を操作して入力した各種の指示や情報は操作部１０７が受け取り、ＧＵＩ７０１の表示制御は通知部１１６が行うものとする。 [First Embodiment]
In this embodiment, a specific use case of the information processing apparatus 100 will be described. In this embodiment, the GUI (graphical user interface) 701 illustrated in FIG. 7 is displayed on the output device 117. The user operates the GUI 701 using the input device 101. In the following, it is assumed that the operation unit 107 receives various instructions and information input by the user by operating the GUI 701, and the notification unit 116 controls the display of the GUI 701.

本実施形態では、動画から特定の物体を検出するタスクを実行するモデルの学習に用いる正解データを作成するためのアノテーション作業を例にとる。そして、このようなアノテーション作業を行う作業者がアノテーション作業の対象とする部分動画を得るためのＧＵＩについて説明する。 In this embodiment, an annotation work for creating correct answer data used for learning a model that executes a task of detecting a specific object from a moving image is taken as an example. Then, a GUI for a worker who performs such annotation work to obtain a partial moving image to be annotated will be described.

領域７０２は、アノテーション作業の対象となる動画の各フレームが表示される領域である。領域７０３は、動画のタイムラインを表示するための領域である。図７では、動画からそれぞれの作業者の部分動画が得られた状態を示しており、作業者Ａに対応する部分動画には該作業者Ａの名称「作業者Ａ」が表示されており、作業者Ｂに対応する部分動画には該作業者Ｂの名称「作業者Ｂ」が表示されている。 The area 702 is an area in which each frame of the moving image to be annotated is displayed. The area 703 is an area for displaying the timeline of the moving image. FIG. 7 shows a state in which a partial moving image of each worker is obtained from the moving image, and the name "worker A" of the worker A is displayed in the partial moving image corresponding to the worker A. The name "worker B" of the worker B is displayed in the partial moving image corresponding to the worker B.

領域７０４は、アノテーション作業の対象となる動画を選択するための領域である。領域７０４には、アノテーション作業の対象となる動画を指定するための「参照」ボタンと、指定した動画を取得する指示を入力するための「取込み」ボタンと、が設けられている。 The area 704 is an area for selecting a moving image to be annotated. The area 704 is provided with a "reference" button for designating a moving image to be annotated, and a "capturing" button for inputting an instruction to acquire the designated moving image.

ユーザが入力装置１０１を操作して「参照」ボタンを指示すると、出力装置１１７には、動画を保持しているフォルダを指定するための画面が表示される。ユーザは該画面において所望の動画（アノテーション作業の対象となる動画）を入力装置１０１を操作して指定することができる。アノテーション作業の対象となる動画を指定した後、ユーザが入力装置１０１を用いて「取込み」ボタンを指示すると、取得部１０２は、該指定した動画を情報処理装置１００に取得する。 When the user operates the input device 101 and instructs the "reference" button, the output device 117 displays a screen for designating the folder holding the moving image. The user can operate the input device 101 to specify a desired moving image (moving image to be annotated) on the screen. After designating the moving image to be annotated, when the user instructs the "capture" button using the input device 101, the acquisition unit 102 acquires the designated moving image to the information processing device 100.

領域７０５は、領域７０４に対するユーザ操作に応じて取得した動画における複雑度を表示するための領域である。領域７０５には、動画の長さ（フレーム数や時間）、動画の粒度（動画に登場する物体のバリエーション数や物体自体の細かさ）、その他の情報、を表示するための領域が設けられている。 The area 705 is an area for displaying the complexity of the moving image acquired according to the user operation for the area 704. The area 705 is provided with an area for displaying the length of the moving image (number of frames and time), the particle size of the moving image (the number of variations of the object appearing in the moving image and the fineness of the object itself), and other information. There is.

領域７０６は、動画に対するアノテーション作業を行う作業者を指定するための領域である。ユーザが入力装置１０１を用いて「選択」ボタンを指示すると、作業者の一覧が表示され、ユーザが入力装置１０１を用いてそのうちの一人を指定すると、該指定した作業者の名称がその左側の領域に表示される。作業者は、該作業者のアノテーション作業の熟練度（スキルレベル）と共に管理されており、作業者の名称と共に、該作業者のアノテーション作業のスキルレベルが表示される。図７では、動画に対するアノテーション作業を行う作業者として作業者Ａ、作業者Ｂ、作業者Ｃを指定したので、領域７０６には、作業者Ａとそのスキルレベル、作業者Ｂとそのスキルレベル、作業者Ｃとそのスキルレベルが表示されている。スキルレベルは、例えば、「高」、「中」、「低」や、「３」、「２」、「１」によって表される。 The area 706 is an area for designating a worker who performs annotation work on the moving image. When the user instructs the "select" button using the input device 101, a list of workers is displayed, and when the user specifies one of them using the input device 101, the name of the specified worker is on the left side. Displayed in the area. The worker is managed together with the skill level (skill level) of the annotation work of the worker, and the skill level of the annotation work of the worker is displayed together with the name of the worker. In FIG. 7, since the worker A, the worker B, and the worker C are designated as the workers who perform the annotation work on the moving image, the worker A and its skill level, and the worker B and its skill level are designated in the area 706. Worker C and its skill level are displayed. The skill level is represented by, for example, "high", "medium", "low", "3", "2", "1".

ユーザが入力装置１０１を用いてボタン７０７を指示すると、情報処理装置１００は、現在ＧＵＩ７０１を用いて行っている処理を終了する。なお、情報処理装置１００は、ボタン７０７が指示されると、前回に領域７０４に対するユーザ操作により取得した動画を領域７０２に表示し、該動画を以降にＧＵＩ７０１にて対象とする動画にするようにしても良い。 When the user instructs the button 707 using the input device 101, the information processing device 100 ends the process currently being performed using the GUI 701. When the button 707 is instructed, the information processing apparatus 100 displays the moving image acquired by the user operation on the area 704 last time in the area 702, and makes the moving image later the target moving image in the GUI 701. May be.

ユーザが入力装置１０１を用いてボタン７０８を指示すると、情報処理装置１００は、領域７０４に対するユーザ操作に応じて取得した動画を対象にして、図３のフローチャートに従った処理を行う。その際、情報処理装置１００は、領域７０５に表示した情報の１つ以上に基づく複雑度と、領域７０６に表示されているそれぞれの作業者のアノテーション作業の熟練度と、に基づいて上記の難易度を取得する。なお、上記の如く、難易度は複雑度のみに基づいて取得しても良いし、それぞれの作業者のアノテーション作業の熟練度のみに基づいて取得しても良い。そして情報処理装置１００は、領域７０４に対するユーザ操作により取得した動画から該難易度に基づいて、作業者Ａ、作業者Ｂ、作業者Ｃのそれぞれに対応する部分動画を取得してＤＢ１０３に格納する。 When the user instructs the button 708 using the input device 101, the information processing device 100 performs processing according to the flowchart of FIG. 3 for the moving image acquired according to the user operation for the area 704. At that time, the information processing apparatus 100 has the above-mentioned difficulty based on the complexity based on one or more of the information displayed in the area 705 and the skill level of the annotation work of each worker displayed in the area 706. Get the degree. As described above, the difficulty level may be acquired only based on the complexity level, or may be acquired based only on the skill level of the annotation work of each worker. Then, the information processing apparatus 100 acquires partial moving images corresponding to each of the worker A, the worker B, and the worker C from the moving image acquired by the user operation for the area 704 and stores them in the DB 103 based on the difficulty level. ..

なお、領域７０６では、作業者Ａ、作業者Ｂ、作業者Ｃの順に上から並んでおり、この順に番号が振られている。よって、情報処理装置１００は、動画において先頭フレーム側から１番目の部分動画は、作業者Ａがアノテーション作業を行う対象とする部分動画とし、該部分動画に作業者Ａの識別情報を付与してＤＢ１０３に登録する。また情報処理装置１００は、動画において先頭フレーム側から２番目の部分動画は、作業者Ｂがアノテーション作業を行う対象とする部分動画とし、該部分動画に作業者Ｂの識別情報を付与してＤＢ１０３に登録する。また情報処理装置１００は、動画において先頭フレーム側から３番目の部分動画は、作業者Ｃがアノテーション作業を行う対象とする部分動画とし、該部分動画に作業者Ｃの識別情報を付与してＤＢ１０３に登録する。 In the area 706, the worker A, the worker B, and the worker C are arranged in this order from the top, and numbers are assigned in this order. Therefore, in the information processing apparatus 100, the first partial moving image from the first frame side in the moving image is a partial moving image to be annotated by the worker A, and the identification information of the worker A is added to the partial moving image. Register in DB103. Further, in the information processing apparatus 100, the second partial moving image from the first frame side in the moving image is a partial moving image to be annotated by the worker B, and the identification information of the worker B is added to the partial moving image to be added to the DB 103. Register with. Further, in the information processing apparatus 100, the third partial moving image from the first frame side in the moving image is a partial moving image to be annotated by the worker C, and the identification information of the worker C is added to the partial moving image to be added to the DB 103. Register with.

ユーザが入力装置１０１を用いてボタン７０９を指示すると、情報処理装置１００は、図４のフローチャートに従った処理、もしくは図６のフローチャートに従った処理を行う。アラートは図７のＧＵＩ７０１に表示しても良いし、該ＧＵＩ７０１とは異なるウィンドウ内に表示しても良い。 When the user instructs the button 709 using the input device 101, the information processing device 100 performs a process according to the flowchart of FIG. 4 or a process according to the flowchart of FIG. The alert may be displayed in the GUI 701 of FIG. 7, or may be displayed in a window different from that of the GUI 701.

次に、重複部分の長さ（重複幅）の制御例について、図９および図１０を用いて説明する。図９は、作業者の熟練度に応じて重複幅を制御する制御例を示している。図９（ａ）に示す如く、作業者Ａの熟練度および作業者Ｂの熟練度が何れも低い場合には重複幅を長くし、図９（ｂ）に示す如く、作業者Ａの熟練度および作業者Ｂの熟練度が何れも高い場合には重複幅を短くする。 Next, a control example of the length (overlapping width) of the overlapping portion will be described with reference to FIGS. 9 and 10. FIG. 9 shows a control example in which the overlap width is controlled according to the skill level of the operator. As shown in FIG. 9A, when both the skill level of the worker A and the skill level of the worker B are low, the overlap width is lengthened, and as shown in FIG. 9B, the skill level of the worker A is increased. And when the skill level of the worker B is high, the overlap width is shortened.

図１０（ａ）、（ｂ）は動画における複雑度に応じて重複幅を制御する制御例を示している。図１０（ａ）に示す如く、「動画における複雑度」が高い場合には重複幅を長くし、図１０（ｂ）に示す如く、「動画における複雑度」が低い場合には重複幅を短くする。 FIGS. 10A and 10B show a control example in which the overlap width is controlled according to the complexity in the moving image. As shown in FIG. 10 (a), the overlap width is lengthened when the “complexity in the moving image” is high, and the overlapping width is shortened when the “complexity in the moving image” is low as shown in FIG. 10 (b). do.

図１０（ｃ）は作業者の熟練度および動画における複雑度に応じて重複幅を制御する制御例を示している。図１０（ｃ）に示す如く、作業者Ａの熟練度および作業者Ｂの熟練度の何れも高い場合には重複幅を短くするが、「動画における複雑度」が高いほど重複幅をより長くする。なお、「作業者Ａの熟練度および作業者Ｂの熟練度の何れも高い、且つ動画における複雑度が低い」という条件に応じて調整する重複幅は、「動画における複雑度が低い」という条件に応じて調整する重複幅よりも短くする。また、「作業者Ａの熟練度および作業者Ｂの熟練度の何れも低い、且つ動画における複雑度が高い」という条件に応じて調整する重複幅は、「動画における複雑度が高い」という条件に応じて調整する重複幅よりも長くする。 FIG. 10C shows a control example in which the overlap width is controlled according to the skill level of the operator and the complexity level in the moving image. As shown in FIG. 10 (c), when both the skill level of the worker A and the skill level of the worker B are high, the overlap width is shortened, but the higher the "complexity in the moving image", the longer the overlap width. do. The overlap width to be adjusted according to the condition that "both the skill level of the worker A and the skill level of the worker B are high and the complexity in the moving image is low" is the condition that the complexity in the moving image is low. Make it shorter than the overlap width to be adjusted according to. Further, the overlap width adjusted according to the condition that "both the skill level of the worker A and the skill level of the worker B are low and the complexity in the moving image is high" is the condition that the complexity in the moving image is high. Make it longer than the overlap width to be adjusted according to.

次に、図５のフローチャートに従った処理による重複幅の制御例について、図１１を用いて説明する。図１１（ａ）に示す如く、一方のフレーム区間の後端付近にシーン１とシーン２の境界位置（シーン１からシーン２に切り替わっているシーン切り替わりフレーム位置）があり、他方のフレーム区間の先端がシーン切り替わりフレーム位置の近傍にあるとする。このとき、図１１（ｂ）に示す如く、他方のフレーム区間の先端のフレーム位置を、シーン切り替わりフレーム位置との距離がより大きくなるように移動させる。つまり、一方のフレーム区間と他方のフレーム区間との重複部分の長さをより長くする。 Next, an example of controlling the overlap width by processing according to the flowchart of FIG. 5 will be described with reference to FIG. As shown in FIG. 11A, there is a boundary position between scene 1 and scene 2 (scene switching frame position where scene 1 is switched to scene 2) near the rear end of one frame section, and the tip of the other frame section. Is near the scene switching frame position. At this time, as shown in FIG. 11B, the frame position at the tip of the other frame section is moved so that the distance from the scene switching frame position becomes larger. That is, the length of the overlapping portion between one frame section and the other frame section is made longer.

本実施形態では、主に以下に挙げる技術思想に基づき、動画から作業者ごとの部分動画を取得する処理を行う際に品質と効率のバランスを確保する一例について提案した。具体的に技術思想とは、動画の複雑度が高い場合には、作業者の操作間違いや判断の誤りが発生する確率が高まる。また、作業者の熟練度が低ければ、操作間違いや判断の誤りが発生する確率が高まる。よって、そのような要因が直接的あるいは間接的にエラー発生につながった際のエラー検知するための対応が行われることが望ましいという考え方である。 In this embodiment, we have proposed an example of ensuring a balance between quality and efficiency when performing a process of acquiring a partial moving image for each worker from a moving image, mainly based on the following technical ideas. Specifically, the technical idea is that when the complexity of the moving image is high, the probability that the operator makes an operation error or a judgment error increases. Further, if the skill level of the worker is low, the probability that an operation error or a judgment error occurs increases. Therefore, the idea is that it is desirable to take measures to detect an error when such a factor directly or indirectly leads to the occurrence of an error.

このように、本実施形態によれば、複数の作業者が動画に対してアノテーションとして正解ラベルを付加する状況下において、作業者による正解ラベルの候補の選択をより好適な態様で支援することが可能となる。具体的には、上述した動画の複雑度や作業者のアノテーション作業の熟練度による重複幅の制御が適用されることで、アノテーション作業工数を抑制しつつ、作業者が意図する正解ラベルの候補とは異なる候補を誤って選択するような事態が発生した場合においてもエラーを検知できる可能性を高めることが可能となる。 As described above, according to the present embodiment, it is possible to support the selection of the correct label candidate by the worker in a more preferable manner in the situation where a plurality of workers add the correct label as an annotation to the moving image. It will be possible. Specifically, by applying the control of the overlap width according to the complexity of the video and the skill level of the annotation work of the worker described above, the man-hours for the annotation work can be suppressed and the candidate for the correct label intended by the worker can be used. Can increase the possibility of detecting an error even when a situation occurs in which a different candidate is mistakenly selected.

［第２の実施形態］
第１の実施形態では、情報処理装置１００とそれぞれの作業者の端末装置１９０とを別個の装置としている。しかし、情報処理装置１００と端末装置１９０とを一体化させて、情報処理装置１００の機能と端末装置１９０の機能とを有する１台のコンピュータ装置を構成しても良い。この場合、それぞれの作業者はこの１台のコンピュータ装置を共有してアノテーション作業を行うことになる。 [Second Embodiment]
In the first embodiment, the information processing device 100 and the terminal device 190 of each worker are separate devices. However, the information processing device 100 and the terminal device 190 may be integrated to form one computer device having the functions of the information processing device 100 and the functions of the terminal device 190. In this case, each worker shares this one computer device to perform annotation work.

また、情報処理装置１００を複数台のコンピュータ装置で実装しても良い。例えば、情報処理装置１００を、動画からそれぞれの作業者の部分動画を生成してＤＢ１０３に登録するコンピュータ装置と、ＤＢ１１３にアップロードされたアノテーション作業済みの部分動画に対して図４や図６のフローチャートに従った処理を行うコンピュータ装置と、で実装しても良い。 Further, the information processing device 100 may be mounted on a plurality of computer devices. For example, the flowcharts of FIGS. 4 and 6 for a computer device that generates a partial moving image of each worker from a moving image and registers the information processing device 100 in the DB 103, and a partial moving image of the annotation work uploaded to the DB 113. It may be implemented with a computer device that performs processing according to the above.

上記の出力装置１１７は、他にも、音声を出力する装置であっても良く、その場合、上記のアラートを出力装置１１７によって音声として通知することができる。また、出力装置１１７は、表示および音声出力の両方の機能を有する装置であっても良い。 The output device 117 may be another device that outputs voice, in which case the alert can be notified as voice by the output device 117. Further, the output device 117 may be a device having both display and audio output functions.

また、上記の入力装置１０１は、マイクロフォン等の集音デバイスを含んでもよく、例えば、ユーザが発話した音声を集音してもよい。この場合、操作部１０７は、入力装置１０１を介して入力されたユーザの音声に対して音響解析や自然言語処理等の各種解析処理を行って、該音声が示す内容をユーザから入力された指示として認識しても良い。 Further, the input device 101 may include a sound collecting device such as a microphone, and may, for example, collect sound spoken by a user. In this case, the operation unit 107 performs various analysis processes such as acoustic analysis and natural language processing on the user's voice input via the input device 101, and instructs the user to input the content indicated by the voice. May be recognized as.

また、アラートの通知方法は特定の通知方法に限らない。例えば、画面上における文字や画像の表示形態を制御することでアラートの通知を行っても良いし、情報処理装置１００にスピーカを設け、該スピーカを介してアラートを音声として通知しても良い。 Further, the alert notification method is not limited to a specific notification method. For example, the alert may be notified by controlling the display form of characters and images on the screen, or the information processing apparatus 100 may be provided with a speaker and the alert may be notified as voice via the speaker.

また、上記の各実施形態で使用した数値、処理タイミング、処理順、データ（情報）の構成／送信先／送信元、画面の構成やその操作方法などは、具体的な説明を行うために一例として挙げたものであり、このような一例に限定することを意図したものではない。 In addition, the numerical values, processing timing, processing order, data (information) configuration / destination / source, screen configuration, and its operation method used in each of the above embodiments are examples for giving a concrete explanation. It is not intended to be limited to such an example.

また、以上説明した各実施形態の一部若しくは全部を適宜組み合わせて使用しても構わない。また、以上説明した各実施形態の一部若しくは全部を選択的に使用しても構わない。 In addition, some or all of the above-described embodiments may be used in combination as appropriate. Further, a part or all of each of the above-described embodiments may be selectively used.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiment, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to publicize the scope of the invention.

１００：情報処理装置１０１：入力装置１０２：取得部１０３：ＤＢ１０４：決定部１０５：管理部１０６：判定部１０７：操作部１０８：分割部１０９：カット検知部１１３：ＤＢ１１４：読み込み部１１５：検査部１１６：通知部１１７：出力装置１１８：推定部１９０：端末装置１９１：通信部１９２：付与部 100: Information processing device 101: Input device 102: Acquisition unit 103: DB 104: Determination unit 105: Management unit 106: Judgment unit 107: Operation unit 108: Division unit 109: Cut detection unit 113: DB 114: Reading unit 115: Inspection unit 116: Notification unit 117: Output device 118: Estimator unit 190: Terminal device 191: Communication unit 192: Grant unit

Claims

動画から、第１フレーム区間に対応する第１部分動画と、該第１フレーム区間と重複部分を有する第２フレーム区間に対応する第２部分動画と、を取得する取得手段と、
前記第１部分動画および前記第２部分動画のそれぞれに対するアノテーション作業を行う作業者の熟練度を表すパラメータおよび／または前記動画に係るパラメータに基づいて、前記重複部分の長さを制御する制御手段と
を備えることを特徴とする情報処理装置。 An acquisition means for acquiring a first partial moving image corresponding to the first frame section and a second partial moving image corresponding to the second frame section having an overlapping portion with the first frame section from the moving image.
A control means for controlling the length of the overlapping portion based on a parameter representing the skill level of a worker performing annotation work for each of the first partial moving image and the second partial moving image and / or a parameter related to the moving image. An information processing device characterized by being equipped with.

前記制御手段は、前記第１部分動画に対するアノテーション作業を行う第１作業者の熟練度および前記第２部分動画に対するアノテーション作業を行う第２作業者の熟練度に基づいて、前記重複部分の長さを制御することを特徴とする請求項１に記載の情報処理装置。 The control means has a length of the overlapping portion based on the skill level of the first worker who performs the annotation work for the first partial moving image and the skill level of the second worker who performs the annotation work for the second partial moving image. The information processing apparatus according to claim 1, wherein the information processing apparatus is controlled.

前記制御手段は、前記動画のフレーム数に応じて、前記重複部分の長さを制御することを特徴とする請求項１または２に記載の情報処理装置。 The information processing apparatus according to claim 1 or 2, wherein the control means controls the length of the overlapping portion according to the number of frames of the moving image.

前記制御手段は、前記動画に登場する物体のバリエーションの数に応じて、前記重複部分の長さを制御することを特徴とする請求項１ないし３のいずれか１項に記載の情報処理装置。 The information processing apparatus according to any one of claims 1 to 3, wherein the control means controls the length of the overlapping portion according to the number of variations of the object appearing in the moving image.

前記制御手段は、前記動画におけるシーンの切れ目のフレーム位置に基づいて前記重複部分の端部のフレーム位置を調整することを特徴とする請求項１ないし４のいずれか１項に記載の情報処理装置。 The information processing apparatus according to any one of claims 1 to 4, wherein the control means adjusts the frame position of the end portion of the overlapping portion based on the frame position of the break of the scene in the moving image. ..

さらに、
前記動画に対するアノテーション作業の異常を検査する検査手段と、
前記異常があった場合にはアラートを通知する通知手段と
を備えることを特徴とする請求項１ないし５のいずれか１項に記載の情報処理装置。 moreover,
Inspection means for inspecting abnormalities in annotation work for the video, and
The information processing apparatus according to any one of claims 1 to 5, further comprising a notification means for notifying an alert when there is an abnormality.

前記検査手段は、前記第１部分動画における前記重複部分のアノテーション作業の結果と、前記第２部分動画における前記重複部分のアノテーション作業の結果と、に基づいて前記検査を行うことを特徴とする請求項６に記載の情報処理装置。 The claim is characterized in that the inspection means performs the inspection based on the result of the annotation work of the overlapping portion in the first partial moving image and the result of the annotation work of the overlapping portion in the second partial moving image. Item 6. The information processing apparatus according to Item 6.

前記検査手段は、学習済みモデルが前記動画に対するアノテーション作業を推定した推定結果と、前記動画に対するアノテーション作業の結果と、に基づいて前記検査を行うことを特徴とする請求項６に記載の情報処理装置。 The information processing according to claim 6, wherein the inspection means performs the inspection based on an estimation result in which the trained model estimates the annotation work for the moving image and the result of the annotation work for the moving image. Device.

情報処理装置が行う情報処理方法であって、
前記情報処理装置の取得手段が、動画から、第１フレーム区間に対応する第１部分動画と、該第１フレーム区間と重複部分を有する第２フレーム区間に対応する第２部分動画と、を取得する取得工程と、
前記情報処理装置の制御手段が、前記第１部分動画および前記第２部分動画のそれぞれに対するアノテーション作業を行う作業者の熟練度を表すパラメータおよび／または前記動画に係るパラメータに基づいて、前記重複部分の長さを制御する制御工程と
を備えることを特徴とする情報処理方法。 It is an information processing method performed by an information processing device.
The acquisition means of the information processing device acquires a first partial moving image corresponding to the first frame section and a second partial moving image corresponding to the second frame section having an overlapping portion with the first frame section from the moving image. Acquisition process and
The overlapping portion is based on a parameter representing the skill level of a worker who performs annotation work for each of the first partial moving image and the second partial moving image and / or a parameter related to the moving image by the control means of the information processing apparatus. An information processing method characterized by comprising a control process for controlling the length of the device.

コンピュータを、請求項１ないし８のいずれか１項に記載の情報処理装置の各手段として機能させるためのコンピュータプログラム。 A computer program for making a computer function as each means of the information processing apparatus according to any one of claims 1 to 8.