WO2019232707A1

WO2019232707A1 - Method and device for weakly-supervised video object splitting

Info

Publication number: WO2019232707A1
Application number: PCT/CN2018/090069
Authority: WO
Inventors: 马汝辉; 张宗璞; 宋涛; 华扬; 管海兵
Original assignee: 上海交通大学
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2019-12-12

Abstract

Disclosed is a weakly-supervised video object splitting method, comprising: constructing a video object splitting model, when the first frame of a test video and a test object bounding box are inputted, pretraining the video object splitting model on the basis of an iterative algorithm; tracking the test object bounding box in each frame subsequent to the first frame of the test video; predicting a pixel level for each frame subsequent to the first frame of the test video, generating an image mask comprising foreground and background information of a current frame; and optimizing the image mask on the basis of the tracking result to produce a final object splitting computation result for the current frame. Also disclosed is a weakly-supervised video object splitting device, comprising a weakly-supervised object splitting pretraining module, a video object tracking module, a video object splitting test module, and a video object splitting optimization module. The present invention reduces the labor costs required for video object splitting and increases the performance of video object splitting.

Description

一种弱监督视频物体分割方法及装置 Method and device for weakly supervised video object segmentation Ranch

技术领域Technical field

本发明涉及计算机视觉方向的视频处理领域，尤其涉及一种弱监督视频物体分割方法及装置。The present invention relates to the field of video processing in the direction of computer vision, and in particular, to a method and device for segmenting weakly supervised video objects.

背景技术Background technique

视频物体分割(Video Object Segmentation, VOS)是计算机视频处理领域的热点研究问题之一。视频物体分割可以对视频中的每一帧生成包含物体前后景信息的图像掩码。视频物体分割框架生成的掩码有着广泛的应用，例如视频编辑、自动驾驶、视频监控、和基于视频内容的视频编码应用等。目前该领域的主流方法包括监督式(Supervised VOS)、无监督式(Unsupervised VOS)、和半监督式(Semi-supervised VOS)视频物体分割框架。Video Object Segmentation (Video Object Segmentation, VOS) is one of the hot research issues in the field of computer video processing. Video object segmentation can generate an image mask for each frame in the video that contains the foreground and background information of the object. The mask generated by the video object segmentation framework has a wide range of applications, such as video editing, autonomous driving, video surveillance, and video coding applications based on video content. The current mainstream methods in this field include supervised (Supervised VOS), Unsupervised VOS, and Semi-supervised VOS video object segmentation framework.

监督式视频物体分割框架假定当前测试序列中，所有视频帧具有人工标定的预先知识，并通过与使用者的交互，协同生成与改善视频物体分割的图像掩码。此类方法不适用于自动化的视频物体分割任务，因其通常在缺少人工介入的情况下性能不佳。无监督视频物体分割框架假定不存在对当前视频序列的任何预先知识，直接对每一个视频帧进行物体分割。此类框架由于缺少针对当前视频序列的上下文信息，往往因误引入无关物体导致性能不佳。The supervised video object segmentation framework assumes that in the current test sequence, all video frames have prior knowledge of manual calibration, and through interaction with users, collaboratively generate and improve image masks for video object segmentation. Such methods are not suitable for automated video object segmentation tasks, as they often perform poorly without human intervention. The unsupervised video object segmentation framework assumes that there is no prior knowledge of the current video sequence and directly performs object segmentation on each video frame. Due to the lack of context information for the current video sequence, such frameworks often have poor performance due to the misintroduction of irrelevant objects.

半监督式视频物体分割框架假定当前测试序列中，第一帧的人工标定信息已经给出。框架通过学习第一帧内的信息，提升分割框架在当前视频序列中的测试准确度。然而，与监督式物体分割框架类似，半监督式框架需要人工标定第一帧的图像掩码。由于标定工作既费时又费力，此类框架在现实生活中的应用空间有限。与此同时，由于半监督式框架在对于第一帧进行学习的过程中难以控制过拟合(Over Fitting)现象的发生，这类框架在测试首帧以后的视频帧时往往出现图像掩码中物体掩码不完整的情况，显著影响视频物体分割性能。The semi-supervised video object segmentation framework assumes that in the current test sequence, artificial calibration information for the first frame has been given. The framework improves the test accuracy of the segmentation framework in the current video sequence by learning the information in the first frame. However, similar to the supervised object segmentation framework, the semi-supervised framework requires manual calibration of the image mask of the first frame. Because calibration is time-consuming and labor-intensive, the application space for such frameworks in real life is limited. At the same time, because the semi-supervised framework is difficult to control overfitting during the learning of the first frame (Over The occurrence of the fitting phenomenon, such frames often have incomplete object masks in the image mask when testing video frames after the first frame, which significantly affects the video object segmentation performance.

除了视频物体分割，视频物体追踪(Video Object Tracking, VOT)也是计算机视频处理领域的另一热点研究问题。视频物体追踪对视频中的每一帧生成围绕测试物体的边界框，并根据视频前后帧的关系持续更新边界框的位置和大小。In addition to video object segmentation, video object tracking (Video Object Tracking, VOT) is another hot research issue in the field of computer video processing. Video object tracking generates a bounding box around the test object for each frame in the video, and continuously updates the position and size of the bounding box according to the relationship between the frames before and after the video.

因此，本领域的技术人员致力于开发一种弱监督视频物体分割方法及装置，将物体边界框与视频物体分割装置进行融合，利用视频物体追踪辅助视频物体分割，降低了人工标定成本，避免了无关物体的引入，提升了视频物体分割性能。Therefore, those skilled in the art are committed to developing a weakly supervised video object segmentation method and device, fusing the object bounding box with the video object segmentation device, using video object tracking to assist video object segmentation, reducing manual calibration costs and avoiding The introduction of irrelevant objects improves the performance of video object segmentation.

发明内容Summary of the Invention

有鉴于现有技术的上述缺陷，本发明所要解决的技术问题是如何降低人工标定成本，如何避免无关物体引入，提升视频物体分割性能。In view of the above-mentioned shortcomings of the prior art, the technical problem to be solved by the present invention is how to reduce the cost of manual calibration, how to avoid the introduction of irrelevant objects, and improve the performance of video object segmentation.

为实现上述目的，本发明提供了一种弱监督视频物体分割方法，包括以下步骤：To achieve the above object, the present invention provides a method for segmenting a weakly supervised video object, including the following steps:

S01构建视频物体分割模型，输入测试视频首帧和首帧中的测试物体边界框后，基于迭代式算法对视频分物体割模型进行预训练；S01 constructs a video object segmentation model. After inputting the first frame of the test video and the bounding box of the test object in the first frame, pre-train the video sub-object cut model based on the iterative algorithm;

S02对所述测试视频首帧后的每一帧中的测试物体边界框进行追踪，更新所述测试物体边界框；S02 tracks the bounding box of the test object in each frame after the first frame of the test video, and updates the bounding box of the test object;

S03基于步骤S02输出的所述测试物体边界框，对所述测试视频首帧后的每一帧进行像素级别的预测，生成一张当前帧的包含前后景信息的图像掩码；S03 based on the bounding box of the test object output in step S02, performs pixel-level prediction on each frame after the first frame of the test video, and generates a current frame image mask including foreground and background information;

S04基于步骤S02输出的结果，对步骤S03输出的图像掩码进行优化，得到当前帧的最终物体分割计算结果。S04 Based on the result output in step S02, the image mask output in step S03 is optimized to obtain the final object segmentation calculation result of the current frame.

进一步地，步骤S01中，所述首帧中的测试物体边界框通过人工标定的方式获得。Further, in step S01, the bounding box of the test object in the first frame is obtained by manual calibration.

进一步地，步骤S01中，基于迭代式算法对视频物体分割模型进行预训练的步骤包括：Further, in step S01, the step of pre-training the video object segmentation model based on the iterative algorithm includes:

S11 利用当前视频物体分割模型，生成对于测试视频首帧的图像掩码；S11 Use the current video object segmentation model to generate an image mask for the first frame of the test video;

S12 基于首帧中的测试物体边界框，对所述测试视频首帧的图像掩码进行优化；S12: optimize the image mask of the first frame of the test video based on the bounding box of the test object in the first frame;

S13 使用优化后的图像掩码训练当前视频物体分割模型；S13 use the optimized image mask to train the current video object segmentation model;

S14 重复步骤S11至S13，达到迭代次数后结束。S14 repeats steps S11 to S13, and ends after reaching the number of iterations.

进一步地，步骤S12中，对所述测试视频首帧的图像掩码进行优化，包括删除无关物体、补全测试物体的缺失部分。Further, in step S12, the image mask of the first frame of the test video is optimized, including deleting irrelevant objects and completing missing parts of the test objects.

进一步地，步骤S02中，所述测试物体边界框包括所述测试物体的位置信息和尺寸信息。Further, in step S02, the bounding box of the test object includes position information and size information of the test object.

进一步地，步骤S03中，预测的范围是所述测试物体附近的子区域，所述子区域由所述测试物体边界框给出。Further, in step S03, the predicted range is a subregion near the test object, and the subregion is given by the bounding box of the test object.

进一步地，步骤S04中，对步骤S03输出的图像掩码进行优化包括：Further, in step S04, optimizing the image mask output in step S03 includes:

去除无关物体；Remove irrelevant objects;

根据所述测试物体边界框优化所述测试物体的边缘；Optimize edges of the test object according to the bounding box of the test object;

根据所述测试物体边界框平滑所述测试物体的缺损。Defects of the test object are smoothed according to the test object bounding box.

本发明还公开了一种弱监督视频物体分割装置，包括弱监督视频物体分割预训练模块、视频物体追踪模块、视频物体分割测试模块和视频物体分割优化模块：The invention also discloses a weakly supervised video object segmentation device, including a weakly supervised video object segmentation pre-training module, a video object tracking module, a video object segmentation test module, and a video object segmentation optimization module:

所述弱监督视频物体分割预训练模块通过输入测试物体边界框与测试视频首帧，利用基于迭代式算法对视频物体分割模型进行预训练；The weakly supervised video object segmentation pre-training module pre-trains a video object segmentation model by inputting a test object bounding box and a first frame of a test video;

所述视频物体追踪模块用于块对测试视频首帧以后的每一帧中测试物体的边界框进行追踪，从而对物体的位置和大小进行准确的预测；The video object tracking module is used to track the bounding box of the test object in each frame after the first frame of the test video, so as to accurately predict the position and size of the object;

所述视频物体分割测试模块用于对测试视频首帧以后的每一帧进行像素级别的预测，生成一张区分前景与背景的图像掩码；The video object segmentation test module is used to perform pixel-level prediction for each frame after the first frame of the test video to generate an image mask that distinguishes the foreground from the background;

所述视频物体分割优化模块利用所述视频物体追踪模块生成的测试物体边界框，对所述视频物体分割测试模块生成的图像掩码进行优化。The video object segmentation optimization module uses a test object boundary box generated by the video object tracking module to optimize an image mask generated by the video object segmentation test module.

进一步地，本发明公开的一种弱监督视频物体分割装置，其工作流程包括：Further, a weakly-supervised video object segmentation device disclosed in the present invention includes a work flow including:

步骤1、弱监督视频物体分割装置开始运行，所述弱监督视频物体分割预训练模块利用测试物体边界框与测试视频首帧，基于迭代式算法生成用于视频分割模型预训练的首帧图像掩码，并执行预训练；Step 1. The weakly supervised video object segmentation device starts to operate. The weakly supervised video object segmentation pre-training module uses the test object bounding box and the first frame of the test video to generate the first frame image mask for pre-training of the video segmentation model based on the iterative algorithm. Code and perform pre-training;

步骤2、所述视频物体追踪模块对测试视频首帧以后每一帧中测试物体的边界框进行追踪，并将追踪结果传递给所述视频物体分割测试模块；Step 2: The video object tracking module tracks the bounding box of the test object in each frame after the first frame of the test video, and transmits the tracking result to the video object segmentation test module;

步骤3、所述视频物体分割测试模块基于所述视频物体追踪模块给出的测试物体边界框，对测试视频首帧以后每一帧中测试物体附近的子区域进行像素级预测，生成包含前后景信息的图像掩码，并将所述图像掩码传递给所述视频物体分割优化模块；Step 3: The video object segmentation test module performs pixel-level prediction on a sub-region near the test object in each frame after the first frame of the test video based on the test object bounding box given by the video object tracking module to generate a foreground and background scene. The image mask of the information, and pass the image mask to the video object segmentation optimization module;

步骤4、所述视频物体分割优化模块利用所述视频物体追踪模块输出的测试物体边界框，对所述视频物体分割测试模块生成的图像掩码进行优化，生成对于当前帧的最终物体分割计算结果。Step 4. The video object segmentation optimization module uses the test object boundary box output by the video object tracking module to optimize the image mask generated by the video object segmentation test module to generate a final object segmentation calculation result for the current frame. .

进一步地，若所述测试视频存在待测帧，所述弱监督视频物体分割装置重复步骤2至步骤4。Further, if there is a frame to be tested in the test video, the weakly supervised video object segmentation device repeats steps 2 to 4.

本发明公开的一种弱监督视频物体分割方法及装置具有以下技术效果：The method and device for segmenting weakly supervised video objects disclosed by the present invention have the following technical effects:

（1）本发明假定当前测试序列中，进给出第一帧的人工标定物体边界框，而非完整的图像掩码，能够显著降低框架运行所需要的人工标定成本，有利于推进相关工作在现实生活中的应用研究。(1) The present invention assumes that in the current test sequence, the bounding box of the manually calibrated object is given in the first frame instead of the complete image mask, which can significantly reduce the manual calibration cost required for the operation of the frame, and is conducive to promoting related work. Applied research in real life.

（2）本发明能够从视频序列首帧中学习当前序列的上下文信息，避免无关物体引入的缺陷，提升视频物体分割测试时的性能。(2) The present invention can learn the context information of the current sequence from the first frame of the video sequence, avoid the defects introduced by irrelevant objects, and improve the performance of the video object segmentation test.

（3）本发明中视频物体追踪辅助的视频物体分割优化模块，能够利用来自物体边界框的物***置和大小信息，避免物体掩码不完整的缺陷。与此同时根据物***置，降低无关物体被引入物体掩码的可能性，进一步提升视频物体分割的性能(3) The video object segmentation optimization module assisted by video object tracking in the present invention can use the object position and size information from the object's bounding box to avoid the defect of incomplete object masks. At the same time, according to the object position, the possibility of irrelevant objects being introduced into the object mask is reduced, and the performance of video object segmentation is further improved

以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明，以充分地了解本发明的目的、特征和效果。The concept, specific structure, and technical effects of the present invention will be further described below with reference to the drawings to fully understand the objects, features, and effects of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的一个较佳实施例的视频物体分割方法流程示意图；1 is a schematic flowchart of a video object segmentation method according to a preferred embodiment of the present invention;

图2是本发明的一个较佳实施例的迭代式算法流程示意图；2 is a schematic diagram of an iterative algorithm flow according to a preferred embodiment of the present invention;

图3是本发明的一个较佳实施例的视频物体分割装置***结构图；3 is a structural diagram of a video object segmentation device system according to a preferred embodiment of the present invention;

图4是本发明的一个较佳实施例的视频物体分割装置工作流程示意图。FIG. 4 is a schematic flowchart of a video object segmentation apparatus according to a preferred embodiment of the present invention.

具体实施方式Detailed ways

以下参考说明书附图介绍本发明的多个优选实施例，使其技术内容更加清楚和便于理解。本发明可以通过许多不同形式的实施例来得以体现，本发明的保护范围并非仅限于文中提到的实施例。The following describes several preferred embodiments of the present invention with reference to the accompanying drawings of the specification to make its technical content clearer and easier to understand. The present invention can be embodied by many different forms of embodiments, and the protection scope of the present invention is not limited to the embodiments mentioned in the text.

图1为本发明一个较佳实施例中的弱监督视频物体分割方法流程示意图，包括以下步骤：FIG. 1 is a schematic flowchart of a weakly supervised video object segmentation method in a preferred embodiment of the present invention, including the following steps:

本实施例中，采用人工标定方式给出测试物体边界框，而非完整的图像掩码，显著降低了所需要的人工标定成本，有利于推进相关工作在现实生活中的应用研究。In this embodiment, a manual calibration method is used to give the bounding box of the test object instead of a complete image mask, which significantly reduces the required manual calibration cost and facilitates the application of related work in real life.

视频物体追踪是计算机视频处理领域的一个热点研究问题，对测试视频中的每一帧生成围绕测试物体的边界框，并根据测试视频前后帧的关系持续更新测试物体边界框的位置和大小。Video object tracking is a hot research problem in the field of computer video processing. A bounding box surrounding the test object is generated for each frame in the test video, and the position and size of the bounding box of the test object are continuously updated according to the relationship between the frames before and after the test video.

测试物体附近的子区域由视频物体追踪模块控制的物体边界框给出，仅对测试物体附近的子区域进行预测，能够从源头上减少无关物体被引入当前帧图像掩码的可能性。The subregion near the test object is given by the object bounding box controlled by the video object tracking module. Only the subregion near the test object is predicted, which can reduce the possibility of irrelevant objects being introduced into the current frame image mask from the source.

S04基于步骤S02输出的结果，对步骤S03输出的图像掩码进行优化，得到当前帧的最终物体分割计算结果；S04 Based on the result output in step S02, optimize the image mask output in step S03 to obtain the final object segmentation calculation result of the current frame;

优化主要包括：去除无关物体、根据测试物体边界框优化掩图像码中测试物体的边缘、根据测试物体边界框平滑图像掩码中测试物体的缺损等。优化后的图像掩码被用作当前帧的最终物体分割计算结果。The optimization mainly includes: removing irrelevant objects, optimizing the edges of the test objects in the image code according to the bounding box of the test object, and smoothing the defects of the test objects in the image mask according to the bounding box of the test object. The optimized image mask is used as the final object segmentation calculation result for the current frame.

图2给出了本发明的一个较佳实施例的迭代算法流程示意图，基于迭代式算法对视频物体分割模型进行预训练，其步骤包括：FIG. 2 shows a schematic flowchart of an iterative algorithm according to a preferred embodiment of the present invention. Pre-training a video object segmentation model based on an iterative algorithm includes the following steps:

本实施例中的迭代式生成算法首先利用原始物体分割模型，生成对于当前测试视频首帧的图像掩码。接着利用测试物体边界框对所述首帧的图像掩码进行优化，包括无关物体的删除，测试物体缺失部分补全。该优化过后的所述首帧的图像掩码将用作训练标定数据对原始物体分割模型进行训练，随后重复生成所述首帧的图像掩码、优化所述首帧的图像掩码、训练当前视频物体分割模型的工作流程。经过N次迭代，最终经过优化的掩码被用作当前测试视频首帧的预测值，迭代次数越多，预训练需要的时间越长，预训练后的视频物体分割模型在后续视频帧上的性能通常越好。本实施例中，迭代式算法的主程序伪代码如下：The iterative generation algorithm in this embodiment first uses the original object segmentation model to generate an image mask for the first frame of the current test video. Then use the bounding box of the test object to optimize the image mask of the first frame, including deletion of irrelevant objects, and completion of missing parts of the test object. The optimized image mask of the first frame will be used as training calibration data to train the original object segmentation model, and then repeatedly generate the image mask of the first frame, optimize the image mask of the first frame, and train the current Workflow of video object segmentation model. After N iterations, the finally optimized mask is used as the prediction value of the first frame of the current test video. The more iterations, the longer the time required for pre-training. The pre-trained video object segmentation model is used on subsequent video frames. Performance is usually better. In this embodiment, the main program pseudo code of the iterative algorithm is as follows:

SEGMENTATION_PRE_TRAIN (bounding_box, model, first_frame)SEGMENTATION_PRE_TRAIN (bounding_box, model, first_frame)

for i = 1, 2, …, Nfor i = 1, 2,…, N

prediction = PREDICT_MASK(model, first_frame) prediction = PREDICT_MASK (model, first_frame)

refined_pred = REFINE_MASK(prediction, bounding_box) refined_pred = REFINE_MASK (prediction, bounding_box)

model = TRAIN_MODEL(model, refined_pred) model = TRAIN_MODEL (model, refined_pred)

return modelreturn model

图3给出了本发明的一个较佳实施例的视频物体分割装置***结构图，包括弱监督视频物体分割预训练模块、视频物体追踪模块、视频物体分割测试模块和视频物体分割优化模块：FIG. 3 is a structural diagram of a video object segmentation device system according to a preferred embodiment of the present invention, including a weakly supervised video object segmentation pre-training module, a video object tracking module, a video object segmentation test module, and a video object segmentation optimization module:

图4给出了本发明的一个较佳实施例的视频物体分割装置工作流程示意图，包括：FIG. 4 is a schematic flowchart of a video object segmentation device according to a preferred embodiment of the present invention, including:

步骤4、所述视频物体分割优化模块利用所述视频物体追踪模块输出的测试物体边界框，对所述视频物体分割测试模块生成的图像掩码进行优化，生成对于当前帧的最终物体分割计算结果；若所述测试视频存在待测帧，重复步骤2至步骤4。Step 4. The video object segmentation optimization module uses the test object boundary box output by the video object tracking module to optimize the image mask generated by the video object segmentation test module to generate a final object segmentation calculation result for the current frame. If there is a frame to be tested in the test video, repeat step 2 to step 4.

本实施例中，所述弱监督视频物体分割预训练模块只需人工标定测试物体的边界框，而非测试视频首帧的图像掩码，显著降低了人工标定成本，同时，所述弱监督视频物体分割预训练模块能够从测试视频首帧中学习当前序列的上下文信息，避免了无关物体引入的缺陷，提升了视频物体分割装置的性能。In this embodiment, the weakly supervised video object segmentation pre-training module only needs to manually calibrate the bounding box of the test object, rather than the image mask of the first frame of the test video, which significantly reduces the cost of manual calibration. The object segmentation pre-training module can learn the context information of the current sequence from the first frame of the test video, avoiding the defects introduced by irrelevant objects, and improving the performance of the video object segmentation device.

本实施例中引入了基于视频物体追踪辅助的视频物体分割优化模块，利用来自测试物体边界框的位置信息和大小信息，能能够避免当前帧的图像掩码不完整的缺陷，与此同时，根据测试物体的位置信息，降低了无关物体被引入图像掩码的可能性，进一步提升了视频物体分割的性能。In this embodiment, a video object segmentation optimization module based on video object tracking assistance is introduced. By using the position information and size information from the bounding box of the test object, the defect of the incomplete image mask of the current frame can be avoided. The position information of the test object reduces the possibility of irrelevant objects being introduced into the image mask and further improves the performance of video object segmentation.

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred embodiments of the present invention have been described in detail above. It should be understood that the ordinary technology in the art can make many modifications and changes according to the concept of the present invention without creative work. Therefore, any technical solution that can be obtained by a person skilled in the technical field through logical analysis, reasoning, or limited experiments based on the concept of the present invention based on the prior art should fall within the protection scope determined by the claims.

Claims

一种弱监督视频物体分割方法，其特征在于，包括以下步骤：A weakly supervised video object segmentation method is characterized in that it includes the following steps:

S01构建视频物体分割模型，输入测试视频首帧和首帧中的测试物体边界框后，基于迭代式算法对视频分物体割模型进行预训练；S01 constructs a video object segmentation model. After inputting the first frame of the test video and the bounding box of the test object in the first frame, pre-train the video sub-object cut model based on the iterative algorithm;

S02对所述测试视频首帧后的每一帧中的测试物体边界框进行追踪，更新所述测试物体边界框；S02 tracks the bounding box of the test object in each frame after the first frame of the test video, and updates the bounding box of the test object;

S03基于步骤S02输出的所述测试物体边界框，对所述测试视频首帧后的每一帧进行像素级别的预测，生成一张当前帧的包含前后景信息的图像掩码；S03 based on the bounding box of the test object output in step S02, performs pixel-level prediction on each frame after the first frame of the test video, and generates a current frame image mask including foreground and background information;

S04基于步骤S02输出的结果，对步骤S03输出的图像掩码进行优化，得到当前帧的最终物体分割计算结果。 S04 Based on the result output in step S02, the image mask output in step S03 is optimized to obtain the final object segmentation calculation result of the current frame. Ranch
如权利要求1所述的弱监督视频物体分割方法，其特征在于，步骤S01中，所述首帧中的测试物体边界框通过人工标定的方式获得。The method for segmenting a weakly supervised video object according to claim 1, wherein in step S01, the bounding box of the test object in the first frame is obtained by manual calibration.
如权利要求1所述的弱监督视频物体分割方法，其特征在于，步骤S01中，基于迭代式算法对视频物体分割模型进行预训练的步骤包括：The method for segmenting a weakly supervised video object according to claim 1, wherein in step S01, the step of pre-training the video object segmentation model based on an iterative algorithm comprises:

S11 利用当前视频物体分割模型，生成对于测试视频首帧的图像掩码；S11 Use the current video object segmentation model to generate an image mask for the first frame of the test video;

S12 基于首帧中的测试物体边界框，对所述测试视频首帧的图像掩码进行优化；S12: optimize the image mask of the first frame of the test video based on the bounding box of the test object in the first frame;

S13 使用优化后的图像掩码训练当前视频物体分割模型；S13 use the optimized image mask to train the current video object segmentation model;

S14 重复步骤S11至S13，达到迭代次数后结束。S14 repeats steps S11 to S13, and ends after reaching the number of iterations.
如权利要求3所述的弱监督视频物体分割方法，其特征在于，步骤S12中，对所述测试视频首帧的图像掩码进行优化，包括删除无关物体、补全测试物体的缺失部分。The method for segmenting a weakly supervised video object according to claim 3, wherein in step S12, the image mask of the first frame of the test video is optimized, including deleting irrelevant objects and completing missing parts of the test objects.
如权利要求1所述的弱监督视频物体分割方法，其特征在于，步骤S02中，所述测试物体边界框包括所述测试物体的位置信息和尺寸信息。The method for segmenting a weakly supervised video object according to claim 1, wherein in step S02, the bounding box of the test object includes position information and size information of the test object.
如权利要求1所述的弱监督视频物体分割方法，其特征在于，步骤S03中，预测的范围是所述测试物体附近的子区域，所述子区域由所述测试物体边界框给出。The method for segmenting a weakly supervised video object according to claim 1, wherein in step S03, the predicted range is a sub-region near the test object, and the sub-region is given by a bounding box of the test object.
如权利要求1所述的弱监督视频物体分割方法，其特征在于，步骤S04中，对步骤S03输出的图像掩码进行优化包括：The method for segmenting a weakly supervised video object according to claim 1, wherein in step S04, optimizing the image mask output in step S03 includes:

去除无关物体；Remove irrelevant objects;

根据所述测试物体边界框优化所述测试物体的边缘；Optimize edges of the test object according to the bounding box of the test object;

根据所述测试物体边界框平滑所述测试物体的缺损。Defects of the test object are smoothed according to the test object bounding box.
一种弱监督视频物体分割装置，其特征在于，包括弱监督视频物体分割预训练模块、视频物体追踪模块、视频物体分割测试模块和视频物体分割优化模块：A weakly supervised video object segmentation device is characterized in that it includes a weakly supervised video object segmentation pre-training module, a video object tracking module, a video object segmentation test module, and a video object segmentation optimization module:

所述弱监督视频物体分割预训练模块通过输入测试物体边界框与测试视频首帧，利用基于迭代式算法对视频物体分割模型进行预训练；The weakly supervised video object segmentation pre-training module pre-trains a video object segmentation model by inputting a test object bounding box and a first frame of a test video;

所述视频物体追踪模块用于块对测试视频首帧以后的每一帧中测试物体的边界框进行追踪，从而对物体的位置和大小进行准确的预测；The video object tracking module is used to track the bounding box of the test object in each frame after the first frame of the test video, so as to accurately predict the position and size of the object;

所述视频物体分割测试模块用于对测试视频首帧以后的每一帧进行像素级别的预测，生成一张区分前景与背景的图像掩码；The video object segmentation test module is used to perform pixel-level prediction for each frame after the first frame of the test video to generate an image mask that distinguishes the foreground from the background;

所述视频物体分割优化模块利用所述视频物体追踪模块生成的测试物体边界框，对所述视频物体分割测试模块生成的图像掩码进行优化。The video object segmentation optimization module uses a test object boundary box generated by the video object tracking module to optimize an image mask generated by the video object segmentation test module.
如权利要求8所述的弱监督视频物体分割装置，其特征在于，工作流程包括：The device for segmenting a weakly-supervised video object according to claim 8, wherein the work flow includes:

步骤1、弱监督视频物体分割装置开始运行，所述弱监督视频物体分割预训练模块利用测试物体边界框与测试视频首帧，基于迭代式算法生成用于视频分割模型预训练的首帧图像掩码，并执行预训练；Step 1. The weakly supervised video object segmentation device starts to operate. The weakly supervised video object segmentation pre-training module uses the test object bounding box and the first frame of the test video to generate the first frame image mask for pre-training of the video segmentation model based on the iterative algorithm. Code and perform pre-training;

步骤2、所述视频物体追踪模块对测试视频首帧以后每一帧中测试物体的边界框进行追踪，并将追踪结果传递给所述视频物体分割测试模块；Step 2: The video object tracking module tracks the bounding box of the test object in each frame after the first frame of the test video, and transmits the tracking result to the video object segmentation test module;

步骤3、所述视频物体分割测试模块基于所述视频物体追踪模块给出的测试物体边界框，对测试视频首帧以后每一帧中测试物体附近的子区域进行像素级预测，生成包含前后景信息的图像掩码，并将所述图像掩码传递给所述视频物体分割优化模块；Step 3: The video object segmentation test module performs pixel-level prediction on a sub-region near the test object in each frame after the first frame of the test video based on the test object bounding box given by the video object tracking module to generate a foreground and background scene The image mask of the information, and pass the image mask to the video object segmentation optimization module;

步骤4、所述视频物体分割优化模块利用所述视频物体追踪模块输出的测试物体边界框，对所述视频物体分割测试模块生成的图像掩码进行优化，生成对于当前帧的最终物体分割计算结果。Step 4. The video object segmentation optimization module uses the test object boundary box output by the video object tracking module to optimize the image mask generated by the video object segmentation test module to generate a final object segmentation calculation result for the current frame. .
如权利要求9所述的弱监督视频物体分割装置，其特征在于，若所述测试视频存在待测帧，重复步骤2至步骤4。The device for segmenting a weakly-supervised video object according to claim 9, wherein if there is a frame to be tested in the test video, steps 2 to 4 are repeated.