CN111414860A

CN111414860A - Real-time portrait tracking and segmenting method

Info

Publication number: CN111414860A
Application number: CN202010200344.2A
Authority: CN
Inventors: 张明琦; 李云夕; 熊永春
Original assignee: Hangzhou Quwei Science & Technology Co ltd
Current assignee: Hangzhou Quwei Science & Technology Co ltd
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2020-07-14

Abstract

The invention discloses a real-time portrait tracking and segmenting method. The method specifically comprises the following steps: (1) the training stage is used for training parameters of the human image segmentation model and performing off-line training on the segmentation network, and comprises two parts, namely data preprocessing and model training; (2) in the prediction stage, picture sequence frames of a video are input into a portrait tracking segmentation algorithm, a portrait tracking frame is obtained through a KCF tracking algorithm, a portrait area is cut out according to the portrait tracking frame, the portrait area is preprocessed and input into a segmentation model, an output result is post-processed, a portrait segmentation mask corresponding to an input frame is obtained, the process is circulated according to the sequence of the video frames until a final video portrait mask sequence is obtained, and the process is finished. The invention has the beneficial effects that: the running speed of the algorithm is improved; the picture data is easier to obtain and label; the running speed of the model is increased, and the real-time requirement of the mobile terminal is met.

Description

Real-time portrait tracking and segmenting method

Technical Field

The invention relates to the technical field of image processing, in particular to a real-time portrait tracking and segmenting method.

Background

Tracking algorithms and segmentation algorithms belong to two different technical fields. Generally, in a scene, a tracking algorithm mainly continuously tracks a given target so as to obtain the position information of the target in the scene. The segmentation algorithm is mainly used for performing semantic segmentation on a given target so as to obtain a series of target masks. Therefore, if the tracking algorithm and the segmentation algorithm are combined together, the application of the tracking algorithm and the segmentation algorithm is quite wide, for example, the tracking segmentation algorithm of the portrait is used in the short video industry, and a basis can be provided for playing methods such as rendering special effects of videos and the like.

Most of the current tracking segmentation algorithms are based on deep learning methods. In terms of data, due to the need to train the tracking and the segmentation networks at the same time, video must be used as training data, and the labeling of video data takes a lot of manual time. The labeling of video data becomes a difficult point. In the aspect of models, due to the complexity of tracking and segmenting algorithms, the complexity of model structures is increased, the operation time of the algorithms is long, and the real-time requirements cannot be met.

Disclosure of Invention

The invention provides a real-time portrait tracking and segmenting method for improving the running speed of the algorithm in order to overcome the defects in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a real-time portrait tracking and segmenting method specifically comprises the following steps:

(1) the training stage is used for training parameters of the human image segmentation model and performing off-line training on the segmentation network, and comprises two parts, namely data preprocessing and model training;

(2) in the prediction stage, picture sequence frames of a video are input into a portrait tracking segmentation algorithm, a portrait tracking frame is obtained through a KCF tracking algorithm, a portrait area is cut out according to the portrait tracking frame, the portrait area is preprocessed and input into a segmentation model, an output result is post-processed, a portrait segmentation mask corresponding to an input frame is obtained, the process is circulated according to the sequence of the video frames until a final video portrait mask sequence is obtained, and the process is finished.

The invention adopts the traditional KCF tracking algorithm, and improves the running speed of the algorithm; the picture is taken as training data of the segmentation network, and compared with video data, the picture data is easier to acquire and label; designing a lightweight segmentation network, improving the running speed of the model and meeting the real-time requirement of a mobile terminal; therefore, the invention improves the running speed of the whole algorithm, and the tracking segmentation algorithm can meet the real-time performance requirement at the mobile terminal.

Preferably, in the step (1), the specific operation method is as follows:

(11) collecting different portrait data, and accurately marking a portrait area, wherein the background area is 0, and the portrait area is 1, so as to obtain a corresponding binary portrait mask;

(12) performing data enhancement processing on the training portrait data, then scaling the long edge of the image to 224, scaling the short edge in an equal ratio, and compensating for the deficiency to be 0 to be aligned to obtain an RGB input image I with the size of 224X 3_xCarrying out the same scaling operation on the corresponding binary portrait mask to obtain a training portrait mask I_y；

(13) The method comprises the steps that Mobilenetv2 is used as a coding module of a segmentation network, and the whole coding module carries out 32-time down-sampling on an input image to obtain a feature map F; in the decoding module, a decoding mode similar to U-net is adopted to recover details of the feature diagram F, and after the size of the feature diagram is recovered to 56 × 56, 4 times of upsampling layers are directly utilized to obtain an output portrait mask Y with the size of 224 × 1;

(14) for output portrait mask Y and training portrait mask I_yPerforming cross entropy loss function operation and Dice loss function operation to obtain a loss function L oss;

(15) the entire model is iterated using the penalty function L oss, updating the model parameters.

Preferably, in step (12), the data enhancement process includes mirroring, rotation, luminance-contrast transformation, affine transformation.

Preferably, in the step (2), the specific operation method is as follows:

(21) performing frame dismantling processing on the video to obtain a picture sequence frame;

(22) inputting the first frame of picture into a KCF tracking algorithm, and manually marking a portrait frame needing to be tracked to initiateKCF tracking algorithm is changed and taken as portrait tracking frame B of the first frame₁；

(23) Suppose the portrait tracking frame of the current frame is B_tWhere t represents the position of the picture frame in the video, the KCF tracking algorithm tracks the frame B according to the portrait of the current frame_tTo predict the portrait tracking frame B of the next frame_t+1；

(24) Tracking the frame B according to the portrait obtained in the step (23)_tSelf-adaptive clipping is carried out on the portrait area to obtain the portrait area P_t；

(25) For portrait region P_tPreprocessing the portrait region P_tScaling the long side to 224, scaling the short side equally, and supplementing 0 to the deficiency to obtain the RGB model input I with the size of 224X 3_t；

(26) RGB model input I_tObtaining a portrait mask output Y after the portrait segmentation model obtained in the step (1)_t；

(27) Outputting Y to portrait mask_tPost-processing is carried out to optimize the result and obtain a binary portrait mask N corresponding to the original image frame_t；

(28) And (5) circulating the steps (23) to (27) until the image segmentation of the last frame is completed, and obtaining all portrait mask frames.

Preferably, in step (24), the adaptive clipping specifically includes: firstly, judging the width-to-height ratio or the height-to-width ratio of a portrait tracking frame, if the width-to-height ratio or the height-to-width ratio is less than 0.5, expanding the range of a short side to enable the ratio of the short side to reach 0.5; then, the width and the height of the portrait tracking frame are expanded in the same proportion to obtain a portrait area P_tAnd the whole cutting area is ensured to contain a complete portrait.

Preferably, in step (27), the image is masked and Y is output_tThe post-treatment specifically comprises the following steps: firstly, carrying out binarization processing on the image to obtain a binarized portrait mask, wherein the threshold value is 0.5; secondly, performing communication domain analysis on the portrait mask, removing the mistakenly-segmented regions, and reserving the maximum region of the portrait; the result is then scaled to the portrait area P in step (24)_tThe size of (a); finally according to the portraitRegion P_tThe cutting information of the image mask is compensated with 0 to obtain a binary image mask N corresponding to the original image frame_t。

The invention has the beneficial effects that: the running speed of the algorithm is improved; the picture data is easier to obtain and label; the running speed of the model is increased, and the real-time requirement of a mobile terminal is met; the running speed of the whole algorithm is improved, so that the tracking segmentation algorithm can meet the real-time performance requirement at the mobile terminal.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of the segmentation model of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment shown in fig. 1, a real-time human image tracking and segmenting method specifically includes the following steps:

(1) the method comprises the following steps that parameters of an image segmentation model are trained in a training stage, and since the KCF tracking algorithm does not need offline training, only offline training is carried out on a segmentation network in the step, and the step comprises two parts, namely data preprocessing and model training; the specific operation method comprises the following steps:

(12) in order to improve the generalization of the network, the training portrait data is subjected to data enhancement processing, the data enhancement processing comprises mirroring, rotation, brightness contrast transformation, affine transformation and the like, then the long edge of the image is scaled to 224, the short edge is scaled in an equal ratio, the deficiency is compensated with 0 to be aligned, and the RGB input image I with the size of 224 x 3 is obtained_xCarrying out the same scaling operation on the corresponding binary portrait mask to obtain a training portrait mask I_y；

(13) The method comprises the steps that Mobilenetv2 is used as a coding module of a segmentation network, the coding module mainly comprises a series of Mobilenetv2 units, and the whole coding module carries out 32-time down-sampling on an input image to obtain a feature map F; in the decoding module, a decoding mode similar to U-net is adopted to recover details of the feature diagram F, and in order to reduce the calculation amount, after the size of the feature diagram is recovered to 56 × 56, 4 times of upsampling layers are directly utilized to obtain an output portrait mask Y with the size of 224 × 1;

(2) Inputting picture sequence frames of a video into a portrait tracking segmentation algorithm in a prediction stage, obtaining a portrait tracking frame through a KCF tracking algorithm, cutting out a portrait area according to the portrait tracking frame, preprocessing the portrait area and inputting the preprocessed portrait area into a segmentation model, post-processing an output result to obtain a portrait segmentation mask corresponding to an input frame, and circulating the process according to the sequence of the video frames until a final video portrait mask sequence is obtained; the specific operation method comprises the following steps:

(22) inputting the first frame picture into a KCF tracking algorithm, manually marking a portrait frame needing to be tracked to initialize the KCF tracking algorithm, and taking the portrait frame as a portrait tracking frame B of the first frame₁；

(23) Suppose the portrait tracking frame of the current frame is B_tWhere t denotes the position of the picture frame in the video (t ═ 1, 2, 3 …), the KCF tracking algorithm tracks the frame B from the portrait of the current frame_tTo predict the portrait tracking frame B of the next frame_t+1；

(24) Tracking the frame B according to the portrait obtained in the step (23)_tSelf-adaptive clipping is carried out on the portrait area to obtain the portrait area P_t(ii) a The self-adaptive cutting specifically comprises the following steps: firstly, judging the width-to-height ratio or the height-to-width ratio of a portrait tracking frame, if the width-to-height ratio or the height-to-width ratio is less than 0.5, expanding the range of a short side to enable the ratio of the short side to reach 0.5; then, the width and the height of the portrait tracking frame are expanded in the same proportion to obtain a portrait area P_tEnsuring that the whole cutting area contains a complete portrait;

(27) Outputting Y to portrait mask_tPost-processing is carried out to optimize the result and obtain a binary portrait mask N corresponding to the original image frame_t(ii) a Outputting Y to portrait mask_tThe post-treatment specifically comprises the following steps: firstly, carrying out binarization processing on the image to obtain a binarized portrait mask, wherein the threshold value is 0.5; secondly, performing communication domain analysis on the portrait mask, removing the mistakenly-segmented regions, and reserving the maximum region of the portrait; the result is then scaled to the portrait area P in step (24)_tThe size of (a); finally according to the portrait region P_tThe cutting information of the image mask is compensated with 0 to obtain a binary image mask N corresponding to the original image frame_t；

(28) And (5) circulating the steps (23) to (27) until the image segmentation of the last frame is completed, and obtaining all portrait mask frames (N)₁，N₂，N₃…)。

Claims

1. A real-time portrait tracking and segmenting method is characterized by comprising the following steps:

2. A real-time human image tracking and segmenting method as claimed in claim 1, wherein in the step (1), the specific operation method is as follows:

3. A method as claimed in claim 2, wherein in step (12) the data enhancement process comprises mirroring, rotation, luminance contrast transformation, affine transformation.

4. A real-time human image tracking and segmenting method as claimed in claim 1, wherein in the step (2), the specific operation method is as follows:

5. A method for real-time segmentation of human images according to claim 4 wherein in step (24) the adaptive cropping is: firstly, the aspect ratio of the portrait tracking frame is judgedOr the aspect ratio, if the aspect ratio or the aspect ratio is less than 0.5, the range of the short side is expanded to make the ratio reach 0.5; then, the width and the height of the portrait tracking frame are expanded in the same proportion to obtain a portrait area P_tAnd the whole cutting area is ensured to contain a complete portrait.

6. A method as claimed in claim 4, wherein in step (27) the image mask is output with Y_tThe post-treatment specifically comprises the following steps: firstly, carrying out binarization processing on the image to obtain a binarized portrait mask, wherein the threshold value is 0.5; secondly, performing communication domain analysis on the portrait mask, removing the mistakenly-segmented regions, and reserving the maximum region of the portrait; the result is then scaled to the portrait area P in step (24)_tThe size of (a); finally according to the portrait region P_tThe cutting information of the image mask is compensated with 0 to obtain a binary image mask N corresponding to the original image frame_t。