CN111798456A - Instance segmentation model training method and device and instance segmentation method - Google Patents

Instance segmentation model training method and device and instance segmentation method Download PDF

Info

Publication number
CN111798456A
CN111798456A CN202010454014.6A CN202010454014A CN111798456A CN 111798456 A CN111798456 A CN 111798456A CN 202010454014 A CN202010454014 A CN 202010454014A CN 111798456 A CN111798456 A CN 111798456A
Authority
CN
China
Prior art keywords
training
model
training set
detected
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010454014.6A
Other languages
Chinese (zh)
Inventor
卢运西
徐兆坤
黄银君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010454014.6A priority Critical patent/CN111798456A/en
Publication of CN111798456A publication Critical patent/CN111798456A/en
Priority to PCT/CN2021/095363 priority patent/WO2021238826A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a training method, a device and an example segmentation method of an example segmentation model, wherein the method comprises the following steps: pruning a pre-constructed deep learning model; acquiring a training set and marking the training set, wherein the training set is a set of RGBD images with target objects in a scene acquired by different depth cameras, and the RGBD images comprise a depth map and a color map; and training the pruned deep learning model by using the labeled training set to obtain an instance segmentation model. According to the method, the network structure of the existing instance segmentation model is pruned, so that the whole model is lighter, the training speed of the model and the prediction speed of the model are improved, and meanwhile, in order to prevent the model prediction precision from being reduced due to the reduction of network layers, the depth map is added, the number of channels is expanded, and the training precision and the prediction precision of the model are improved.

Description

Instance segmentation model training method and device and instance segmentation method
Technical Field
The invention belongs to the field of target detection, and particularly relates to a training method and device of an instance segmentation model and an instance segmentation method.
Background
With the continuous improvement of the technological level, the technology in the field of artificial intelligence is continuously mature and applied to the ground, and the quality of life of people is greatly improved. Nowadays, a large number of image video acquisition systems are arranged in a plurality of scenes, the advanced technology in the existing artificial intelligence field is operated in an image video system, the comprehension capability of the system to image video contents can be greatly improved, and the intelligent monitoring technical capability is provided for scenes such as offline unmanned stores, security systems, public places and the like. However, in the existing example segmentation model, because the number of network layers is large when image features are extracted, the whole training process is slow when the data size is large, and the existing example segmentation model is generally trained by using a color map, and the prediction accuracy of the example segmentation model obtained by training only using the color map is often not high in scenes such as unmanned stores, security systems, public places and the like on line. Therefore, an efficient and fast deep learning segmentation algorithm providing relevant technical capability is urgently needed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a training method, a device and an example segmentation method of an example segmentation model.
The embodiment of the invention provides the following specific technical scheme:
a first aspect discloses a method for training an instance segmentation model, the method comprising:
pruning a pre-constructed deep learning model;
acquiring a training set and marking the training set, wherein the training set is a set of RGBD images with target objects in a scene acquired by different depth cameras, and the RGBD images comprise a depth image and a color image;
and training the pruned deep learning model by using the labeled training set to obtain an instance segmentation model.
Preferably, the method further comprises: preprocessing the training set before labeling, specifically comprising:
performing three-dimensional reconstruction according to the acquired depth map in the training set to obtain a first modeling result, and simultaneously performing three-dimensional reconstruction according to the acquired depth map in the RGBD image which is acquired by different depth cameras and does not have any target object in the scene corresponding to the training set to obtain a second modeling result;
according to the second modeling result, performing background removal processing on the first modeling result to obtain a foreground image containing a target object;
determining a truncation distance corresponding to each depth camera according to the foreground image containing the target object, wherein the truncation distance is a moving distance from the target object to the depth camera;
performing truncation processing on the depth map in each training set according to the truncation distance corresponding to each depth camera, and performing normalization processing on the depth map after the truncation processing;
and carrying out normalization processing on the color image in the training set.
Preferably, the labeling of the training set specifically includes:
calculating the integrity of each target object in the RGBD image with the target object;
and when the integrity of all the target objects is greater than a first preset value, labeling the target objects in the RGBD image with the target objects to obtain a target object detection frame, and generating corresponding labels.
Preferably, the training the pruned deep learning model by using the labeled training set to obtain the instance segmentation model specifically includes:
extracting features of the marked training set, and fusing the extracted features to obtain a feature region;
performing segmentation processing on the feature region to obtain a segmentation result of the feature region, and performing regression and classification processing on the feature region to obtain a detection frame of the feature region, a classification result corresponding to the detection frame, and an instance score associated with the segmentation result;
multiplying the segmentation result and the corresponding example score to obtain an example segmentation result;
utilizing the corresponding target object detection frame to perform truncation processing on the example segmentation result, calculating an error between the truncated example segmentation result and the corresponding label, and meanwhile calculating an error between the detection frame and the corresponding target object detection frame;
calculating a total loss value according to an error between the example segmentation result after the truncation processing and the corresponding label and an error between the detection frame and the corresponding target object detection frame;
and judging the total loss value, stopping training the deep learning model when the total loss value is smaller than a second preset value, and determining the corresponding deep learning model as the example segmentation model when the total loss value is smaller than the second preset value.
Preferably, the regression processing on the feature region specifically includes:
predicting the central point of the feature region, and calculating the width and the height of the feature region according to the central point to generate the detection frame;
the method further comprises the following steps:
and when the number of the generated detection frames is more than one, performing maximum pooling on each generated detection frame and storing the detection frames after the maximum pooling meeting the first preset condition.
Preferably, the training of the pruned deep learning model by using the labeled training set specifically further comprises:
and training the deep learning model according to the learning rate corresponding to the current total loss value.
Preferably, pruning the pre-constructed deep learning model specifically includes:
obtaining an influence factor corresponding to a network layer to be pruned in the deep learning model, wherein the influence factor is a scaling factor obtained by performing normalization calculation on the network layer to be pruned;
and when the influence factor is smaller than a third preset value, pruning the network layer corresponding to the influence factor.
In a second aspect, a method for instance segmentation is disclosed, the method comprising:
acquiring a picture to be detected;
inputting the picture to be detected into a pre-trained example segmentation model for identification, and outputting a detection frame and an example segmentation result of the picture to be detected;
wherein the pre-trained instance segmentation model is obtained by training based on the method of the first aspect.
Preferably, before the picture to be detected is input to a pre-trained example segmentation model for recognition, the method further includes:
acquiring the number of the pictures to be detected;
when the number of the pictures to be detected is more than one, splicing the pictures to be detected;
the inputting the picture to be detected into a pre-trained example segmentation model for identification, and outputting the detection frame and the example segmentation result of the picture to be detected specifically comprises:
inputting the spliced pictures to be detected into the example segmentation model for identification, and outputting detection frames and example segmentation results of all the pictures to be detected;
the method further comprises the following steps:
and splitting the detection frames and the example segmentation results of all the pictures to be detected to obtain the detection frames and the example segmentation results corresponding to each picture to be detected.
In a third aspect, an apparatus for training an instance segmentation model is disclosed, the apparatus comprising:
the pruning module is used for pruning the pre-constructed deep learning model;
the acquisition module is used for acquiring a training set; the training set is a set of RGBD images with target objects in a scene collected by different depth cameras, and the RGBD images comprise a depth map and a color map;
the preprocessing module is used for labeling the training set;
and the training module is used for training the pruned deep learning model by utilizing the labeled training set to obtain an example segmentation model.
The embodiment of the invention has the following beneficial effects:
1. according to the invention, the deep learning model is pruned, so that the network structure is lighter, the speed is high when the model is trained and the model is used for prediction, and meanwhile, a depth map is added when the deep learning network is trained, so that the number of channels is expanded, the training precision is improved, and the prediction precision is also improved;
2. when the deep learning model is trained, the depth map in the training data is subjected to truncation processing and normalization, and the color map in the training data is subjected to normalization processing, so that the accuracy of the training data is improved, and the training precision of the model is improved;
3. according to the method, the training data are labeled by using a special labeling strategy, and the data with low integrity are removed, so that the effectiveness of labeling the training data is improved, and the training precision of the model is also improved;
4. according to the embodiment segmentation model, the center point of the target object is predicted by using an anchor free method, then the width and the height are obtained through regression, so that the detection frame is obtained, the detection frame is subjected to maximum pooling treatment to realize de-duplication, the detection robustness under the condition of dense personnel is improved, and the loss of the detection frame under the condition of dense personnel is effectively avoided;
5. when the method and the device utilize the model for prediction, the input data are spliced and the output result is split, so that parallel processing is realized, the execution efficiency is improved, the efficient utilization of computing resources is improved, and the method and the device are more suitable for application scenes related to video monitoring.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a training method of an example segmentation model provided in embodiment 1 of the present application;
FIG. 2 is a block diagram of an example segmentation model provided in embodiment 1 of the present application;
FIG. 3 is a flow chart of an example segmentation method provided in embodiment 2 of the present application;
fig. 4 is a schematic structural diagram of a training apparatus for an example segmentation model provided in embodiment 3 of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As described in the background art, the existing force segmentation network algorithm is complex and high, and the real-time performance of the algorithm is difficult to guarantee, so that a more efficient and rapid segmentation algorithm is needed to better distinguish different individuals in videos and images and obtain richer body information of human bodies, so that better target object identification can be realized in scenes such as unmanned stores, security systems, public places and the like. Based on the above, the application provides a training method of an example segmentation model, which can obtain a model with lighter weight, faster speed and higher precision in training and prediction.
Example 1
As shown in fig. 1, a training method of an example segmentation model includes the following steps:
s11, constructing a deep learning model;
the method builds a basic network structure based on a YOLACT model, and builds a deep learning model. The deep learning model comprises: the model has a specific structure as shown in fig. 2, and includes a content 18 network, an FPN network, two network branches (protonet and Pred _ heads) connected to the FPN network, a crop network, and the like. The content 18 network is used for extracting features, the FPN network is used for fusing the features, the protonet network branch is used for segmenting the characteristic diagram to obtain segmentation results including a foreground and a background, and the Pred _ headers are used for predicting the characteristic diagram to obtain a detection frame, a category and a confidence coefficient of a target object and an example segmentation score associated with the prediction results of the protonet network branch.
S12, pruning the deep learning model;
in order to lighten the network change, the network layer is selected to be pruned when the network is trained and the network is used for prediction at a high speed.
In the application, a network layer is pruned by a coarse-grained method, and the method comprises the following specific steps:
1. acquiring an influence factor corresponding to a network layer to be pruned in the deep learning model, wherein the influence factor is a scaling factor obtained by carrying out normalization calculation on the network layer to be pruned;
wherein, the network layer to be pruned is a convolutional layer.
2. And when the influence factor is smaller than a preset value, pruning the network layer corresponding to the influence factor.
Specifically, a batch normalization layer is added after each convolution layer (i.e., each layer of the content 18 network), and normalization calculation is performed for each convolution layer. The calculation formula of the batch normalization layer comprises a parameter gamma, wherein the gamma is a scaling factor, and when the gamma is smaller than a preset value, a corresponding channel is less important, so that the network layer can be pruned. In addition, a regular term related to gamma can be added to the calculation formula, so that automatic pruning can be realized in the training process of the model.
S13, acquiring and labeling a training set, wherein the training set is a set of RGBD images with target objects in a scene acquired by different depth cameras, and the RGBD images comprise a depth map and a color map;
the purpose of labeling is to preprocess the RGBD image, so that a target object detection frame and a label can be obtained.
The process of labeling the training set specifically includes:
s131, calculating the integrity of each target object in the RGBD image with the target object;
and S132, when the integrity of all the target objects is larger than a preset value, labeling the target objects in the RGBD image with the target objects to obtain a target object detection frame, and generating corresponding labels.
For example, the preset value may be 1/2, and therefore, when the integrity of the target object is greater than 1/2, the target object in the RGBD image with the target object is labeled and a corresponding label is generated. By the special marking strategy, the effectiveness of training data can be improved, and the precision of subsequent model training and prediction can be improved.
In order to further improve the accuracy of model training and prediction, the training set may be further processed, specifically including:
1. performing three-dimensional reconstruction according to the acquired depth map in the training set to obtain a first modeling result, and simultaneously performing three-dimensional reconstruction according to the acquired depth map in the RGBD image which is acquired by different depth cameras and does not have any target object in the scene corresponding to the training set to obtain a second modeling result;
specifically, the three-dimensional model including the target object and the three-dimensional model not including any target object are constructed by jointly calibrating the depth maps acquired by different depth cameras.
2. According to the second modeling result, performing background removal processing on the first modeling result to obtain a foreground image containing the target object;
3. determining a truncation distance corresponding to each depth camera according to a foreground image containing a target object, wherein the truncation distance is a moving distance from the target object to the depth camera;
since the moving range of the target object under each depth camera is not fixed, the truncation distance may be a dynamic range.
4. Carrying out truncation processing on the depth map in each training set through the truncation distance corresponding to each depth camera, and carrying out normalization processing on the depth map subjected to truncation processing;
by performing truncation processing on the depth map by using the truncation distance, some noise in the depth map can be filtered.
5. And carrying out normalization processing on the color map in the training set.
Through the processing process, the accuracy of the training data is improved, and therefore the training precision of the model is improved.
And S14, training the pruned deep learning model by using the labeled training set to obtain an example segmentation model.
Step S14 specifically includes:
s141, extracting features of the marked training set, and fusing the extracted features to obtain a feature region;
specifically, the labeled training set is input into a present 18 network, and the present 18 network includes a plurality of convolution layers for extracting features of the training set to obtain features of multiple dimensions. And after the feature extraction is finished, inputting the features of multiple dimensions into the FPN network to obtain a feature region.
The features output by the event 18 network are two, one is a low-level feature, the other is a high-level feature, the semantic information of the low-level feature is less, but the target position is accurate, and the semantic information of the high-level feature is rich, but the target position is rough. The FPN network is a characteristic pyramid network, so that two types of characteristics can be fused, the multi-scale problem is solved, and the target detection performance is improved.
S142, carrying out segmentation processing on the feature region to obtain a segmentation result of the feature region, and carrying out regression and classification processing on the feature region to obtain a detection frame of the feature region, a classification result and confidence corresponding to the detection frame and an instance score associated with the segmentation result;
the method comprises the steps of connecting two branches of an FPN network, inputting characteristic regions into the two network branches (protonet and Pred _ heads) respectively, wherein the protonet network branches are used for segmenting characteristic regions to obtain segmentation results including foreground and background, and the Pred _ heads are used for predicting the characteristic regions to obtain detection frames, categories and confidence degrees of target objects and example scores related to the segmentation results.
S143, multiplying the segmentation result and the corresponding example score to obtain an example segmentation result;
s144, utilizing the corresponding target object detection frame to perform truncation processing on the example segmentation result, calculating an error between the truncated example segmentation result and the corresponding label, and meanwhile calculating an error between the detection frame and the corresponding target object detection frame;
s145, calculating a total loss value according to the error between the example segmentation result after the truncation processing and the corresponding label and the error between the detection frame and the corresponding target object detection frame;
in the scheme, the total loss value is the sum of the error between the example segmentation result after the truncation processing and the corresponding label and the error between the detection frame and the corresponding target object detection frame.
And S146, judging the total loss value, stopping training the deep learning model when the total loss value is smaller than a second preset value, and determining the corresponding deep learning model as an example segmentation model when the total loss value is smaller than the second preset value.
When the total loss value is less than a preset value, the whole model is converged, and the training can be stopped at the moment.
In addition, in the process of training the model, the deep learning model is optimally trained by using a gradient descent algorithm, and in order to improve the convergence rate of the model, corresponding learning rates can be set for loss values in different stages, and the method specifically comprises the following steps:
and training the deep learning model according to the learning rate corresponding to the current loss value.
After the model training is completed, the model may be verified to ensure the prediction accuracy of the model, and specifically, the method may include the following implementation steps:
1. acquiring a verification set and marking the verification set, wherein the verification set is a set of RGBD images with target objects acquired by different depth cameras, and the RGBD images comprise a depth image and a color image;
2. inputting the marked verification set into an example segmentation model to obtain an output result;
the output result can be output in a fixed round, for example, when the model iterates for 5 times, the result is output once, so that the reasonability and the efficiency of the model verification process can be ensured.
3. And comparing the output result with the real result to verify the example segmentation model.
Example 2
Based on the example segmentation model obtained by training in the above embodiment 1, an embodiment of the present invention further provides an example segmentation method, as shown in fig. 3, the method includes:
s31, acquiring a picture to be detected;
and S32, inputting the picture to be detected into a pre-trained example segmentation model for identification, and outputting a detection frame and an example segmentation result of the picture to be detected.
The identification process of the picture to be detected may specifically refer to the training process of the model in embodiment 1. Before outputting the detection frame and the example segmentation result of the picture to be detected, the confidence coefficient needs to be compared with a preset value, and with specific reference to fig. 2, after comparing the confidence coefficient with the preset value, the Crop module outputs the detection frame corresponding to the confidence coefficient higher than the preset value and the corresponding example segmentation result.
The pre-trained example segmentation model is obtained by training based on the method described in embodiment 1.
In order to improve the prediction speed of different pictures, the scheme further comprises the following steps:
s41, before the pictures to be detected are input to a pre-trained example segmentation model for identification, the number of the pictures to be detected is obtained;
s42, when the number of the pictures to be detected is more than one, splicing the pictures to be detected;
s43, inputting the spliced pictures to be detected into an example segmentation model for identification, and outputting detection frames and example segmentation results of all the pictures to be detected;
and S44, splitting the detection frames and the example segmentation results of all the pictures to be detected to obtain the detection frames and the example segmentation results corresponding to each picture to be detected.
Based on the processing process (splicing the pictures to be detected before prediction and splitting the prediction result after prediction), a plurality of pictures can be predicted at the same time, so that the parallelization capability of model prediction is greatly improved, the efficient utilization of computing resources is improved, and the method is more suitable for the application scene related to video monitoring.
Example 3
Based on the foregoing embodiment 1, an embodiment of the present invention further provides a training apparatus for an example segmentation model, as shown in fig. 4, the apparatus includes:
a pruning module 41, configured to prune the pre-constructed deep learning model;
an obtaining module 42, configured to obtain a training set; the training set is a set of RGBD images with target objects in a scene collected by different depth cameras, and the RGBD images comprise a depth map and a color map;
a preprocessing module 43, configured to label the training set;
and the training module 44 is configured to train the pruned deep learning model by using the labeled training set to obtain an instance segmentation model.
Further, the preprocessing module 43 is further configured to preprocess the training set before labeling, and specifically includes:
performing three-dimensional reconstruction according to the acquired depth map in the training set to obtain a first modeling result, and simultaneously performing three-dimensional reconstruction according to the acquired depth map in the RGBD image which is acquired by different depth cameras and does not have any target object in the scene corresponding to the training set to obtain a second modeling result;
according to the second modeling result, performing background removal processing on the first modeling result to obtain a foreground image containing the target object;
determining a truncation distance corresponding to each depth camera according to a foreground image containing a target object, wherein the truncation distance is a moving distance from the target object to the depth camera;
carrying out truncation processing on the depth map in the training set through the truncation distance corresponding to each depth camera, and carrying out normalization processing on the depth map after the truncation processing;
and carrying out normalization processing on the color map in the training set.
Further, the preprocessing module 43 is specifically configured to:
calculating the integrity of each target object in the RGBD image with the target object;
and when the integrity of all the target objects is greater than a first preset value, labeling the target objects in the RGBD image with the target objects to obtain a target object detection frame, and generating corresponding labels.
Further, the training module 44 specifically includes:
the feature extraction and fusion module 441 is configured to perform feature extraction on the labeled training set, and fuse features obtained after extraction to obtain a feature region;
a prediction module 442, configured to perform segmentation processing on the feature region to obtain a segmentation result about the feature region, and perform regression and classification processing on the feature region to obtain a detection frame about the feature region, a classification result corresponding to the detection frame, and an instance score associated with the segmentation result;
a processing module 443, configured to multiply the segmentation result and the corresponding instance score to obtain an instance segmentation result;
the processing module 443 is further configured to perform truncation processing on the instance segmentation result by using the corresponding target object detection box;
a calculating module 444, configured to calculate an error between the truncated instance segmentation result and the corresponding tag, and calculate an error between the detection frame and the corresponding target object detection frame;
the calculating module 444 is further configured to calculate a total loss value according to an error between the truncated instance segmentation result and the corresponding tag, and an error between the detection frame and the corresponding target object detection frame;
the determining module 445 is configured to determine the total loss value, stop training the deep learning model when the total loss value is smaller than a second preset value, and determine the corresponding deep learning model as the example segmentation model when the total loss value is smaller than the second preset value.
Further, the prediction module 442 is specifically configured to:
predicting the central point of the characteristic region, and calculating the width and the height of the characteristic region according to the central point to generate a detection frame;
the prediction module 442 is further configured to:
and when the number of the generated detection frames is more than one, performing maximum pooling on each generated detection frame and storing the detection frames after the maximum pooling meeting the first preset condition.
Further, the training module 44 is further configured to: and training the deep learning model according to the learning rate corresponding to the current total loss value.
Further, the pruning module 41 is specifically configured to:
acquiring an influence factor corresponding to a network layer to be pruned in the deep learning model, wherein the influence factor is a scaling factor obtained by carrying out normalization calculation on the network layer to be pruned;
and when the influence factor is smaller than a third preset value, pruning the network layer corresponding to the influence factor.
It should be noted that: in the training apparatus for the example segmentation model provided in this embodiment, only the division of the functional modules is exemplified, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the training apparatus for the example segmentation model of this embodiment and the training method embodiment for the example segmentation model in embodiment 1 belong to the same concept, and specific implementation processes and beneficial effects thereof are described in detail in the text recognition model training method embodiment, and are not described herein again.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for training an instance segmentation model, the method comprising:
pruning a pre-constructed deep learning model;
acquiring a training set and marking the training set, wherein the training set is a set of RGBD images with target objects in a scene acquired by different depth cameras, and the RGBD images comprise a depth image and a color image;
and training the pruned deep learning model by using the labeled training set to obtain an instance segmentation model.
2. The method of claim 1, further comprising: preprocessing the training set before labeling, specifically comprising:
performing three-dimensional reconstruction according to the acquired depth map in the training set to obtain a first modeling result, and simultaneously performing three-dimensional reconstruction according to the acquired depth map in the RGBD image which is acquired by different depth cameras and does not have any target object in the scene corresponding to the training set to obtain a second modeling result;
according to the second modeling result, performing background removal processing on the first modeling result to obtain a foreground image containing a target object;
determining a truncation distance corresponding to each depth camera according to the foreground image containing the target object, wherein the truncation distance is a moving distance from the target object to the depth camera;
carrying out truncation processing on the depth map in the training set through the truncation distance corresponding to each depth camera, and carrying out normalization processing on the depth map after the truncation processing;
and carrying out normalization processing on the color image in the training set.
3. The method of claim 1, wherein labeling the training set specifically comprises:
calculating the integrity of each target object in the RGBD image with the target object;
and when the integrity of all the target objects is greater than a first preset value, labeling the target objects in the RGBD image with the target objects to obtain a target object detection frame, and generating corresponding labels.
4. The method of claim 3, wherein training the pruned deep learning model with the labeled training set to obtain the instance segmentation model specifically comprises:
extracting features of the marked training set, and fusing the extracted features to obtain a feature region;
performing segmentation processing on the feature region to obtain a segmentation result of the feature region, and performing regression and classification processing on the feature region to obtain a detection frame of the feature region, a classification result corresponding to the detection frame, and an instance score associated with the segmentation result;
multiplying the segmentation result and the corresponding example score to obtain an example segmentation result;
utilizing the corresponding target object detection frame to perform truncation processing on the example segmentation result, calculating an error between the truncated example segmentation result and the corresponding label, and meanwhile calculating an error between the detection frame and the corresponding target object detection frame;
calculating a total loss value according to an error between the example segmentation result after the truncation processing and the corresponding label and an error between the detection frame and the corresponding target object detection frame;
and judging the total loss value, stopping training the deep learning model when the total loss value is smaller than a second preset value, and determining the corresponding deep learning model as the example segmentation model when the total loss value is smaller than the second preset value.
5. The method according to claim 4, wherein performing regression processing on the feature region specifically comprises:
predicting the central point of the feature region, and calculating the width and the height of the feature region according to the central point to generate the detection frame;
the method further comprises the following steps:
and when the number of the generated detection frames is more than one, performing maximum pooling on each generated detection frame and storing the detection frames after the maximum pooling meeting the first preset condition.
6. The method of claim 4, wherein training the pruned deep learning model using the labeled training set further comprises:
and training the deep learning model according to the learning rate corresponding to the current total loss value.
7. The method according to any one of claims 1 to 6, wherein pruning the pre-constructed deep learning model specifically comprises:
obtaining an influence factor corresponding to a network layer to be pruned in the deep learning model, wherein the influence factor is a scaling factor obtained by performing normalization calculation on the network layer to be pruned;
and when the influence factor is smaller than a third preset value, pruning the network layer corresponding to the influence factor.
8. An instance splitting method, the method comprising:
acquiring a picture to be detected;
inputting the picture to be detected into a pre-trained example segmentation model for identification, and outputting a detection frame and an example segmentation result of the picture to be detected;
the pre-trained example segmentation model is obtained by training based on the method of any one of claims 1-7.
9. The method according to claim 8, wherein before inputting the picture to be detected into a pre-trained instance segmentation model for recognition, the method further comprises:
acquiring the number of the pictures to be detected;
when the number of the pictures to be detected is more than one, splicing the pictures to be detected;
the inputting the picture to be detected into a pre-trained example segmentation model for identification, and outputting the detection frame and the example segmentation result of the picture to be detected specifically comprises:
inputting the spliced pictures to be detected into the example segmentation model for identification, and outputting detection frames and example segmentation results of all the pictures to be detected;
the method further comprises the following steps:
and splitting the detection frames and the example segmentation results of all the pictures to be detected to obtain the detection frames and the example segmentation results corresponding to each picture to be detected.
10. An apparatus for training an instance segmentation model, the apparatus comprising:
the pruning module is used for pruning the pre-constructed deep learning model;
the acquisition module is used for acquiring a training set; the training set is a set of RGBD images with target objects in a scene collected by different depth cameras, and the RGBD images comprise a depth map and a color map;
the preprocessing module is used for labeling the training set;
and the training module is used for training the pruned deep learning model by utilizing the labeled training set to obtain an example segmentation model.
CN202010454014.6A 2020-05-26 2020-05-26 Instance segmentation model training method and device and instance segmentation method Pending CN111798456A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010454014.6A CN111798456A (en) 2020-05-26 2020-05-26 Instance segmentation model training method and device and instance segmentation method
PCT/CN2021/095363 WO2021238826A1 (en) 2020-05-26 2021-05-24 Method and apparatus for training instance segmentation model, and instance segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010454014.6A CN111798456A (en) 2020-05-26 2020-05-26 Instance segmentation model training method and device and instance segmentation method

Publications (1)

Publication Number Publication Date
CN111798456A true CN111798456A (en) 2020-10-20

Family

ID=72806274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010454014.6A Pending CN111798456A (en) 2020-05-26 2020-05-26 Instance segmentation model training method and device and instance segmentation method

Country Status (2)

Country Link
CN (1) CN111798456A (en)
WO (1) WO2021238826A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330709A (en) * 2020-10-29 2021-02-05 奥比中光科技集团股份有限公司 Foreground image extraction method and device, readable storage medium and terminal equipment
CN113139983A (en) * 2021-05-17 2021-07-20 北京华捷艾米科技有限公司 Human image segmentation method and device based on RGBD
WO2021238826A1 (en) * 2020-05-26 2021-12-02 苏宁易购集团股份有限公司 Method and apparatus for training instance segmentation model, and instance segmentation method
CN113781500A (en) * 2021-09-10 2021-12-10 中国科学院自动化研究所 Method and device for segmenting cabin segment image instance, electronic equipment and storage medium
CN116721342A (en) * 2023-06-05 2023-09-08 淮阴工学院 Hybrid rice quality recognition device based on deep learning

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155427A (en) * 2021-12-17 2022-03-08 成都交大光芒科技股份有限公司 Visual monitoring self-adaptive on-off state identification method and system for contact network switch
CN114550117A (en) * 2022-02-21 2022-05-27 京东鲲鹏(江苏)科技有限公司 Image detection method and device
CN114612825B (en) * 2022-03-09 2024-03-19 云南大学 Target detection method based on edge equipment
CN115052154B (en) * 2022-05-30 2023-04-14 北京百度网讯科技有限公司 Model training and video coding method, device, equipment and storage medium
CN115100579B (en) * 2022-08-09 2024-03-01 郑州大学 Intelligent video damage segmentation system in pipeline based on optimized deep learning
CN115760748B (en) * 2022-11-14 2023-06-16 江苏科技大学 Ice circumferential crack size measurement method based on deep learning
CN115993365B (en) * 2023-03-23 2023-06-13 山东省科学院激光研究所 Belt defect detection method and system based on deep learning
CN116993660A (en) * 2023-05-24 2023-11-03 淮阴工学院 PCB defect detection method based on improved EfficientDet
CN116433747B (en) * 2023-06-13 2023-08-18 福建帝视科技集团有限公司 Construction method and detection device for detection model of wall thickness of bamboo tube

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
US20200065976A1 (en) * 2018-08-23 2020-02-27 Seoul National University R&Db Foundation Method and system for real-time target tracking based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598340A (en) * 2018-11-15 2019-04-09 北京知道创宇信息技术有限公司 Method of cutting out, device and the storage medium of convolutional neural networks
CN109949316B (en) * 2019-03-01 2020-10-27 东南大学 Power grid equipment image weak supervision example segmentation method based on RGB-T fusion
CN110378345B (en) * 2019-06-04 2022-10-04 广东工业大学 Dynamic scene SLAM method based on YOLACT instance segmentation model
CN110782467B (en) * 2019-10-24 2023-05-30 新疆农业大学 Horse body ruler measuring method based on deep learning and image processing
CN111798456A (en) * 2020-05-26 2020-10-20 苏宁云计算有限公司 Instance segmentation model training method and device and instance segmentation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
US20200065976A1 (en) * 2018-08-23 2020-02-27 Seoul National University R&Db Foundation Method and system for real-time target tracking based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL BOLYA等: "YOLACT: Real-time Instance Segmentation", 《HTTPS://ARXIV.ORG/PDF/1904.02689V2.PDF》 *
ZHUANG LIU等: "Learning Efficient Convolutional Networks through Network Slimming", 《HTTPS://ARXIV.ORG/PDF/1708.06519V1.PDF》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238826A1 (en) * 2020-05-26 2021-12-02 苏宁易购集团股份有限公司 Method and apparatus for training instance segmentation model, and instance segmentation method
CN112330709A (en) * 2020-10-29 2021-02-05 奥比中光科技集团股份有限公司 Foreground image extraction method and device, readable storage medium and terminal equipment
CN113139983A (en) * 2021-05-17 2021-07-20 北京华捷艾米科技有限公司 Human image segmentation method and device based on RGBD
CN113781500A (en) * 2021-09-10 2021-12-10 中国科学院自动化研究所 Method and device for segmenting cabin segment image instance, electronic equipment and storage medium
CN113781500B (en) * 2021-09-10 2024-04-05 中国科学院自动化研究所 Method, device, electronic equipment and storage medium for segmenting cabin image instance
CN116721342A (en) * 2023-06-05 2023-09-08 淮阴工学院 Hybrid rice quality recognition device based on deep learning
CN116721342B (en) * 2023-06-05 2024-06-11 淮阴工学院 Hybrid rice quality recognition device based on deep learning

Also Published As

Publication number Publication date
WO2021238826A1 (en) 2021-12-02

Similar Documents

Publication Publication Date Title
CN111798456A (en) Instance segmentation model training method and device and instance segmentation method
CN109919031B (en) Human behavior recognition method based on deep neural network
CN111310731B (en) Video recommendation method, device, equipment and storage medium based on artificial intelligence
CN106845621B (en) Dense population number method of counting and system based on depth convolutional neural networks
CN112380921A (en) Road detection method based on Internet of vehicles
CN112183240B (en) Double-current convolution behavior identification method based on 3D time stream and parallel space stream
CN113297956B (en) Gesture recognition method and system based on vision
CN112818821B (en) Human face acquisition source detection method and device based on visible light and infrared light
CN113269224A (en) Scene image classification method, system and storage medium
CN111415338A (en) Method and system for constructing target detection model
CN113221770A (en) Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN113516146A (en) Data classification method, computer and readable storage medium
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN115424017A (en) Building internal and external contour segmentation method, device and storage medium
CN112907138B (en) Power grid scene early warning classification method and system from local to whole perception
CN114519863A (en) Human body weight recognition method, human body weight recognition apparatus, computer device, and medium
CN113673308A (en) Object identification method, device and electronic system
CN115131826B (en) Article detection and identification method, and network model training method and device
CN116580232A (en) Automatic image labeling method and system and electronic equipment
CN115937492A (en) Transformer equipment infrared image identification method based on feature identification
CN115187906A (en) Pedestrian detection and re-identification method, device and system
CN115311680A (en) Human body image quality detection method and device, electronic equipment and storage medium
CN114387496A (en) Target detection method and electronic equipment
CN114329050A (en) Visual media data deduplication processing method, device, equipment and storage medium
CN112541469A (en) Crowd counting method and system based on self-adaptive classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201020