CN115861739A - Training method, device, equipment, storage medium and product of image segmentation model - Google Patents

Training method, device, equipment, storage medium and product of image segmentation model Download PDF

Info

Publication number
CN115861739A
CN115861739A CN202310078429.1A CN202310078429A CN115861739A CN 115861739 A CN115861739 A CN 115861739A CN 202310078429 A CN202310078429 A CN 202310078429A CN 115861739 A CN115861739 A CN 115861739A
Authority
CN
China
Prior art keywords
image
target
foreground
motion
segmentation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310078429.1A
Other languages
Chinese (zh)
Other versions
CN115861739B (en
Inventor
刘继超
詹慧媚
付晓雪
金岩
邱敏
唐至威
胡国锋
冯谨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainayun IoT Technology Co Ltd
Qingdao Hainayun Digital Technology Co Ltd
Qingdao Hainayun Intelligent System Co Ltd
Original Assignee
Hainayun IoT Technology Co Ltd
Qingdao Hainayun Digital Technology Co Ltd
Qingdao Hainayun Intelligent System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainayun IoT Technology Co Ltd, Qingdao Hainayun Digital Technology Co Ltd, Qingdao Hainayun Intelligent System Co Ltd filed Critical Hainayun IoT Technology Co Ltd
Priority to CN202310078429.1A priority Critical patent/CN115861739B/en
Publication of CN115861739A publication Critical patent/CN115861739A/en
Application granted granted Critical
Publication of CN115861739B publication Critical patent/CN115861739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a training method, a device, equipment, a storage medium and a product of an image segmentation model, which belong to the technical field of image processing, and the method comprises the following steps: obtaining a binary label according to the target foreground and the target background image; superposing the multi-frame target foreground in the target background image according to the motion information of the target foreground in the initial frame image to obtain a motion blurred image; inputting the initial frame image and the motion blurred image as training data into an image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model; and training the image segmentation model according to the predicted segmentation image and the binarization label. The method can automatically generate the data required by training, does not need to manually acquire training data and manually mark, trains the model by using the motion blurred image obtained by superposition, obtains the image segmentation model with more accurate segmentation, and effectively and accurately segments the foreground.

Description

Training method, device, equipment, storage medium and product of image segmentation model
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a product for training an image segmentation model.
Background
Foreground segmentation is one of the research difficulties in the field of image analysis, and the existing foreground segmentation methods are mainly divided into foreground segmentation methods based on traditional image algorithms and foreground segmentation methods based on deep learning.
The foreground segmentation method based on the traditional image algorithm mainly relies on the background subtraction principle, and obtains a difference image positioning foreground target by constructing a background model so as to perform foreground segmentation. The foreground segmentation algorithm based on the deep learning is mostly evolved from a detection task and a semantic segmentation task, and the relation between the foreground and the background is explored from a single image layer, so that the foreground segmentation is carried out.
However, most of the image segmentation models are trained based on the target segmentation data set, and a large amount of manual labeling and manual training data collection are required before training, which undoubtedly increases the difficulty of model training.
Disclosure of Invention
The application provides a training method, a training device, equipment, a storage medium and a product of an image segmentation model, which are used for solving the problems that a large amount of manual labels and training data are required to be collected manually before the existing image segmentation model is trained.
In a first aspect, the present application provides a training method for an image segmentation model, including:
setting a target foreground in a target background image to obtain an initial frame image, and obtaining a binarization label of the initial frame image according to the target foreground and the target background image;
superposing a plurality of frames of target foregrounds in the target background image according to motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information comprises a motion direction and a motion translation matrix;
inputting the initial frame image and the motion blurred image as training data to an image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model;
and training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
In a second aspect, the present application provides an image segmentation method, including:
acquiring an image to be detected;
extracting a background image from the image to be detected;
and respectively inputting the image to be detected and the background image into an image segmentation model to obtain a target segmentation image output by the image segmentation model, wherein the image segmentation model is obtained by training according to the method of the first aspect.
In a third aspect, the present application provides a training apparatus for an image segmentation model, including:
the image processing unit is used for setting a target foreground in a target background image to obtain an initial frame image and obtaining a binary label of the initial frame image according to the target foreground and the target background image;
the image processing unit is further configured to superimpose the multiple frames of target foregrounds in the target background image according to motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information includes a motion direction and a motion translation matrix;
the first processing unit is used for inputting the initial frame image and the motion blurred image into an image segmentation model as training data and acquiring a prediction segmentation image output by the image segmentation model;
and the first processing unit is further used for training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
In a fourth aspect, the present application provides an image segmentation apparatus, comprising:
the acquisition unit is used for acquiring an image to be detected;
the second processing unit is used for extracting a background image from the image to be detected;
the second processing unit is further configured to input the image to be detected and the background image to an image segmentation model respectively, so as to obtain a target segmentation image output by the image segmentation model, where the image segmentation model is a model obtained by training according to the method of the first aspect.
In a fifth aspect, the present application provides an electronic device, comprising: a processor, a memory and a transceiver;
a processor, a memory and transceiver circuitry interconnected;
the memory stores computer execution instructions;
a transceiver for transceiving data;
the processor executes the computer-executable instructions stored by the memory, causing the processor to perform the method according to the first or second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method according to the first or second aspect when executed by a processor.
In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.
According to the training method, the training device, the training equipment, the training storage medium and the training product of the image segmentation model, an initial frame image is obtained by setting a target foreground in a target background image, and a binarization label of the initial frame image is obtained according to the target foreground and the target background image; superposing multiple frames of target foregrounds in the target background image according to the motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information comprises a motion direction and a motion translation matrix; inputting the initial frame image and the motion blurred image as training data to an image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model; according to the prediction segmentation image and the binarization label, the image segmentation model is trained to obtain a trained image segmentation model, data required by training can be automatically generated, training data do not need to be collected manually, labeled data can be generated without manual labeling, the model is trained by the aid of the motion blurred image obtained by superposition, the segmentation precision of the image segmentation model can be improved, and the image segmentation model with accurate segmentation is obtained. And the image can be accurately segmented through the trained image segmentation model, so that the foreground can be effectively and accurately segmented.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario of a training method of an image segmentation model provided in the present application;
FIG. 2 is a schematic flow chart of a training method of an image segmentation model provided in the present application;
FIG. 3 is a schematic flow chart of another training method for an image segmentation model provided in the present application;
FIG. 4 is a schematic diagram of foreground and background images provided herein;
FIG. 5 is a diagram illustrating a motion-blurred image obtained by a training method of an image segmentation model provided in the present application;
FIG. 6 is an image directly synthesized using a multi-frame construction image;
FIG. 7 is an image synthesized using SuBSENSE;
fig. 8 is an ASPP structural diagram of the GNet model provided in the present application;
FIG. 9 is a flowchart illustrating an image segmentation modeling method according to the present application;
FIG. 10 is a first schematic structural diagram of the GNet model provided in the present application;
fig. 11 is a schematic structural diagram of the GNet model provided in the present application;
FIG. 12 is a schematic structural diagram of an image segmentation model training apparatus according to the present application;
fig. 13 is a schematic structural diagram of an image segmentation module apparatus provided in the present application;
FIG. 14 is a first block diagram of an electronic device for implementing a training method or an image segmentation model method of an image segmentation model according to an embodiment of the present application;
fig. 15 is a second block diagram of an electronic device for implementing a training method or an image segmentation model method of an image segmentation model according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
For a clear understanding of the technical solutions of the present application, a detailed description of the prior art solutions will be given first.
The foreground segmentation method based on the traditional image algorithm mainly relies on the background subtraction principle, and obtains a difference image positioning foreground target by constructing a background model so as to perform foreground segmentation. The foreground segmentation algorithm based on the deep learning is mostly evolved from a detection task and a semantic segmentation task, and the relation between the foreground and the background is explored from a single image layer, so that the foreground segmentation is carried out.
However, most of the segmentation models are trained based on a target segmentation data set, a large amount of manual labeling and manual training data collection are required before training, the difficulty of model training is undoubtedly increased, and the generalization of the models is difficult to guarantee.
The method comprises the steps of setting a target foreground in a target background image to obtain an initial frame image, further obtaining a binarization label of the initial frame image according to the target foreground and a current background image, superposing a plurality of frames of target foreground in the initial frame image in the target background image according to motion information of the target foreground in the initial frame image to obtain a motion blurred image, inputting the initial frame image and the motion blurred image into an image segmentation model as training data, obtaining a prediction segmentation image output by the image segmentation model, training the image segmentation model according to the prediction segmentation image and the binarization label to obtain a trained image segmentation model, generating the initial frame image and the binarization label according to the target foreground and the target background image, superposing the plurality of frames of target foreground and the target background to obtain the motion blurred image, automatically generating required data for training, generating labeled data without manually acquiring the training data and manually labeling the image, and training by adopting the motion segmentation model obtained by superposition.
Therefore, the inventor proposes a technical solution of the embodiment of the present application based on the above-mentioned inventive findings. An application scenario of the training method for the image segmentation model provided in the embodiment of the present application is described below.
As shown in fig. 1, in an application scenario of the training method for an image segmentation model provided in the embodiment of the present application, a target foreground is a person, a target background image is a street, and an electronic device 1 sets the target foreground in the target background image to obtain an initial frame image, and obtains a binarization label of the initial frame image, that is, a binarization image, according to the target foreground and the target background image. Superposing multiple frames of target foregrounds in a target background image according to motion information of the target foregrounds in an initial frame image to obtain a motion blurred image, wherein the motion information comprises a motion direction and a motion translation matrix, and inputting the initial frame image and the motion blurred image into an image segmentation model as training data by the electronic equipment 1 to obtain a prediction segmentation image output by the image segmentation model; the electronic device 1 trains an image segmentation model based on the predicted segmented image and the binarization label to obtain a trained image segmentation model. The method has the advantages that the data required by training can be automatically generated, the labeled data can be generated without manually collecting the training data, manual labeling is not needed, the model is trained by the motion blurred image obtained through superposition, the segmentation precision of the image segmentation model can be improved, and the image segmentation model with accurate segmentation is obtained.
Embodiments of the present application will be described below in detail with reference to the accompanying drawings.
Fig. 2 is a flowchart illustrating a method for training an image segmentation model according to the present application, where the method is applied to an electronic device. Among other things, the electronic device may be a digital computer that represents various forms. Such as cellular phones, smart phones, laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. As shown in fig. 2, the method includes:
step 201, setting the target foreground in the target background image to obtain an initial frame image, and obtaining a binarization label of the initial frame image according to the target foreground and the target background image.
In this embodiment, a background image data set and a foreground data set are preset, a background image is selected from the background image data set, the selected background image is used as a target background image, a foreground is selected from the foreground data set, and the selected foreground is used as a target foreground. And setting the target foreground in the target background image to obtain an initial frame image. And further obtaining a binarization label of the initial frame image according to the target foreground and the target background image, comparing the binarization label with the prediction segmentation image, and evaluating the training condition of the image segmentation model.
Step 202, superposing the multiple frames of target foregrounds in the target background image according to the motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information comprises a motion direction and a motion translation matrix.
In this embodiment, motion information of a target foreground in an initial frame image is obtained, and multiple frames of target foregrounds are superimposed in a target background image according to the motion information of the target foreground in the initial frame image to obtain a motion blurred image, where the motion information includes a motion direction and a motion translation matrix, the multiple frames of target foregrounds are different in position in the target background image, the position of each frame of target foreground is determined based on the motion information of a previous frame of target foreground in the target background image, the position of a next frame of target foreground in the target background image is determined based on the motion information of the previous frame of target foreground in the target background image, and the multiple frames of target foregrounds are superimposed in the same target background image to obtain the motion blurred image. The effect of fuzzy processing of the target foreground edge can be obtained through superposition, more training data can be obtained through processing and amplification of the image, and the accuracy of the model can be improved.
Step 203, inputting the initial frame image and the motion blurred image as training data to the image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model.
In the present embodiment, an initial frame image and a motion-blurred image are input as training data and a motion-blurred image is input as training data to an image segmentation model, and a predicted segmented image of the image segmentation model is obtained, which is a binarized image.
And step 204, training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
In this embodiment, an image segmentation model is trained based on a predicted segmented image and a binarization label to obtain a trained image segmentation model, and the trained image segmentation model is used to perform segmentation processing on the image.
The method includes the steps of setting a target foreground in a target background image to obtain an initial frame image, further obtaining a binarization label of the initial frame image according to the target foreground and a current background image, superposing multiple frames of target foreground in the initial frame image in the target background image to obtain a motion blurred image, inputting the initial frame image and the motion blurred image into an image segmentation model as training data, obtaining a prediction segmentation image output by the image segmentation model, training the image segmentation model according to the prediction segmentation image and the binarization label to obtain a trained image segmentation model, generating the initial frame image and the binarization label according to the target foreground and the target background image, superposing the multiple frames of target foreground and the target background to obtain the motion blurred image, automatically generating required training data, generating labeled data without manually collecting the training data and without manually marking, training the model by adopting the motion blurred image obtained by superposition, improving the segmentation precision of the image segmentation model, and obtaining the image segmentation model with more accurate segmentation.
Fig. 3 is a schematic flowchart of another training method for an image segmentation model provided in the present application, where the method is applied to an electronic device, and as shown in fig. 3, the method includes:
step 301a, a background image data set and a foreground data set are obtained, the background image data set includes a plurality of background images, and the foreground data set includes a plurality of foregrounds.
In this embodiment, a background image data set and a foreground data set are obtained, where the background image data set may be a cityscaps data set, that is, an urban landscape data set, and is currently widely used in one of data sets in a public scene in the field of computer vision. The foreground dataset may be a COCO dataset, which is a large-scale dataset that may be used for image detection, semantic segmentation, and image header generation. It should be noted that the data set may also be other data sets, and is not limited to the above data set.
Step 301b, selecting a plurality of foreground images from the foreground data set, and using the selected foreground as the target foreground, and selecting a plurality of background images from the background image data set, and using the selected background image as the target background image.
The background image data set comprises a plurality of background images, the foreground data set comprises a plurality of foregrounds, the plurality of foregrounds are selected from the foreground data set optionally, for example, the plurality of foregrounds (n is more than or equal to 0 and less than or equal to 5) are selected optionally, the selected foregrounds are used as target foregrounds, the plurality of background images are selected optionally from the background image data set optionally, the selected background images are used as target background images, the target foregrounds are set on the target background images, initial frame images are obtained, the initial frame images can be used as training data, the training data are automatically generated by adopting the mode, the training data can be effectively amplified, and the training data do not need to be collected manually.
Step 301, setting the target foreground in the target background image to obtain an initial frame image, and obtaining a binarization label of the initial frame image according to the target foreground and the target background image.
In a possible implementation manner, setting a target foreground in a target background image to obtain an initial frame image includes:
step 3011, determine a first distance from the left boundary of the target foreground to the left boundary of the target background image according to the length of the target background image and the length of the target foreground.
In this embodiment, the image is initialized, and the first distance from the left boundary of the target foreground to the left boundary of the target background image is determined according to the length of the target background image and the length of the target foreground, see fig. 4,I o To a foreground, I b As a background image of the object, w o Is the target foreground length, w b For the current background image length, according to w o And w b Calculating a first distance from the left boundary of the target foreground to the left boundary of the target background image, wherein the first distance is represented as w left ,w left ∈(0,w b- w o )。
Step 3012, determine a second distance from the upper boundary of the target foreground to the upper boundary of the target background image according to the height of the target background image and the height of the target foreground.
In this embodiment, the second distance from the upper boundary of the target foreground to the upper boundary of the target background image is determined according to the height of the target background image and the height of the target foreground, see 4,I o To a foreground, I b Is a target background image, h o Is the target foreground width, h b For the current background image width, according to h o And h b Calculating a second distance from the upper boundary of the target foreground to the upper boundary of the target background image, wherein the second distance is represented as h top ,h top ∈(0,h b- h o )。
And step 3013, determining a target position of the target foreground in the target background image according to the first distance and the second distance.
In this embodiment, the target position of the target foreground in the target background image is calculated according to the first distance and the second distance, or a third distance from a right boundary of the target foreground to a right boundary of the target background image is determined according to the length of the target background image and the length of the target foreground, a fourth distance from a lower boundary of the target foreground to a lower boundary of the target background image is determined according to the height of the target background image and the height of the target foreground, and the target position of the target foreground in the target background image is determined according to the second distance and the fourth distance.
Optionally, determining a target position of the target foreground in the target background image according to the first distance and the second distance includes:
optionally selecting a first position from the positions with the distance to the left boundary of the background image being less than the first distance; optionally selecting a second position from the positions having a distance to the border on the background image smaller than the second distance; and calculating the target position of the target foreground central point in the target background image according to the first position, the second position and the size of the target foreground.
In the present embodiment, the first distance w left ∈(0,w b- w o ) Second distance h top ∈(0,h b- h o ) Selecting a first position from positions with a distance to the left boundary of the background image being smaller than the first distance, selecting a second position from positions with a distance to the upper boundary of the background image being smaller than the second distance, and calculating the target position of the target foreground central point in the target background image according to the first position, the second position and the size of the target foreground, wherein the size of the target foreground comprises the length of the target foreground and the height of the target foreground, and the target position is represented as (w) 1 +0.5×w o ,h 1 -0.5×h o ) Wherein w is 1 Is a first position, h 1 Is a second position, w o Is the target foreground length, h o Is the target foreground width.
Step 3014, set the target foreground at the target position to get the initial frame image.
In this embodiment, the target foreground is covered at the target position in the target background image to obtain an initial frame image, and the initial frame image can be automatically generated according to the target foreground and the target background image.
Step 302, according to the motion information of the target foreground in the initial frame image, superimposing the multiple frames of target foreground in the target background image to obtain a motion-blurred image, wherein the motion information includes a motion direction and a motion translation matrix.
In a possible implementation manner, superimposing multiple frames of target foregrounds in a target background image according to motion information of the target foregrounds in an initial frame image to obtain a motion-blurred image, including:
step 3021, determining the movement direction of the target foreground of the i-th frame in the target background image according to the movement direction of the target foreground of the i-1 th frame in the target background image and a preset offset angle, wherein i ∈ [1,N-1], and the movement direction of the target foreground of the 0 th frame in the target background image is determined according to the movement direction of the target foreground in the initial frame image.
In this embodiment, a preset offset angle is obtained, a moving direction of a next frame of target foreground in a target background image is determined according to a moving direction of a previous frame of target foreground in the target background image and the preset offset angle, the moving direction of an i-1 th frame of target foreground in the target background image and the preset offset angle are substituted into formula (1), and the moving direction of the i-th frame of target foreground in the target background image is calculated, where i belongs to [1,N-1], the moving direction of the 0 th frame of target foreground in the target background image is the same as the moving direction of the target foreground in an initial frame image, and formula (1) is expressed as:
Figure SMS_1
formula (1)
Wherein the content of the first and second substances,
Figure SMS_2
for the movement direction of the target foreground in the target background image of the ith frame, < >>
Figure SMS_3
In order to preset the offset angle of the optical disk,
Figure SMS_4
wherein N is the total frame number of the multi-frame target foreground, and N belongs to [5, 15 ]],θ i-1 The motion angle of the foreground object in the i-1 frame in the target background image is obtained.
Wherein, the moving direction of the 0 th frame target foreground in the target background image is expressed by formula (2), and formula (2) is as follows:
Figure SMS_5
formula (2)
Wherein the content of the first and second substances,
Figure SMS_6
is the moving direction of the target foreground in the 0 th frame in the target background image, i.e. the moving direction of the target foreground in the initial frame image, theta 0 The motion angle of the foreground object in the 0 th frame in the target background image is shown.
Step 3022, determining a motion vector of the target foreground of the ith frame in the target background image relative to the target foreground of the (i-1) th frame according to the preset inter-frame motion speed and the motion direction of the target foreground of the ith frame in the target background image.
In this embodiment, a preset inter-frame motion speed is obtained, the preset inter-frame motion speed and a motion direction of an i-th frame target foreground in a target background image are substituted into a formula (3), a motion vector of the i-th frame target foreground in the target background image relative to an i-1-th frame target foreground is calculated, and the motion vector formula (3) is expressed as:
Figure SMS_7
formula (3)
Wherein d is i In order to preset the inter-frame motion speed,
Figure SMS_8
for the moving direction of the target foreground in the target background image in the ith frame, d xi For the motion vector of the target foreground of the ith frame in the target background image on the x-axis relative to the target foreground of the (i-1) th frame, d yi And (5) a motion vector of the target foreground of the ith frame in the target background image relative to the target foreground of the (i-1) th frame on the y axis.
Step 3023, determining a candidate translation matrix according to the motion vector, and determining a target motion translation matrix corresponding to the target foreground of the i-th frame according to the candidate translation matrix and a previous motion translation matrix corresponding to the target foreground of the i-th frame, where the motion translation matrix corresponding to the target foreground of the 0-th frame is determined according to the motion direction of the target foreground in the initial frame image.
In this embodiment, the candidate translation matrix is calculated according to the motion vector of the i-th frame target foreground relative to the i-1 th frame target foreground in the target background image, and the formula is as follows:
Figure SMS_9
formula (4)
Where H is the candidate translation matrix, d xi For the motion vector of the target foreground of the ith frame in the target background image on the x-axis relative to the target foreground of the (i-1) th frame, d yi And (5) a motion vector of the target foreground of the ith frame in the target background image relative to the target foreground of the (i-1) th frame on the y axis.
Further, substituting the candidate translation matrix and the previous motion translation matrix corresponding to the i-1 th frame of target foreground into formula (5), and calculating a target motion translation matrix corresponding to the i-th frame of target foreground, where formula (5) is expressed as:
Figure SMS_10
formula (5)
Wherein H i A target motion translation matrix corresponding to the target foreground of the ith frame H i-1 The last motion translation matrix corresponding to the target foreground of the i-1 th frame, d xi For the motion vector of the target foreground of the ith frame in the target background image on the x-axis relative to the target foreground of the (i-1) th frame, d yi And the motion vector of the target foreground of the ith frame in the target background image on the y axis relative to the target foreground of the (i-1) th frame.
Wherein, the motion translation matrix corresponding to the target foreground in the 0 th frame is determined according to the motion direction of the target foreground in the initial frame image, and the motion translation matrix corresponding to the target foreground in the 0 th frame is expressed as:
Figure SMS_11
formula (6)
Wherein H 0 A motion translation matrix corresponding to the target foreground of the 0 th frame, d 0 The inter-frame motion velocity, theta, corresponding to the 0 th frame 0 For foreground object in 0 th frame in background image of objectAnd (4) moving angle.
And step 3024, superimposing each frame of foreground on the target background image according to the moving direction of each frame of target foreground in the target background and the corresponding target motion translation matrix, to obtain a motion-blurred image.
In this embodiment, each frame of foreground is superimposed in the target background image according to the moving direction of each frame of target foreground in the target background image and the corresponding target motion translation matrix, so as to obtain a motion blurred image, and the corresponding superimposed result of the motion blurred image is represented as:
Figure SMS_12
formula (7)
Wherein, I is the superposition result corresponding to the motion blurred image, I b In order to be the target background image,
Figure SMS_13
as foreground transparency weight, I oi Is the ith frame target foreground, wherein I oi =I o (H ix ),H ix For the motion translation matrix corresponding to the target foreground of the ith frame, I o The target foreground of the 0 th frame is obtained, and the meaning of the formula (7) is that the target foreground of the 0 th frame-N-1 frame with certain transparency is superposed on the target background to obtain the motion blurred image.
Referring to fig. 5-7, fig. 5 shows a motion-blurred image obtained by superimposing a plurality of frames of target foreground on a target background image, fig. 6 shows an image directly synthesized by using a plurality of frames of constructed images, and fig. 7 shows an image synthesized by using SuBSENSE. The background image is usually difficult to obtain directly, and the background image extracted by the SuBSENSE method is used as input. In order to enable a background image used for model training to be closer to a background image extracted by SuBSENSE, the idea of an image projection motion blur model is used for reference when a training data set is constructed, the background image extracted by SuBSEBSE is simulated into a result of superposition of a plurality of clear images in the projection motion direction, the motion blur image is obtained, model training is carried out by adopting the motion blur image, and the segmentation precision of a video foreground segmentation model can be improved by reducing the difference between training data and data in actual application.
Step 303, inputting the initial frame image and the motion-blurred image as training data to the image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model.
In one possible implementation manner, inputting an initial frame image and a motion-blurred image as training data to an image segmentation model, and obtaining a predicted segmentation image output by the image segmentation model, the method includes:
step 3031, inputting the initial frame image and the motion blurred image into a ResNet layer of the GNet model, respectively, obtaining a first high-resolution feature image and a first low-resolution feature image corresponding to the initial frame image and a second high-resolution feature image and a second low-resolution feature image corresponding to the motion blurred image, and obtaining a first differential feature image according to the first high-resolution feature image and the second high-resolution feature image.
The image segmentation model is a GNet model, the GNet model comprises a ResNet layer, an Encoder Encoder and a Decoder, an initial frame image and a motion blurred image are respectively input into the ResNet layer of the GNet model, the size of the input initial frame image and the size of the motion blurred image can be 513 multiplied by 3, the output of the ResNet layer comprises two parts, one part is a high-resolution feature image, the other part is a second low-resolution feature image, and specifically, a first high-resolution feature image b corresponding to the output initial frame image is output l And a first low resolution feature image b d And a second high resolution feature image p corresponding to the motion-blurred image l And a second low resolution feature image p d The first low-resolution feature image and the second low-resolution feature image have sizes of 55 × 65 × 2048.
Obtaining a first differential feature image according to the first high-resolution feature image and the second high-resolution feature image, specifically, substituting the first high-resolution feature image and the second high-resolution feature image into formula (8) to calculate the first differential feature image, where formula (8) is expressed as:
Figure SMS_14
formula (8)
Wherein d is l As a first differential feature image, b l For the first high-resolution feature image, p l The first differential feature image has a size of 129 × 129 × 256 for the second high resolution feature image.
Step 3032, inputting the first low-resolution feature image and the second low-resolution feature image into an encoder of the GNet model, obtaining a first multi-scale feature image corresponding to the first low-resolution feature image and a second multi-scale feature image corresponding to the second low-resolution feature image, and obtaining a second difference feature image according to the first multi-scale feature image and the second multi-scale feature image.
In this embodiment, in order to enable the network to capture multi-scale information more effectively and improve the segmentation capability for different-scale targets, the Encoder portion of the GNet introduces an empty space Pyramid Pooling (ASPP), and the ASPP portion adopts a depth separable convolution structure. The ASPP structure is shown in fig. 8, and a given input is sampled by depth separable hole convolution with different expansion rates, and then the sampling results are connected, and the number of channels is adjusted by convolution of 1 × 1 to obtain an output result. The first low-resolution feature image b d And a second low resolution feature image p d Respectively extracting features of the ASPP input to the encoder by using convolution with convolution kernel size of 1 multiplied by 1 and depth separable hole convolution with expansion rates of 6, 12 and 18 to obtain a first multi-scale feature image b corresponding to the first low-resolution feature image di And a second multi-scale feature image p corresponding to the second low-resolution feature image di Obtaining a second difference characteristic image corresponding to the first multi-scale characteristic image and the second multi-scale characteristic image through a difference mode, specifically obtaining the second difference characteristic image by adopting a formula (9), wherein the formula (9) is expressed as:
Figure SMS_15
formula (9)
Wherein d is di As a second differential feature image, b di For the first multi-scale feature image, p di Is the second multi-scale feature image.
Step 3033, inputting the first difference characteristic image and the second difference characteristic image into a GNet model decoder for fusion processing, and outputting a prediction segmentation image.
In the present embodiment, a plurality of second differential feature images are combined
Figure SMS_16
Performing superposition and up-sampling, and comparing the first difference characteristic image d l The difference between the multi-scale characteristics of the initial frame image and the motion blurred image can be fused when the difference is input into the Decoder, the pixel level difference of the two images can be detected, the segmentation of the foreground pixel points can be further realized, and the predicted segmented image can be output.
Optionally, before inputting the initial frame image and the motion-blurred image as training data to the image segmentation model and acquiring a predicted segmentation image output by the image segmentation model, the method further includes:
rotating the target foreground in the initial frame image to obtain a rotated image; carrying out translation processing on a target foreground in the initial frame image to obtain a translation processed image; adjusting the brightness of the initial frame image to obtain an image with adjusted brightness; adjusting the saturation of the initial frame image to obtain an image with adjusted supersaturation; training data is augmented using the rotated image, the translated image, the adjusted brightness image, and the adjusted supersaturation image.
In this embodiment, in an actual scene, there may be conditions such as illumination change, camera motion, leaves blown by wind, and water surface wave light shaking, so that there is a pixel difference between the to-be-detected image and the background image corresponding to the to-be-detected image, except for the foreground. In order to reduce the influence of the environmental factors on the model and improve the robustness of the model, a training image with certain background position difference and noise difference is obtained by rotating a foreground or background image at different angles, translating at different distances, adjusting the brightness and the saturation at different intensities and adding random salt and pepper noise, and training data is amplified by adopting the image subjected to rotation processing, the image subjected to translation processing, the image subjected to brightness adjustment and the image subjected to supersaturation adjustment.
And step 304, training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
In this embodiment, step 304 and step 204 have the same technical features, and the detailed description may refer to step 204, which is not described herein again.
According to the method and the device, the required training data can be automatically generated, the training data do not need to be acquired manually, the motion blurred image obtained through superposition is adopted to train the model, the image segmentation model segmentation precision can be improved, and the image segmentation model with accurate segmentation is obtained. And by processing the initial frame image, the training image with certain background position difference and noise difference can be automatically amplified.
Fig. 9 is a schematic flowchart of an image segmentation method provided in the present application, where the method is applied to an electronic device. Among other things, the electronic device may be a digital computer that represents various forms. Such as cellular phones, smart phones, laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. As shown in fig. 9, the method includes:
step 701, obtaining an image to be detected.
In this embodiment, an image to be detected is obtained, and the image to be detected includes foreground and background images.
Step 702, extracting a background image from the image to be detected.
In this embodiment, a background image is extracted from the image to be detected, and specifically, the background image may be extracted from the image to be detected by using the sussense algorithm.
And 703, respectively inputting the image to be detected and the background image into an image segmentation model to obtain a target segmentation image output by the image segmentation model, wherein the image segmentation model is obtained by training through the scheme.
In this embodiment, the image to be detected and the background image are input to the image segmentation model, respectively, to obtain a target segmentation image output by the image segmentation model, where the image segmentation model is a GNet model.
Extracting a background image in a continuous video frame by using SuBSENSE, inputting the background image and an image to be detected into a trained GNet model as a pair of foreground and background so as to extract a difference characteristic graph of the background image and the image to be detected and separate the foreground, respectively inputting the image to be detected and the background image into a ResNet layer of the trained GNet model, and obtaining a first high-resolution characteristic image b corresponding to the image to be detected l And a first low resolution feature image b d And a second high-resolution characteristic image p corresponding to the background image l And a second low resolution feature image p d And obtaining a first differential feature image d according to the first high-resolution feature image and the second high-resolution feature image l (ii) a The first low-resolution feature image b d And a second low resolution feature image p d Inputting the image into an Encoder Encoder of a trained GNet model to obtain a first multi-scale feature image b corresponding to a first low-resolution feature image di And a second multi-scale feature image p corresponding to the second low-resolution feature image di And based on the first multi-scale feature image b di And a second multi-scale feature image p di Obtaining a second differential feature image
Figure SMS_17
Based on the comparison, a plurality of second differential feature images are->
Figure SMS_18
Performing superposition and up-sampling, and comparing the first difference characteristic image d l The images are input to a decoder Dncoder of a trained GNet model together to be subjected to fusion processing, and a target segmentation image is output.
In this embodiment, the trained image segmentation model can more accurately segment the image, so as to effectively and accurately segment the foreground, thereby obtaining a more accurate image segmentation result.
Fig. 12 is a schematic structural diagram of a training apparatus for an image segmentation model according to the present application, and as shown in fig. 12, the training apparatus 900 for an image segmentation model according to this embodiment includes an image processing unit 901 and a first processing unit 902.
The image processing unit 901 is configured to set the target foreground in the target background image to obtain an initial frame image, and obtain a binarization label of the initial frame image according to the target foreground and the target background image. The image processing unit 901 is further configured to superimpose the multiple frames of target foregrounds in the target background image according to motion information of the target foregrounds in the initial frame image, so as to obtain a motion-blurred image, where the motion information includes a motion direction and a motion translation matrix. A first processing unit 902, configured to input the initial frame image and the motion-blurred image as training data to an image segmentation model, and obtain a predicted segmentation image output by the image segmentation model. The first processing unit 902 is further configured to train the image segmentation model according to the predicted segmented image and the binarization label to obtain a trained image segmentation model.
Optionally, the image processing unit is further configured to determine a first distance from the left boundary of the target foreground to the left boundary of the target background image according to the length of the target background image and the length of the target foreground; determining a second distance from the upper boundary of the target foreground to the upper boundary of the target background image according to the height of the target background image and the height of the target foreground; determining the target position of the target foreground in the target background image according to the first distance and the second distance; and setting the target foreground at the target position to obtain an initial frame image.
Optionally, the image processing unit is further configured to select one first position from the positions having a distance to the left boundary of the background image smaller than the first distance; optionally selecting a second position from the positions having a distance to the border on the background image smaller than the second distance; and calculating the target position of the target foreground central point in the target background image according to the first position, the second position and the size of the target foreground.
Optionally, the image processing unit is further configured to determine a moving direction of the target foreground in the ith frame in the target background image according to a moving direction of the target foreground in the ith-1 frame in the target background image and a preset offset angle, where i e [1,N-1], and the moving direction of the target foreground in the 0 th frame in the target background image is determined according to the moving direction of the target foreground in the initial frame image; determining a motion vector of the target foreground of the ith frame in the target background image relative to the target foreground of the (i-1) th frame according to the preset inter-frame motion speed and the motion direction of the target foreground of the ith frame in the target background image; determining a candidate translation matrix according to the motion vector, and determining a target motion translation matrix corresponding to the target foreground of the ith frame according to the candidate translation matrix and a previous motion translation matrix corresponding to the target foreground of the (i-1) th frame, wherein the motion translation matrix corresponding to the target foreground of the 0 th frame is determined according to the motion direction of the target foreground in the initial frame image; and superposing each frame of target foreground in the target background image according to the motion direction of each frame of target foreground in the target background image and the corresponding target motion translation matrix to obtain a motion blurred image.
Optionally, the first processing unit is further configured to input the initial frame image and the motion-blurred image into a ResNet layer of the GNet model, respectively, obtain a first high-resolution feature image and a first low-resolution feature image corresponding to the initial frame image and a second high-resolution feature image and a second low-resolution feature image corresponding to the motion-blurred image, and obtain a first difference feature image according to the first high-resolution feature image and the second high-resolution feature image; inputting the first low-resolution characteristic image and the second low-resolution characteristic image into an encoder of the GNet model, obtaining a first multi-scale characteristic image corresponding to the first low-resolution characteristic image and a second multi-scale characteristic image corresponding to the second low-resolution characteristic image, and obtaining a second difference characteristic image according to the first multi-scale characteristic image and the second multi-scale characteristic image; and inputting the first difference characteristic image and the second difference characteristic image into a decoder of the GNet model for fusion processing, and outputting a prediction segmentation image.
Optionally, the image processing unit is further configured to obtain a background image data set and a foreground data set, where the background image data set includes a plurality of background images, and the foreground data set includes a plurality of foregrounds; selecting a plurality of foreground from the foreground data set optionally, and taking the selected foreground as a target foreground; optionally, a plurality of background images are selected from the background image data set, and the selected background image is used as the target background image.
Optionally, the image processing unit is further configured to perform rotation processing on the target foreground in the initial frame image to obtain a rotation-processed image; performing translation processing on a target foreground in the initial frame image to obtain a translation processed image; adjusting the brightness of the initial frame image to obtain an image with adjusted brightness; adjusting the saturation of the initial frame image to obtain an image with adjusted supersaturation; training data is augmented using the rotated image, the translated image, the adjusted brightness image, and the adjusted supersaturation image.
Fig. 13 is a schematic structural diagram of an image segmentation apparatus provided in the present application, and as shown in fig. 13, the training apparatus 1000 of an image segmentation model provided in this embodiment includes an obtaining unit 1001 and a second processing unit 1002.
The acquiring unit 1001 is configured to acquire an image to be detected. A second processing unit 1002, configured to extract a background image from the image to be detected. The second processing unit 1002 is further configured to input the image to be detected and the background image into an image segmentation model, respectively, to obtain a target segmentation image output by the image segmentation model, where the image segmentation model is a model obtained by training according to the above method
Fig. 14 is a first block diagram of an electronic device for implementing an image segmentation model training method or an image segmentation method according to an embodiment of the present application, and as shown in fig. 14, the electronic device 1100 includes: a memory 1101, a processor 1102, and a transceiver 1103.
The processor 1102, the memory 1101, and the transceiver 1103 are electrically interconnected;
a transceiver 1103 for transceiving data;
the memory 1101 stores computer-executable instructions;
the processor 1102 executes computer-executable instructions stored by the memory 1101 to cause the processor 1002 to perform a method provided by any of the embodiments described above.
Fig. 15 is a second block diagram of an electronic device, which may be a computer, a digital broadcast terminal, a messaging device, a tablet device, a personal digital assistant, a server cluster, or the like, for implementing the image segmentation model training method or the image segmentation method according to the embodiment of the present application, as shown in fig. 15.
The electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communications component 816 further includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer-readable storage medium is also provided, in which computer-executable instructions are stored, the computer-executable instructions being executed by a processor to perform the method in any one of the above-mentioned embodiments.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program for execution by a processor of the method in any of the above embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (13)

1. A method for training an image segmentation model, the method comprising:
setting a target foreground in a target background image to obtain an initial frame image, and obtaining a binarization label of the initial frame image according to the target foreground and the target background image;
superposing a plurality of frames of target foregrounds in the target background image according to motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information comprises a motion direction and a motion translation matrix;
inputting the initial frame image and the motion blurred image as training data to an image segmentation model, and obtaining a prediction segmentation image output by the image segmentation model;
and training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
2. The method of claim 1, wherein the setting the target foreground in the target background image, resulting in an initial frame image, comprises:
determining a first distance from a left boundary of a target foreground to a left boundary of a target background image according to the length of the target background image and the length of the target foreground;
determining a second distance from the upper boundary of the target foreground to the upper boundary of the target background image according to the height of the target background image and the height of the target foreground;
determining the target position of the target foreground in the target background image according to the first distance and the second distance;
and setting the target foreground at the target position to obtain an initial frame image.
3. The method of claim 2, wherein determining the target position of the target foreground in the target background image based on the first distance and the second distance comprises:
optionally selecting a first position in positions having a distance to the left border of the background image smaller than the first distance;
optionally selecting a second location among locations that are less than the second distance from the border on the background image;
and calculating the target position of the target foreground central point in the target background image according to the first position, the second position and the size of the target foreground.
4. The method of claim 1, wherein the superimposing, according to the motion information of the target foreground in the initial frame image, the multiple frames of target foreground in the target background image to obtain a motion-blurred image includes:
determining the motion direction of the target foreground of the ith frame in the target background image according to the motion direction of the target foreground of the ith-1 frame in the target background image and a preset offset angle, wherein i belongs to [1,N-1], and the motion direction of the target foreground of the 0 th frame in the target background image is determined according to the motion direction of the target foreground in the initial frame image;
determining a motion vector of the target foreground of the ith frame in the target background image relative to the target foreground of the (i-1) th frame according to a preset inter-frame motion speed and the motion direction of the target foreground of the ith frame in the target background image;
determining a candidate translation matrix according to the motion vector, and determining a target motion translation matrix corresponding to the target foreground of the i-th frame according to the candidate translation matrix and a previous motion translation matrix corresponding to the target foreground of the i-1 th frame, wherein the motion translation matrix corresponding to the target foreground of the 0 th frame is determined according to the motion direction of the target foreground in the initial frame image;
and superposing each frame of target foreground in the target background image according to the motion direction of each frame of target foreground in the target background image and the corresponding target motion translation matrix to obtain a motion blurred image.
5. The method according to claim 1, wherein the image segmentation model is a GNet model, and the inputting the initial frame image and the motion-blurred image as training data into the image segmentation model and obtaining a predicted segmentation image output by the image segmentation model comprises:
respectively inputting the initial frame image and the motion blurred image into a ResNet layer of a GNet model, obtaining a first high-resolution characteristic image and a first low-resolution characteristic image corresponding to the initial frame image and a second high-resolution characteristic image and a second low-resolution characteristic image corresponding to the motion blurred image, and obtaining a first differential characteristic image according to the first high-resolution characteristic image and the second high-resolution characteristic image;
inputting the first low-resolution feature image and the second low-resolution feature image into an encoder of a GNet model, obtaining a first multi-scale feature image corresponding to the first low-resolution feature image and a second multi-scale feature image corresponding to the second low-resolution feature image, and obtaining a second difference feature image according to the first multi-scale feature image and the second multi-scale feature image;
and inputting the first difference characteristic image and the second difference characteristic image into a GNet model decoder for fusion processing, and outputting a prediction segmentation image.
6. The method of claim 1, wherein the setting the target foreground in the target background image before obtaining the initial frame image, further comprises:
acquiring a background image data set and a foreground data set, wherein the background image data set comprises a plurality of background images, and the foreground data set comprises a plurality of foregrounds;
optionally selecting a plurality of foregrounds from the foreground data set, and taking the selected foregrounds as target foregrounds, and optionally selecting a plurality of background images from the background image data set, and taking the selected background images as target background images.
7. The method of claim 1, wherein before inputting the initial frame image and the motion-blurred image as training data to an image segmentation model and obtaining a predicted segmentation image output by the image segmentation model, the method comprises:
rotating the target foreground in the initial frame image to obtain a rotated image;
performing translation processing on a target foreground in the initial frame image to obtain a translation processed image;
adjusting the brightness of the initial frame image to obtain an image with adjusted brightness;
adjusting the saturation of the initial frame image to obtain an image with adjusted supersaturation;
the training data is augmented with rotation processed images, translation processed images, adjusted brightness images, and adjusted supersaturation images.
8. A method of image segmentation, the method comprising:
acquiring an image to be detected;
extracting a background image from the image to be detected;
respectively inputting the image to be detected and the background image into an image segmentation model to obtain a target segmentation image output by the image segmentation model, wherein the image segmentation model is a model obtained by training according to the method of any one of claims 1 to 7.
9. An apparatus for training an image segmentation model, the apparatus comprising:
the image processing unit is used for setting a target foreground in a target background image to obtain an initial frame image and obtaining a binary label of the initial frame image according to the target foreground and the target background image;
the image processing unit is further configured to superimpose the multiple frames of target foregrounds in the target background image according to motion information of the target foregrounds in the initial frame image to obtain a motion blurred image, wherein the motion information includes a motion direction and a motion translation matrix;
a first processing unit, configured to input the initial frame image and the motion-blurred image as training data to an image segmentation model, and obtain a predicted segmentation image output by the image segmentation model;
and the first processing unit is further used for training the image segmentation model according to the predicted segmentation image and the binarization label to obtain a trained image segmentation model.
10. An image segmentation apparatus, characterized in that the apparatus comprises:
the acquisition unit is used for acquiring an image to be detected;
the second processing unit is used for extracting a background image from the image to be detected;
the second processing unit is further configured to input the image to be detected and the background image into an image segmentation model respectively, so as to obtain a target segmentation image output by the image segmentation model, where the image segmentation model is a model trained by the method according to any one of claims 1 to 7.
11. An electronic device, comprising: a processor, a memory and a transceiver;
a processor, a memory and transceiver circuitry interconnected;
a transceiver for transceiving data;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory, causing the processor to perform the method of any of claims 1 to 7 or 8.
12. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the method of any one of claims 1 to 7 or 8.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7 or 8.
CN202310078429.1A 2023-02-08 2023-02-08 Training method, device, equipment, storage medium and product of image segmentation model Active CN115861739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310078429.1A CN115861739B (en) 2023-02-08 2023-02-08 Training method, device, equipment, storage medium and product of image segmentation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310078429.1A CN115861739B (en) 2023-02-08 2023-02-08 Training method, device, equipment, storage medium and product of image segmentation model

Publications (2)

Publication Number Publication Date
CN115861739A true CN115861739A (en) 2023-03-28
CN115861739B CN115861739B (en) 2023-07-14

Family

ID=85657720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310078429.1A Active CN115861739B (en) 2023-02-08 2023-02-08 Training method, device, equipment, storage medium and product of image segmentation model

Country Status (1)

Country Link
CN (1) CN115861739B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614983A (en) * 2018-10-26 2019-04-12 阿里巴巴集团控股有限公司 The generation method of training data, apparatus and system
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training
JP2020064364A (en) * 2018-10-15 2020-04-23 オムロン株式会社 Learning device, image generating device, learning method, and learning program
CN112541867A (en) * 2020-12-04 2021-03-23 Oppo(重庆)智能科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114511041A (en) * 2022-04-01 2022-05-17 北京世纪好未来教育科技有限公司 Model training method, image processing method, device, equipment and storage medium
CN114723646A (en) * 2022-02-25 2022-07-08 北京育达东方软件科技有限公司 Image data generation method with label, device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020064364A (en) * 2018-10-15 2020-04-23 オムロン株式会社 Learning device, image generating device, learning method, and learning program
CN109614983A (en) * 2018-10-26 2019-04-12 阿里巴巴集团控股有限公司 The generation method of training data, apparatus and system
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training
CN112541867A (en) * 2020-12-04 2021-03-23 Oppo(重庆)智能科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114723646A (en) * 2022-02-25 2022-07-08 北京育达东方软件科技有限公司 Image data generation method with label, device, storage medium and electronic equipment
CN114511041A (en) * 2022-04-01 2022-05-17 北京世纪好未来教育科技有限公司 Model training method, image processing method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
青晨;禹晶;肖创柏;段娟;: "深度卷积神经网络图像语义分割研究进展", 中国图象图形学报 *

Also Published As

Publication number Publication date
CN115861739B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
JP6392468B2 (en) Region recognition method and apparatus
EP2852933B1 (en) Image-driven view management for annotations
CN106127751B (en) Image detection method, device and system
JP2013522971A (en) Image feature detection based on the application of multiple feature detectors
CN108830219B (en) Target tracking method and device based on man-machine interaction and storage medium
CN109784164B (en) Foreground identification method and device, electronic equipment and storage medium
CN112771572B (en) Object tracking based on user-specified initialization points
US11961278B2 (en) Method and apparatus for detecting occluded image and medium
CN109784327B (en) Boundary box determining method and device, electronic equipment and storage medium
CN112115894B (en) Training method and device of hand key point detection model and electronic equipment
CN110569835A (en) Image identification method and device and electronic equipment
WO2022193507A1 (en) Image processing method and apparatus, device, storage medium, program, and program product
WO2022099988A1 (en) Object tracking method and apparatus, electronic device, and storage medium
CN111754414A (en) Image processing method and device for image processing
CN113450459B (en) Method and device for constructing three-dimensional model of target object
TW202219822A (en) Character detection method, electronic equipment and computer-readable storage medium
WO2023155350A1 (en) Crowd positioning method and apparatus, electronic device, and storage medium
US20230048952A1 (en) Image registration method and electronic device
CN111489284B (en) Image processing method and device for image processing
CN115861739B (en) Training method, device, equipment, storage medium and product of image segmentation model
CN114863392A (en) Lane line detection method, lane line detection device, vehicle, and storage medium
CN115223143A (en) Image processing method, apparatus, device, and medium for automatically driving vehicle
CN115408544A (en) Image database construction method, device, equipment, storage medium and product
CN114445778A (en) Counting method and device, electronic equipment and storage medium
CN115147466A (en) Image registration method and apparatus, image processing method and apparatus, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant