CN111915636B - Method and device for positioning and dividing waste targets - Google Patents

Method and device for positioning and dividing waste targets Download PDF

Info

Publication number
CN111915636B
CN111915636B CN202010637308.2A CN202010637308A CN111915636B CN 111915636 B CN111915636 B CN 111915636B CN 202010637308 A CN202010637308 A CN 202010637308A CN 111915636 B CN111915636 B CN 111915636B
Authority
CN
China
Prior art keywords
segmentation
network
image
training
segmentation network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010637308.2A
Other languages
Chinese (zh)
Other versions
CN111915636A (en
Inventor
汪涛
蔡远征
温正垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minjiang University
Original Assignee
Minjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minjiang University filed Critical Minjiang University
Priority to CN202010637308.2A priority Critical patent/CN111915636B/en
Publication of CN111915636A publication Critical patent/CN111915636A/en
Application granted granted Critical
Publication of CN111915636B publication Critical patent/CN111915636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The inventor provides a technical scheme of multi-level garbage and waste target positioning and segmentation, which solves the problems faced in garbage and waste positioning and segmentation tasks currently by combining a rough segmentation network of a scene level and a fine segmentation network of a target level, and specifically comprises the following steps: acquiring an image meeting a preset standard; and processing the image by using a rough segmentation network and a fine segmentation network successively, and then optimizing the segmentation result by using a CRF model to obtain a semantic segmentation result. The inventor also provides a corresponding device for positioning and dividing the waste targets. The method can construct a multistage network with stronger robustness by combining the relation between the global scene and the local target, and solves the problem of extreme size difference in the garbage waste target positioning and segmentation task.

Description

Method and device for positioning and dividing waste targets
Technical Field
The invention relates to the field of computer software, in particular to a method and a device for positioning and segmenting a waste target.
Background
With the continuous development of urban processes worldwide, waste management has become a central concern for government departments of various countries. According to world banking reports, the urban population will reach 43 million by 2025, with 22 million tons of solid waste being produced annually. The measure of garbage classification not only can change waste into valuable more effectively, but also can reduce air pollution generated by landfill and incineration, so that great promotion of government is achieved. However, due to the strict requirements of garbage classification, certain difficulty exists in popularization.
In recent years, rapid development of computer vision technology has enabled researchers to develop robust image algorithms that assist in performing garbage collection and classification tasks. In particular, the application of the semantic segmentation algorithm to the garbage image can promote the following steps: the effects of related tasks such as content-based image retrieval, target attitude estimation, mechanical arm grabbing and the like are achieved, and automatic classification of garbage and waste is achieved.
However, due to extreme instability of the imaging size of the garbage and waste targets in the image area, the general semantic segmentation model is easy to generate the phenomena of missing detection or false detection in the segmentation task of the garbage and waste targets, and the requirements in the practical application scene cannot be met.
Disclosure of Invention
Therefore, the inventor considers that it is necessary to invent a multi-level garbage disposal target positioning and dividing method. From a bionic point of view, the inventor finds that the human visual system can easily identify various garbage and waste targets with extreme size differences. According to the research, human eyes firstly perceive the whole scene to obtain the target position information of the potential garbage and waste, and then further carry out more detailed detection and identification on the potential area. Therefore, by referring to the working principle of the human visual system, the inventor has invented a multi-level garbage and waste target positioning and dividing method, and the problems faced in garbage and waste positioning and dividing tasks are currently solved by combining a scene-level rough dividing network and a target-level fine dividing network.
To this end, the inventors propose a method for targeting and segmenting waste, comprising the steps of:
acquiring an image meeting preset standards, wherein the preset standards comprise: the image comprises a color image;
processing the image by a first segmentation network to generate a first segmentation result and a potential target area;
performing segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result;
optimizing the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result;
the first segmentation network is a rough segmentation network of a scene level, the second segmentation network is a fine segmentation network of a target level, the first segmentation network, the second segmentation network and the CRF model are all obtained through pre-training, and the training is performed based on a garbage and waste training data set with contour labels.
Further, in the method for positioning and segmenting the waste target, the source of the waste training data set with the outline label is a public database or a non-public database with the outline of the edge marked manually, a training set and a testing set are determined from the training data set, and a semantic segmentation algorithm to be used is determined, wherein the semantic segmentation algorithm comprises FCN, deep Labv3, PSPNet or CCNet.
Further, in the method for positioning and segmenting the waste target, the semantic segmentation algorithm is deep labv3, model fine adjustment is performed on the deep labv3 model through training data set pictures with more than a preset number, and then a first segmentation network is obtained.
Further, in the method for positioning and segmenting the waste target, the step of training to obtain the second segmentation network comprises the following steps:
model prediction is carried out on the training data set based on the first segmentation network, so that a first segmentation result of the potential target is obtained;
generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network;
and performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network.
Further, in the method for positioning and segmenting the waste object, the step of acquiring the image meeting the preset standard specifically includes: the image type is RGBD or RGB; the image comprises information of a color image I, and a semantic label of X is marked for a pixel point with coordinates (I, j) in the color image I ij
The step of processing the image with the first segmentation network to generate a first segmentation result and a potential target region specifically includes:
the first dividing network F c Is characterized byWherein->Pixel information representing the entire image area;
for all (i, j) ∈R 0 Feature C 0 i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0;
based on the probability value information of the pixel points obtained by the steps, the method comprises the following steps ofAnd obtaining a first segmentation result, and generating a potential target region by adopting a connected region analysis algorithm.
Further, in the method for positioning and dividing the waste object, the step is based on the probability value information of the pixel points obtained by the step, and the probability value information of the pixel points is obtained byObtaining the first segmentation result, and generating the potential target region by adopting a connected region analysis algorithm further comprises:
for the first target area in the L potential target areasThe bounding box with which the mark corresponds closely +.>And->Around toExpanding the size of 20-40% to obtain R l As parameters for acquiring the input features of the second split network.
Further, in the method for positioning and segmenting the target of the waste, the step of performing the segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result specifically includes: intercepting R on image I l In-range images as a second segmentation network F f The output characteristics of the second splitting network are:wherein->Representing pixel information in the bounding box corresponding to the first connected region, C l Is H l *W l * C size features;
for all (i, j) ∈R l Feature C l i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l is more than or equal to 1 and less than or equal to L; />And->Representing the mapping of the image coordinates to the corresponding locations of the first target region.
Further, in the method for positioning and segmenting the waste object, the expression of the CRF model is as follows:
E(x,I,D)=Φ c (x; I) +alpha.phi.f (x; I) +ψ (x; I, D), wherein phi c (x; I) represents a first split networkSingle point potential energy generated, phi f (x; I) represents single point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after the classification information of the comprehensive image is included, the classification information includes color, depth or space position relation, and alpha is a weight parameter.
The inventor also provides a waste target positioning and dividing device which comprises an image input unit, a dividing unit and a training unit;
the image input unit is used for acquiring an image meeting preset standards, and the preset standards comprise: the image comprises a color image;
the segmentation unit is used for processing the image through a first segmentation network to generate a first segmentation result and a potential target area;
the segmentation unit is further used for carrying out segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result;
the segmentation unit is also used for optimizing the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result;
the first segmentation network is a rough segmentation network of a scene level, the second segmentation network is a fine segmentation network of a target level, the first segmentation network, the second segmentation network and the CRF model are all obtained through pre-training of a training unit, and the training is performed based on a garbage waste training data set with contour labeling.
Further, in the device for positioning and segmenting the waste object, the source of the waste training data set with the outline label is a public database or a non-public database with the outline of the edge marked manually, the training set and the test set are determined from the training data set, and a semantic segmentation algorithm to be used is determined, wherein the semantic segmentation algorithm comprises FCN, deep Labv3, PSPNet or CCNet.
Further, in the device for positioning and segmenting the waste target, the semantic segmentation algorithm is deep labv3, and the training unit performs model fine adjustment on the deep labv3 model by more than a preset number of training data set pictures, and then a first segmentation network is obtained.
Further, in the apparatus for positioning and dividing a waste object, the training means trains to obtain a second dividing network specifically includes:
model prediction is carried out on the training data set based on the first segmentation network, so that a first segmentation result of the potential target is obtained;
generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network;
and performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network.
Further, in the apparatus for positioning and dividing a waste object, the image acquiring unit "acquiring an image meeting a preset standard" specifically includes: the image type is RGBD or RGB; the image comprises information of a color image I, and a semantic label of X is marked for a pixel point with coordinates (I, j) in the color image I ij
The segmentation unit "processes the image with a first segmentation network to generate a first segmentation result and a potential target region" specifically includes:
the first dividing network F c Is characterized by
For all (i, j) ∈R 0 Feature C 0 i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0;
based on the probability value information of the pixel points obtained by the steps, the method is conductedPassing throughAnd obtaining a first segmentation result, and generating a potential target region by adopting a connected region analysis algorithm.
Further, in the apparatus for positioning and dividing the waste object, the dividing unit performs the steps ofObtaining the first segmentation result, and generating the potential target region by adopting a connected region analysis algorithm further comprises:
for the first target area in the L potential target areasThe bounding box with which the mark corresponds closely +.>And->Expanding the size of 20-40% to the periphery to obtain R l As parameters for acquiring the input features of the second split network.
Further, in the apparatus for positioning and splitting a target of waste, the splitting unit performs a splitting operation on the potential target area by using a second splitting network, and the generating a second splitting result specifically includes: intercepting R on image I l In-range images as a second segmentation network F f The output characteristics of the second splitting network are:wherein->Representing pixels in the bounding box corresponding to the first connected regionInformation, C l Is H l *W l * C size features;
for all (i, j) ∈R l Feature C l i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l E [1, L];/>And->Representing the mapping of the image coordinates to the corresponding locations of the first target region.
Further, in the apparatus for targeting and segmenting waste, the CRF model expression is:
E(x,I,D)=Φ c (x;I)+α·Φ f (x; I) +ψ (x; I, D), where Φ c (x; I) represents a single point potential energy generated by the first split network, Φ f (x; I) represents single point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after the classification information of the comprehensive image is included, the classification information includes color, depth or space position relation, and alpha is a weight parameter.
According to the technical scheme, a multi-level network with stronger robustness is constructed by combining the relation between the global scene and the local target, so that the problem of extreme size difference in the garbage waste target positioning and dividing tasks is solved. The technical scheme of the invention particularly provides a rough segmentation network for perceiving the scene level of a potential target in an image and a fine segmentation network for precisely analyzing the target level of target information in a local image, and also provides a Conditional Random Field (CRF) model suitable for garbage and waste positioning so as to mine potential association information among pixels of the image. In addition, the invention further provides a method for further improving the positioning effect of the model by utilizing the depth information of the image.
Drawings
FIG. 1 is a flow chart of a method for targeting and segmentation of waste according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a waste target positioning and dividing apparatus according to an embodiment of the invention.
Reference numerals illustrate:
1-image input unit
2-splitting unit
3-training unit
Detailed Description
In order to describe the technical content, constructional features, achieved objects and effects of the technical solution in detail, the following description is made in connection with the specific embodiments in conjunction with the accompanying drawings.
Referring to fig. 1, a flowchart of a method for positioning and dividing a waste object according to an embodiment of the invention is shown; the method comprises the following steps:
s0, pre-training a first segmentation network, a second segmentation network and a CRF model which are needed to be used subsequently;
s1, acquiring an image meeting preset standards, wherein the preset standards comprise: the image comprises a color image;
s2, processing the image by using a first segmentation network to generate a first segmentation result and a potential target area;
s3, performing segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result;
and S4, performing optimization processing on the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result.
The individual steps are described in detail below.
In step S0, the training is performed based on a garbage training data set with contour labeling, where the garbage training data set with contour labeling is derived from a public database or a non-public database with artificially labeled edge contours, a training set and a test set are determined from the training data set, and a semantic segmentation algorithm to be used is determined, and in this embodiment, the semantic segmentation algorithm is deep labv3, and in other embodiments, the semantic segmentation algorithm used may further include FCN, PSPNet, or CCNet. The first segmentation network is a rough segmentation network of a scene level, and the second segmentation network is a fine segmentation network of a target level.
The first segmentation network is obtained by training a deep labv3 model in a mode of carrying out model fine tuning (fine-tuning) on a certain number of training data sets, and the number of pictures of the training data sets is larger than or equal to a corresponding preset value for obtaining the training results with certain quality because the number of pictures of the training data sets is related to the number of pictures of the training data sets.
Then, based on a first segmentation network, model prediction is carried out on the training data set to obtain a first segmentation result of the potential target; generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network; and performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network.
In step S1, an "image meeting a predetermined criterion" is input, said predetermined criterion being that the image must contain a color image, and there may be a depth image D. The images are RGBD type images or RGB images, which contain a color image I and optionally a depth image D. Assuming that the width W and the height H of the image I are the same, all pixel coordinates in the image form a set R 0 ={(i,j) i∈{1....H},j∈{1....W} Let semantic label set Δ= {1,2,.. ij And x is ij ∈Δ。
The first segmentation network described in step S2 is a scene-level coarse segmentation network, denoted herein as F c Its main function is to acquire location information of potential targets in the input image I, the network captures the correct targets, and coarse targetsSlightly located. The output characteristics of the network are:wherein->Pixel information representing the entire image area, C 0 Is H.W.C.
For all (i, j) ∈R 0 Feature C 0 i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0. Obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Based on the probability value information of the pixel points obtained by the steps, the method comprises the following steps ofThe first segmentation result (a rough segmentation result) is obtained, and a connected region analysis algorithm ConnectedComponentanalysis is adopted to generate a plurality of potential target regions. Suppose that L target areas are generated and that the first target area +>For each target region, a bounding box is marked which corresponds closely to it>And->Expanding the size of a certain proportion (20% -40%) to the periphery, wherein 30% is adopted in the embodiment, and R is obtained l
Then, in step S3, the above-mentioned bounding box R is truncated on the image I l An image within the range as a new input feature. The second partitioning network proposed here is a target level finely partitioned network, denoted as F f The network is used for carrying out detailed analysis on a specific target area, and the network is used for extracting information such as the outline of the target. The output characteristics of the network are:wherein->Representing pixel information in the bounding box corresponding to the first connected region, C l Is H l *W l * C size.
For all (i, j) ∈R l Feature C l i,j Obtaining the probability value P of the current pixel point about the category through a Softmax function l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l E [1, L];/>And->Representing the mapping of the image coordinates to the corresponding locations of the first target region.
In step S4, a Conditional Random Field (CRF) model for positioning garbage waste according to the present invention is used, where the model can be expressed as:
E(x,I,D)=Φ c (x;I)+α·Φ f (x; I) +ψ (x; I, D), where Φ c (x; I) represents a single point potential energy generated by the first split network, Φ f (x; I) represents single-point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after synthesizing information such as color, depth or spatial position relation of the image, and alpha is a weight parameter.
Wherein for phi c (x; I) can be further expressed as:
for phi f (x; I) can be further expressed as:
and is also provided with
Furthermore, for ψ (x; I, D), we can further express:
and has
Wherein delta (x) ij ≠x uv ) Indicating whenAnd only when x ij ≠x uv The value is 1, otherwise, 0; w (w) (a) 、w (s) 、w (d) Weights that are kernel functions of the corresponding terms; θ α 、θ β 、θ γ 、θ δ 、θ ε Is the variance value of the corresponding term; psi phi type d (x ij ,x uv The method comprises the steps of carrying out a first treatment on the surface of the D) Is an option, i.e. there is a depth image D in the input image when it is present, ψ d (x ij ,x uv The method comprises the steps of carrying out a first treatment on the surface of the D) The introduction of the model can improve the performance effect of the model to a certain extent.
In the model prediction process, a completely decomposable probability distribution function Q (x) is used to approximate the original joint probability distribution P (x) so as to minimize the K-L divergence KL (Q||P):
in the present embodiment, model learning is performed by a block learning method, and the first segmentation network F is trained c With a second dividing network F f In (2) a standard cross entropy loss function (CELoss) is used as the objective function. In learning a Conditional Random Field (CRF) model, a grid search method is employed to optimize model parameters.
The inventor also provides a waste target positioning and dividing device which comprises an image input unit 1, a dividing unit 2 and a training unit 3;
the image input unit 1 is configured to acquire an image that meets preset criteria, where the preset criteria include: the image comprises a color image;
the segmentation unit 2 is configured to process the image with a first segmentation network to generate a first segmentation result and a potential target region;
the segmentation unit 2 is further configured to perform a segmentation operation on the potential target area by using a second segmentation network, so as to generate a second segmentation result;
the segmentation unit 2 is further used for optimizing the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result;
the first segmentation network is a rough segmentation network of a scene level, the second segmentation network is a fine segmentation network of a target level, the first segmentation network, the second segmentation network and the CRF model are all obtained through the pre-training of the training unit 3, and the training is performed based on a garbage waste training data set with contour labeling.
The training performed by the training unit 3 is based on a garbage training data set with contour labeling, wherein the garbage training data set with contour labeling is derived from a public database or a non-public database with artificially labeled edge contour, a training set and a testing set are determined from the training data set, and a semantic segmentation algorithm to be used is determined, and in this embodiment, the semantic segmentation algorithm is deep labv3, and in other embodiments, the semantic segmentation algorithm to be used may further include FCN, PSPNet or CCNet. The first segmentation network is a rough segmentation network of a scene level, and the second segmentation network is a fine segmentation network of a target level.
The training unit 3 performs training to obtain the first segmentation network by performing model fine tuning (fine-tuning) on the deep labv3 model by using a certain number of training data sets, and since the quality of the training results is related to the number of pictures of the training data sets, in order to obtain a training result with a certain quality, the number of pictures of the training data sets should be ensured to be greater than or equal to a corresponding preset value.
Then, the segmentation unit 2 carries out model prediction on the training data set based on the first segmentation network to obtain a first segmentation result of the potential target; generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network; and performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network.
The image input unit 1 has a function of inputting an "image meeting a preset standard", wherein the preset standard is mainly an RGBD type image or an RGB image, andit is necessary to include a color image and there may be a depth image D that, when present, will provide more information to the subsequent process that is more abundant and beneficial to the result. Assuming that the width W and the height H of the image I are the same, all pixel coordinates in the image form a set R 0 ={(i,j) i∈{1....H},j∈{1....W} And the semantic label set delta= {1,2,. & gt, C }, wherein the semantic label corresponding to the pixel point of any position (I, j) in the image I is x ij And x is ij ∈Δ。
The first segmentation network employed by the segmentation unit 2 is a scene-level coarse segmentation network, denoted here as F c Its main function is to obtain location information of potential targets in the input image I, the network emphasizes capturing the correct targets, and the rough location of the targets. The output characteristics of the network are:wherein->Pixel information representing the entire image area, C 0 Is H.W.C.
For all (i, j) ∈R 0 Feature C 0 i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0.
Based on the obtained probability value information of all the pixel points, byThe first segmentation result (a rough segmentation result) is obtained, and a connected region analysis algorithm ConnectedComponentanalysis is adopted to generate a plurality of potential target regions. Assume that L target areas are generated and the first target areaDomain->For each target region, a bounding box is marked which corresponds closely to it>And->Expanding the size of a certain proportion (20% -40%) to the periphery, wherein 30% is adopted in the embodiment, and R is obtained l
The segmentation unit 2 will then cut the above-mentioned bounding box R on the image I l An image within the range as a new input feature. The second partitioning network used by the partitioning unit 2 is a target-level finely partitioned network, denoted as F f The network is used for carrying out detailed analysis on a specific target area, and the network is used for extracting information such as the outline of the target. The output characteristics of the network are:wherein->Representing pixel information in the bounding box corresponding to the first connected region, C l Is H l *W l * C size.
For all (i, j) ∈R l Feature C l i,j Obtaining the probability value P of the current pixel point about the category through a Softmax function l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l is more than or equal to 1 and less than or equal to L; />And->Representing the mapping of the image coordinates to the corresponding locations of the first target region.
The segmentation unit 2 then uses a Conditional Random Field (CRF) model for garbage disposal localization proposed by the present invention, which can be expressed as:
E(x,I,D)=Φ c (x;I)+α·Φ f (x; I) +ψ (x; I, D), where Φ c (x; I) represents a single point potential energy generated by the first split network, Φ f (x; I) represents single-point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after synthesizing information such as color, depth or spatial position relation of the image, and alpha is a weight parameter.
Wherein for phi c (x; I) can be further expressed as:
for phi f (x; I) can be further expressed as:
and is also provided with
Furthermore, for ψ (x; I, D), we can further express:
and has
Wherein delta (x) ij ≠x uv ) Indicating if and only if x ij ≠x uv The value is 1, otherwise, 0; w (w) (a) 、w (s) 、w (d) Weights that are kernel functions of the corresponding terms; θ α 、θ β 、θ γ 、θ δ 、θ ε Is the variance value of the corresponding term; psi phi type d (x ij ,x uv The method comprises the steps of carrying out a first treatment on the surface of the D) Is an option, i.e. there is a depth image D in the input image when it is present, ψ d (x ij ,x uv The method comprises the steps of carrying out a first treatment on the surface of the D) The introduction of the model can improve the performance effect of the model to a certain extent.
In the model prediction process, a completely decomposable probability distribution function Q (x) is used to approximate the original joint probability distribution P (x) so as to minimize the K-L divergence KL (Q||P):
in the present embodiment, model learning is performed by a block learning method, and the first segmentation network F is trained c With a second dividing network F f In (2) a standard cross entropy loss function (CELoss) is used as the objective function. In learning a Conditional Random Field (CRF) model, a grid search method is employed to optimize model parameters.
According to the technical scheme, a multi-level network with stronger robustness is constructed by combining the relation between the global scene and the local target, so that the problem of extreme size difference in the garbage waste target positioning and dividing tasks is solved. The technical scheme of the invention particularly provides a rough segmentation network for perceiving the scene level of a potential target in an image and a fine segmentation network for precisely analyzing the target level of target information in a local image, and also provides a Conditional Random Field (CRF) model suitable for garbage and waste positioning so as to mine potential association information among pixels of the image. In addition, the invention further provides a method for further improving the positioning effect of the model by utilizing the depth information of the image.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the statement "comprising … …" or "comprising … …" does not exclude the presence of additional elements in a process, method, article or terminal device comprising the element. Further, herein, "greater than," "less than," "exceeding," and the like are understood to not include the present number; "above", "below", "within" and the like are understood to include this number.
It will be appreciated by those skilled in the art that the various embodiments described above may be provided as methods, apparatus, or computer program products. These embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. All or part of the steps in the methods according to the above embodiments may be implemented by a program for instructing related hardware, and the program may be stored in a storage medium readable by a computer device, for performing all or part of the steps in the methods according to the above embodiments. The computer device includes, but is not limited to: personal computers, servers, general purpose computers, special purpose computers, network devices, embedded devices, programmable devices, intelligent mobile terminals, intelligent home devices, wearable intelligent devices, vehicle-mounted intelligent devices and the like; the storage medium includes, but is not limited to: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, usb disk, removable hard disk, memory card, memory stick, web server storage, web cloud storage, etc.
The embodiments described above are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a computer device to produce a machine, such that the instructions, which execute via the processor of the computer device, create means for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer device-readable memory that can direct a computer device to function in a particular manner, such that the instructions stored in the computer device-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer apparatus to cause a series of operational steps to be performed on the computer apparatus to produce a computer implemented process such that the instructions which execute on the computer apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the embodiments have been described above, other variations and modifications will occur to those skilled in the art once the basic inventive concepts are known, and it is therefore intended that the foregoing description and drawings illustrate only embodiments of the invention and not limit the scope of the invention, and it is therefore intended that the invention not be limited to the specific embodiments described, but that the invention may be practiced with their equivalent structures or with their equivalent processes or with their use directly or indirectly in other related fields.

Claims (8)

1. A method for targeting and segmenting waste, comprising the steps of:
acquiring an image meeting preset standards, wherein the preset standards comprise: the image comprises a color image; the step of acquiring the image meeting the preset standard specifically comprises the following steps: the image type is RGBD or RGB; the image comprises information of a color image I, and a semantic label of X is marked for a pixel point with coordinates (I, j) in the color image I ij
Processing the image by a first segmentation network to generate a first segmentation result and a potential target area; the method specifically comprises the following steps:
the first dividing network F c Is characterized byWherein->Pixel information representing the entire image area;
for all (i, j) ∈R 0 Feature C 0 i,j After scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0;
based on the probability value information of the pixel points obtained by the steps, the method comprises the following steps ofObtaining a first segmentation result by adopting connected region analysisThe algorithm generates a potential target area; and +/for the first target region of the L potential target regions generated>The bounding box with which the mark corresponds closely +.>And boundary boxExpanding the size of 20-40% to the periphery to obtain R l As parameters for acquiring input features of the second split network;
performing segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result; wherein training to obtain the second split network comprises: model prediction is carried out on the training data set based on the first segmentation network, so that a first segmentation result of the potential target is obtained; generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network; performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network; the step of performing a segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result specifically includes: intercepting R on image I l In-range images as a second segmentation network F f The output characteristics of the second splitting network are:wherein->Representing pixel information in the bounding box corresponding to the first connected region, C l Is H l *W l * C size features;
for the followingAll (i, j) ε R l Features are characterized byAfter scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l is more than or equal to 1 and less than or equal to L; />And->Representing mapping the image coordinates to the corresponding locations of the first target region;
optimizing the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result;
the first segmentation network is a rough segmentation network of a scene level, the second segmentation network is a fine segmentation network of a target level, the first segmentation network, the second segmentation network and the CRF model are all obtained through pre-training, and the training is performed based on a garbage and waste training data set with contour labels.
2. The method of claim 1, wherein the garbage training dataset source with contour labeling is a public database or a non-public database with artificially labeled edge contours, and wherein the training set and test set are determined from the training dataset and a semantic segmentation algorithm to be used is determined, the semantic segmentation algorithm comprising FCN, deep labv3, PSPNet, or CCNet.
3. The method for targeting and segmentation of waste according to claim 2, wherein the semantic segmentation algorithm is deep labv3, and the deep labv3 model is subjected to model fine tuning with more than a preset number of training data set pictures, and then a first segmentation network is obtained.
4. The method of targeting and segmentation of waste as set forth in claim 1, wherein the CRF model expression is:
E(x,I,D)=Φ c (x;I)+α·Φ f (x; I) +ψ (x; I, D), where Φ c (x; I) represents a single point potential energy generated by the first split network, Φ f (x; I) represents single point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after the classification information of the comprehensive image is included, the classification information includes color, depth or space position relation, and alpha is a weight parameter.
5. The device for positioning and dividing the waste targets is characterized by comprising an image input unit, a dividing unit and a training unit;
the image input unit is used for acquiring an image meeting preset standards, and the preset standards comprise: the image comprises a color image; the method specifically comprises the following steps: the image type is RGBD or RGB; the image comprises information of a color image I, and a semantic label of X is marked for a pixel point with coordinates (I, j) in the color image I ij
The segmentation unit is used for processing the image through a first segmentation network to generate a first segmentation result and a potential target area; the method specifically comprises the following steps:
the first dividing network F c Is characterized by
For all (i, j) ∈R 0 Features are characterized byAfter scaling by the Softmax function, the current pixel point is obtainedProbability value P of belonging category c i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0;
based on the probability value information of the pixel points obtained by the steps, the method comprises the following steps ofObtaining a first segmentation result, and generating a potential target area by adopting a connected area analysis algorithm; and +/for the first target region of the L potential target regions generated>The bounding box with which the mark corresponds closely +.>And boundary boxExpanding the size of 20-40% to the periphery to obtain R l As parameters for acquiring input features of the second split network;
the segmentation unit is further used for carrying out segmentation operation on the potential target area by using a second segmentation network to generate a second segmentation result; the method specifically comprises the following steps: intercepting R on image I l In-range images as a second segmentation network F f The output characteristics of the second splitting network are:wherein->Representing pixels in the bounding box corresponding to the first connected regionInformation, C l Is H l *W l * C size features;
for all (i, j) ∈R l Features are characterized byAfter scaling by the Softmax function, obtaining the probability value P of the current pixel point about the category to which the current pixel point belongs l i,j
Wherein delta (x) ij =k') means if and only if x ij The value of =k' is 1, otherwise 0; l is more than or equal to 1 and less than or equal to L; />And->Representing mapping the image coordinates to the corresponding locations of the first target region;
the segmentation unit is also used for optimizing the first segmentation result and the second segmentation result by using a CRF model to obtain a semantic segmentation result;
the first segmentation network is a rough segmentation network of a scene level, the second segmentation network is a fine segmentation network of a target level, the first segmentation network, the second segmentation network and the CRF model are all obtained through the pre-training of a training unit, and the training is performed based on a garbage waste training data set with contour labeling;
the training unit trains to obtain the second segmentation network specifically comprises the following modes: model prediction is carried out on the training data set based on the first segmentation network, so that a first segmentation result of the potential target is obtained; generating and intercepting an image area of a specific target by using a connected area analysis algorithm, and acquiring a training set of a second segmentation network; and performing model fine adjustment on the deep Labv3 model by using more than a preset number of training set pictures for acquiring the second segmentation network, and then obtaining the second segmentation network.
6. The apparatus of claim 5, wherein the garbage training dataset source with contour labeling is a public database or a non-public database with artificially labeled edge contours, and wherein the training set and test set are determined from the training dataset and a semantic segmentation algorithm to be used is determined, the semantic segmentation algorithm comprising FCN, deep labv3, PSPNet, or CCNet.
7. The apparatus for targeting and segmentation of waste as set forth in claim 6, wherein the semantic segmentation algorithm is deep labv3, and the training unit performs model fine-tuning on the deep labv3 model with more than a predetermined number of training dataset pictures, and then obtains the first segmentation network.
8. The apparatus for targeting and segmentation of waste as set forth in claim 5, wherein the CRF model expression is:
E(x,I,D)=Φ c (x;I)+α·Φ f (x; I) +ψ (x; I, D), where Φ c (x; I) represents a single point potential energy generated by the first split network, Φ f (x; I) represents single point potential energy generated by the second segmentation network, ψ (x; I, D) represents point potential energy generated after the classification information of the comprehensive image is included, the classification information includes color, depth or space position relation, and alpha is a weight parameter.
CN202010637308.2A 2020-07-03 2020-07-03 Method and device for positioning and dividing waste targets Active CN111915636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010637308.2A CN111915636B (en) 2020-07-03 2020-07-03 Method and device for positioning and dividing waste targets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010637308.2A CN111915636B (en) 2020-07-03 2020-07-03 Method and device for positioning and dividing waste targets

Publications (2)

Publication Number Publication Date
CN111915636A CN111915636A (en) 2020-11-10
CN111915636B true CN111915636B (en) 2023-10-24

Family

ID=73227509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010637308.2A Active CN111915636B (en) 2020-07-03 2020-07-03 Method and device for positioning and dividing waste targets

Country Status (1)

Country Link
CN (1) CN111915636B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250874A (en) * 2016-08-16 2016-12-21 东方网力科技股份有限公司 A kind of dress ornament and the recognition methods of carry-on articles and device
CN107025457A (en) * 2017-03-29 2017-08-08 腾讯科技(深圳)有限公司 A kind of image processing method and device
CN107527351A (en) * 2017-08-31 2017-12-29 华南农业大学 A kind of fusion FCN and Threshold segmentation milking sow image partition method
CN108876796A (en) * 2018-06-08 2018-11-23 长安大学 A kind of lane segmentation system and method based on full convolutional neural networks and condition random field
CN109145713A (en) * 2018-07-02 2019-01-04 南京师范大学 A kind of Small object semantic segmentation method of combining target detection
CN109255790A (en) * 2018-07-27 2019-01-22 北京工业大学 A kind of automatic image marking method of Weakly supervised semantic segmentation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250874A (en) * 2016-08-16 2016-12-21 东方网力科技股份有限公司 A kind of dress ornament and the recognition methods of carry-on articles and device
CN107025457A (en) * 2017-03-29 2017-08-08 腾讯科技(深圳)有限公司 A kind of image processing method and device
CN107527351A (en) * 2017-08-31 2017-12-29 华南农业大学 A kind of fusion FCN and Threshold segmentation milking sow image partition method
CN108876796A (en) * 2018-06-08 2018-11-23 长安大学 A kind of lane segmentation system and method based on full convolutional neural networks and condition random field
CN109145713A (en) * 2018-07-02 2019-01-04 南京师范大学 A kind of Small object semantic segmentation method of combining target detection
CN109255790A (en) * 2018-07-27 2019-01-22 北京工业大学 A kind of automatic image marking method of Weakly supervised semantic segmentation

Also Published As

Publication number Publication date
CN111915636A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN107833213B (en) Weak supervision object detection method based on false-true value self-adaptive method
CN103824053B (en) The sex mask method and face gender detection method of a kind of facial image
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN107730553B (en) Weak supervision object detection method based on false-true value search method
Quang et al. An efficient framework for pixel-wise building segmentation from aerial images
Dubey et al. Interactive Biogeography Particle Swarm Optimization for Content Based Image Retrieval
JP4926266B2 (en) Learning data creation device, learning data creation method and program
CN117152604A (en) Building contour extraction method and device, electronic equipment and storage medium
Bao et al. Unpaved road detection based on spatial fuzzy clustering algorithm
US8611695B1 (en) Large scale patch search
Mondal et al. Improved skin disease classification using generative adversarial network
Ju et al. A novel fully convolutional network based on marker-controlled watershed segmentation algorithm for industrial soot robot target segmentation
Ansari et al. A novel approach for scene text extraction from synthesized hazy natural images
Xue et al. Automatic identification of butterfly species based on gray-level co-occurrence matrix features of image block
CN111915636B (en) Method and device for positioning and dividing waste targets
CN113282781B (en) Image retrieval method and device
CN110851633B (en) Fine-grained image retrieval method capable of realizing simultaneous positioning and Hash
CN114445691A (en) Model training method and device, electronic equipment and storage medium
Spoorthy et al. Performance analysis of bird counting techniques using digital photograph
Sun et al. A Click-based Interactive Segmentation Network for Point Clouds
CN116468960B (en) Video image analysis and retrieval method and system
Pemula et al. Generation of random fields for image segmentation using manifold learning technique
Sickert et al. Semantic segmentation of outdoor areas using 3d moment invariants and contextual cues
Chen et al. SORB: improve ORB feature matching by semantic segmentation
Mishra et al. Identifying universal safety signs using computer vision for an assistive feedback mobile application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant