CN113705371B - Water visual scene segmentation method and device - Google Patents

Water visual scene segmentation method and device Download PDF

Info

Publication number
CN113705371B
CN113705371B CN202110914168.3A CN202110914168A CN113705371B CN 113705371 B CN113705371 B CN 113705371B CN 202110914168 A CN202110914168 A CN 202110914168A CN 113705371 B CN113705371 B CN 113705371B
Authority
CN
China
Prior art keywords
live
network
semantic
image
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110914168.3A
Other languages
Chinese (zh)
Other versions
CN113705371A (en
Inventor
肖长诗
陈芊芊
陈华龙
文元桥
张帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110914168.3A priority Critical patent/CN113705371B/en
Publication of CN113705371A publication Critical patent/CN113705371A/en
Application granted granted Critical
Publication of CN113705371B publication Critical patent/CN113705371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a water visual scene segmentation method, which comprises the following steps: collecting a live-action image of a scene on water, and carrying out semantic segmentation on the live-action image by adopting a pre-training semantic segmentation network to generate a semantic label of each pixel in the live-action image; dividing the live-action image by adopting a feature clustering algorithm to obtain a plurality of super-pixel areas; counting the proportion of pixels corresponding to various semantic tags in each super-pixel region, taking the semantic tag of the pixel with the largest proportion as the semantic tag of the corresponding super-pixel region, and calculating the confidence weight of the semantic tag of the corresponding super-pixel region according to the proportion; establishing a live-action training sample set according to the live-action image marked with the semantic tag and the confidence weight; training the deep convolutional neural network through a live-action training sample set to obtain a semantic segmentation network; inputting the image to be identified into a semantic segmentation network to obtain a semantic segmentation result. The method and the device can automatically generate the semantic tags of the training samples of the semantic segmentation network.

Description

Water visual scene segmentation method and device
Technical Field
The application relates to the technical field of scene understanding on water, in particular to a method and a device for segmenting a visual scene on water and a computer storage medium.
Background
The traditional image semantic segmentation method mainly comprises a pixel level threshold method, a segmentation method based on pixel clustering and a segmentation method based on graph theory division. The method mainly relies on low-dimensional visual features of images to divide, adopts visual features based on colors, textures, edges and the like, uses some feature extraction algorithms to extract visual information such as the edge features, textures and the like of objects in the images, and then divides areas and objects in the images according to the low-level visual features, such as common image features including directional gradient histogram features, SIFT features, SURF features, local binary features (LBP), gabor features and the like.
With the development of neural network technology, semantic segmentation networks are also applied to image semantic segmentation. When training the semantic segmentation network, we can train U-Net offline by using public image data sets such as ImageNet to obtain the image semantic segmentation network, but because the training set is not specific to the water navigation scene, if the semantic segmentation network is directly applied to the water semantic segmentation, the error is larger, so we need to further retrain the network to adapt the network structure and weight to the new application scene. The retraining process needs to be performed on the labeled live-action training data set, and the problem that the efficiency is low and mistakes are easy to occur when the live-action training data set is generated by manual labeling.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus and a computer storage medium for segmentation of a water visual scene, which are used for solving the problems of difficulty in labeling real-scene training data of a semantic segmentation network and easiness in error.
The application provides a water visual scene segmentation method, which comprises the following steps:
collecting a live-action image of a scene on water, and carrying out semantic segmentation on the live-action image by adopting a pre-training semantic segmentation network to generate a semantic label of each pixel in the live-action image;
dividing the live-action image by adopting a feature clustering algorithm to obtain a plurality of super-pixel areas;
counting the proportion of pixels corresponding to various semantic tags in each super-pixel region, taking the semantic tag of the pixel with the largest proportion as the semantic tag of the corresponding super-pixel region, and calculating the confidence weight of the semantic tag of the corresponding super-pixel region according to the proportion;
establishing a live-action training sample set according to the live-action image marked with the semantic tag and the confidence weight;
training the deep convolutional neural network through the live-action training sample set to obtain a semantic segmentation network;
inputting the image to be identified into the semantic segmentation network to obtain a semantic segmentation result.
Further, the feature clustering algorithm is adopted to segment the live-action image, so that a plurality of super-pixel areas are obtained, specifically:
distinguishing different scene areas by using life cycle semantic feature points among the live-action image sequences, modeling feature time statistics of the feature points by using Gaussian distribution of different parameters to obtain a double Gaussian model, and taking the double Gaussian model as a likelihood function of the extracted feature points;
calculating likelihood functions of other pixel points except the feature points in the live-action image by adopting a clustering algorithm;
under a Bayesian framework, taking a loss value of convolutional neural network segmentation as a priori probability, and calculating classification probabilities of all pixel points in an image by combining the likelihood functions:
P r (X i,j =o|Y)∝P(Y|X i,j =o)×P(X i,j =o);
wherein P is r (X i,j =o|y) is the classification probability, P (y|x i,j O) is a priori probability, P (X i,j =o) is a likelihood function;
and the classification probability is in direct proportion to the product of the prior probability and the likelihood function, and the semantic segmentation of the live-action image is completed according to the classification probability, so that a plurality of super-pixel areas are obtained.
Further, a clustering algorithm is adopted to calculate likelihood functions of other pixel points except the feature points in the live-action image, and the likelihood functions are specifically as follows:
calculating the distance and gray level difference between other pixel points and the feature point based on a clustering algorithm model by taking the extracted feature point as a center, and assuming that the clustering algorithm model is as follows:
P(Y i |X i,j =o)=K·exp(-(ΔI i,j )·(Δd i,j ));
wherein P (Y) i |X i,j =o) is a likelihood function of other pixels than the feature point, X i,j E { o, w }, o represents that the pixel belongs to an obstacle, w represents that the pixel belongs to the water surface, Y i Represents the observed value, K is the scaling factor, ΔI i,j Represents the distance between the pixel point and the feature point, Δd i,j Representing the gray scale difference between the pixel point and the feature point.
Further, calculating the confidence weight of the semantic label of the corresponding superpixel region according to the proportion, wherein the confidence weight is specifically as follows:
acquiring the proportion as a first weight factor;
acquiring the feature quantity of the life cycle of the super pixel area as a second weight factor;
acquiring the coverage proportion of radar echo signals in the super-pixel area as a third weight factor;
and normalizing the first weight factor, the second weight factor and the third weight factor to obtain three corresponding probabilities, and taking the difference value of the maximum probability and the second maximum probability as the confidence weight.
Further, a live-action training sample set is established according to the live-action image marked with the semantic tag and the confidence weight, and specifically comprises the following steps:
constructing a generated countermeasure network, and training the generated countermeasure network by utilizing the live-action image;
and automatically generating training samples by using the trained generating countermeasure network, and constructing the live-action training sample set.
Further, constructing a generated countermeasure network, and training the generated countermeasure network by using the live-action image specifically comprises the following steps:
constructing a generating network by adopting a structure without interlayer connection U-Net;
constructing a discrimination network by adopting a triple network structure;
inputting the live-action image as an input image into the generation network to obtain a generated image;
the Triplet network comprises three feature extraction networks, wherein an input image, a generated image and a reference image are respectively input into the three feature extraction networks and transformed into the same deep feature space, and the distance of a feature vector is used as a loss function to calculate a loss value;
training the generation of the antagonism network by back-propagating the loss values.
Further, the loss function is:
G * =arg min G max D (L CGAN )+αL content +βL environment
wherein G is * Representing loss value, alpha and beta are super parameters, L CGAN Representing generation of a loss function against the network, L content To input constraint items of a scene, L environment Constraint item, max, related to migration characteristics of reference image D Represents the maximum value, min G Representing taking the minimum value.
Further, before training the generating countermeasure network by using the live-action image, the method further includes:
and pre-training the identification network for generating the countermeasure network by using the manually marked sample image set.
The application also provides a water visual scene segmentation device, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the water visual scene segmentation method is realized when the computer program is executed by the processor.
The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the above-mentioned method of water visual scene segmentation.
The beneficial effects are that: the application firstly carries out parallel segmentation on the input image by using two segmentation methods, the generated areas of the two segmentation methods cannot be completely overlapped, and particularly the difference between the junction parts of the different areas is larger. The semantic tag of the superpixel is then determined using the distribution characteristics of the semantics of the pixels contained within the superpixel. The super-pixel region segmentation map weighted with semantic tags and confidence will be used as new training data for training the semantic segmentation network online. The application automatically generates the training data of the segmentation network, and has high efficiency and low error rate.
Drawings
Fig. 1 is a flowchart of a method of a first embodiment of a method for segmenting a water visual scene.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
Example 1
As shown in fig. 1, embodiment 1 of the present application provides a method for segmenting a water visual scene, which is characterized by comprising the following steps:
s1, acquiring a live-action image of a scene on water, and performing semantic segmentation on the live-action image by adopting a pre-training semantic segmentation network to generate a semantic label of each pixel in the live-action image;
s2, segmenting the live-action image by adopting a feature clustering algorithm to obtain a plurality of super-pixel areas;
s3, counting the proportion of pixels corresponding to various semantic labels in each super-pixel region, taking the semantic label of the pixel with the largest proportion as the semantic label of the corresponding super-pixel region, and calculating the confidence weight of the semantic label of the corresponding super-pixel region according to the proportion;
s4, building a live-action training sample set according to the live-action image marked with the semantic tag and the confidence weight;
s5, training the deep convolutional neural network through the live-action training sample set to obtain a semantic segmentation network;
s6, inputting the image to be identified into the semantic segmentation network to obtain a semantic segmentation result.
In order to automatically generate the meaning label of the live-action image, the present embodiment first performs parallel segmentation on the current input image by two segmentation methods. The first method is as follows: the pre-trained semantic segmentation network is used for segmenting each pixel in the image to generate a semantic label, wherein the semantic label can be a water surface, a sky, a shoreline and the like. The second method is as follows: and (3) adopting characteristic cluster segmentation, such as scale self-adaptive clustering or graph segmentation, to generate a super-pixel region without semantic information, and further optimizing a cluster segmentation result by using a random Markov random field method according to requirements. The two segmentation methods can not completely coincide with each other, and particularly the difference between the junctions of the different regions is large, and the semantic label of the super pixel is determined by utilizing the semantic distribution characteristics of the pixels contained in the super pixel. For example, if a majority of the pixels in a superpixel region are "water" labels, then the superpixel region will also be labeled "water", with the confidence of this label being proportional to the proportion of "water" label pixels in the superpixel region. The super-pixel region segmentation map weighted with semantic tags and confidence will be used as new training data for training the semantic segmentation network online. The effect of the confidence weight is reflected in the loss function of the network training, and the effect of the superpixel with higher confidence on the loss function value is larger.
In the above-described live-action training data generation scheme, the influence data quality is determined by two factors: the first factor is the quality of feature cluster segmentation; the second factor is the calculation that generates the tag confidence weights. The present embodiment proposes an improvement for these two influencing factors, and will be described in detail below.
Preferably, the feature clustering algorithm is adopted to segment the live-action image, so as to obtain a plurality of super-pixel areas, which specifically are:
distinguishing different scene areas by using life cycle semantic feature points among the live-action image sequences, modeling feature time statistics of the feature points by using Gaussian distribution of different parameters to obtain a double Gaussian model, and taking the double Gaussian model as a likelihood function of the extracted feature points;
calculating likelihood functions of other pixel points except the feature points in the live-action image by adopting a clustering algorithm;
under a Bayesian framework, taking a loss value of convolutional neural network segmentation as a priori probability, and calculating classification probabilities of all pixel points in an image by combining the likelihood functions:
P r (X i,j =o|Y)∝P(Y|X i,j =o)×P(X i,j =o);
wherein P is r (X i,j =o|y) is the classification probability, P (y|x i,j O) is a priori probability, P (X i,j =o) is a likelihood function;
and the classification probability is in direct proportion to the product of the prior probability and the likelihood function, and the semantic segmentation of the live-action image is completed according to the classification probability, so that a plurality of super-pixel areas are obtained.
The double gaussian model is:
wherein t is the observed characteristic point characteristic time, mu is the average value of the life cycle model, sigma 2 Is the standard deviation of life cycle model, f μ,σ (t) is a likelihood function of the extracted feature points.
And combining the double Gaussian model and the clustering algorithm model to calculate likelihood function distribution of all pixel points in the image.
Preferably, a clustering algorithm is adopted to calculate likelihood functions of other pixel points except the feature points in the live-action image, and the likelihood functions are specifically as follows:
and calculating the distance and gray level difference between other pixel points and the feature points based on a clustering algorithm model by taking the extracted feature points as the center, wherein the clustering algorithm model is as follows:
P(Y i |X i,j =o)=K·exp(-(ΔI i,j )·(Δd i,j ));
wherein P (Y) i |X i,j =o) is a likelihood function of other pixels than the feature point, X i,j E { o, w }, o represents that the pixel belongs to an obstacle, w represents that the pixel belongs to the water surface, Y i Represents the observed value, K is the scaling factor, ΔI i,j Represents the distance between the pixel point and the feature point, Δd i,j Representing the gray scale difference between the pixel point and the feature point.
The embodiment also improves the initialization method of the feature cluster segmentation algorithm. In the region growing image segmentation algorithm, the number and position distribution of the region growing seeds are parameters preset by relying on priori knowledge, and the segmentation results are greatly different due to different initial seed numbers and distribution parameters. In the embodiment, by utilizing the characteristic of position distribution aggregation in the current input image based on the characteristics of different life cycles, the proper number of seeds and the proper positions of the seeds in the region are selected, and the dependence of the segmentation result on prior parameters is reduced. The method comprises the following specific steps: firstly, carrying out feature extraction tracking, counting a three-dimensional histogram of life cycle and position distribution of features in a current image, then setting the number and position distribution of region growing seeds according to the number and the peak position of peaks in the histogram, and finally completing current image segmentation by using a region growing algorithm.
Preferably, the confidence weight of the semantic label of the corresponding superpixel region is calculated according to the proportion, specifically:
acquiring the proportion as a first weight factor;
acquiring the feature quantity of the life cycle of the super pixel area as a second weight factor;
acquiring the coverage proportion of radar echo signals in the super-pixel area as a third weight factor;
and normalizing the first weight factor, the second weight factor and the third weight factor to obtain three corresponding probabilities, and taking the difference value of the maximum probability and the second maximum probability as the confidence weight.
According to the embodiment, confidence weight is calculated by fusing radar and AIS heterogeneous sensor information according to the proportion of the semantic label corresponding to the pixels in the super pixel region.
The calculation scheme of the label confidence weight of the live-action training data comprises the following steps: the real training data as described above generates a basic scheme, and the generated training data is attached with weight information of the confidence level of the label in addition to the label information of each pixel. The distribution method of the weight directly influences the network training effect through the loss function, and how to reasonably calculate the weight is one of key factors of the network training effect. This embodiment introduces three factors that affect the confidence weight: semantic distribution characteristics in a super-pixel area generated by cluster segmentation, characteristic life cycle distribution characteristics in the super-pixel area, and distribution characteristics of radar and AIS signal back projection signals in the super-pixel area.
1. Semantic tag confidence weighting factors: the statistical proportion of each semantic pixel in the super pixel area is calculated as the probability that the super pixel belongs to a certain class of targets (water surface, sky and obstacle), and the proportion is a softmax loss value based on Convolutional Neural Network (CNN) segmentation.
2. Feature lifecycle confidence weight factor: the number of features of the life cycle of the super pixel area is counted, and the larger the number is, the larger the probability that the super pixel belongs to the obstacle area is.
3. Radar AIS signal backprojection confidence weighting factor: and (3) counting the coverage proportion of radar echo signals in the super-pixel area, wherein the probability that the super-pixel belongs to the obstacle area is larger as the proportion is larger.
And fusing the three weight influence factors for each type of scene target semantic label, and normalizing to obtain the probability of the super pixel belonging to each semantic label. The label category with the highest probability is used as the semantic label of the superpixel, and the confidence weight is determined by the difference between the maximum probability and the second highest probability.
Preferably, a live-action training sample set is established according to the live-action image marked with the semantic tag and the confidence weight, specifically:
constructing a generated countermeasure network, and training the generated countermeasure network by utilizing the live-action image;
and automatically generating training samples by using the trained generating countermeasure network, and constructing the live-action training sample set.
The expansion of training data sets with data enhancement is a common approach in deep learning. Common data enhancement techniques fall into three categories: the first method is to make translation, selection, stretching, twisting, noise adding and other treatments on the existing training data, so as to multiply increase the training data set; the second way is 3D digital model scene virtual camera imaging; the third way is to randomly generate specific image data from a certain random distribution using a data generation network.
The third way and improvement of this embodiment is that the most common data generation network is the generation countermeasure network (Generative Adversarial Network), the basic idea is: GAN consists of a arbiter D and a generator G, both of which are structured as CNNs. The generator G generates a virtual image through a CNN network by a random vector, and inputs the virtual image and the virtual image into a discriminator, and the discriminator judges whether the input image is real or virtual through another CNN network; in the training process, alternately training the generating network and the judging network: the design of the network loss function enables the discriminator to open virtual data and real data as far as possible, and the generated network generates data which is as close to the real data as possible to reduce the accuracy of the discriminator. The two networks achieve Nash equilibrium through interactive competition learning, and the two networks achieve the optimal simultaneously.
Preferably, a generated countermeasure network is constructed, and the generated countermeasure network is trained by using the live-action image, specifically:
constructing a generating network by adopting a structure without interlayer connection U-Net;
constructing a discrimination network by adopting a triple network structure;
inputting the live-action image as an input image into the generation network to obtain a generated image;
the Triplet network comprises three feature extraction networks, wherein an input image, a generated image and a reference image are respectively input into the three feature extraction networks and transformed into the same deep feature space, and the distance of a feature vector is used as a loss function to calculate a loss value;
training the generation of the antagonism network by back-propagating the loss values.
In the embodiment, an extended GAN is adopted to generate virtual data, namely a condition generation countermeasure network, the main idea is that a semantic tag image generated by an image segmentation network is used as a generation constraint condition, a specific texture is generated in a specific tag area on the premise of semantic tags, so that the texture features of the generated virtual image are respectively consistent with semantics, and the texture features are as close to the original segmentation network input image as possible on the level of the intrinsic features. The present embodiment is particularly directed to a meteorological feature migration technology of a natural scene image, such as generating virtual water surfaces with different wave heights from calm water surface textures in a live-action image, or generating virtual water surface flare textures from water surface textures under a live-action shadow, or adding virtual fog in a real scene, or virtually generating obstacles such as an island reef, a ship, and the like. The detailed scheme is as follows:
generation network of cgan: an encoder/decoder architecture without an interlayer connection U-Net is employed. Scene meteorological feature migration is a non-linear pixel-to-pixel transformation mapping, so the final label output layer of the network needs to be modified to generate output for RGB three-dimensional virtual pixels.
Discrimination network of cgan: three feature extraction networks have common Network structures and Network parameters by adopting a triple Network (extension of a Siamese Network), meanwhile, an input image, a generated image and a reference image are transformed into the same deep feature space, then a corresponding loss function is calculated by calculating distance measurement of feature vectors, the Network is trained and optimized through counter propagation errors, the image generated by the generated Network is guided to be close to the input image on intrinsic features and close to the reference image on meteorological features, and the semantic segmentation graph constraint of the input image is met.
Preferably, the loss function is:
G * =arg min G max D (L CGAN )+αL content +βL environment
wherein G is * Representing loss value, alpha and beta are super parameters, L CGAN Representing generation of a loss function against the network, L content To input constraint items of a scene, L environment Constraint item, max, related to migration characteristics of reference image D Represents the maximum value, min G Representing taking the minimum value.
Network training loss function design: the loss function used for training the meteorological feature migration CGAN network consists of three parts: the first part is a normal CGAN loss function L CGAN The second part is the constraint L of the input scene content Namely, the intrinsic scene of the generated image is as close as possible to the input intrinsic scene; the third part is a constraint item L related to the meteorological features of the target meteorological scene reference image to be migrated environment
The three constraint terms are calculated as follows:
L CGAN =E x,y [log D(x,y)]+E x,z [log(1-D(x,G(x,z)))];
where D (x, y) represents the ability of the discriminator to discriminate between real and virtual scenes given the semantic tag x and G (x, z) represents the ability of the generator to generate virtual scene y from random noise z given the semantic tag x.
The meteorological environment concerned in this embodiment has regional characteristics on the visual characteristics of the scene, for example, under high storm weather, the characteristics of the water surface and the characteristics of the on-shore target or sky are greatly different, so that the drawing style characteristic statistical method based on the whole image is not applicable. According to the embodiment, semantic information of the segmented regions is introduced into network input, so that the features have the characteristic of spatial region limitation, and the network is guided to learn different meteorological features of different regions in the mode, so that the purpose of generating a virtual image of a realistic scene as training data is achieved.
The three loss functions are integrated, and the training optimization target of the generator G is as follows:
G * =arg min G max D (L CGAN )+αL content +βL environment
the network optimization training adopts a random gradient descent method SGD with momentum, and the network generalization method adopts common methods such as drop-out and L2/L1 constraint.
Preferably, before training the generating countermeasure network by using the live-action image, the method further includes:
and pre-training the identification network for generating the countermeasure network by using the manually marked sample image set.
To reduce the complexity of the CGAN network training, the discrimination network may be pre-trained using manually labeled datasets of different meteorological scenarios. The supervised pre-training scheme adds a full-connection layer and an output softmax layer after the feature extraction network for semantic classification of meteorological scenes, and the meteorological semantic dictionary covers common navigation scenes and meteorological scenes. The other scheme is that different artificially marked meteorological navigation scene data sets are utilized to jointly perform supervised learning together with an image semantic segmentation network, and a meteorological feature extraction network is partially overlapped with an encoder of the semantic segmentation network.
In the embodiment, when training is performed on the semantic segmentation network, a training data set which is not required to be marked manually is generated on line in real time, and high-quality and scene-adaptive training data with labels is automatically generated. In order to improve network precision, pixel areas with high semantic confidence are required to be considered in a training process, the influence degree of pixels with low semantic confidence is reduced, therefore, in the embodiment, a weight item is calculated on the semantic label of each pixel, so that the network loss function is automatically adjusted, generated training data is obtained through extracting image features, fusing radar and AIS data, and finally, the obtained automatically generated label has different confidence degrees in different areas.
Example 2
Embodiment 2 of the present application provides a water visual scene segmentation apparatus, including a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the water visual scene segmentation method provided in embodiment 1 is implemented.
The device for dividing the water visual scene provided by the embodiment of the application is used for realizing the method for dividing the water visual scene, so that the device for dividing the water visual scene has the technical effects as well and is not described in detail herein.
Example 3
Embodiment 3 of the present application provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method for water visual scene segmentation provided in embodiment 1.
The computer storage medium provided by the embodiment of the application is used for realizing the method for dividing the water visual scene, so that the technical effects of the method for dividing the water visual scene are achieved, and the computer storage medium is also provided and will not be described herein.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.

Claims (7)

1. The method for segmenting the water visual scene is characterized by comprising the following steps of:
collecting a live-action image of a scene on water, and carrying out semantic segmentation on the live-action image by adopting a pre-training semantic segmentation network to generate a semantic label of each pixel in the live-action image;
dividing the live-action image by adopting a feature clustering algorithm to obtain a plurality of super-pixel areas;
counting the proportion of pixels corresponding to various semantic tags in each super-pixel region, taking the semantic tag of the pixel with the largest proportion as the semantic tag of the corresponding super-pixel region, and calculating the confidence weight of the semantic tag of the corresponding super-pixel region according to the proportion;
establishing a live-action training sample set according to the live-action image marked with the semantic tag and the confidence weight;
training the deep convolutional neural network through the live-action training sample set to obtain a semantic segmentation network;
inputting the image to be identified into the semantic segmentation network to obtain a semantic segmentation result;
the live-action image is segmented by adopting a feature clustering algorithm to obtain a plurality of super-pixel areas, which are specifically as follows:
distinguishing different scene areas by using life cycle semantic feature points among the live-action image sequences, modeling feature time statistics of the feature points by using Gaussian distribution of different parameters to obtain a double Gaussian model, and taking the double Gaussian model as a likelihood function of the extracted feature points;
calculating likelihood functions of other pixel points except the feature points in the live-action image by adopting a clustering algorithm;
under a Bayesian framework, taking a loss value of convolutional neural network segmentation as a priori probability, and calculating classification probabilities of all pixel points in an image by combining the likelihood functions:
wherein,for classification probability->For the prior probability->Is a likelihood function;
the classification probability is in direct proportion to the product of the prior probability and the likelihood function, and the semantic segmentation of the live-action image is completed according to the classification probability, so that a plurality of super-pixel areas are obtained;
calculating likelihood functions of other pixel points except the feature points in the live-action image by adopting a clustering algorithm, wherein the likelihood functions are specifically as follows:
and calculating the distance and gray level difference between other pixel points and the feature points based on a clustering algorithm model by taking the extracted feature points as the center, wherein the clustering algorithm model is as follows:
wherein,for likelihood functions of other pixels than feature points, +.>Indicating that the pixel belongs to an obstacle +.>Indicating that the pixel belongs to the water surface,/->Representing observations->For scaling factor +.>Represents the distance between the pixel point and the feature point, < >>Representing a gray difference value between the pixel point and the characteristic point;
calculating the confidence weight of the semantic label of the corresponding superpixel region according to the proportion, wherein the confidence weight is specifically as follows:
acquiring the proportion as a first weight factor;
obtaining likelihood function distribution of all pixel points of the super pixel area as a second weight factor;
acquiring the coverage proportion of radar echo signals in the super-pixel area as a third weight factor;
and normalizing the first weight factor, the second weight factor and the third weight factor to obtain three corresponding probabilities, and taking the difference value of the maximum probability and the second maximum probability as the confidence weight.
2. The method for segmenting the water visual scene according to claim 1, wherein the real-scene training sample set is established according to the real-scene image marked with the semantic tag and the confidence weight, specifically:
constructing a generated countermeasure network, and training the generated countermeasure network by utilizing the live-action image;
and automatically generating training samples by using the trained generating countermeasure network, and constructing the live-action training sample set.
3. The method of segmentation of a visual scene on water according to claim 2, characterized in that a generated countermeasure network is constructed, which is trained with the live-action image, in particular:
constructing a generating network by adopting a structure without interlayer connection U-Net;
constructing a discrimination network by adopting a triple network structure;
inputting the live-action image as an input image into the generation network to obtain a generated image;
the Triplet network comprises three feature extraction networks, wherein an input image, a generated image and a reference image are respectively input into the three feature extraction networks and transformed into the same deep feature space, and the distance of a feature vector is used as a loss function to calculate a loss value;
training the generation of the antagonism network by back-propagating the loss values.
4. A method of segmentation of a visual scene in water according to claim 3, wherein said loss function is:
wherein,indicating a loss value->、/>Is super-parameter (herba Cinchi Oleracei)>Representing the generation of a loss function against the network,for entering constraint items of a scene +.>Constraint item related to migration characteristics of reference image, < ->Indicating that the maximum value is taken>Representing taking the minimum value.
5. The method of claim 2, further comprising, prior to training the generating an countermeasure network using the live-action image:
and pre-training the identification network for generating the countermeasure network by using the manually marked sample image set.
6. A water visual scene segmentation apparatus comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements a water visual scene segmentation method according to any of claims 1-5.
7. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method of water visual scene segmentation as defined in any one of claims 1-5.
CN202110914168.3A 2021-08-10 2021-08-10 Water visual scene segmentation method and device Active CN113705371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914168.3A CN113705371B (en) 2021-08-10 2021-08-10 Water visual scene segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914168.3A CN113705371B (en) 2021-08-10 2021-08-10 Water visual scene segmentation method and device

Publications (2)

Publication Number Publication Date
CN113705371A CN113705371A (en) 2021-11-26
CN113705371B true CN113705371B (en) 2023-12-01

Family

ID=78652132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914168.3A Active CN113705371B (en) 2021-08-10 2021-08-10 Water visual scene segmentation method and device

Country Status (1)

Country Link
CN (1) CN113705371B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419321B (en) * 2022-03-30 2022-07-08 珠海市人民医院 CT image heart segmentation method and system based on artificial intelligence
CN115170481A (en) * 2022-06-20 2022-10-11 中国地质大学(武汉) Natural resource image analysis method and system based on visual saliency
CN114926463B (en) * 2022-07-20 2022-09-27 深圳市尹泰明电子有限公司 Production quality detection method suitable for chip circuit board
CN116523912B (en) * 2023-07-03 2023-09-26 四川省医学科学院·四川省人民医院 Cleanliness detection system and method based on image recognition
CN116665137B (en) * 2023-08-01 2023-10-10 聊城市彩烁农业科技有限公司 Livestock breeding wastewater treatment method based on machine vision

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764027A (en) * 2018-04-13 2018-11-06 上海大学 A kind of sea-surface target detection method calculated based on improved RBD conspicuousnesses
CN109063723A (en) * 2018-06-11 2018-12-21 清华大学 The Weakly supervised image, semantic dividing method of object common trait is excavated based on iteration
CN109740638A (en) * 2018-12-14 2019-05-10 广东水利电力职业技术学院(广东省水利电力技工学校) A kind of method and device of EM algorithm two-dimensional histogram cluster
CN109919159A (en) * 2019-01-22 2019-06-21 西安电子科技大学 A kind of semantic segmentation optimization method and device for edge image
WO2021017372A1 (en) * 2019-08-01 2021-02-04 中国科学院深圳先进技术研究院 Medical image segmentation method and system based on generative adversarial network, and electronic equipment
CN112446342A (en) * 2020-12-07 2021-03-05 北京邮电大学 Key frame recognition model training method, recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764027A (en) * 2018-04-13 2018-11-06 上海大学 A kind of sea-surface target detection method calculated based on improved RBD conspicuousnesses
CN109063723A (en) * 2018-06-11 2018-12-21 清华大学 The Weakly supervised image, semantic dividing method of object common trait is excavated based on iteration
CN109740638A (en) * 2018-12-14 2019-05-10 广东水利电力职业技术学院(广东省水利电力技工学校) A kind of method and device of EM algorithm two-dimensional histogram cluster
CN109919159A (en) * 2019-01-22 2019-06-21 西安电子科技大学 A kind of semantic segmentation optimization method and device for edge image
WO2021017372A1 (en) * 2019-08-01 2021-02-04 中国科学院深圳先进技术研究院 Medical image segmentation method and system based on generative adversarial network, and electronic equipment
CN112446342A (en) * 2020-12-07 2021-03-05 北京邮电大学 Key frame recognition model training method, recognition method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Weakly-Supervised Semantic Segmentation via Sub-category Exploration;Yu-Ting Chang 等;《IEEE Xplore》;第8991-9000页 *
基于局部特征和弱标注信息的图像分类和识别;吴绿;《中国博士论文全文数据库 信息科技辑》(第7期);第1-114页 *

Also Published As

Publication number Publication date
CN113705371A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN113705371B (en) Water visual scene segmentation method and device
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
CN109241913B (en) Ship detection method and system combining significance detection and deep learning
CN106909902B (en) Remote sensing target detection method based on improved hierarchical significant model
CN111652321A (en) Offshore ship detection method based on improved YOLOV3 algorithm
CN111445488B (en) Method for automatically identifying and dividing salt body by weak supervision learning
CN112733614B (en) Pest image detection method with similar size enhanced identification
CN109543632A (en) A kind of deep layer network pedestrian detection method based on the guidance of shallow-layer Fusion Features
CN112418330A (en) Improved SSD (solid State drive) -based high-precision detection method for small target object
CN112734764A (en) Unsupervised medical image segmentation method based on countermeasure network
CN108596195B (en) Scene recognition method based on sparse coding feature extraction
CN110853070A (en) Underwater sea cucumber image segmentation method based on significance and Grabcut
Yuan et al. Neighborloss: a loss function considering spatial correlation for semantic segmentation of remote sensing image
CN110334656A (en) Multi-source Remote Sensing Images Clean water withdraw method and device based on information source probability weight
CN112329771B (en) Deep learning-based building material sample identification method
CN113177503A (en) Arbitrary orientation target twelve parameter detection method based on YOLOV5
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN113554679A (en) Anchor-frame-free target tracking algorithm for computer vision application
CN110472632B (en) Character segmentation method and device based on character features and computer storage medium
Qian et al. A hybrid network with structural constraints for SAR image scene classification
CN117253044B (en) Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant