CN117765482B - Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning - Google Patents

Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning Download PDF

Info

Publication number
CN117765482B
CN117765482B CN202410195184.5A CN202410195184A CN117765482B CN 117765482 B CN117765482 B CN 117765482B CN 202410195184 A CN202410195184 A CN 202410195184A CN 117765482 B CN117765482 B CN 117765482B
Authority
CN
China
Prior art keywords
network model
improved
garbage
window
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410195184.5A
Other languages
Chinese (zh)
Other versions
CN117765482A (en
Inventor
于迅
彭士涛
胡健波
何建斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Research Institute for Water Transport Engineering MOT
Original Assignee
Tianjin Research Institute for Water Transport Engineering MOT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Research Institute for Water Transport Engineering MOT filed Critical Tianjin Research Institute for Water Transport Engineering MOT
Priority to CN202410195184.5A priority Critical patent/CN117765482B/en
Publication of CN117765482A publication Critical patent/CN117765482A/en
Application granted granted Critical
Publication of CN117765482B publication Critical patent/CN117765482B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and discloses a garbage identification method and a garbage identification system for a garbage enrichment zone of a coastal zone based on deep learning, wherein the method comprises the following steps: acquiring an original image of a garbage enrichment area of a coastal zone, and preprocessing the original image to be divided into N data sets with different scales; the improved Swin transform layer is used as a backbone layer of the Mask2form network model, and an improved Mask2form network model is built; training the improved Mask2Former network model by using N data sets with different scales; inputting the image to be detected into a trained improved Mask2Former network model to carry out garbage identification. Targets with different pixel sizes can be adapted through the feature extraction windows with different scales, so that the waste of calculation amount and overfitting are avoided, the receptive field and the segmentation precision of the network model are improved, and the garbage recognition precision is improved.

Description

Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning
Technical Field
The invention relates to the technical field of image processing, in particular to a garbage identification method and a garbage identification system for a coastal zone garbage enrichment zone based on deep learning.
Background
Coastal zone waste refers to persistent, man-made or processed solid waste in a coastal environment. The amount of waste in coastal zones is increasing dramatically and has become a major concern worldwide due to its significant potential impact on coastal systems, marine life and human health. The increase of garbage in the coastal zone can also affect the beach urban tourism industry, seriously damage urban image and lead to the reduction of tourism income. Therefore, it is necessary to monitor the coastal zone waste. The garbage sources in the coastal zone are complex and can be roughly divided into three types, namely direct discarding and depositing of personnel around the coastal zone, conveying from peripheral areas through rain channels and coastal runoffs, and conveying to the coast by using factors such as wind, sea waves and tides to influence the marine system. At present, unmanned aerial vehicle aerial image monitoring is a common method in the field of coastal zone garbage monitoring, but a standardized automatic method is still required for processing aerial images to reduce the monitoring cost.
Conventional methods for automatically processing images include an image processing thresholding method and a Random Forest (RF) method, which process images and extract features, and then use a classification algorithm to determine the presence or absence of an object. However, thresholding typically relies on a selected threshold, and the selection of this threshold has a large impact on the result, nor does thresholding process images with multiple lighting conditions, complex textures, or large variations well for complex image scenes. While random forests have high computational resource requirements for large-scale data sets, training and reasoning make it difficult to apply in many scenarios. In the prior art, the Swin Transformer is used as the Mask2 Transformer of the back plane layer, and the self-attention operation is only carried out in the window, so that the calculation complexity is reduced. However, the window size of the Swin transducer addition is limited, resulting in limited receptive fields, which do not fit well into different scale targets during feature extraction.
Therefore, there is a need for a garbage identification method for a garbage enrichment area of a coastal zone based on deep learning, which can adapt to targets with different scales to perform feature extraction, avoid the occurrence of wasting calculation amount and overfitting, and improve the receptive field and segmentation precision of a network model, thereby improving the garbage identification precision.
Disclosure of Invention
In order to solve the technical problems, the invention provides the garbage identification method and the garbage identification system for the garbage enrichment area of the coastal zone based on deep learning, which can adapt to targets with different scales to perform feature extraction, avoid the occurrence of the phenomena of wasting calculation and overfitting, and improve the receptive field and the segmentation precision of a network model, thereby improving the garbage identification precision.
The invention provides a garbage identification method of a coastal zone garbage enrichment zone based on deep learning, which comprises the following steps:
S1, acquiring an original image of a garbage enrichment area of a coastal zone;
S2, preprocessing an original image, and dividing the preprocessed original image into N data sets with different scales;
S3, based on a Mask2Former network model, using the improved Swin transform layer as a backbone layer of the Mask2Former network model, and establishing an improved Mask2Former network model; the improved Swin transducer layer comprises M feature extraction windows with different scales, wherein M=N;
S4, training the improved Mask2Former network model by utilizing N data sets with different scales to obtain a trained improved Mask2Former network model;
S5, inputting the image to be detected into a trained improved Mask2Former network model to carry out garbage identification.
Further, S2, preprocessing an original image, dividing the preprocessed original image into N data sets with different scales, and dividing the data set of each scale into a training set and a verification set, which includes:
S21, performing image cutting processing on the original image, performing labeling processing on the original image after image cutting, labeling a corresponding real label of the original image after image cutting, and taking the original image after image cutting and the corresponding real label as a group of data sets;
S22, dividing the data component into N data sets with different scales according to the categories of the real labels.
Further, S22, the data group is divided into N data sets with different scales according to the category of the real tag, where the category of the real tag includes: ocean, beach, garbage, vegetation, biology and background.
Further, S22, dividing the data group into N data sets with different scales according to the category of the real tag includes:
Wherein N is 3;
taking a data group with the category of the real tag as a living being as a first data set;
Taking a data set with the real tag classified as garbage and vegetation as a second data set;
the data set with the category of real tags as ocean and beach is taken as a third data set.
Further, after S22, the method further includes:
s23, respectively dividing the data in the data sets of the N scales into a training set and a verification set according to a preset proportion.
Further, S3, based on the Mask2Former network model, using the improved Swin transform layer as a backbone layer of the Mask2Former network model, and establishing a modified Mask2Former network model, wherein the improved Swin transform layer comprises M feature extraction windows with different scales, and the M feature extraction windows with different scales comprise:
taking a feature extraction window with a scale within a first scale range as a first window;
taking a feature extraction window with the scale within a second scale range as a second window;
and taking the feature extraction window with the scale within the third scale range as a third window.
Further, S4, training the improved Mask2Former network model by using N data sets with different scales, where obtaining a trained improved Mask2Former network model includes:
S41, inputting a data set in a training set into an improved Mask2Former network model for training; s41 specifically comprises:
s411, training a first window in the improved Mask2Former network model through a data set in a training set in a first data set;
S412, training a second window in the improved Mask2Former network model through a data set in a training set in a second data set;
S413, training a third window in the improved Mask2Former network model through a data set in a training set in a third data set;
s42, comparing the output data set of the improved Mask2Former network model with the data set in the verification set in the corresponding data set, and calculating whether the error value is smaller than a preset value;
S43, if the error value is smaller than a preset value, training the improved Mask2Former network model is completed, and a trained improved Mask2Former network model is obtained; if the error value is not smaller than the preset value, training the improved Mask2Former network model is continued.
The invention also provides a garbage recognition system of the garbage enrichment zone of the coastal zone based on deep learning, which is used for executing the garbage recognition method of the garbage enrichment zone of the coastal zone based on deep learning, and comprises the following steps:
the image acquisition module is used for acquiring an original image of the garbage enrichment area of the coastal zone;
The image processing module is used for preprocessing an original image and dividing the preprocessed original image into N data sets with different scales;
the network model building module is used for building an improved Mask2Former network model by taking the improved Swin transform layer as a backbone layer of the Mask2Former network model based on the Mask2Former network model; the improved Swin transducer layer comprises M feature extraction windows with different scales, wherein M=N; training the improved Mask2Former network model by utilizing N data sets with different scales to obtain a trained improved Mask2Former network model;
the image recognition module is used for inputting the image to be detected into the trained improved Mask2Former network model to carry out garbage recognition.
The embodiment of the invention has the following technical effects:
the invention provides an improved network model based on a Mask2former, a Swin transform layer is selected as a backbone layer of the Mask2former network model, the Swin transform layer is further improved, a plurality of translation windows with different scales are integrated into the Swin transform layer, and collected data sets are classified with different scales, so that the network model can adapt to targets with different pixel sizes through characteristic extraction windows with different scales, the waste of calculation amount and overfitting are avoided, the garbage characteristics of a coastal zone are extracted more accurately, and the accurate segmentation of garbage enrichment areas of the coastal zone is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a garbage identification method of a coastal zone garbage enrichment zone based on deep learning, which is provided by the embodiment of the invention;
FIG. 2 is a schematic diagram of a Mask2Former network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a Swin transducer layer structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a modified Swin transducer layer structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an improved Mask2Former network model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of feature extraction based on an improved Mask2Former network model according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of a garbage recognition system based on deep learning of a garbage enrichment zone of a coastal zone.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the invention, are within the scope of the invention.
Fig. 1 is a flowchart of a garbage identification method for a garbage enrichment area of a coastal zone based on deep learning, which is provided by an embodiment of the present invention, referring to fig. 1, specifically includes:
s1, acquiring an original image of a garbage enrichment area of the coastal zone.
Specifically, shooting and sampling are carried out on the garbage enrichment area of the coastal zone through the unmanned aerial vehicle, so that an original image of the garbage enrichment area of the coastal zone is obtained. Only the raw images taken on the coastal zone garbage enrichment zone are used, thereby reducing the complexity of model training caused by various backgrounds.
Illustratively, the garbage enrichment area may be photographed and sampled using a DJI PHANTOM 4 Pro +v2.0 drone, and the pixel size of the sampled original image may be 4864×3648.
S2, preprocessing the original image, and dividing the preprocessed original image into N data sets with different scales.
S21, performing image cutting processing on the original image, performing labeling processing on the original image after image cutting, labeling corresponding real labels of the original image after image cutting, and taking the original image after image cutting and the corresponding real labels as a group of data sets.
Specifically, to adapt to the input of the deep learning network, the original image is preprocessed, where the preprocessing includes: the original image is subjected to image cutting processing, and the original image is cut into images with a pixel size of 1024×1024. And labeling the original image after the image cutting, labeling the corresponding real label of the original image after the image cutting, and taking the original image after the image cutting and the corresponding real label as a group of data sets.
S22, dividing the data component into N data sets with different scales according to the categories of the real labels.
Illustratively, the categories of real tags may include: ocean, beach, garbage, vegetation, biology and background; the category names, the numbers and the like of the real labels can be customized according to the actual situation.
Illustratively, N may be set to 3, dividing the data group into 3 data sets of different scales according to the category of the real tag, including:
taking a data group with the category of the real tag as a living being as a first data set;
Taking a data set with the real tag classified as garbage and vegetation as a second data set;
the data set with the category of real tags as ocean and beach is taken as a third data set.
S23, respectively dividing the data in the data sets of the N scales into a training set and a verification set according to a preset proportion.
Illustratively, the preset ratio may be set as desired, such as 5:1 or 8:2, etc. Taking the preset ratio of 5:1 as an example, dividing the data in the first data set into a first training set and a first verification set according to the ratio of 5:1; dividing the data in the second data set into a second training set and a second verification set according to a ratio of 5:1; data in the third dataset is divided into a third training set and a third validation set … … and so on according to a ratio of 5:1, and data in the dataset with N scales is divided into the training set and the validation set according to a ratio of 5:1.
S3, based on the Mask2Former network model, the improved Swin transformation Former layer is used as a backbone layer of the Mask2Former network model, and the improved Mask2Former network model is built.
Specifically, fig. 2 is a schematic diagram of a Mask2Former network model provided by an embodiment of the present invention, fig. 3 is a schematic diagram of a Swin transform layer provided by an embodiment of the present invention, fig. 4 is a schematic diagram of a modified Swin transform layer provided by an embodiment of the present invention, and fig. 5 is a schematic diagram of a modified Mask2Former network model provided by an embodiment of the present invention; referring to fig. 2-5, based on the Mask2Former network model, a Swin transform layer is selected as a backbone layer of the Mask2Former network model, and the Swin transform layer is further improved, and a plurality of translation windows with different dimensions are integrated into the Swin transform layer, so that the improved Swin transform layer is obtained, and an improved Mask2Former network model is further established. Wherein the modified Swin transducer layer includes M feature extraction windows of different dimensions, and m=n.
Further, with continued reference to FIG. 4, given the input profile Y L-1, normalization is performed by the LayerNorm (LN) layer. Then, the normalized feature map is passed through TW-MSA (three window multi-head self attention, three window multi-head attention mechanism) module to obtain a second feature map Y L; the principle formula of feature extraction is as follows:
Wherein LN is a normalization layer, MLP is a multi-layer perceptron, TW-MSA is a three-window multi-head attention mechanism module, Y L -1 is an input feature map, Y L is a feature map obtained by Y L-1 through the normalization layer and the three-window multi-head attention mechanism module, and Y L+1 is a feature map obtained by Y L through the normalization layer and the multi-layer perceptron; the multi-layer perceptron MLP is a neural network structure and consists of a plurality of fully connected layers, any neuron of the upper layer is connected with all neurons of the lower layer, and the layers are propagated forward.
The TW-MSA module is improved on the basis of a W-MSA (window multi-head self attention, window multi-head attention mechanism) module, so that the scope of self-attention operation is limited to one regular window, different scale feature extraction windows are described by adding different superscripts in a formula, and the three-window feature extraction principle formula is as follows:
Wherein Y L-1 is an input feature map, Z L-1 is a second feature map obtained by normalizing Y L-1, Z L is a third feature map obtained by normalizing Z L-1 through a TW-MSA module, and the TW-MSA module consists of three W-MSA modules and a superscript; superscript Representing three feature extraction windows of different scales, respectively. The three Windows of the TW-MSA module are arranged in parallel, and the three Windows of the STW-MSA (SHIFT THREE Windows multi-head self attention, three-scale window movement mechanism) module are also arranged in parallel.
Illustratively, when N is set to 3, M is also set to 3, and the modified Swin fransformer layer includes 3 different scale feature extraction windows, including:
taking a feature extraction window with a scale within a first scale range as a first window;
taking a feature extraction window with the scale within a second scale range as a second window;
and taking the feature extraction window with the scale within the third scale range as a third window.
The first scale range, the second scale range and the third scale range can be set according to the pixel size of the target to be detected, and the larger the pixel of the target to be detected is, the larger the scale of the feature extraction window is. The targets with different scales can better extract features by using feature extraction windows with corresponding sizes, for example, when the first window is used for extracting features of data in the first data set, and the size of the first window can be set to be 5×5 because the category of the first data set is biology and the pixel size of the data in the first data set is not greater than 2304 pixels; when the second window is used for extracting features of data in the second data set, and the class of the second data set is garbage and vegetation, and the pixel size of the data in the second data set is between 2304 pixels and 16384 pixels, the scale of the second window can be set to be 7×7; when the third window is used for feature extraction of data in the third data set, since the category of the third data set is ocean and beach, and the pixel size of the data in the third data set is not less than 16384 pixels, the scale of the third window may be set to 9×9.
Further, with continued reference to fig. 5, the improved Mask2Former network model is further explained. The improved Mask2Former network model firstly uses a patch segmentation module to segment an input image into non-overlapping patches, the size of the patches can be set to 8 multiplied by 8, and then the size is converted through Linear Embedding layers; the two transforms are implemented by convolution operations, where the convolution kernel size is 8 x 8, the step size is 8, and the output dimension is 96. Next is a stack of modifications Swin Transformer Block and PATCH MERGING, PATCH MERGING is a 2 x downsampling operation, which is accomplished by means of a recombination splice and linear layer. The Mask2Form Decoder is consistent with the original network architecture.
And S4, training the improved Mask2Former network model by utilizing N data sets with different scales to obtain a trained improved Mask2Former network model.
S41, inputting the data set in the training set into an improved Mask2Former network model for training.
Specifically, training is performed on feature extraction windows of different scales according to data sets of different scales, and S41 specifically includes:
S411, training a first window in the improved Mask2Former network model through a data set in a training set in the first data set.
Specifically, data in a training set in a first data set is input into an improved Mask2Former network model, a first window in the improved Mask2Former network model is selected to perform feature extraction on the input data, and a prediction result output by the first window is obtained.
S412, training a second window in the improved Mask2Former network model through the data set in the training set in the second data set.
Specifically, data in a training set in a second data set is input into an improved Mask2Former network model, a second window in the improved Mask2Former network model is selected to perform feature extraction on the input data, and a prediction result output by the second window is obtained.
S413, training a third window in the improved Mask2Former network model through the data set in the training set in the third data set.
Specifically, inputting data in a training set in a third data set into an improved Mask2Former network model, selecting a third window in the improved Mask2Former network model, and extracting features of the input data to obtain a prediction result output by the third window.
S42, comparing the output data set of the improved Mask2Former network model with the data set in the verification set in the corresponding data set, and calculating whether the error value is smaller than a preset value.
Specifically, respectively comparing the prediction result output by the first window with the data set in the verification set in the first data set; comparing the predicted result output by the second window with the data set in the verification set in the second data set; and comparing the prediction result output by the third window with the data group in the verification set in the third data set to calculate whether the error value is smaller than a preset value.
S43, if the error value is smaller than a preset value, training the improved Mask2Former network model is completed, and a trained improved Mask2Former network model is obtained; if the error value is not smaller than the preset value, training the improved Mask2Former network model is continued.
Specifically, if the error values of the prediction results output by the first window, the second window and the third window are smaller than the preset value, training of the improved Mask2Former network model is completed, and a trained improved Mask2Former network model is obtained; if the feature extraction window with the error value not smaller than the preset value exists, training and optimizing the window is continued.
S5, inputting the image to be detected into a trained improved Mask2Former network model to carry out garbage identification.
Specifically, fig. 6 is a schematic diagram of feature extraction based on an improved Mask2Former network model, in which an image to be detected is input into a trained improved Mask2Former network model, the Mask2Former network model divides the image to be detected into non-overlapping areas, and multi-head attention operation is independently executed in each feature extraction window. And the feature extraction windows with different scales are connected in parallel, feature extraction is respectively carried out on the images to be detected, and then feature images of a plurality of receptive fields are fused to obtain a result of carrying out garbage identification on the garbage enrichment area of the coastal zone.
In the embodiment of the invention, an improved network model based on a Mask2former is provided, a Swin transform former layer is selected as a backbone layer of the Mask2former network model, the Swin transform former layer is further improved, a plurality of translation windows with different scales are integrated into the Swin transform former layer, and the acquired data set is classified with different scales, so that the network model can adapt to targets with different pixel sizes through feature extraction windows with different scales, the waste of calculation amount and overfitting is avoided, the garbage features of a coastal zone are extracted more accurately, and the accurate segmentation of garbage enrichment areas of the coastal zone is realized.
Further, the garbage identification effect of the garbage enrichment area of the coastal zone of the improved Mask2Former network model and other network models is evaluated and verified.
Wherein, the evaluation index comprises an intersection ratio (Intersection over Union, ioU) and an accuracy (Acc). IoU is an evaluation index for evaluating the percentage of overlap between the correct region (standard answer) and the classified region (network model output) of each category, and the calculation formula is as follows:
Wherein Intersection is the number of pixels common to the correct region and the classified region for each class, and Union is the sum of the number of pixels of the correct region and the number of pixels of the classified region. IoU ranges from 0 to 1, where 0 means no overlap and 1 means a completely overlapping split.
The calculation formula of the accuracy Acc is as follows:
where Acc represents the accuracy, TP represents the number of actual positive categories predicted as positive categories, FP represents the number of actual negative categories predicted as positive categories, FN represents the number of actual positive categories predicted as negative categories, and TN represents the number of actual negative categories predicted as negative categories.
Illustratively, taking garbage as a positive category, TP represents the number of targets labeled garbage predicted as garbage, FP represents the number of targets labeled non-garbage predicted as garbage, FN represents the number of targets labeled garbage predicted as non-garbage, and TN represents the number of targets labeled non-garbage predicted as non-garbage.
Identifying the verification set by using HRNet network model and Mask2former model of different backbone layers, comparing according to the identification result, and evaluating the identification result as shown in table 1:
TABLE 1 semantic segmentation results of different models on a validation set
In table 1, mIoU is an average cross-over ratio, that is, an average of cross-over ratios of all categories, and mAcc is an average accuracy, that is, an average of accuracy of all categories. Mask2former (Swin-T-5) is a network model in which a Swin transform layer is selected as a backbone layer of the Mask2former network model, and the feature extraction window is set to 5×5 in size; mask2former (Swin-T-9) is a network model in which a Swin transform layer is selected as a backbone layer of the Mask2former network model, and the feature extraction window is set to 9×9 in size; mask2former (Swin-S-5) is a network model with more Swin modules selected as the backbone layer of the Mask2former network model and the feature extraction window size set to 5×5; mask2Former (Three-spin-T-5, 7, 9) is an improved Mask2Former network model of Three feature extraction windows of different dimensions in the scheme, wherein the sizes of the Three feature extraction windows are respectively set as 5×5,7×7 and 9×9 network models.
According to the evaluation results of table 1, the improved Mask2Former network model provided by the scheme achieves the best recognition precision on the verification set, which indicates that the improved Mask2Former network model provided by the scheme can extract the garbage characteristics of the coastal zone more accurately and realize the accurate segmentation of the garbage enrichment zone of the coastal zone.
Fig. 7 is a schematic structural diagram of a garbage recognition system based on a deep learning garbage enrichment region of a coastal zone according to an embodiment of the present invention, where the garbage recognition system based on the deep learning garbage enrichment region of the coastal zone is used to execute the garbage recognition method based on the deep learning garbage enrichment region of the coastal zone according to any one of the above embodiments, and as shown in fig. 7, the system includes:
the image acquisition module is used for acquiring an original image of the garbage enrichment area of the coastal zone.
Specifically, the image acquisition module may include an unmanned aerial vehicle, and is configured to take a photograph of the coastal zone garbage enrichment region, and obtain an original image of the coastal zone garbage enrichment region.
The image processing module is used for preprocessing the original image and dividing the preprocessed original image into N data sets with different scales.
The network model building module is used for building an improved Mask2Former network model by taking the improved Swin transform layer as a backbone layer of the Mask2Former network model based on the Mask2Former network model; the improved Swin transducer layer comprises M feature extraction windows with different scales, wherein M=N; and training the improved Mask2Former network model by utilizing the N data sets with different scales to obtain the trained improved Mask2Former network model.
The image recognition module is used for inputting the image to be detected into the trained improved Mask2Former network model to carry out garbage recognition.
In the embodiment of the invention, a network model building module is based on a Mask2former network model, a Swin transform layer is selected as a backbone layer of the Mask2former network model, the Swin transform layer is further improved, and a plurality of translation windows with different scales are integrated into the Swin transform layer; the collected data sets are classified in different scales through the image processing module, so that the network model can adapt to targets with different pixel sizes through feature extraction windows in different scales, the waste of calculation amount and overfitting are avoided, the garbage features of the coastal zone are extracted more accurately, and the accurate segmentation of the garbage enrichment area of the coastal zone is realized.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used in this specification, the terms "a," "an," "the," and/or "the" are not intended to be limiting, but rather are to be construed as covering the singular and the plural, unless the context clearly dictates otherwise. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method or apparatus that includes the element.
It should also be noted that the positional or positional relationship indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Unless specifically stated or limited otherwise, the terms "mounted," "connected," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims (5)

1. The garbage identification method for the coastal zone garbage enrichment zone based on deep learning is characterized by comprising the following steps of:
S1, acquiring an original image of a garbage enrichment area of a coastal zone;
S2, preprocessing the original image, and dividing the preprocessed original image into N data sets with different scales;
Wherein N is 3;
taking a data group with the category of the real tag as a living being as a first data set;
Taking a data set with the real tag classified as garbage and vegetation as a second data set;
taking a data set with the real tag category of ocean and beach as a third data set;
s3, based on a Mask2Former network model, using the improved Swin transform layer as a backbone layer of the Mask2Former network model, and establishing an improved Mask2Former network model; wherein the improved Swin transducer layer comprises M feature extraction windows of different scales, and m=n;
The M feature extraction windows with different scales comprise:
taking a feature extraction window with a scale within a first scale range as a first window;
taking a feature extraction window with the scale within a second scale range as a second window;
taking a feature extraction window with the scale within a third scale range as a third window;
Given an input feature map Y L-1, carrying out normalization through an LN layer, and obtaining a second feature map Y L through a three-window multi-head attention mechanism module by the normalized feature map; the principle formula of feature extraction is as follows:
Wherein LN is a normalization layer, MLP is a multi-layer perceptron, TW-MSA is a three-window multi-head attention mechanism module, Y L-1 is an input feature map, Y L is a feature map obtained by Y L-1 through the normalization layer and the three-window multi-head attention mechanism module, and Y L+1 is a feature map obtained by Y L through the normalization layer and the multi-layer perceptron;
The three-window multi-head attention mechanism module is improved on the basis of the window multi-head attention mechanism module, so that the scope of self-attention operation is limited to one rule window, the feature extraction windows with different dimensions are described in a formula by adding different superscripts, and the three-window feature extraction principle formula is as follows:
Wherein Y L-1 is an input feature map, Z L-1 is a second feature map obtained by normalizing Y L-1, Z L is a third feature map obtained by normalizing Z L-1 through TW-MSA module consisting of three W-MSA modules and superscript, W-MSA is a window multi-head attention mechanism module, and superscript Respectively representing three feature extraction windows with different scales; three windows of the TW-MSA module are arranged in parallel, and three windows of the three-scale window moving mechanism module STW-MSA are also arranged in parallel;
S4, training the improved Mask2Former network model by utilizing the N data sets with different scales to obtain a trained improved Mask2Former network model;
S41, inputting a data set in a training set into an improved Mask2Former network model for training; s41 specifically comprises:
s411, training the first window in the improved Mask2Former network model through a data set in a training set in the first data set;
S412, training the second window in the improved Mask2Former network model through a data set in a training set in the second data set;
s413, training the third window in the improved Mask2Former network model through a data set in a training set in the third data set;
S42, comparing the output data set of the improved Mask2Former network model with the data set in the verification set in the corresponding data set, and calculating whether the error value is smaller than a preset value;
S43, if the error value is smaller than a preset value, training the improved Mask2Former network model is completed, and a trained improved Mask2Former network model is obtained; if the error value is not smaller than the preset value, continuing to train the improved Mask2Former network model;
S5, inputting the image to be detected into a trained improved Mask2Former network model to carry out garbage identification.
2. The deep learning-based garbage identification method for a coastal zone garbage enrichment zone according to claim 1, wherein the step S2 of preprocessing the original image, dividing the preprocessed original image into N data sets with different scales, and dividing the data sets of each scale into a training set and a verification set comprises:
S21, performing image cutting processing on the original image, performing labeling processing on the original image after image cutting, labeling a corresponding real label of the original image after image cutting, and taking the original image after image cutting and the corresponding real label as a group of data sets;
s22, dividing the data group into N data sets with different scales according to the category of the real label.
3. The deep learning-based garbage identification method for a coastal zone garbage enrichment zone according to claim 2, wherein S22, the data group is divided into N data sets with different scales according to the category of the real tag, and the category of the real tag includes: ocean, beach, garbage, vegetation, biology and background.
4. The deep learning based coastal zone waste identification method of claim 3, further comprising, after S22:
s23, respectively dividing the data in the data sets of the N scales into a training set and a verification set according to a preset proportion.
5. A deep learning-based garbage identification system for a garbage enrichment zone of a coastal zone for performing the deep learning-based garbage identification method of the garbage enrichment zone of the coastal zone as claimed in any one of the preceding claims 1 to 4, characterized in that the system comprises:
the image acquisition module is used for acquiring an original image of the garbage enrichment area of the coastal zone;
the image processing module is used for preprocessing the original image and dividing the preprocessed original image into N data sets with different scales;
the network model building module is used for building an improved Mask2Former network model by taking the improved Swin transform layer as a backbone layer of the Mask2Former network model based on the Mask2Former network model; wherein the improved Swin transducer layer comprises M feature extraction windows of different scales, and m=n; training the improved Mask2Former network model by utilizing the N data sets with different scales to obtain a trained improved Mask2Former network model;
the image recognition module is used for inputting the image to be detected into the trained improved Mask2Former network model to carry out garbage recognition.
CN202410195184.5A 2024-02-22 2024-02-22 Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning Active CN117765482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410195184.5A CN117765482B (en) 2024-02-22 2024-02-22 Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410195184.5A CN117765482B (en) 2024-02-22 2024-02-22 Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Publications (2)

Publication Number Publication Date
CN117765482A CN117765482A (en) 2024-03-26
CN117765482B true CN117765482B (en) 2024-05-14

Family

ID=90314751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410195184.5A Active CN117765482B (en) 2024-02-22 2024-02-22 Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Country Status (1)

Country Link
CN (1) CN117765482B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648707A (en) * 2022-03-22 2022-06-21 交通运输部天津水运工程科学研究所 Coastline typical garbage rapid positioning and checking method based on unmanned aerial vehicle aerial photography technology
CN115205781A (en) * 2022-06-23 2022-10-18 成都民航空管科技发展有限公司 Transformer-based trans-scale target detection method and system
CN115272887A (en) * 2022-07-20 2022-11-01 广东工业大学 Coastal zone garbage identification method, device and equipment based on unmanned aerial vehicle detection
CN115470828A (en) * 2022-09-23 2022-12-13 华东师范大学 Multi-lead electrocardiogram classification and identification method based on convolution and self-attention mechanism
CN115661507A (en) * 2022-09-22 2023-01-31 北京建筑大学 Building garbage classification method and device based on optimized Swin Transformer network
CN116168197A (en) * 2023-01-28 2023-05-26 北京交通大学 Image segmentation method based on Transformer segmentation network and regularization training
WO2023092813A1 (en) * 2021-11-25 2023-06-01 苏州大学 Swin-transformer image denoising method and system based on channel attention
WO2023154320A1 (en) * 2022-02-08 2023-08-17 Senem Velipasalar Thermal anomaly identification on building envelopes as well as image classification and object detection
CN117392638A (en) * 2023-10-13 2024-01-12 苏州煋海图科技有限公司 Open object class sensing method and device for serving robot scene
CN117523403A (en) * 2023-11-29 2024-02-06 中国农业科学院农业信息研究所 Method, system, equipment and medium for detecting spot change of residence map

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259809B (en) * 2020-01-17 2021-08-17 五邑大学 Unmanned aerial vehicle coastline floating garbage inspection system based on DANet

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023092813A1 (en) * 2021-11-25 2023-06-01 苏州大学 Swin-transformer image denoising method and system based on channel attention
WO2023154320A1 (en) * 2022-02-08 2023-08-17 Senem Velipasalar Thermal anomaly identification on building envelopes as well as image classification and object detection
CN114648707A (en) * 2022-03-22 2022-06-21 交通运输部天津水运工程科学研究所 Coastline typical garbage rapid positioning and checking method based on unmanned aerial vehicle aerial photography technology
CN115205781A (en) * 2022-06-23 2022-10-18 成都民航空管科技发展有限公司 Transformer-based trans-scale target detection method and system
CN115272887A (en) * 2022-07-20 2022-11-01 广东工业大学 Coastal zone garbage identification method, device and equipment based on unmanned aerial vehicle detection
CN115661507A (en) * 2022-09-22 2023-01-31 北京建筑大学 Building garbage classification method and device based on optimized Swin Transformer network
CN115470828A (en) * 2022-09-23 2022-12-13 华东师范大学 Multi-lead electrocardiogram classification and identification method based on convolution and self-attention mechanism
CN116168197A (en) * 2023-01-28 2023-05-26 北京交通大学 Image segmentation method based on Transformer segmentation network and regularization training
CN117392638A (en) * 2023-10-13 2024-01-12 苏州煋海图科技有限公司 Open object class sensing method and device for serving robot scene
CN117523403A (en) * 2023-11-29 2024-02-06 中国农业科学院农业信息研究所 Method, system, equipment and medium for detecting spot change of residence map

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Masked-attention Mask Transformer for Universal Image Segmentation;Bowen Cheng et al.;《arXiv》;20220615;全文 *
T-RODNet: Transformer for Vehicular Millimeter-Wave Radar Object Detection;Tiezhen Jiang et al.;《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》;20231231;全文 *
基于 Swin Transformer 的家居垃圾分类***;瞿定垚 等;《电子制作》;20230131;全文 *

Also Published As

Publication number Publication date
CN117765482A (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN112270347B (en) Medical waste classification detection method based on improved SSD
CN110033002B (en) License plate detection method based on multitask cascade convolution neural network
CN111368690B (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN109685765B (en) X-ray film pneumonia result prediction device based on convolutional neural network
CN112819748B (en) Training method and device for strip steel surface defect recognition model
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN113887472A (en) Remote sensing image cloud detection method based on cascade color and texture feature attention
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
CN113052215A (en) Sonar image automatic target identification method based on neural network visualization
CN115661649A (en) Ship-borne microwave radar image oil spill detection method and system based on BP neural network
CN112949510A (en) Human detection method based on fast R-CNN thermal infrared image
CN111539931A (en) Appearance abnormity detection method based on convolutional neural network and boundary limit optimization
CN114882204A (en) Automatic ship name recognition method
CN113850151A (en) Method, device, terminal and storage medium for identifying distraction behavior of driver
CN113327253A (en) Weak and small target detection method based on satellite-borne infrared remote sensing image
KR101334858B1 (en) Automatic butterfly species identification system and method, and portable terminal having automatic butterfly species identification function using the same
CN117765482B (en) Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning
CN117333669A (en) Remote sensing image semantic segmentation method, system and equipment based on useful information guidance
CN116844055A (en) Lightweight SAR ship detection method and system
CN113947780B (en) Sika face recognition method based on improved convolutional neural network
CN115546668A (en) Marine organism detection method and device and unmanned aerial vehicle
CN114927236A (en) Detection method and system for multiple target images
CN113095265A (en) Fungal target detection method based on feature fusion and attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant