CN113887517B

CN113887517B - Crop remote sensing image semantic segmentation method based on parallel attention mechanism

Info

Publication number: CN113887517B
Application number: CN202111272099.7A
Authority: CN
Inventors: 董荣胜; 马雨琪; 刘意; 李凤英
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2024-04-09
Anticipated expiration: 2041-10-29
Also published as: CN113887517A

Abstract

The invention discloses a crop remote sensing image semantic segmentation method based on a parallel attention mechanism, which comprises the steps of preprocessing crop remote sensing images of a crop remote sensing image dataset; then, a crop remote sensing image semantic segmentation network based on parallel attention is built, and the network is trained by utilizing the preprocessed crop remote sensing image dataset; and then sending the crop remote sensing image to be semantically segmented into a trained crop remote sensing image semantic segmentation network based on parallel attention for semantic segmentation, so as to obtain an accurate segmentation result of the crop remote sensing image to be semantically segmented. The semantic segmentation network constructed in the invention comprehensively solves the problems of inaccurate boundary segmentation caused by large intra-class difference, small inter-class difference, complex and various ground feature information and multiple interference information in the conventional crop remote sensing image language, and improves the performance of the crop remote sensing image semantic segmentation network.

Description

Crop remote sensing image semantic segmentation method based on parallel attention mechanism

Technical Field

The invention relates to the technical field of crop remote sensing image semantic segmentation, in particular to a crop remote sensing image semantic segmentation method based on a parallel attention mechanism.

Background

The high resolution remote sensing image of crops typically contains rich detailed information and distribution features of ground objects such as buildings, trees, crops and the like. The semantic segmentation of the crop remote sensing image aims at classifying the crop remote sensing image at a pixel level and segmenting the image into areas with different semantic identifications. The semantic segmentation of the remote sensing image of the crop can accurately and rapidly acquire the area and distribution condition of the crop, and has important significance in the aspects of monitoring the area, growth or disaster of the crop, identifying the type of the crop, evaluating the yield of the crop and the like.

In the past few decades, many scholars at home and abroad have developed researches on image segmentation techniques. The traditional image segmentation methods comprise a threshold segmentation method, a region segmentation method, an edge detection method and the like, and the methods can only segment objects in some simple scenes, and are long in time consumption and unsatisfactory in effect for massive remote sensing images with complicated ground features. In recent years, with the appearance of large-scale data sets, the application of the deep learning technology in the field of semantic segmentation of remote sensing images is more and more advantageous. Semantic segmentation based on deep learning classifies each pixel. Compared with the traditional semantic segmentation method, the semantic segmentation method based on deep learning has the advantages of taking the speed and the precision into consideration. In addition, some students introduce an attention mechanism into the semantic segmentation network to capture long-distance correlation among pixels, and more important information is extracted from global information so as to achieve better segmentation effect. Hu et al use SE (measure-and-specification) attention module to learn correlations between channel features and assign different weights to each channel to emphasize useful channel features and suppress irrelevant channel features. The CBAM (Convolutional Block Attention Module) module proposed by Woo et al combines spatial and channel attention mechanisms, covering richer image features. Hou et al at CA (Coordinate Attention) module embedded location information into the channel attention to obtain cross-channel direction and location information, enabling the network to more accurately locate and identify the target area.

Compared with the semantic segmentation of natural images, the semantic segmentation of crop remote sensing images mainly has the following two challenges: 1. the remote sensing images of the crops have the problems of large intra-class difference, high inter-class similarity and the like due to the influence of factors such as similar appearance and different shooting angles of the crops. For example, the artwork and labels in fig. 1 (a) show corn, as the corn is similar in appearance to myotonin, the network misclassifies corn as myotonin, and the visualization shows myotonin; in fig. 1 (b), the original image and the label are shown as myotonin, misclassified as corn, and visualized as corn. The same crop has large shape and characteristic difference, so that the network can misclassify the same crop as other crops. For example, in fig. 1 (c), the artwork and labels show myotonin, the network misclassifying myotonin as flue-cured tobacco and part of the visualization as flue-cured tobacco. 2. The crops are generally adjacent to each other on the agricultural land, and the ground feature information in the agricultural land is complex and various, and the interference information is large, so that the boundary segmentation of the adjacent crops is inaccurate, as shown in fig. 1 (d). However, the existing semantic segmentation method is mostly used for solving the problem of natural image segmentation, and the research on the semantic segmentation method of the crop image is less, so that the development of the semantic segmentation method suitable for the crop remote sensing image is urgent.

Disclosure of Invention

The invention aims to solve the problems of high similarity among partial crops in the crop remote sensing image, large intra-class variability, unclear boundary division of adjacent crops and the like, and provides a crop remote sensing image semantic segmentation method based on a parallel attention mechanism.

In order to solve the problems, the invention is realized by the following technical scheme:

the crop remote sensing image semantic segmentation method based on the parallel attention mechanism comprises the following steps:

step 1, acquiring a crop remote sensing image dataset, and preprocessing a crop remote sensing image of the crop remote sensing image dataset to obtain a preprocessed crop remote sensing image dataset;

step 2, building a crop remote sensing image semantic segmentation network based on parallel attention;

the crop remote sensing image semantic segmentation network based on parallel attention consists of an input layer, an initial module, 4 residual modules, 6 upsampling modules, 3 adding modules, 3 CA attention modules and an output layer; the output end of the input layer is connected with the input end of the initial module, the output end of the initial module is connected with the input end of the first residual error module, the output end of the first residual error module is connected with the input end of the second residual error module, one output end of the second residual error module is connected with the input end of the third residual error module, and one output end of the third residual error module is connected with the input end of the fourth residual error module; the output end of the fourth residual error module is connected with the input end of the first up-sampling module, one output end of the first up-sampling module is connected with one input end of the first adding module, and the other output end of the first up-sampling module is connected with the first CA attention module; the output end of the third residual error module is connected with the other input end of the first adding module, the output end of the first adding module is connected with the input end of the second up-sampling module, one output end of the second up-sampling module is connected with one input end of the second adding module, and the other output end of the second up-sampling module is connected with the second CA attention module; the output end of the second residual error module is connected with the other input end of the second addition module, the output end of the second addition module is connected with the input end of the third up-sampling module, and the output end of the third up-sampling module is connected with the third CA attention module; the output end of the first CA attention module is connected with the input end of the fourth upsampling module; the output end of the second CA attention module is connected with the input end of the fifth upsampling module; the output end of the third CA attention module is connected with the input end of the sixth upsampling module; the output end of the fourth up-sampling module is connected with one input end of the third adding module; the output end of the fifth up-sampling module is connected with the other input end of the third adding module; the output end of the sixth upsampling module is connected with the other input end of the third adding module; the output end of the third adding module is connected with the input end of the output layer;

step 3, training the crop remote sensing image semantic segmentation network based on the parallel attention constructed in the step 2 by utilizing the preprocessed crop remote sensing image dataset obtained in the step 1 to obtain a trained crop remote sensing image semantic segmentation network based on the parallel attention;

and step 4, sending the crop remote sensing image to be semantically segmented into the trained crop remote sensing image semantic segmentation network based on the parallel attention obtained in the step 3 for semantic segmentation, so as to obtain an accurate segmentation result of the crop remote sensing image to be semantically segmented.

In a crop remote sensing image semantic segmentation network based on parallel attention, an initial module consists of a convolution layer and a pooling layer; the input end of the convolution layer forms the input end of the initial module, the output end of the initial module is connected with the input end of the pooling layer, and the output end of the pooling layer forms the output end of the initial module.

In a crop remote sensing image semantic segmentation network based on parallel attention, a first residual error module consists of 3 residual error layers; the second residual error module consists of 4 residual error layers; the third residual error module consists of 6 residual error layers; the fourth residual error module consists of 3 residual error layers; for each residual module: all residual layers are sequentially connected in series, the input end of the first residual layer forms the input end of the residual module, and the output end of the last residual layer forms the output end of the residual module.

In a crop remote sensing image semantic segmentation network based on parallel attention, a residual layer consists of 2 convolution layers, 2 batch normalization layers, a ReLU activation function layer and an addition layer; the input end of the first convolution layer forms the input end of the residual layer; the input end of the first convolution layer is connected with one input end of the additive layer, the output end of the first convolution layer is connected with the input end of the first normalization layer, the output end of the first normalization layer is connected with the input end of the ReLU activation function layer, the output end of the ReLU activation function layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the second normalization layer, and the output end of the second normalization layer is connected with the other input end of the additive layer; the output of the additive layer forms the output of the residual layer.

In a crop remote sensing image semantic segmentation network based on parallel attention, a CA attention module consists of 4 convolution layers, 2 average pooling layers, a splicing layer, a batch normalization layer, a ReLU activation function layer, 2 sigmoid activation function layers and a multiplication layer; the input of the first convolution layer forms the input of the CA attention module; one output end of the first convolution layer is connected with one input end of the multiplication layer, the other output end of the first convolution layer is connected with the input end of the first average pooling layer, and the other output end of the first convolution layer is connected with the input end of the second average pooling layer; the input end of the first convolution layer and the output end of the second convolution layer are connected with the two input ends of the splicing layer, the output end of the splicing layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the batch normalization layer, and the output end of the batch normalization layer is connected with the input end of the ReLU activation function layer; one input end of the ReLU activation function layer is connected with the input end of the first sigmoid activation function layer through a third convolution layer, and the other input end of the ReLU activation function layer is connected with the input end of the second sigmoid activation function layer through a fourth convolution layer; the output end of the first sigmoid activation function layer is connected with the other input end of the multiplication layer, and the output end of the second sigmoid activation function layer is connected with the other input end of the multiplication layer; the outputs of the multiplication layers form the outputs of the CA attention module.

Compared with the prior art, the method comprehensively solves the problems of inaccurate boundary segmentation caused by large intra-class difference, small inter-class difference, complex and various ground feature information and multiple interference information in the existing crop remote sensing image language, and improves the performance of a crop remote sensing image semantic segmentation network.

Drawings

FIG. 1 shows a visual example, (a) is a similar class score, (b) is a similar class score, (c) is a similar class score, and (d) is a boundary ambiguity.

Fig. 2 is a flow chart of a crop remote sensing image semantic segmentation method based on a parallel attention mechanism.

Fig. 3 is a schematic diagram of the overall structure of a crop remote sensing image semantic segmentation network based on parallel attention.

Fig. 4 is a schematic diagram of the structure of the initial module.

Fig. 5 is a schematic diagram of the structure of the residual modules, (a) is a first residual module, (b) is a second residual module, (c) is a third residual module, (d) is a fourth residual module, and (e) is a residual layer.

Fig. 6 is a schematic diagram of the structure of the CA attention module.

FIG. 7 is a graph showing the comparison of the segmentation results of different methods.

Detailed Description

The present invention will be further described in detail with reference to specific examples in order to make the objects, technical solutions and advantages of the present invention more apparent.

The utility model provides a crop remote sensing image semantic segmentation method based on parallel attention mechanisms, which is shown in fig. 2 and comprises the following steps:

step 1: the method comprises the steps of obtaining a crop remote sensing image dataset, and preprocessing a crop remote sensing image of the crop remote sensing image dataset to obtain a preprocessed crop remote sensing image dataset.

Preprocessing the remote sensing image of the crop comprises image cropping and data enhancement: firstly, cutting in a sliding window mode, and filtering out a part with the invalid area accounting for more than 7/8, wherein the size of a sub-graph after cutting is 512 multiplied by 512 pixels; and then carrying out arbitrary data enhancement operations such as horizontal overturning, vertical overturning, scaling, brightness adjustment, contrast adjustment and the like on the cut subgraph and the cut label.

The present embodiment is based on the example of the barley remote sensing dataset provided by the 2019-day county-area agricultural brain AI challenge game, and the ground object category of the dataset comprises 5 categories, namely corn, coix seed, flue-cured tobacco, artificial buildings and others.

Step 2: and (3) building a crop remote sensing image semantic segmentation network based on parallel attention, and setting network parameters.

The crop remote sensing image semantic segmentation network based on parallel attention, as shown in fig. 3, comprises a trunk structure, a characteristic pyramid structure, a parallel attention structure and an up-sampling structure.

1) Backbone structure

In the trunk structure, the output end of the input layer is connected with the input end of the initial module, the output end of the initial module is connected with the input end of the first residual error module, the output end of the first residual error module is connected with the input end of the second residual error module, one output end of the second residual error module is connected with the input end of the third residual error module, and one output end of the third residual error module is connected with the input end of the fourth residual error module.

Referring to fig. 4, the initial module is composed of 1 convolution layer with 7×7 cores and 2 steps and 1 pooling layer with 2×2 cores and 2 steps, and the initial module channel takes a value of 64 and is mainly used for dimension lifting.

Referring to fig. 5, the structures of the 4 residual modules are respectively: the first residual module is formed by connecting 3 residual layers in series (fig. 5 (a)), the second residual module is formed by connecting 4 residual layers in series (fig. 5 (b)), the third residual module is formed by connecting 6 residual layers in series (fig. 5 (c)), and the fourth residual module is formed by connecting 3 residual layers in series (fig. 5 (d)). For each residual module, the input of the first residual layer forms the input of the residual module and the output of the last residual layer forms the output of the residual module. The main structure of the residual layers of all residual modules is the same, and the only difference is that the size and the channel number of the input and output characteristic planes are different. The channel values C of the residual layers in the first residual modules, the second residual module, the third residual module and the fourth residual module are 64, 128, 256 and 512 respectively. As shown in fig. 5 (e), residual layers are used for feature extraction, each consisting of 2 kernel 3×3, step size 1 convolution layers, 2 batch normalization layers (batch norm layers), 1 ReLU activation function layer, and 1 additive layer. The input end of the first convolution layer forms the input end of the residual layer; the input end of the first convolution layer is connected with one input end of the additive layer, the output end of the first convolution layer is connected with the input end of the first normalization layer, the output end of the first normalization layer is connected with the input end of the ReLU activation function layer, the output end of the ReLU activation function layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the second normalization layer, and the output end of the second normalization layer is connected with the other input end of the additive layer; the output of the additive layer forms the output of the residual layer. Activating the function ReLU, and calculating the formula relu=max (0, x), wherein x is the output of the first batch normalization layer.

2) Feature pyramid structure

In the feature pyramid structure, the output end of the fourth residual error module is connected with the input end of the first up-sampling module, one output end of the first up-sampling module is connected with one input end of the first adding module, and the other output end of the first up-sampling module is connected with the first CA attention module; the output end of the third residual error module is connected with the other input end of the first adding module, the output end of the first adding module is connected with the input end of the second up-sampling module, one output end of the second up-sampling module is connected with one input end of the second adding module, and the other output end of the second up-sampling module is connected with the second CA attention module; the output end of the second residual error module is connected with the other input end of the second addition module, the output end of the second addition module is connected with the input end of the third up-sampling module, and the output end of the third up-sampling module is connected with the third CA attention module.

The fourth residual error module is added with the third residual error module after being up-sampled by the upper 2 times bilinear interpolation method, the third residual error module is added with the second residual error module after being up-sampled by the upper 2 times bilinear interpolation method, and the second residual error module is up-sampled by the upper 2 times bilinear interpolation method. The effect of the addition is to fuse the features of the different layers.

3) Parallel attention structure

In the parallel attention structure, the output end of the first CA attention module is connected with the input end of the fourth upsampling module; the output end of the second CA attention module is connected with the input end of the fifth upsampling module; the output end of the third CA attention module is connected with the input end of the sixth upsampling module.

The three-layer feature map output by the feature pyramid structure is input into a CA attention module, and the CA attention module encodes space information in the vertical and horizontal directions and weights the space information on the channels to acquire the cross-channel direction and position information, so that the network can more accurately locate and identify the target area.

Referring to fig. 6, the ca attention module consists of 4 convolution layers, 2 average pooling layers, a stitching layer, a batch normalization layer, a ReLU activation function layer, 2 sigmoid activation function layers, and a multiplication layer. The input of the first convolution layer forms the input of the CA attention module; one output end of the first convolution layer is connected with one input end of the multiplication layer, the other output end of the first convolution layer is connected with the input end of the first average pooling layer, and the other output end of the first convolution layer is connected with the input end of the second average pooling layer; the input end of the first convolution layer and the output end of the second convolution layer are connected with the two input ends of the splicing layer, the output end of the splicing layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the batch normalization layer, and the output end of the batch normalization layer is connected with the input end of the ReLU activation function layer; one input end of the ReLU activation function layer is connected with the input end of the first sigmoid activation function layer through a third convolution layer, and the other input end of the ReLU activation function layer is connected with the input end of the second sigmoid activation function layer through a fourth convolution layer; the output end of the first sigmoid activation function layer is connected with the other input end of the multiplication layer, and the output end of the second sigmoid activation function layer is connected with the other input end of the multiplication layer; the outputs of the multiplication layers form the outputs of the CA attention module.

Each CA attention module contains 9 layers of operations: the first layer is a 1 x 1 convolutional layer, reducing the channel dimension to 5. The method can be regarded as a classifier, the global features are mapped into 5 channels, the channels correspond to classification categories one by one, and each channel can represent the features of one category; the second layer is an average pooling layer, and H and W directions of each feature map are compressed into 1 to obtain two feature maps C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W; the third layer is a splicing layer, and then splicing is carried out in the C direction to obtain a characteristic diagram of Cx1× (H×W); the fourth layer is a 1 multiplied by 1 convolution layer, and is used for reducing the dimension; the fifth layer is a batch normalization layer, and acts to accelerate convergence and prevent overfitting; the sixth layer is an activation function ReLU, the calculation formula is ReLU=max (0, x), wherein x is the output characteristic of the fifth layer; the seventh layer is a convolution layer, and after the output feature map of the sixth layer is divided (split), two 1×1 convolution ascending dimensions are respectively passed; the eighth layer is a sigmoid activation function, and the calculation formula isWherein y is a characteristic of the output of the seventh layer; the ninth layer multiplies the feature maps on the three branches.

4) Upsampling structure

In the up-sampling structure, the output end of the fourth up-sampling module is connected with one input end of the third adding module; the output end of the fifth up-sampling module is connected with the other input end of the third adding module; the output end of the sixth upsampling module is connected with the other input end of the third adding module; the output end of the third adding module is connected with the input end of the output layer.

And the result obtained by adding the fourth upsampling module, the fifth upsampling module and the sixth upsampling module is subjected to 2 times bilinear interpolation, and finally an output result diagram of 512 multiplied by 512 pixels is obtained.

After the network is built, initial parameters of the network need to be set. In this embodiment, the set network parameters include: batch size of all samples was 10 and iteration number was 100; the weight attenuation of the network is 0.0005, and the initial learning rate is 0.0005; meanwhile, a poly attenuation strategy is introduced to adjust the learning rate, and the calculation formula is as follows:

where lr is the current round learning rate, base_lr is the initial learning rate, epoch is the current iteration number, num_epoch is the maximum iteration number, and power is 0.9.

Step 3: training the crop remote sensing image semantic segmentation network based on parallel attention constructed in the step 2 by utilizing the preprocessed crop remote sensing image dataset obtained in the step 1, carrying out counter propagation by using a class balance loss function to update network parameters, and obtaining the trained crop remote sensing image semantic segmentation network based on parallel attention by optimizing the loss function through random gradient descent.

The class balance loss function formula is:

wherein n is _y The number of tags of category y, herein n _y 5.p is the prediction class probability. Beta=0 corresponds to no re-weighting and beta→1 corresponds to weighting with the inverse frequency.

Step 4: and (3) sending the crop remote sensing image to be semantically segmented into the trained crop remote sensing image semantic segmentation network based on parallel attention, which is obtained in the step (3), so as to carry out semantic segmentation and obtain an accurate segmentation result of the crop remote sensing image to be semantically segmented.

The effect of the present invention will be described below using the cross-over ratio (IoU) and the average cross-over ratio (MIoU) as evaluation indexes for crop image segmentation:

the intersection ratio IoU represents the coincidence of the segmentation result with its true value:

average cross-over MIoU represents a standard measure of semantic segmentation:

wherein the test dataset has k classes, p _ii Represents the number, p, of marked i-th classes in i-th class data _ij Represents the number, p, of the j-th class marked in the i-th class data _ji Indicating the number of classes marked as class i in the j-th data.

Table 1 is a comparative table of IoU and MIoU for the different methods. As can be seen from Table 1, the process of the present invention has an improvement over the classical LinkNet, PSPNet, deep Lab V3+ and FPN networks on MIoU.

Table 1 comparative table IoU and MIoU for different methods

Note that: bold font to optimal value per column

It can be seen from table 1 that the network proposed herein is 65.94% on MIoU, 4.47%, 1.91%, 1.62% and 0.68% higher compared to LinkNet, PSPNet, deep v3+, and FPN, respectively. IoU on corn is all optimal.

Fig. 7 is a graph of segmentation results for different methods. As can be seen from fig. 7, the segmentation effect of the method herein is similar to that of the real tag, compared with LinkNet, PSPNet, deeplab v3+ and FPN, and can distinguish different crops with similar shapes and identify the same crop with larger difference, and segment a complete and clear boundary.

It should be noted that, although the examples described above are illustrative, this is not a limitation of the present invention, and thus the present invention is not limited to the above-described specific embodiments. Other embodiments, which are apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein, are considered to be within the scope of the invention as claimed.

Claims

1. The crop remote sensing image semantic segmentation method based on the parallel attention mechanism is characterized by comprising the following steps of:

the crop remote sensing image semantic segmentation network based on parallel attention consists of an input layer, an initial module, 4 residual modules, 6 upsampling modules, 3 adding modules, 3 CA attention modules and an output layer;

the output end of the input layer is connected with the input end of the initial module, the output end of the initial module is connected with the input end of the first residual error module, the output end of the first residual error module is connected with the input end of the second residual error module, one output end of the second residual error module is connected with the input end of the third residual error module, and one output end of the third residual error module is connected with the input end of the fourth residual error module;

the output end of the fourth residual error module is connected with the input end of the first up-sampling module, one output end of the first up-sampling module is connected with one input end of the first adding module, and the other output end of the first up-sampling module is connected with the first CA attention module; the output end of the third residual error module is connected with the other input end of the first adding module, the output end of the first adding module is connected with the input end of the second up-sampling module, one output end of the second up-sampling module is connected with one input end of the second adding module, and the other output end of the second up-sampling module is connected with the second CA attention module; the output end of the second residual error module is connected with the other input end of the second addition module, the output end of the second addition module is connected with the input end of the third up-sampling module, and the output end of the third up-sampling module is connected with the third CA attention module;

the output end of the first CA attention module is connected with the input end of the fourth upsampling module; the output end of the second CA attention module is connected with the input end of the fifth upsampling module; the output end of the third CA attention module is connected with the input end of the sixth upsampling module;

the output end of the fourth up-sampling module is connected with one input end of the third adding module; the output end of the fifth up-sampling module is connected with the other input end of the third adding module; the output end of the sixth upsampling module is connected with the other input end of the third adding module; the output end of the third adding module is connected with the input end of the output layer;

2. The crop remote sensing image semantic segmentation method based on the parallel attention mechanism as set forth in claim 1, wherein the initial module is composed of a convolution layer and a pooling layer; the input end of the convolution layer forms the input end of the initial module, the output end of the initial module is connected with the input end of the pooling layer, and the output end of the pooling layer forms the output end of the initial module.

3. The crop remote sensing image semantic segmentation method based on the parallel attention mechanism as set forth in claim 1, wherein the first residual module consists of 3 residual layers; the second residual error module consists of 4 residual error layers; the third residual error module consists of 6 residual error layers; the fourth residual error module consists of 3 residual error layers; for each residual module: all residual layers are sequentially connected in series, the input end of the first residual layer forms the input end of the residual module, and the output end of the last residual layer forms the output end of the residual module.

4. The crop remote sensing image semantic segmentation method based on the parallel attention mechanism according to claim 3, wherein a residual layer consists of 2 convolution layers, 2 batch normalization layers, a ReLU activation function layer and an addition layer; the input end of the first convolution layer forms the input end of the residual layer; the input end of the first convolution layer is connected with one input end of the additive layer, the output end of the first convolution layer is connected with the input end of the first normalization layer, the output end of the first normalization layer is connected with the input end of the ReLU activation function layer, the output end of the ReLU activation function layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the second normalization layer, and the output end of the second normalization layer is connected with the other input end of the additive layer; the output of the additive layer forms the output of the residual layer.

5. The crop remote sensing image semantic segmentation method based on the parallel attention mechanism according to claim 1, wherein the CA attention module consists of 4 convolution layers, 2 average pooling layers, a splicing layer, a batch normalization layer, a ReLU activation function layer, 2 sigmoid activation function layers and a multiplication layer;

the input of the first convolution layer forms the input of the CA attention module; one output end of the first convolution layer is connected with one input end of the multiplication layer, the other output end of the first convolution layer is connected with the input end of the first average pooling layer, and the other output end of the first convolution layer is connected with the input end of the second average pooling layer; the input end of the first convolution layer and the output end of the second convolution layer are connected with the two input ends of the splicing layer, the output end of the splicing layer is connected with the input end of the second convolution layer, the output end of the second convolution layer is connected with the input end of the batch normalization layer, and the output end of the batch normalization layer is connected with the input end of the ReLU activation function layer; one input end of the ReLU activation function layer is connected with the input end of the first sigmoid activation function layer through a third convolution layer, and the other input end of the ReLU activation function layer is connected with the input end of the second sigmoid activation function layer through a fourth convolution layer; the output end of the first sigmoid activation function layer is connected with the other input end of the multiplication layer, and the output end of the second sigmoid activation function layer is connected with the other input end of the multiplication layer; the outputs of the multiplication layers form the outputs of the CA attention module.