CN116091770A - Grape leaf lesion image segmentation method based on cross-resolution transducer model - Google Patents

Grape leaf lesion image segmentation method based on cross-resolution transducer model Download PDF

Info

Publication number
CN116091770A
CN116091770A CN202310045185.7A CN202310045185A CN116091770A CN 116091770 A CN116091770 A CN 116091770A CN 202310045185 A CN202310045185 A CN 202310045185A CN 116091770 A CN116091770 A CN 116091770A
Authority
CN
China
Prior art keywords
resolution
decoder
information
transducer
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310045185.7A
Other languages
Chinese (zh)
Inventor
穆维松
张馨心
张慧
徐子睿
邹彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202310045185.7A priority Critical patent/CN116091770A/en
Publication of CN116091770A publication Critical patent/CN116091770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of agricultural information, and particularly relates to a grape leaf spot segmentation method based on a trans-resolution transducer model, which is used for solving the problem of grape leaf spot segmentation of a complex background in a natural field environment. The trans-resolution transducer model allows for retention of high resolution features and powerful semantic information, as compared to the transducer-based model, which previously considered acquisition of high resolution features or semantic information alone. The trans-resolution transducer model is of an encoder-decoder structure, wherein the encoder consists of 4 stages, i multi-resolution transducer blocks connected in parallel are distributed in the ith stage, and the multi-resolution transducer blocks are distributed in a pyramid shape; the decoder is a hamburger decoder. Each Transformer block includes a large core mining attention mechanism, a multi-path feed forward network, and a cross-resolution fusion strategy. And finally, fusing the characteristic information with different scale resolutions through a hamburger decoder.

Description

Grape leaf lesion image segmentation method based on cross-resolution transducer model
Technical Field
The invention belongs to the technical field of agricultural information, and particularly relates to a grape leaf lesion image segmentation method based on a trans-resolution transducer model.
Background
Plant leaf lesions have become a major obstacle to the development of grape agriculture, directly leading to reduced yield and quality. Monitoring disease information and making appropriate measures at an early stage of disease can effectively control agricultural economic loss. Automatic segmentation is an important basis for plant disease detection and identification, so that automatic segmentation of grape leaf lesions is helpful for preventing disease spread. However, the background of the field grape leaf disease is complex, the edge textures of the small disease areas are rich, disease symptoms are similar, and the accuracy of disease segmentation is seriously affected. To address this challenge, common segmentation methods are generally as follows:
(1) Convolutional neural network
Convolutional neural networks are widely used in the agricultural field, such as the deep series, the Unet, the PSPnet, and the like. These network architectures are mainly aimed at improving the segmentation performance of the model by increasing the network depth or introducing residual learning. While the above networks have met with great success in extracting plaque characteristics, they suffer from a number of problems that limit their performance: (1) The fixed convolution kernel constrains the size of the receptive field, so that the natural image segmentation effect on the complex background is poor; (2) The local connection makes the key of long-distance semantic information interaction ignored; (3) small target disease areas are difficult to precisely segment. In order to solve the above-mentioned problems, it is necessary to extract local features while understanding the global information in depth, and explore more scene-level semantic information in the entire natural scene.
(2)Transformer
Transformer has further advanced the vision field and has shown superior performance in segmentation tasks over convolutional neural network based models. Due to the self-attention mechanism with robust characterization capability, the transducer can explicitly model global context information when inputting high resolution natural images with complex background. Some researchers have improved the performance of plant lesion segmentation by improving the transducer. In addition, many work extracts lesion low resolution features by introducing convolution operators, downsampling feature maps, employing pyramid hierarchies, and redesigning markers. However, they still follow a tandem topology, gradually letting each stage output resolution features of the same scale, but neglecting the information interaction between different resolutions of the same stage, which results in the inability to generate high quality segmented images. In identifying grape leaf lesions with complex backgrounds and small targets, segmentation performance is reduced and edge detail information is lost. Empirically, high resolution feature maps can obtain finer granularity information, especially for grape leaf disease edges. Low resolution feature maps typically contain stronger semantic characterization information, especially for small target disease areas that are difficult to segment. For this reason, maintaining both high resolution feature maps and deep semantic information is critical to efficiently process segmentation tasks.
(3) Attention mechanism
The goal of the attention mechanism is to focus the attention of the network on the most important small part of the data by increasing the weight of some parts of the input data and decreasing the weight of others. The attention mechanism is roughly divided into two branches: spatial attention, channel attention. Different types of attention bear different functions. For example, the purpose of spatial attention is to enhance the spatial characterization of critical areas. The purpose of channel attention is to model the correlation between different channels. However, the current popular transducer architecture ignores the importance of channel dimension adaptability. The challenges presented by segmentation of grape leaf lesion images can be addressed from the perspective of inheriting the advantages of both channel attention and spatial attention.
Disclosure of Invention
The invention aims to tailor a model for a grape leaf spot image segmentation task, namely a trans-resolution transducer model, and is used for solving the problem of grape leaf spot image segmentation of a complex background in a natural field environment. In contrast to the previous Transformer-based models, which were considered alone to obtain high resolution features or semantic information, the trans-resolution Transformer model allows for retention of high resolution features and powerful semantic information.
In particular, the cross-resolution transducer model is of an 'encoder-decoder' structure, the encoder consists of 4 stages, and i multi-resolution transducer blocks connected in parallel are distributed in the ith stage and are distributed in a pyramid shape; the decoder is a hamburger decoder.
According to the design concept of parallel transformers, the input is first downsampled with the CONV-BN-ReLU block to effectively extract the low resolution feature map.
In particular, the network architecture of the multi-resolution Transformer block includes a large core mining attention mechanism, a multi-path feed forward network, and a cross-resolution fusion strategy.
Specifically, the size of the convolution kernel is set to 11×11 to enlarge the receptive field, input data is embedded and remodeled through overlapping features and then is transmitted into the large-kernel mining attention mechanism, and output features of the attention mechanism are calculated through a Hadamard operator.
Specifically, given input data
Figure BDA0004055070260000031
H×W and C are the number of each input mark and the channel feature dimension respectively, and after overlapping feature embedding and remodelling, the input marks are transmitted into a large-core mining attention mechanism. The similarity-based score matrix a is calculated by large kernel depth convolution with a kernel size of k x k. In the present invention, the convolution kernel size is set to k=11 to effectively enlarge the receptive field. Based on the self-attention mechanism by matrix +.>
Figure BDA0004055070260000041
Multiplying the embedded input to obtain the value V. In stage i, specific details are formulated as follows:
A=DWConv k×k (W i X)
Figure BDA0004055070260000042
wherein the method comprises the steps of
Figure BDA0004055070260000043
And->
Figure BDA0004055070260000044
Is a weight matrix derived from linear projection, matrix a represents the similarity or correlation between each pair of input labels, DWConv (·) represents the depth separable convolution operation. The values of V and A calculated by the Hadamard operator are taken as the attention output, and the specific formula is as follows:
Attention(X)=A⊙V
compared to the attention mechanism, where the computation amount grows twice, the large-kernel convolution is a full convolution, so its complexity and parameter amount remain linearly increasing. The large-core mining attention mechanism can realize effective information interaction between channels. When a high resolution image is used as input, such as a grape leaf disease image, the weight matrix of the encoded space and channel information generated by the large kernel mining attention mechanism can be adapted to the input.
In particular, the multi-path feedforward network considers the importance of multi-level semantic mining, and is implemented by directly applying k×1 and 1×k (k=3, 5) double-branch convolution pairs and an expansion ratio r (r=4), where the multi-path feedforward network can be expressed as:
x 3 =Conv 3×1 (Conv 1×3 (Linear(x in )))
x 5 =Conv 5×1 (Conv 1×5 (Linear(x in )))
x out =Cat(x 3 ,x 5 )+x in
wherein LN normalizes and excitesThe active layer (GELU) is omitted, x in Representing features output from the large kernel mining attention mechanism, x out Representing the characteristics of the output from the multipath feedforward network. The design of the multipath feedforward network can further capture receptive fields of different scales, and is beneficial to improving the capacity of multi-scale information aggregation.
Experiments show that the exchange of information between resolutions of different scales is beneficial to the generation of high-quality high-resolution images, so that the invention adopts a cross-resolution fusion strategy to transfer semantic information between two continuous stages, and realizes information interaction between adjacent stages.
Specifically, according to the cross-resolution fusion strategy, a static two-dimensional matrix is constructed through binary circulation, semantic information is fused through up-sampling or down-sampling, the semantic information of a low-resolution feature map branch is up-sampled to a high-resolution feature map branch so as to extract semantic features of a larger receptive field, and the high-resolution feature map is down-sampled to the low-resolution feature map so as to keep more image details, so that accurate segmentation of small target diseases with complex backgrounds is realized.
More specifically, let the feature resolution feature of the input branch be j and the feature resolution feature of the output be n. To obtain high-level features of larger receptive fields, low-resolution features are upsampled and incorporated into the high-resolution features. I.e., j > n, the same number of channels for j and n layers is maintained using a 1 x 1 convolution while the spatial dimension is upsampled by the adjacent interpolation. In order to keep the low resolution features more image detail, the low resolution features are combined with the downsampled high resolution features, i.e. j < n, the high resolution spatial dimension is reduced and the number of channels output is matched using a depth separable convolution with a step size of 2 (j-n) +1. When j=n, the jump connection direct output feature is employed. The cross-resolution semantic fusion strategy inherits the advantages of high-resolution characterization and semantic information with higher low resolution, and is beneficial to realizing accurate segmentation on small diseases with complex backgrounds.
In particular, the hamburger decoder uses a matrix decomposition method to model global space information, aggregates the context information of the last three layers to fuse the feature information of different scale resolutions, and only aggregates the context information of the last three stages to aggregate the information from the low resolution feature and the high resolution feature because the first stage has more low-level features, if the aggregation of the first stage results in higher calculation cost.
The invention has the advantage that the trans-resolution Transformr model is considered to retain high resolution features and powerful semantic information at the same time as the trans-former model based on the trans-former model, which was previously considered to acquire high resolution features or semantic information alone. The novel trans-resolution architecture is provided, trans-resolution information transmission is carried out in a parallel mode, and the advantages of trans-resolution are utilized to improve characterization learning and extract robust semantic information; introducing a large kernel mining attention mechanism, wherein large kernel convolution is used for remodelling a pixel weight matrix, adaptively channeling and spatially informationized without increasing the calculation cost, and mining context information from the whole scene; a multipath feed forward network and a hamburger decoder are designed to further expand the multiscale receptive field and enhance the capability of multiscale information aggregation. The invention can effectively solve the problem of grape leaf lesion segmentation in a complex background in natural fields.
Drawings
FIG. 1 is a diagram of the overall architecture of a cross-resolution transducer model;
FIG. 2 is a diagram of a transducer block framework;
FIG. 3 is a schematic diagram of a hamburger decoder;
FIG. 4 is a schematic diagram of a cross-resolution fusion strategy.
Detailed Description
The whole frame diagram of the grape leaf lesion image segmentation method based on a trans-resolution transducer model is shown in figure 1, and the model is an encoder-decoder model; FIG. 2 is a diagram of a transducer block framework of the present invention; as shown in fig. 3, a schematic diagram of the decoder of the present invention, i.e., a hamburger decoder; FIG. 4 is a schematic diagram of a cross-resolution fusion strategy employed by the model. In the training stage, the experiment and other experiments of the invention are deployed in a pyrach and mmsegment library to carry out semantic segmentation real-timeAnd (5) checking. All models were trained on NVIDIA Tesla V100 GPU. The present invention follows the same training strategy of the previous work, considering the fairness of the comparison. Specifically, the resolution size of the training image is cut out to 1024×1024. The model of the invention was optimized during the training phase using AdamW with a weight decay of 0.01. LRT was trained using the "poly" LR strategy (lr=baselr× (1-epoch/maxiter) power ) Wherein the "poly" LR strategy factor is set to 1, the initial learning rate is 6×10 -5 A total of 16 tens of thousands of iterations.
The present invention is trained and evaluated on three data sets, including a Field-PV (Field-PV) data set, a Plant Village (Plant Village) data set, and a synthetic-PV (Syn-PV) data set. The field-PV data set is acquired by an OLYMPUS OM-D camera used by forestry and fruit tree research institute of Beijing and forestry academy of sciences, china, and 400 original images containing natural scenes of grape grad mold disease are shot in total. The plant village data set is a public and fair data set specially used for identifying crop diseases and insect pests, and consists of 54303 high-resolution images, including different disease types and healthy leaf images of 38 plants, which are obtained in a controlled laboratory, wherein 1383 grape black measles disease images and 1180 Zhang Putao black rot disease images are utilized and are manually marked; the synthetic-PV dataset is a natural field image synthesized from plant village segmentation images obtained from a controlled laboratory by background replacement, and a background replacement method is used to synthesize a grape disease image with a complex background.
All the data sets are manually marked with disease areas and leaf areas by using a labelme tool, and the marked data are saved in json format and converted into PASCAL VOC 2012 data format with foreground and background object semantic labels. The invention uses Augmentor modules to perform geometric transformations such as random left/right flipping, random clipping, random sampling, color and brightness enhancement or reduction, etc. In the training process, the method for enhancing the data provided by the semantic segmentation library is applied.
To evaluate the effectiveness of a Cross-resolution transform (i.e., cross-Resolution Transformer, CRFormer) model, the present invention compares the model to other image segmentation methods. Four indicators precision, ioU, recall and Dice were used to measure the performance of the model, with darkening representing maximum and underlined representing suboptimal results. Meanwhile, parameters (parameters) and kilomega floating point operation Seconds (FLPs) of each model were also analyzed, and the results are shown in tables 1 to 5.
Table 1 quantitative comparison of the CRFormer with other segmentation methods for Black measurements and Black rot segmentation on the plant village dataset
Figure BDA0004055070260000081
TABLE 2 quantitative comparison of background and grape leaf segmentations on plant village datasets by CRFormer and other segmentation methods
Figure BDA0004055070260000091
TABLE 3 quantitative comparison of the results of the CRFormer and other methods for the black measurements and black rot segmentation on the Syn-PV dataset
Figure BDA0004055070260000092
Table 4 quantitative comparison of CRFormer with other segmentation methods for background and grape leaf segmentation on Syn-PV dataset
Figure BDA0004055070260000093
Table 5 quantitative comparison of CRFormer with other segmentation methods for background and gray mold segmentation in Field-PV datasets
Figure BDA0004055070260000101
Experimental results show that the CRFormer has better segmentation performance on grape leaf lesions than the most advanced transform method and the deep learning-based method at present. The invention has the advantage that the image segmentation performance and the training and running cost are comprehensively considered, and the invention has optimal performance in complex grape leaf lesion segmentation tasks.

Claims (6)

1. A grape leaf spot image segmentation method based on a trans-resolution transducer model is characterized in that the trans-resolution transducer model is of an encoder-decoder structure, an encoder consists of 4 stages, i multi-resolution transducer blocks connected in parallel are distributed in the ith stage, and the multi-resolution transducer blocks are distributed in a pyramid shape; the decoder is a hamburger decoder.
2. The multi-resolution fransformer block of claim 1, wherein the network architecture comprises a large core mining attention mechanism, a multi-path feed forward network, and a cross-resolution fusion strategy.
3. The large kernel mining attention mechanism of claim 2, wherein the convolution kernel is sized to 11 x 11 to expand the receptive field, the input data is embedded and remodeled with overlapping features and then transferred into the large kernel mining attention mechanism, and the output features of the attention mechanism are calculated by hadamard operators.
4. The multi-path feedforward network according to claim 2, wherein the multi-path feedforward network considers the importance of multi-level semantic mining, and is implemented by using k x 1 and 1 x k double-branch convolution pairs and an expansion ratio r, wherein the value of k is set to 3 and the value of 5,r is set to 4, and the specific formula is:
x 3 =Conv 3×1 (Conv 1×3 (Linear(x in )))
x 5 =Conv 5×1 (Conv 1×5 (Linear(x in )))
x out =Cat(x 3 ,x 5 )+x in
wherein L isN normalization and activation layer (GELU) is omitted, x in Representing features output from the large kernel mining attention mechanism, x out Representing the characteristics of the output from the multipath feedforward network.
5. The cross-resolution fusion strategy of claim 2 wherein a static two-dimensional matrix is constructed by binary cycles and semantic information is fused by upsampling or downsampling, semantic information of low resolution feature map branches is upsampled to high resolution feature map branches to extract semantic features of larger receptive fields, high resolution feature maps are downsampled to low resolution feature maps to maintain more image detail, thereby achieving accurate segmentation of small target diseases with complex backgrounds.
6. The hamburger decoder of claim 1 wherein the decoder models global spatial information using a matrix decomposition method to aggregate the last three layers of context information to fuse feature information of different scale resolutions.
CN202310045185.7A 2023-01-30 2023-01-30 Grape leaf lesion image segmentation method based on cross-resolution transducer model Pending CN116091770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310045185.7A CN116091770A (en) 2023-01-30 2023-01-30 Grape leaf lesion image segmentation method based on cross-resolution transducer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310045185.7A CN116091770A (en) 2023-01-30 2023-01-30 Grape leaf lesion image segmentation method based on cross-resolution transducer model

Publications (1)

Publication Number Publication Date
CN116091770A true CN116091770A (en) 2023-05-09

Family

ID=86198826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310045185.7A Pending CN116091770A (en) 2023-01-30 2023-01-30 Grape leaf lesion image segmentation method based on cross-resolution transducer model

Country Status (1)

Country Link
CN (1) CN116091770A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993756A (en) * 2023-07-05 2023-11-03 石河子大学 Method for dividing verticillium wilt disease spots of field cotton

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993756A (en) * 2023-07-05 2023-11-03 石河子大学 Method for dividing verticillium wilt disease spots of field cotton

Similar Documents

Publication Publication Date Title
CN111259898B (en) Crop segmentation method based on unmanned aerial vehicle aerial image
CN108734659B (en) Sub-pixel convolution image super-resolution reconstruction method based on multi-scale label
CN109241972B (en) Image semantic segmentation method based on deep learning
Liebel et al. Single-image super resolution for multispectral remote sensing data using convolutional neural networks
Cheng et al. SESR: Single image super resolution with recursive squeeze and excitation networks
CN105046276B (en) Hyperspectral image band selection method based on low-rank representation
Yue et al. Deep recursive super resolution network with Laplacian Pyramid for better agricultural pest surveillance and detection
CN113902915A (en) Semantic segmentation method and system based on low-illumination complex road scene
CN110648331B (en) Detection method for medical image segmentation, medical image segmentation method and device
CN116091770A (en) Grape leaf lesion image segmentation method based on cross-resolution transducer model
JP7344987B2 (en) Convolutional neural network construction method and system based on farmland images
CN112183448B (en) Method for dividing pod-removed soybean image based on three-level classification and multi-scale FCN
CN116109947A (en) Unmanned aerial vehicle image target detection method based on large-kernel equivalent convolution attention mechanism
Jiang et al. Forest-CD: Forest change detection network based on VHR images
CN117058367A (en) Semantic segmentation method and device for high-resolution remote sensing image building
CN115546466A (en) Weak supervision image target positioning method based on multi-scale significant feature fusion
CN116311186A (en) Plant leaf lesion identification method based on improved transducer model
Liu et al. CASR-Net: A color-aware super-resolution network for panchromatic image
CN114549538A (en) Brain tumor medical image segmentation method based on spatial information and characteristic channel
CN111832508B (en) DIE _ GA-based low-illumination target detection method
CN117876386A (en) Heavy parameterized lightweight grape leaf lesion image segmentation method
CN116071239B (en) CT image super-resolution method and device based on mixed attention model
Huang et al. Dense labeling of large remote sensing imagery with convolutional neural networks: a simple and faster alternative to stitching output label maps
CN116206210A (en) NAS-Swin-based remote sensing image agricultural greenhouse extraction method
Lu et al. Dense U-net for super-resolution with shuffle pooling layer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination