CN116977863A

CN116977863A - Tomato plant disease and pest detection method and detection system

Info

Publication number: CN116977863A
Application number: CN202311079899.6A
Authority: CN
Inventors: 陈宁夏; 李滕科; 罗智杰; 杨灵; 吴霆; 陈锐涛; 林晓燕
Original assignee: Zhongkai University of Agriculture and Engineering
Current assignee: Zhongkai University of Agriculture and Engineering
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-10-31

Abstract

A tomato plant disease and pest detection method and detection system comprise the following steps: reducing the number of channels in the YOLOv7 model, and replacing a Conv module in the YOLOv7 backbone network with an XSepConv module to obtain a YOLOv7-XSepConv model as a target detection model; the picture to be detected is preprocessed and then is input into a trained target detection model, so that a detection result is obtained; the target detection model is used in the WeChat applet, and the rapid identification and classification of tomato plant diseases and insect pests can be realized through the front end and the rear end services of the WeChat applet. The detection method can realize rapid and real-time target detection under limited computing resources, and can be suitable for deployment application in small programs; the use method of the detection system is simple, and a user can check the detection result through the front-end interface at any time and any place only by uploading the plant photo.

Description

Tomato plant disease and pest detection method and detection system

Technical Field

The invention relates to the technical field of target identification, in particular to a tomato plant disease and pest detection method and a detection system.

Background

Agricultural planting often faces serious pest and disease damage problems, such as powdery mildew, early blight, late blight, gray mold, leaf mold, root knot nematode disease, fruit cracking and the like, seriously influences the normal growth and harvest of crops, and brings great challenges to the planting and management of farmers and gardening lovers. In order to solve the problem, some plant disease and pest detection products are already on the market, some of which are based on the traditional image processing technology, and others of which adopt the deep learning technology for image recognition and classification. The traditional image processing method is generally based on a manually designed feature extractor and classifier, has low accuracy and reliability, requires a large amount of manual intervention and adjustment, and is difficult to adapt to complex and changeable actual conditions. The method based on the deep learning technology can automatically extract the characteristics and the classification, has higher accuracy and robustness, but requires a large amount of labeling data and computing resources, has excessive model parameters, and is difficult to realize rapid and real-time detection on the mobile equipment.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a lightweight and high-performance tomato plant disease and pest detection method, and a tomato plant disease and pest detection system adopting and realizing the detection method, so that the rapid detection and accurate identification of the tomato plant disease and pest are realized, and the rapid and efficient technical support is provided for agricultural production.

The invention is realized by the following technical scheme:

a tomato plant disease and pest detection method comprises the following steps:

s1, reducing the number of channels in a YOLOv7 model on the basis of YOLOv7, and replacing a Conv module in a YOLOv7 backbone network with an XSepConv module to obtain the YOLOv7-XSepConv model as a target detection model;

s2, collecting tomato plant pictures with different kinds of plant diseases and insect pests, forming a training data set, setting parameters, training a target detection model until the model converges and storing model parameters;

s3, inputting the picture to be detected into a trained target detection model after preprocessing to obtain a detection result, and identifying and classifying the plant diseases and insect pests.

Further, the XSepConv module includes, connected in order: a 1 x 1 spread convolution for spreading the number of input channels to a higher dimension; batch normalization and activation of functions; a 2 x 2DW convolution for performing convolution operations between channels within a group; batch normalization and activation of functions; a 1 xK DW convolution performs a convolution operation in the channel dimension and has a stride in the spatial dimension; batch normalization and activation of functions; kx1dw convolutions perform convolution operations in the channel dimension and have different convolution kernel sizes in the spatial dimension; batch normalization; the Squeeze-and-Excitation (SE) module is used for feature recalibration in the channel dimension; a 1 x 1 output convolution for reducing the number of channels of the feature map to a desired output size; and (5) batch normalization.

Further, k=3 in the XSepConv module, and the backbone network of the YOLOv7-XSepConv model includes 41 lightweight convolution blocks XSepConv, wherein each convolution block has the following structure:

lightweight convolution block XSepConv1: the number of input channels of the 1×1 spread convolution is 3, and the number of output channels is 16; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 16, and the number of output channels is 16; the number of the batch normalized channels is 16, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

lightweight convolution block XSepConv2: the number of input channels of the 1×1 spread convolution is 16, and the number of output channels is 32; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 32, and the number of output channels is 32; the number of the batch normalized channels is 32, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

lightweight convolution 3 rd XSepConv3: the number of input channels of the 1×1 spread convolution is 32, and the number of output channels is 32; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 32, and the number of output channels is 32; the number of the batch normalized channels is 32, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

lightweight convolution block XSepConv4: the number of input channels of the 1×1 spread convolution is 32, and the number of output channels is 64; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 64, and the number of output channels is 64; the number of the batch normalized channels is 64, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

The 5 th lightweight convolution block XSepConv5 and the 6 th lightweight convolution block XSepConv6 are: the number of input channels of the 1×1 spread convolution is 64, and the number of output channels is 32; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 32, and the number of output channels is 32; the number of the batch normalized channels is 32, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 7 th lightweight convolution block XSepConv7, the 8 th lightweight convolution block XSepConv8, the 9 th lightweight convolution block XSepConv9, and the 10 th lightweight convolution block XSepConv10 are: the number of input channels of the 1×1 spread convolution is 32, and the number of output channels is 32; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 32, and the number of output channels is 32; the number of the batch normalized channels is 32, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

11 th lightweight convolution block XSepConv11: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 12 th lightweight convolution block XSepConv12 and the 13 th lightweight convolution block XSepConv13 are: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 64; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 64, and the number of output channels is 64; the number of the batch normalized channels is 64, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

Light weight convolution block XSepConv14: the number of input channels of the 1×1 spread convolution is 64, and the number of output channels is 64; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 64, and the number of output channels is 64; the number of the batch normalized channels is 64, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 15 th light convolution block XSepConv15 and the 16 th light convolution block XSepConv16 are: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 64; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 64, and the number of output channels is 64; the number of the batch normalized channels is 64, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 17 th lightweight convolution block XSepConv17, the 18 th lightweight convolution block XSepConv18, the 19 th lightweight convolution block XSepConv19, and the 20 th lightweight convolution block XSepConv20 are: the number of input channels of the 1×1 spread convolution is 64, and the number of output channels is 64; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 64, and the number of output channels is 64; the number of the batch normalized channels is 64, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

21 st lightweight convolution block XSepConv21: the number of input channels of the 1×1 spread convolution is 256, and the number of output channels is 256; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 256, and the number of output channels is 256; the number of the batch normalized channels is 256, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

The 22 nd lightweight convolution block XSepConv22 and the 23 rd lightweight convolution block XSepConv23 are: the number of input channels of the 1×1 spread convolution is 256, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

24 th lightweight convolution block XSepConv24: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 25 th lightweight convolution block XSepConv25 and the 26 th lightweight convolution block XSepConv26 are: the number of input channels of the 1×1 spread convolution is 256, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 27 th lightweight convolution block XSepConv27, the 28 th lightweight convolution block XSepConv28, the 29 th lightweight convolution block XSepConv29, and the 30 th lightweight convolution block XSepConv30 are: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

31 st lightweight convolution block XSepConv31: the number of input channels of the 1×1 spread convolution is 512, and the number of output channels is 512; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 512, and the number of output channels is 512; the number of the batch normalized channels is 512, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 32 nd lightweight convolution block XSepConv32 and the 33 rd lightweight convolution block XSepConv33 are: the number of input channels of the 1×1 spread convolution is 512, and the number of output channels is 256; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 256, and the number of output channels is 256; the number of the batch normalized channels is 256, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

34 th lightweight convolution block XSepConv34: the number of input channels of the 1×1 spread convolution is 256, and the number of output channels is 256; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 256, and the number of output channels is 256; the number of the batch normalized channels is 256, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

the 35 th lightweight convolution block XSepConv35 and the 36 th lightweight convolution block XSepConv36 are: the number of input channels of the 1×1 spread convolution is 512, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

The 37 th lightweight convolution block XSepConv37, the 38 th lightweight convolution block XSepConv38, the 39 th lightweight convolution block XSepConv39, and the 40 th lightweight convolution block XSepConv40 are: the number of input channels of the 1×1 spread convolution is 128, and the number of output channels is 128; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 128, and the number of output channels is 128; the number of the batch normalized channels is 128, and the activation function layer is a LeakyReLU function with a complex slope of 0.1;

41 st lightweight convolution block XSepConv41: the number of input channels of the 1×1 spread convolution is 512, and the number of output channels is 512; the number of input channels of the DW convolution, the SE module and the 1 multiplied by 1 output convolution is 512, and the number of output channels is 512; the batch normalized channel number is 512, and the activation function layer is a LeakyReLU function with a complex slope of 0.1.

Further, the number of channels of the YOLOv7-XSepConv model backbone network is half of the number of channels of the backbone network in the conventional YOLOv7 model.

Further, the step of preprocessing the picture in the step S3 includes: image denoising, image enhancement, image segmentation, image graying, image normalization and image binarization.

Further, the image enhancement adopts a histogram equalization or filtering technology, the image segmentation adopts a color segmentation or threshold segmentation technology to separate tomato plants in the image from the background, and the image normalization is used for carrying out size unification on the image.

Further, the picture to be detected in the step S3 is obtained by cutting and rotating a shot picture; in the S2 step, the plant diseases and insect pests in the picture of the tomato plant comprise cold injury, insect disease, early blight, fruit cracking, TY virus, potassium deficiency, powdery mildew and the like.

Further, the learning rate in the model training in the step S2 is automatically adjusted by cosine annealing, the initial value learning rate is set to 0.001, and the iteration number is set to 300.

A tomato plant disease and pest detection system is used for realizing the tomato plant disease and pest detection method, and comprises the following steps:

front end user interface section: the method is used for shooting, processing and uploading the picture to be detected, and displaying and outputting the detection result;

an image preprocessing section: the method comprises the steps of preprocessing an uploaded picture to be detected;

an object detection section: performing target detection on the preprocessed picture to be detected based on a target detection model algorithm, so as to realize identification and classification of plant diseases and insect pests of the tomato plant;

a database section: for storing the detection result data.

Further, the front-end user interface part is configured on an application program of the intelligent terminal, the image preprocessing part and the target detection part are configured on a background server of the application program, and the background server is used for data interaction and communication of the front end and the back end, including data interaction with a database.

Further, the intelligent terminal is a mobile phone, and the application program is a WeChat applet.

The whole detection system has a simple use method, a user only needs to upload a plant photo, the back-end server automatically performs image preprocessing and pest detection, the result is fed back to the user, and the user can check the detection result through a front-end interface. The whole system can be developed by adopting a WeChat applet, and can be used by a user anytime and anywhere, thereby being convenient and practical. Its advantages are mainly expressed in:

high efficiency and accuracy: by adopting a target detection algorithm in deep learning and combining various feature extraction technologies and classification algorithms, the plant diseases and insect pests on tomato plants can be efficiently and accurately detected, and more timely prevention and control measures are provided for peasants.

Simple and easy to use: the user only needs to upload the photo of the tomato plant, the system can automatically perform pretreatment and detection, and the result is displayed to the user in real time, so that the operation is simple and easy to use.

Data visualization: through the Web server technology, the detection result can be transmitted to the browser end of the user in real time, and the user can conveniently check and analyze the detection result.

And (3) data management: by adopting the relational database technology, a large amount of data can be efficiently stored and managed, and support is provided for subsequent inquiry and statistical analysis.

The invention adopts an improved YOLOv7 model algorithm, reduces the model size, reduces the calculation load and is beneficial to improving the running speed of the model by reducing the model parameters and replacing a Conv module in a YOLOv7 backbone network with an XSepConv module, and can realize rapid and real-time target detection under limited calculation resources, so that the application of the target detection in a small program is more suitable and efficient. The optimization has important significance for the application scene of the WeChat applet, can meet the requirements of users on quick and real-time target detection, and improves the user experience.

The invention can further improve the accuracy and reliability of detection by designing the light-weight and high-performance plant diseases and insect pests detector and combining technical means such as WeChat applet, and the like, has the advantages of low cost and simple and convenient operation, and also realizes the intellectualization and convenience of the tomato plant disease and insect pests detection system. Compared with the traditional image processing method and other methods based on the deep learning technology, the method has the advantages in the aspects of accuracy, reliability, cost, operation simplicity and the like, and can better meet the requirements of users.

Drawings

Fig. 1 is a block diagram of a Backbone network Backbone of a target detection model YOLOv7-XSepConv in an embodiment of the present invention.

FIG. 2 is a block diagram illustrating an exemplary architecture of an XSepConv module in a target detection model according to an embodiment of the present invention.

Fig. 3 is an illustration of a plant disease and pest plant collected during model training in an embodiment of the present invention.

Fig. 4 is a block diagram of a detection system according to an embodiment of the present invention.

Fig. 5 is a detection flow chart of the detection system according to an embodiment of the invention.

FIG. 6 is a diagram of a front end user interface portion of a detection system according to an embodiment of the present invention.

Detailed Description

A tomato plant disease and pest detection method comprises the following steps:

s1, constructing a target detection model

On the basis of YOLOv7, the number of channels in the YOLOv7 model is reduced, as shown in fig. 1, and a Conv module in a YOLOv7 Backbone network (Backbone) is replaced by an XSepConv module, so that the YOLOv7-XSepConv model is obtained as a target detection model.

The invention aims to realize light-weight and high-performance pest detection, and improves a YOLOv7 model in order to reduce the size of the model. YOLOv7 is a popular target detection model, but when applied on a terminal such as a mobile device, the model size and computational resource requirements are limiting factors. Therefore, the invention reduces the number of channels in the YOLOv7 model, thereby reducing the number of model parameters and enabling the model parameters to be suitable for real-time detection on mobile equipment. However, the reduction in the number of channels causes problems such as insufficient global information acquisition, image redundant information retention, and insufficient attention. To solve these problems, the present invention replaces the Conv module in the YOLOv7 backbone network with an XSepConv module. The specific method can be as follows: the XSepConv module code is written into the common. Py configuration file of the YOLOv7 model, then a yaml file is built, and the Conv structure in the back bone is replaced by the XSepConv structure relative to the original YOLOv7.Yaml file.

XSepConv is a lightweight convolution consisting of multiple convolution layers with fewer parameters and computation effort. As shown in fig. 2, the structure of the XSepConv module specifically includes:

a 1 x 1 spread convolution for spreading the number of input channels to a higher dimension;

batch normalization (Batch Normalization), activation function to enhance the representational capacity of the feature;

2 x 2DW convolution (Depthwise Convolution, depth separable convolution) for performing convolution operations between channels within a group, reducing computational effort;

batch normalization and activation of functions, and further improvement of feature expression capacity;

1 XK DW convolves, performs convolution operation in the channel dimension, and has steps in the spatial dimension, reducing the size of the feature map; k is the size of the convolution kernel, and may be set according to the specific case, for example, 3, and is a DW convolution of 1×3. DW (Depthwise Convolution) is that in the convolution operation, only the number of channels is convolved, the number of channels is not changed, and the convolution method is widely used for a lightweight model, and has the advantages of less parameters, less calculation amount and the like compared with the common convolution.

Batch normalization and activation of functions;

kx1dw convolutions perform convolution operations in the channel dimension and have different convolution kernel sizes in the spatial dimension;

Batch normalization;

the Squeeze-and-Excitation (SE) module is used for feature recalibration in the channel dimension and enhancing the expression of important features;

a 1 x 1 output convolution for reducing the number of channels of the feature map to a desired output size;

batch normalization to promote accuracy of feature representation.

Specifically, k=3 in the XSepConv module, and the backbone network of the YOLOv7-XSepConv model includes 41 lightweight convolution blocks XSepConv, wherein each convolution block has the following structure:

the 1 st lightweight convolution block XSepConv1 comprises:

the number of input channels of the 1×1 extended convolution is 3, the number of output channels is 16, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 16 and a LeakyReLU activation function layer with a complex slope of 0.1;

the input channel number of the 2×2DW convolution is 16, the output channel number is 16, the DW convolution kernel size is 2×2, a batch normalization layer with the channel number of 16 and a LeakyReLU activation function layer with a complex slope of 0.1;

the number of input channels of the 1 XK DW convolution is 16, the number of output channels is 16, the DW convolution kernel size is 1X 3, a batch normalization layer with the number of channels being 16 and a LeakyReLU activation function layer with a complex slope of 0.1;

the number of input channels of the K multiplied by 1DW convolution is 16, the number of output channels is 16, the DW convolution kernel size is 3 multiplied by 1, and a batch normalization layer with the number of channels being 16;

The input channel number of the Squeeze-and-Excitation (SE) module is 16, the output channel number of the SE module is 16, and the batch normalization layer with the channel number of 16 is provided;

the number of input channels of the 1×1 output convolution is 16, the number of output channels is 16, the convolution kernel size is 1×1, and one batch normalization layer with the number of channels being 16.

The 2 nd lightweight convolution block XSepConv2 comprises:

the number of input channels of the 1×1 extended convolution is 16, the number of output channels is 32, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 32 and a LeakyReLU activation function layer with a complex slope of 0.1;

the input channel number of the 2X 2DW convolution is 32, the output channel number is 32, the DW convolution kernel size is 2X 2, a batch normalization layer with the channel number of 32 and a LeakyReLU activation function layer with a complex slope of 0.1;

the number of input channels of the 1 XK DW convolution is 32, the number of output channels is 32, the DW convolution kernel size is 1X 3, a batch normalization layer with the number of channels being 32 and a LeakyReLU activation function layer with a complex slope of 0.1;

a batch normalization layer with K multiplied by 1DW convolution input channel number of 32, output channel number of 32, DW convolution kernel size of 3 multiplied by 1 and channel number of 16;

the SE module is provided with 32 input channels and 32 output channels of the squeze-and-Excitation (SE) module, and a batch normalization layer with 32 channels;

The number of input channels of the 1×1 output convolution is 32, the number of output channels is 32, the convolution kernel size is 1×1, and one batch normalization layer with the number of channels being 32.

The 3 rd lightweight convolution block XSepConv3 comprises:

the number of input channels of the 1×1 extended convolution is 32, the number of output channels is 32, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 32 and a LeakyReLU activation function layer with a complex slope of 0.1;

The 4 th lightweight convolution block XSepConv4 includes:

the number of input channels of the 1×1 extended convolution is 32, the number of output channels is 64, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 64 and a LeakyReLU activation function layer with a complex slope of 0.1;

the input channel number of the 2×2DW convolution is 64, the output channel number is 64, the DW convolution kernel size is 2×2, a batch normalization layer with the channel number of 64, and a LeakyReLU activation function layer with a complex slope of 0.1;

the number of input channels of the 1 XK DW convolution is 64, the number of output channels is 64, the DW convolution kernel size is 1X 3, a batch normalization layer with the number of channels being 64 and a LeakyReLU activation function layer with a complex slope of 0.1;

a batch normalization layer with K multiplied by 1DW convolution input channel number of 64, output channel number of 64, DW convolution kernel size of 3 multiplied by 1 and channel number of 16;

the SE module with the input channel number of 64 and the output channel number of 64 of the squeze-and-Excitation (SE) module and the batch normalization layer with the channel number of 64;

the number of input channels of the 1×1 output convolution is 64, the number of output channels is 64, the convolution kernel size is 1×1, and one batch normalization layer with the number of channels being 64.

The structures of the 5 th lightweight convolution block XSepConv5 and the 6 th lightweight convolution block XSepConv6 each comprise:

The number of input channels of the 1×1 extended convolution is 64, the number of output channels is 32, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 32 and a LeakyReLU activation function layer with a complex slope of 0.1;

The structures of the 7 th lightweight convolution block XSepConv7, the 8 th lightweight convolution block XSepConv8, the 9 th lightweight convolution block XSepConv9, and the 10 th lightweight convolution block XSepConv10 each include:

The 11 th lightweight convolution block XSepConv11 includes:

the number of input channels of the 1×1 extended convolution is 128, the number of output channels is 128, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 128 and a LeakyReLU activation function layer with a complex slope of 0.1;

A batch normalization layer with the input channel number of 128, the output channel number of 128, the DW convolution kernel size of 2×2, and the channel number of 128, and a LeakyReLU activation function layer with a complex slope of 0.1;

the number of input channels of the 1 XK DW convolution is 128, the number of output channels is 128, the DW convolution kernel size is 1X 3, a batch normalization layer with the number of channels being 128 and a LeakyReLU activation function layer with a complex slope of 0.1;

a batch normalization layer with K multiplied by 1DW convolution input channel number of 128, output channel number of 128, DW convolution kernel size of 3 multiplied by 1 and channel number of 16;

the SE module with the input channel number of 128 and the output channel number of 128 of the squeze-and-Excitation (SE) module is provided with a batch normalization layer with the channel number of 128;

the number of input channels of the 1×1 output convolution is 128, the number of output channels is 128, the convolution kernel size is 1×1, and a batch normalization layer with 128 channels.

The structures of the 12 th lightweight convolution block XSepConv12 and the 13 th lightweight convolution block XSepConv13 each comprise:

the number of input channels of the 1×1 extended convolution is 128, the number of output channels is 64, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 64 and a LeakyReLU activation function layer with a complex slope of 0.1;

The 14 th lightweight convolution block XSepConv14 includes:

the number of input channels of the 1×1 extended convolution is 64, the number of output channels is 64, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 64 and a LeakyReLU activation function layer with a complex slope of 0.1;

The structures of the 15 th lightweight convolution block XSepConv15 and the 16 th lightweight convolution block XSepConv16 each comprise:

The structures of the 17 th lightweight convolution block XSepConv17, the 18 th lightweight convolution block XSepConv18, the 19 th lightweight convolution block XSepConv19, and the 20 th lightweight convolution block XSepConv20 each include:

The 21 st lightweight convolution block XSepConv21 comprises:

the number of input channels of the 1×1 extended convolution is 256, the number of output channels is 256, the convolution kernel size is 1×1, a batch normalization layer with 256 channels and a LeakyReLU activation function layer with a complex slope of 0.1;

the input channel number of the 2×2DW convolution is 256, the output channel number is 256, the DW convolution kernel size is 2×2, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with a complex slope of 0.1;

the input channel number of the 1 XK DW convolution is 256, the output channel number is 256, the DW convolution kernel size is 1X 3, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with a complex slope of 0.1;

The number of input channels of the K multiplied by 1DW convolution is 256, the number of output channels is 256, the DW convolution kernel size is 3 multiplied by 1, and one batch normalization layer with the number of channels being 16;

the input channel number of the squeze-and-Excitation (SE) module is 256, the output channel number of the SE module is 256, and a batch normalization layer with 256 channels is provided;

the number of input channels of the 1×1 output convolution is 256, the number of output channels is 256, the convolution kernel size is 1×1, and one batch normalization layer with 256 channels is provided.

The structures of the 22 nd lightweight convolution block XSepConv22 and the 23 rd lightweight convolution block XSepConv23 each comprise:

the number of input channels of the 1×1 extended convolution is 256, the number of output channels is 128, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 128 and a LeakyReLU activation function layer with a complex slope of 0.1;

The 24 th lightweight convolution block XSepConv24 includes:

The structures of the 25 th lightweight convolution block XSepConv25 and the 26 th lightweight convolution block XSepConv26 each comprise:

The structures of the 27 th lightweight convolution block XSepConv27, the 28 th lightweight convolution block XSepConv28, the 29 th lightweight convolution block XSepConv29, and the 30 th lightweight convolution block XSepConv30 each include:

The 31 st lightweight convolution block XSepConv31 comprises:

the number of input channels of the 1×1 extended convolution is 512, the number of output channels is 512, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 512 and a LeakyReLU activation function layer with a complex slope of 0.1;

a 2 x 2DW convolved input channel number of 512, output channel number of 512, DW convolution kernel size of 2 x 2, a batch normalization layer with channel number of 512, a LeakyReLU activation function layer with complex slope of 0.1;

the number of input channels of the 1 XK DW convolution is 512, the number of output channels is 512, the DW convolution kernel size is 1X 3, a batch normalization layer with the number of channels being 512 and a LeakyReLU activation function layer with a complex slope of 0.1;

a batch normalization layer with K multiplied by 1DW convolution input channel number of 512, output channel number of 512, DW convolution kernel size of 3 multiplied by 1 and channel number of 16;

the input channel number of the squeze-and-Excitation (SE) module is 512, the output channel number of the SE module is 512, and the channel number of the SE module is 512;

The number of input channels of the 1×1 output convolution is 512, the number of output channels is 512, the convolution kernel size is 1×1, and one batch normalization layer with the number of channels being 512.

The structures of the 32 nd lightweight convolution block XSepConv32 and the 33 rd lightweight convolution block XSepConv33 each comprise:

the number of input channels of the 1×1 extended convolution is 512, the number of output channels is 256, the convolution kernel size is 1×1, a batch normalization layer with 256 channels and a LeakyReLU activation function layer with a complex slope of 0.1;

The 34 th lightweight convolution block XSepConv34 includes:

The structure of the 35 th lightweight convolution block XSepConv35 and the 36 th lightweight convolution block XSepConv36 each comprises:

the number of input channels of the 1×1 extended convolution is 512, the number of output channels is 128, the convolution kernel size is 1×1, a batch normalization layer with the number of channels being 128 and a LeakyReLU activation function layer with a complex slope of 0.1;

The structures of the 37 th lightweight convolution block XSepConv37, the 38 th lightweight convolution block XSepConv38, the 39 th lightweight convolution block XSepConv39, and the 40 th lightweight convolution block XSepConv40 all include:

The 41 st lightweight convolution block XSepConv41 includes:

The number of backbone channels of the YOLOv7-XSepConv model may be set to half the number of backbone channels in the conventional YOLOv7 model. By reducing the number of channels, the complexity and the parameter amount of the model can be reduced, and further, the calculation resources required by the operation on the WeChat applet are reduced, and the optimization measure is helpful for improving the operation speed of the model, so that the target detection is more efficient in the applet.

The lightweight design of the XSepConv module reduces the number of model parameters and the computational complexity, and simultaneously maintains higher characteristic representation capability, thereby realizing higher-performance pest detection. The Conv module in the YOLOv7 model is replaced by an XSepConv module which uses operations such as 1x1 extended convolution, 2x2 DW convolution, and 1xK, kx1 DW convolution to extract features and perform convolution operations. The replacement can further reduce the size of the model, reduce the calculation load and realize faster target detection speed; meanwhile, the XSepConv module keeps higher characteristic representation capability, and can also improve the detection accuracy and reliability, so that higher-performance pest and disease detection is realized.

Therefore, by reducing model parameters and replacing Conv modules, the accuracy and reliability of detection are further improved, and the method has the advantages of being low in cost and simple and convenient to operate. Compared with the traditional image processing method and other methods based on the deep learning technology, the method has the advantages in the aspects of accuracy, reliability, cost, operation simplicity and the like, and can better meet the requirements of users.

In this embodiment, the YOLOv7-XSepConv model employs an Adam optimizer, and the activation functions are FReLU activation functions and sigmoid functions. YOLOv7 is based on an Anchor based, and the positive and negative sample allocation strategy is the combination of the positive and negative sample allocation strategies in YOLOv5 and YOLOX, and the flow is as follows:

(1) YOLOv5: positive samples were assigned using the YOLOv5 positive and negative sample assignment strategy.

(2) YOLOX: calculation of reg+Class (Loss away) for each sample for each GT

(3) YOLOX: determining the number of positive samples (Dynamic k) it needs to allocate using the predicted samples for each GT

(4) YOLOX: taking the front dynamic k samples with minimum loss as positive samples for each GT

(5) YOLOX: manually removing positive samples (global information) of the same sample assigned to multiple GTs

The first step in simOTA is essentially to replace "use center a priori" with "strategy in yolov 5". The fusion of the YOLOv5 strategy with the simOTA strategy in YOLOX allows for additional fine screening by utilizing the current model behavior, as compared to using only the YOLOv5 strategy, with the addition of loss aware. The fusion strategy can provide more accurate a priori knowledge than using simOTA only in YOLOX.

S2, model training

Collecting tomato plant pictures with different kinds of plant diseases and insect pests, wherein the kinds of plant diseases and insect pests comprise cold injury, insect disease, early blight, leaf mold, early blight, fruit cracking, TY virus, potassium deficiency, powdery mildew and the like. As shown in fig. 3, fig. 3 (a) is a plant picture of powdery mildew, fig. 3 (b) is a plant picture of insect pest, fig. 3 (c) is a plant picture of cold damage, fig. 3 (d) is a plant picture of fruit cracking, fig. 3 (e) is a plant picture of potassium deficiency, fig. 3 (f) is a plant picture of early blight, the rest are not listed, these pictures form a training data set, and the pictures in the data set can be marked by using tools such as LabelImg. Inputting the picture into the model, setting parameters, training the target detection model until the model converges and saving the model parameters.

The learning rate of model training adopts an automatic adjustment learning rate, the initial value lr can be set to 0.001, and a cosine annealing scheduler (cosine annealing scheduler) is adopted as a preset learning rate strategy. Cosine annealing schedulers are a common learning rate scheduling strategy that adjusts the learning rate at each training step (or epoch) according to a predetermined learning plan. The learning rate is changed during the training period, and a larger learning rate is used in the initial stage of training and then gradually decreased. The number of iterations may be set to 300.

S3, target detection

And inputting the picture to be detected into a trained target detection model after pretreatment to obtain a detection result, and realizing the identification and classification of the plant diseases and insect pests.

The picture to be detected is a real-time shot picture, and can also be a picture acquired by other ways (such as a local file), and the picture to be detected can be cut, rotated and the like before being uploaded according to the acquisition way or the picture quality.

The image preprocessing step mainly comprises the technologies of image enhancement, image segmentation, image normalization and the like, and the technologies of image denoising and the like can be carried out according to actual conditions, so that an input image is converted into a form acceptable by a model.

a. Image enhancement: the contrast and the definition of the image are enhanced by using histogram equalization, filtering and other technologies, so that the image quality is improved, and the model is more accurately identified;

b. Image segmentation: the tomato plants in the image are separated from the background by using technologies such as color segmentation, threshold segmentation and the like, so that the interference of the background is reduced;

c. image normalization: and performing size unification processing on the images, for example, uniformly converting the images into 640 x 640, so as to avoid the influence of the images with different sizes on target detection.

And S4, storing the detection result data, so that the subsequent display, statistical analysis and tracing are convenient.

The data in the test data set is input into the trained model for detection, in this embodiment, 182 pictures in the test set are marked with 544 targets, and the test result is shown in table 1.

Table 1 evaluation table of object detection model

As can be seen from Table 1, the target recognition model has better performance, the detection accuracy rate for various diseases and insect pests is generally more than 0.75, even some types can reach 0.85, the average accuracy rate of the whole types is 0.775, and the average accuracy average value [email protected] of all types is 0.713; the average value of recall rate (recovery) reaches 0.671, and the individual plant diseases and insect pests can reach more than 0.75. The identification rate of the target identification model is high, the types of tomato diseases and insect pests can be accurately identified, the use requirement is met, and a powerful technical reference can be provided for a planter.

A tomato plant disease and insect pest detection system is used for realizing the tomato plant disease and insect pest detection method, and as shown in fig. 3, the system comprises a front-end user interface part, an image preprocessing part, a target detection part, a database part and a server part, wherein the front-end user interface part is configured on an application program of an intelligent terminal, the intelligent terminal can be a mobile phone, a tablet computer, a computer or an embedded terminal, and the application program can be a WeChat applet. WeChat applets typically run on mobile devices with limited computing resources and processing power, so in order to ensure the smoothness of the operation and user experience of the applet, the model algorithm for the object detection part needs to be as small as possible, and the consumption of computing resources is as low as possible. The image preprocessing part and the target detection part are arranged at the rear end of the application program, can be deployed on a local server or a cloud server, the model channel of the target detection part is halved, the parameters are reduced, the model reasoning speed is accelerated, the selection of the server is also facilitated, the server with low calculation performance is also selected, on one hand, the resources of a computer can be reasonably utilized, and on the other hand, the cost can be saved.

The front-end user interface part is mainly responsible for shooting, processing and uploading of pictures to be detected, and displaying and outputting detection results. Taking the WeChat applet as an example, as shown in FIG. 6, the front-end user interface portion mainly includes the page design and function implementation in the WeChat applet. The user can select a local picture or shoot by using a camera through the interface, upload a picture of the tomato plant to be detected (as shown in fig. 6 (c)), and cut and rotate the picture before uploading so as to improve the accuracy of subsequent detection. Specifically, a button and an image preview area are provided on the front-end interface portion for the user to upload a picture, through which the user uploads a picture and previews it. In addition, the part can also realize functions of displaying and outputting detection results of plant diseases and insect pests of tomatoes (as shown in fig. 6 (d) and 6 (e)).

After uploading, the system will transmit the picture to the image preprocessing section. The image preprocessing part is mainly used for preprocessing the picture to be detected uploaded by the user so as to improve the accuracy of subsequent detection. The part mainly comprises preprocessing technologies such as image enhancement, image segmentation, image normalization and the like, and converts an input image into a form acceptable by a model.

The target detection part is mainly used for carrying out target detection on the preprocessed picture to be detected based on a target detection model algorithm, so that the identification and classification of the plant diseases and insect pests of the tomato plants are realized. The target detection model algorithm adopts the YOLOv7-XSepConv model, deep learning is carried out on the image through a neural network, and the preprocessed image is taken as input to output a disease and pest detection result by combining various feature extraction technologies and classification algorithms.

The YOLOv7-XSepConv model can realize rapid and real-time target detection under limited computing resources by reducing the size and the parameter number of the model and adopting a lightweight convolution module. The use scenario of the WeChat applet is usually that the user performs target detection after taking a photo of a tomato plant in the field, and the user may need to perform pest detection using the WeChat applet in the open air, in a greenhouse or other environments. In this case, rapid target detection can help the user quickly understand the health of the tomato plant and take control measures in time. Therefore, the YOLOv7-XSepConv model is particularly suitable for the application scene of the WeChat applet, and can ensure the real-time performance of detection and the convenience of users.

The database part is mainly used for storing the result data of the detection of the plant diseases and insect pests of the tomato plants. The part can adopt a MySQL and other relational databases, can efficiently store and manage a large amount of data, and is convenient for users to inquire and statistically analyze when needed.

The background server is mainly used for database management and display functions, stores detection results into a database, displays the results to a user in a list or picture form, and realizes data interaction and communication of front and rear ends. The part can adopt the HTTP protocol, can transmit the detection result to the browser end of the user in real time through the network, is convenient for the user to check and analyze, and realizes the connection and data interaction of the front-end user interface and the back-end data processing part.

Fig. 4 and 5 show the connection relationship and interaction flow between the above parts. After uploading the photo of the tomato plant through the front-end interface part, the image preprocessing part preprocesses the image, and then transmits the preprocessed image to the target detection part for detecting the plant diseases and insect pests. The detection result is stored in a database and is displayed to the user through a server part. The whole system realizes the functions of user uploading, image processing, pest detection, result storage, display and the like, and the parts work cooperatively, so that the complete tomato plant pest detection flow is realized.

In the whole system, data transfer and interaction are carried out between all parts through an API interface. After the picture uploaded by the user is previewed at the front end, the picture is transmitted to the image preprocessing part for processing through the API interface. After the pretreatment part finishes the treatment, the treated image is transmitted to the target detection part through an API interface to detect the plant diseases and insect pests. The detection result is transmitted to the server through the API interface and stored in the database, and is returned to the front end interface part through the API interface for display.

In a word, the invention realizes the rapid detection and accurate identification of the plant diseases and insect pests of the tomato plants by integrating the front-end interface, the image preprocessing, the target detection and the server part, and provides convenient and efficient technical support for agricultural production.

It is worth to say that the disease and pest detection method and system of the invention are not only suitable for target detection of tomato disease and pest, but also suitable for detection of disease and pest of other crops, such as vegetable crops like beans, cucumbers, eggplants and other melon and fruit crops.

The foregoing detailed description is directed to embodiments of the invention which are not intended to limit the scope of the invention, but rather to cover all modifications and variations within the scope of the invention.

Claims

1. The tomato plant disease and pest detection method is characterized by comprising the following steps:

2. A method for detecting plant diseases and insect pests according to claim 1, wherein said XSepConv module comprises, in order: a 1 x 1 spread convolution for spreading the number of input channels to a higher dimension; batch normalization and activation of functions; a 2 x 2DW convolution for performing convolution operations between channels within a group; batch normalization and activation of functions; a 1 xK DW convolution performs a convolution operation in the channel dimension and has a stride in the spatial dimension; batch normalization and activation of functions; kx1dw convolutions perform convolution operations in the channel dimension and have different convolution kernel sizes in the spatial dimension; batch normalization; the SE module is used for carrying out feature recalibration on the dimension of the channel; a 1 x 1 output convolution for reducing the number of channels of the feature map to a desired output size; and (5) batch normalization.

3. A tomato plant disease and pest detection method according to claim 2, wherein k=3 in the XSepConv module, and the backbone network of the YOLOv7-XSepConv model comprises 41 lightweight convolution blocks XSepConv, wherein each convolution block has the following structure:

4. The method for detecting plant diseases and insect pests according to claim 1, wherein the number of channels of the yellow 7-XSepConv model backbone network is half of the number of backbone network channels in the conventional yellow 7 model.

5. The method for detecting plant diseases and insect pests according to claim 1, wherein the step of preprocessing the picture in the step S3 comprises: image denoising, image enhancement, image segmentation, image graying, image normalization and image binarization;

the image enhancement adopts a histogram equalization or filtering technology, the image segmentation adopts a color segmentation or threshold segmentation technology to separate tomato plants in the image from the background, and the image normalization is used for carrying out size unification on the image.

6. The method for detecting plant diseases and insect pests according to claim 1, wherein the picture to be detected in the step S3 is obtained by cutting and rotating a photographed picture;

and S2, the types of diseases and insect pests in the picture of the tomato plant in the step of S2 comprise cold injury, insect disease, early blight, fruit cracking, TY virus, potassium deficiency and powdery mildew.

7. The method for detecting plant diseases and insect pests according to claim 1, wherein the learning rate in the model training in the step S2 is automatically adjusted by cosine annealing, the initial value learning rate is set to 0.001, and the iteration number is set to 300.

8. A tomato plant pest detection system for implementing a tomato plant pest detection method according to any one of claims 1-7, comprising:

a database section: for storing the detection result data.

9. A tomato plant disease and pest detection system as defined in claim 8, wherein the front end user interface portion is configured on an application of the intelligent terminal, and the image preprocessing portion and the target detection portion are configured on a background server of the application.

10. The tomato plant disease and pest detection system of claim 9, wherein the intelligent terminal is a mobile phone and the application program is a WeChat applet.