CN114066920B

CN114066920B - Harvester visual navigation method and system based on improved Segnet image segmentation

Info

Publication number: CN114066920B
Application number: CN202111394892.4A
Authority: CN
Inventors: 卢柱; 齐亮; 李邦昱; 张永韡; 宋英磊; ***; 暴琳
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2024-07-05
Anticipated expiration: 2041-11-23
Also published as: CN114066920A

Abstract

The invention discloses a harvester visual navigation method and a system based on improved Segnet image segmentation, comprising the following steps: (1) Collecting a field crop image to be segmented for pretreatment; (2) Semantic segmentation is carried out on the obtained field crop image by using an improved Segnet model, and a target feature map is generated; the improvement Segnet model refers to utilizing ShuffleNetV network as the encoder part in the Segnet model; (3) Obtaining boundary pixel points and position information of a target in the feature map by using an edge detection algorithm; (4) And taking the acquired boundary pixel points and the position information as input of an accumulated probability Hough transform PPHT algorithm, and finally outputting a straight line segment as a target straight line path for navigation by adjusting a threshold parameter, a MINLINELENGTH parameter and a maxLineGap parameter by the accumulated probability Hough transform PPHT algorithm. The invention can quickly carry out semantic segmentation on the acquired image, and calculates and screens out a proper path for navigation of the harvester according to the segmented characteristic image.

Description

Harvester visual navigation method and system based on improved Segnet image segmentation

Technical Field

The invention relates to the technical field of deep learning, computer vision and image analysis, in particular to a harvester vision navigation method and system based on improved Segnet image segmentation.

Background

At present, a technology of segmenting an image by deep learning and acquiring image information through the segmented image is applied to the field of agricultural production, and the application of the patent application number CN202110287975.7 is used for segmenting the image by utilizing a Segnet network model to lodge the acquired sorghum, and adopts a MobileNet network to lighten a Segnet network in a Segnet network model coding stage so as to realize rapid segmentation of the image. However, such segmentation speeds are inadequate for harvesters, which are moving dynamically in real time when harvesting crops, requiring more rapid segmentation of the acquired images, and calculating and screening out the desired path for navigation from the segmented images.

Disclosure of Invention

The invention aims to: aiming at the defects, the invention provides a visual navigation method of a harvester based on improved Segnet image segmentation, which realizes rapid identification to carry out semantic segmentation on images acquired by the harvester, and calculates and screens out a proper harvesting path for navigation of the harvester according to the segmented characteristic images. Meanwhile, the invention also provides a harvester visual navigation system based on the improved Segnet image segmentation, and the system can acquire a proper harvesting path in real time for navigation of the harvester.

The technical scheme is as follows: in order to solve the problems, the invention provides a harvester visual navigation method based on improved Segnet image segmentation, which is characterized by comprising the following steps:

(1) Collecting a field crop image to be segmented for pretreatment;

(2) Semantic segmentation is carried out on the obtained field crop image by using an improved Segnet model, and a target feature map is generated; the improvement Segnet model refers to utilizing ShuffleNetV network as the encoder part in the Segnet model;

(3) Obtaining boundary pixel points and position information of a target in the feature map by using an edge detection algorithm;

(4) And taking the acquired boundary pixel points and the position information as input of an accumulated probability Hough transform PPHT algorithm, and finally outputting a straight line segment as a target straight line path for navigation by adjusting a threshold parameter, a MINLINELENGTH parameter and a maxLineGap parameter by the accumulated probability Hough transform PPHT algorithm.

The beneficial effects are that: compared with the prior art, the invention has the remarkable advantages that: the coding part of Segnet network adopts ShuffleNetV network, shuffleNetV is simplified by mainly utilizing depth separable convolution, the calculation efficiency is remarkably improved under the condition of sacrificing a small amount of precision, the Segnet network is light, the segmentation speed of the field crop image is improved, boundary pixel points and position information are obtained by detecting the edges of the segmented image, and a proper straight line path is selected by utilizing PPHT algorithm for navigation of a harvester.

Further, generating the target feature map in the step (2) by using the improved Segnet model specifically includes:

(2.1) acquiring a field crop image, setting a region of interest for marking, and dividing the marked data set into a training set and a testing set; performing image enhancement processing on the marked images in the training set;

(2.2) performing model iterative training by adjusting the weight coefficient vector omega and the learning rate alpha in the improved Segnet model, and selecting the model with the largest average cross ratio MIoU as an optimal model;

Wherein k represents the number of categories; k+1 represents the number of categories including empty categories; p _ij denotes the number of false positive samples; p _ji represents the number of false negative samples; p _ii represents the true sample number;

(2.3) image segmentation is carried out on the field crop image to be segmented by utilizing the optimal model, and a target feature map is obtained;

Further, the ShuffleNetV network in the step (2) is further improved, and the construction of the basic unit in the ShuffleNetV network specifically comprises the following steps:

(a) Dividing input into two groups of feature graphs according to the number of channels through CHANNEL SPLIT operation, and then respectively transmitting the two groups of feature graphs into two branches;

(b) After the feature map transmitted to one branch is advanced by 1X 1 row of convolution, inputting an ASPP structure based on depth separable convolution for convolution operation, then splicing and outputting, and outputting the feature map after 1X 1 convolution; the ASPP structure is formed by three depth separable convolution layer branches with different sizes;

(c) And splicing the feature map output by one branch with the feature map output by the other branch, and finally outputting the feature map after Channel Shuffle operation.

The ASPP structure based on depth separable convolution is used for replacing a single 3×3 convolution kernel in the lightweight network ShuffleNetV to improve the receptive field of the network, thereby capturing multi-scale spatial information and improving the performance of the model.

Further, the method further improves the ShuffleNetV network in the step (2), and the construction of the spatial downsampling unit in the ShuffleNetV network specifically comprises the following steps:

(d) Copying the feature map output by the upper network as input to two branches, wherein one branch input sequentially passes through a 3×3 depth separable convolution layer and a1×1 convolution operation and then outputs the feature map; the other branch is subjected to convolution of 1 multiplied by 1, then input into an ASPP structure based on depth separable convolution for convolution operation, then spliced and output, and then subjected to convolution operation of 1 multiplied by 1 to output a characteristic diagram;

(e) And splicing the feature map output by one branch with the feature map output by the other branch, and finally outputting the feature map after Channel Shuffle operation.

Further, the method further comprises optimizing a loss function of the improved Segnet model by adopting a random gradient descent method and L1 regularization; the random gradient descent method SGD is formulated as follows:

wherein ω represents a weight coefficient vector, α represents a learning rate, and J (ω) represents a loss function;

The L1 regularization formula is as follows:

wherein X is a training sample, y is a label corresponding to X, omega (omega) is a punishment term, Is an objective function.

Further, the three different-sized depth separable convolution layers in the ASPP structure are 3×3, 5×5, and 7×7, respectively.

Further, the stride of the depth separable convolutional layer in the ASPP structure in the base unit is set to 1, and the stride of the depth separable convolutional layer in the ASPP structure in the downsampling unit is set to 2.

The invention also adopts a harvester visual navigation system based on improved Segnet image segmentation, which comprises:

the image acquisition module is used for acquiring a field crop image to be segmented for preprocessing;

The image processing module is used for carrying out semantic segmentation on the obtained field crop image by using the improved Segnet model to generate a target feature map; the improvement Segnet model refers to utilizing ShuffleNetV network as the encoder part in the Segnet model; obtaining boundary pixel points and position information of a target in the feature map by using an edge detection algorithm;

The path decision unit takes the obtained boundary pixel points and the position information as the input of the accumulated probability Hough transform PPHT algorithm, and the accumulated probability Hough transform PPHT algorithm finally outputs a straight line segment as a target straight line path for navigation by adjusting the threshold parameter, the MINLINELENGTH parameter and the maxLineGap parameter.

The beneficial effects are that: compared with the prior art, the system has the remarkable advantages that the system can acquire a proper harvesting path in real time for the harvester to navigate.

Furthermore, a computer readable storage medium comprising a stored computer program, wherein the computer readable storage medium is controlled to perform the above-mentioned method when the computer program is run.

Furthermore, a memory, a processor and a program stored and executable on said memory, said program realizing the steps of the above method when being executed by the processor.

Drawings

FIG. 1 is a flow chart of a method for visual navigation of a harvester based on improved Segnet image segmentation according to the present invention;

FIG. 2 is a flow chart of the invention for obtaining a target feature map using the modified segnet model;

FIG. 3 is a block diagram of a further modified ShuffleNetV base unit and downsampling unit of the present invention;

FIG. 4 is a schematic diagram of the acceleration process of the inference accelerator TensorRT of the present invention;

FIG. 5 is a diagram showing the composition of a visual navigation system for a harvester based on improved Segnet image segmentation in accordance with the present invention;

FIG. 6 is a graph showing the effect of detecting the boundaries of rice according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the visual navigation method of the harvester based on the improved Segnet image segmentation, disclosed by the invention, comprises the following steps of:

(1) Collecting target data and preprocessing;

The high-list-view camera of the compass (Logitech) C1000e is selected, the real-time frame rate is set to 60f/s and is used for collecting field crop images in real time, and the field images need to be subjected to size reconstruction and a region of interest in advance in a continuous capturing mode of an industrial camera so as to obtain abundant detail information and improve the real-time performance of an algorithm by compressing the image size. Meanwhile, in order to reduce the field illumination and false edge detection influence, preprocessing such as graying, bilateral filtering and the like is carried out on the image;

(2) Semantic segmentation is carried out on the preprocessed image data by using an improved Segnet model, and characteristics are extracted to generate a target characteristic diagram;

the Segnet network consists of two parts, namely an encoder and a decoder, the Segnet encoder adopts the first 13 layers of the VGG network, each layer comprises convolution +BN (batch normalization) +ReLU, and downsampling processing is carried out. Each layer of the decoder and encoder corresponds one-to-one to upsample the recovered pixels and finally the output layer uses the softmax function to classify and output the classification probability for each pixel. However, when the Segnet network is deployed by the embedded platform, the parameter quantity of the network is too large, the required calculation quantity is high, and the real-time requirement of harvesting crops by the harvester is difficult to meet, so that the ShuffleNetV2 network is used as an encoder part in the Segnet model to form an improved Segnet model;

The ShuffleNetV network mainly utilizes depth separable convolution to simplify the network, remarkably improves the calculation efficiency under the condition of sacrificing a small amount of precision, and can be well used as the basis of a lightweight image semantic segmentation model. To achieve better processing speed, the ShuffleNetV network is further improved. As shown in fig. 3 (a), the construction of the basic unit in the ShuffleNetV network specifically includes the following steps:

(b) The feature map transmitted to one branch is firstly subjected to 1X 1 convolution, then input into an ASPP structure based on depth separable convolution to be subjected to convolution operation, then spliced and output, and then the feature map is subjected to 1X 1 convolution to output; the ASPP structure is three branches of the depth separable convolution layers of 3×3, 5×5 and 7×7; the stride of the depth separable convolution in ASPP structures is set to 1. Replacing a single 3 x 3 convolution kernel in the original ShuffleNetV model with an ASPP (spatial pyramid pooling) structure based on depth separable convolution, such structural modification can expand the receptive field of convolution operation, thereby capturing multi-scale spatial information

(C) And (3) splicing and outputting the feature graphs output through the two branches, and finally outputting the feature graphs after Channel Shuffle operation.

As shown in fig. 3 (b), the construction of the spatial downsampling unit in ShuffleNetV network specifically includes the following steps:

(d) Copying the feature map output by the upper network as input to two branches, wherein one branch input sequentially passes through a3×3 depth separable convolution layer and a1×1 convolution operation and then outputs the feature map; the other branch is subjected to convolution of 1 multiplied by 1, then input into an ASPP structure based on depth separable convolution for convolution operation, then spliced and output, and then subjected to convolution operation of 1 multiplied by 1 to output a characteristic diagram; the ASPP structure is three branches of the depth separable convolution layers of 3×3, 5×5 and 7×7; the stride of the depth separable convolution in ASPP structures are all set to 2

In addition, a random gradient descent method and L1 regularization can be adopted to optimize the loss function of the improved Segnet model; the random gradient descent method SGD is formulated as follows:

The L1 regularization formula is as follows:

wherein X is a training sample, y is a label corresponding to X, omega (omega) is a punishment term, Is an objective function;

As shown in fig. 2, the specific steps of processing an image and generating a target feature map using the modified Segnet model include:

(2.1) finely marking the region of interest in the collected farmland dataset (2000 sheets in total) by a labelme marking tool, and dividing the marked dataset into a training set and a testing set with the dividing ratio of 7:3; and (3) carrying out labeling processing on the data set, and converting the labeled data into TFRecord format data which is convenient to obtain Tensorflow. And under Tensorflow environment, each frame of image is subjected to geometric transformation such as overturning, rotating, scaling, shifting and the like to enhance the image data, so that the over-fitting condition of the network is reduced, and the training effect of the network is improved.

(2.2) Performing 30000 to 50000 times of iterative training by adjusting and improving the weight coefficient vector omega and the learning rate alpha in Segnet models to obtain training results of different MIoU, and finally selecting a model with the largest average intersection ratio value MIoU as an optimal model according to MIoU values, wherein the MIoU formula is as follows:

(2.3) performing image segmentation on the test set sample by using the optimal model to obtain a target feature map; the test set sample can be obtained by image segmentation of the field crop image obtained in the step (1), so as to obtain a target feature map;

In order to improve the reasoning speed of Segnet models, the obtained optimal models can be deployed in the embedded terminal, and operations such as merging layers, reducing calculation accuracy, parallel optimization and the like are performed on the optimal models deployed in the embedded terminal by utilizing a reasoning accelerator TensorRT of NVIDIA. As shown in FIG. 4, the system can perform operations such as merging layer, precision calibration, dynamic storage, kernel automatic adjustment, parallel optimization and the like on the trained network, so that the reasoning speed of the model is improved.

(3) Aiming at the generated target feature map, the edge detection algorithm can be utilized to acquire the boundary pixel point and the position information of the target in the feature map by combining the edge harvesting operation characteristics of the combine harvester; specifically, boundary pixel points and position information of a target in the feature map can be acquired through a sobel operator edge detection algorithm and then a canny operator edge detection algorithm, and the detection effect is enhanced through superposition processing of the two algorithms.

(4) Taking the obtained boundary pixel point and the position information as the input of the accumulated probability Hough transform PPHT algorithm, calling HoughLinesP () function in opencv to obtain a plurality of straight line segments, and modifying the threshold parameter (minimum curve intersection point required by detecting a straight line) in the function: the adjustment interval is: 150-200 pixel points MINLINELENGTH (the number of minimum points making up a straight line): the adjustment interval is: 80-120 pixel points, maxLineGap parameters (threshold of the spacing between detected line segments): the adjustment interval is: and 0-10 pixel points, so that the accumulated probability Hough transform PPHT algorithm finally outputs a linear line segment as a target linear path for navigation.

As shown in fig. 6, the target path of the harvesting boundary of rice finally outputted by the above-mentioned method. In combination with the size and heading information of the harvester, the edges of the harvester are caused to travel beyond and along the acquired boundary straight path.

In addition, the invention also provides a harvester visual navigation system based on improved Segnet image segmentation, which comprises the following components:

As shown in fig. 5, the image acquisition module adopts a monocular high-definition camera, the high-performance vision computer is internally provided with an image processing module and a path decision unit, the monocular high-definition camera and the high-performance vision computer are arranged on the harvester, and the image acquired by the monocular high-definition camera is transmitted into the high-performance vision computer to process the image and make a path decision.

Claims

1. A visual navigation method of a harvester based on improved Segnet image segmentation, which is characterized by comprising the following steps:

(1) Collecting a field crop image to be segmented for pretreatment;

(2) Semantic segmentation is carried out on the obtained field crop image by using an improved Segnet model, and a target feature map is generated; the improvement Segnet model refers to utilizing ShuffleNetV network as the encoder part in the Segnet model; the construction of the basic unit in ShuffleNetV network specifically comprises the following steps:

(b) The feature map transmitted to one branch is firstly subjected to 1X 1 convolution, then input into an ASPP structure based on depth separable convolution to be subjected to convolution operation, then spliced and output, and then the feature map is subjected to 1X 1 convolution to output; the ASPP structure is formed by three depth separable convolution layer branches with different sizes;

(c) The feature map output by one branch is spliced with the feature map output by the other branch to be output, and finally the feature map is output after Channel Shuffle operation;

(3) Obtaining boundary pixel points and position information of a target in a target feature map by using an edge detection algorithm;

2. The method of visual navigation of a harvester based on improved Segnet image segmentation of claim 1, wherein generating the target feature map in step (2) using the improved Segnet model comprises:

and (2.3) carrying out image segmentation on the field crop image to be segmented by utilizing the optimal model, and obtaining a target feature map.

3. The method of claim 1, wherein the step (2) of constructing a spatial downsampling unit for a ShuffleNetV network, and the step (ShuffleNetV) of constructing a spatial downsampling unit for a ShuffleNetV network, comprises the steps of:

(d) Copying the feature map output by the upper network as input to two branches, wherein the input of one branch sequentially passes through a 3×3 depth separable convolution layer and a1×1 convolution operation and then outputs the feature map; the input of the other branch is firstly subjected to convolution of 1 multiplied by 1, then is input into an ASPP structure based on depth separable convolution to be subjected to convolution operation and then is spliced and output, and then is subjected to convolution operation of 1 multiplied by 1 and then is subjected to feature map output;

4. The method of claim 1, further comprising optimizing a loss function of the modified Segnet model using a stochastic gradient descent method and L1 regularization; the random gradient descent method SGD is formulated as follows:

The L1 regularization formula is as follows:

5. A method of visual navigation of a harvester based on improved Segnet image segmentation according to claim 3, wherein three different sized depth separable convolution layers in the ASPP structure are 3 x 3, 5 x 5, 7 x 7, respectively.

6. A harvester visual navigation method based on improved Segnet image segmentation according to claim 3, characterised in that the stride structure in the base unit has a stride depth separable convolution layer of 1 and the ASPP structure in the downsampling unit has a stride depth separable convolution layer of 2.

7. A harvester visual navigation system based on improved Segnet image segmentation, comprising:

The image processing module is used for carrying out semantic segmentation on the obtained field crop image by using the improved Segnet model to generate a target feature map; the improvement Segnet model refers to utilizing ShuffleNetV network as the encoder part in the Segnet model; obtaining boundary pixel points and position information of a target in the feature map by using an edge detection algorithm; the construction of the basic unit in ShuffleNetV network specifically comprises the following steps:

8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the method according to any one of claims 1 to 6.

9. A debugging device, characterized by a memory, a processor and a program stored and executable on said memory, which when executed by the processor realizes the steps of the method according to any one of claims 1 to 6.