CN116958053B

CN116958053B - Yolov 4-tiny-based bamboo stick counting method

Info

Publication number: CN116958053B
Application number: CN202310743049.5A
Authority: CN
Inventors: 雷帮军; 丁奇帅
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2024-05-14
Anticipated expiration: 2043-06-21
Also published as: CN116958053A

Abstract

A yolov-tiny-based bamboo stick counting method comprises the following steps: s1: firstly, collecting a plurality of pictures with a certain number of bamboo stick cross sections, and unifying the pictures in size; s2: labeling each original image acquired in the step S1, drawing a rectangular bounding box for each bamboo stick in the image by using a labeling tool, and storing a corresponding labeling file; s3: dividing the marked data set into a training set, a verification set and a test set; s4: constructing a yolov-based 4-tiny network; s5: inputting the marked training data set into the yolov-tiny network constructed to perform multi-round iterative training, and simultaneously storing a network weight file in the training process; s6: inputting the verification set and the test set into a network, and performing performance evaluation on the network by using the weight file stored in the S5; s7: invoking the saved optimal weight, detecting the input bamboo stick picture to be detected, and displaying the accurate position and counting result of the bamboo sticks on the detection result image; the bamboo sticks are counted through the steps.

Description

Yolov 4-tiny-based bamboo stick counting method

Technical Field

The invention relates to the technical field of computer vision and deep learning, in particular to a yolov-tiny-based bamboo stick counting method.

Background

The most commonly used bamboo stick counting methods at present are a manual counting method, a weighing counting method, an image processing method and the like.

The manual counting method is completed by manually counting the bamboo sticks one by one, and the factors such as fatigue, eye flowers and the like of people can cause errors in the manual counting process, so that the error rate is higher.

The bamboo sticks are weighed by the weighing and counting method, and then the number of the bamboo sticks is calculated according to the weight of the single bamboo stick. In the actual measurement process, the weight of the bamboo sticks in different batches may have larger difference, thereby affecting the accuracy of counting. For bamboo sticks with different lengths, the weights of the bamboo sticks are different, and the bamboo sticks need to be classified and counted according to the lengths, so that the calculation difficulty and the error risk are increased.

The counting method based on image processing is mainly to perform binarization processing on bamboo sticks, and then count the bamboo sticks by using morphological processing, region growing and other technologies. The method needs to carry out a large amount of parameter adjustment in advance, has unstable counting effect on complex scenes, and is difficult to process bamboo sticks with different shapes and sizes.

Patent document with the publication number CN114291593A discloses a bamboo stick counting device, the device uses a ratchet wheel with a groove to transmit the bamboo sticks, the bamboo sticks are counted one by one through sensing signals of a counter, the method can realize the bamboo stick counting function, but a physical mechanical device is needed to be used, the implementation process is relatively complex, the counting speed and accuracy are influenced by the ratchet wheel rotating speed, the sensing time of an inductor and other aspects, and the counting efficiency and the counting accuracy are required to be improved.

With the continuous development of artificial intelligence technology, the target detection algorithm is widely applied to various industries, and the target detection algorithm has great advantages in detection precision and efficiency. The yolov-tiny target detection algorithm has remarkable effect on small target detection, and is suitable for dense small target detection such as bamboo sticks.

Therefore, the applicant proposes a bamboo stick counting method based on yolov-tiny.

Disclosure of Invention

The invention aims to solve the technical problems of low accuracy, low efficiency, poor stability and the like in the existing bamboo stick counting technology, and provides a yolov-tiny-based bamboo stick counting method.

In order to solve the technical problems, the invention adopts the following technical scheme:

a yolov-tiny-based bamboo stick counting method comprises the following steps:

S1: firstly, collecting a plurality of pictures with a certain number of bamboo stick cross sections, and unifying the pictures in size;

S2: labeling each original image acquired in the step S1, drawing a rectangular bounding box for each bamboo stick in the image by using a labeling tool, and storing a corresponding labeling file;

S3: dividing the marked data set into a training set, a verification set and a test set so as to train and evaluate the network model;

s4: constructing a yolov-based 4-tiny network;

S5: inputting the marked training data set into the yolov-tiny network constructed to perform multi-round iterative training, and simultaneously storing a network weight file in the training process;

S6: inputting the verification set and the test set into a network, performing performance evaluation on the network by using the weight files stored in the step S5, and selecting the weight file corresponding to the highest index as an optimal weight according to the indexes of the counting accuracy, the recall rate and the F1 value, and storing the weight file for subsequent calling;

s7: invoking the saved optimal weight, detecting the input bamboo stick picture to be detected, and displaying the accurate position and counting result of the bamboo sticks on the detection result image;

The bamboo sticks are counted through the steps.

In step S1, the acquired bamboo stick cross-section picture is an RGB three-channel color image, and the picture is uniformly sized to 416×416 pixels;

in step S2, the labeling file stores the upper left corner position coordinates and the lower right corner position coordinates of the rectangular bounding box corresponding to each bamboo stick in each picture;

In step S3, the training set: validation set=9:1 is divided by (training set+validation set) test set=9:1;

In step S4, the constructed network is an improved yolov-tini network adopted by the invention, and the main improvement contents are as follows:

(1) The original common convolution is replaced by the deformable convolution at the back bone part of the feature extraction, and the deformable convolution enables the sampling position of the convolution kernel to have variability by introducing the learnable offset and weight, so that the perception capability of the model to irregular-shaped bamboo sticks is enhanced.

(2) A CBAM attention mechanism module is added to the Neck part of the feature fusion to enhance the features of the region of interest. The feature expression capacity of the bamboo stick area is improved by adaptively adjusting the channel weight and the space weight of the feature map.

In step S5, training the network iteration 300 times, and storing a weight file every 10 times;

in step S6, the optimal weight is found out according to the evaluation result of the verification set, and the comprehensive performance of the network model is indicated according to the evaluation result of the test set;

In step S7, the optimal weight and the designed network model are called, the input picture to be detected is inferred, the detection result is displayed, and the number of detected bamboo sticks is displayed as a counting result.

In step S4, the yolov-tiny network structure constructed is as follows:

The method comprises the steps of a first layer of Input of a backbox feature extraction module, a second layer of basic_DCN of the backbox feature extraction module, a third layer of basic_DCN of the backbox feature extraction module, a fourth layer of CSP_blocks of the backbox feature extraction module, a fifth layer of CSP_blocks of the backbox feature extraction module, a sixth layer of CSP_blocks of the backbox feature extraction module and a seventh layer of basic_DCN of the backbox feature extraction module;

The fifth layer CSP_Blocks of the back tone feature extraction module, the attention mechanism module CBAM, the Neck feature fusion fourth layer Concat and YOLOHead detection head;

The seventh layer CSP_blocks- & gt attention mechanism module CBAM- & gt Neck feature fusion first layer basic_Conv- & gt YOLO Head detection Head of the back feature extraction module;

neck feature fusion first layer basic_conv→attention mechanism module CBAM →feature fusion third layer Upsample →feature fusion fourth layer Concat.

The constructed basic_DCN module structure is as follows:

Input feature map→deformable convolution (DCN) → BatchNorm2d→ LeakyRelu →output feature map Output.

The construction method of the Deformable Convolution (DCN) comprises the following steps:

Step 1) learning the offset of each convolution kernel by an input feature map (input feature map) through a common convolution (conv);

Step 2) the offset is the offset of the coordinate axes x and y, and an offset field (offset field) with the size of 2N is obtained through the step 1), wherein N is the number of convolution kernels;

Step 3), the offset (offsets) of each convolution kernel is contained in an offset field, the offset (offsets) learned by each convolution kernel is corresponding to the original feature map, the offset points are sampled, and the feature map obtained by sampling and the convolution kernel perform common convolution operation;

step 4) obtains an output feature map (output feature map) which can be used as input of the next layer to participate in subsequent network computation.

The attention mechanism module CBAM mainly comprises two spatial attention branches and a channel attention branch connected in series, and the structure is as follows:

Spatial attention branches: input feature map (Input Feature map) → MaxPool, avgPool →pooling feature map→convolution, batchNorm and Relu →concat connection→sigmoid→spatial attention weight→multiplication with input feature map→spatial attention output feature map (Output Feature map);

Channel attention branches: input feature map (Input Feature map) → MaxPool →convolution, batchNorm and Relu →sigmoid→channel attention weighting→multiplication with input feature map→channel attention output feature map (Output Feature map).

In step S6, the evaluation index used is:

1) Recall (Recall): representing the ratio of the number of bamboo sticks detected by the model to the actual number of bamboo sticks.

The formula: recall=tp/(tp+fn)

Wherein TP (True Positive) represents the number of sticks correctly detected by the model, FN (False Negative) represents the number of sticks not detected by the model.

2) Precision (Precision): representing the duty ratio of the correct number in the number of bamboo sticks detected by the model.

The formula: precision = TP/(tp+fp)

Where TP (True Positive) represents the number of sticks correctly detected by the model, FP (False Positive) represents the number of sticks incorrectly detected by the model.

3) MAP (average accuracy): representing the average accuracy of the model at a confidence threshold of 0.5.

The formula: mapj= (ap_1+ap_2+) +ap_n)/n

Where AP_ i (AveragePrecision) represents the average accuracy of the ith category and n represents the total number of categories.

4) F1score (F1 score): the reconciliation average of the accuracy and recall is used to comprehensively consider the accuracy and integrity of the model.

The formula: f1 =2× (precision×recall)/(precision+recall).

Compared with the prior art, the invention has the following technical effects:

1) The invention provides a bamboo stick counting method based on yolov-tiny improved model, which can realize more accurate and efficient bamboo stick counting function by introducing deformable convolution and increasing attention mechanism.

2) The invention combines deformable convolution and a attentive mechanism, and can more accurately detect and count the bamboo sticks. The improved yolov-tiny model has better learning ability on detailed characteristics such as the shape, the texture and the like of the bamboo sticks, improves the counting accuracy, has smaller model size and rapid reasoning speed while keeping the counting accuracy, and can rapidly process a large number of bamboo stick images and improve the counting efficiency.

3) The bamboo stick counting method realizes an automatic counting process, reduces the time and labor cost of manual operation, and can automatically complete counting by only providing images of the bamboo sticks and display the counting result.

4) The method is suitable for various scenes needing to count the bamboo sticks, such as bamboo stick processing sellers, catering industry, and the like, which can benefit from the bamboo stick counting method no matter in the fields of production, sales or use of the bamboo sticks.

Drawings

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

FIG. 1 is a flow chart of an overall implementation of the present invention;

FIG. 2 is a diagram of yolov-tini network employed in the present invention;

FIG. 3 is a block diagram of a basic_DCN module used in the present invention;

FIG. 4 is a schematic diagram of a deformable convolution;

FIG. 5 is a schematic diagram of an attention mechanism module CBAM;

FIG. 6 is a diagram showing the result of identifying and counting bamboo sticks according to the present invention.

Detailed Description

The invention provides a method for counting bamboo sticks based on yolov-tiny by taking an unknown number of bamboo sticks as an object to be detected, as shown in fig. 1, wherein the method is an overall implementation flow of the method, and generally comprises 4 implementation stages, wherein the first stage is for preparing a data set, the second stage is for constructing a network model, the third stage is for training the model and storing weight files in a training process, and the fourth stage is for demonstrating results.

Preparing a bamboo stick real object, randomly selecting one bamboo stick with the diameter of 2-3 mm, photographing and storing the cross section, and repeating the process to take 375 pictures altogether.

And (3) shooting 375 RGB three-channel color bamboo stick images, and uniformly cutting into pictures with the size of 416 x 416 pixels.

The picture importing and marking tool labelImg is used in the marking process, a rectangular frame is used in the marking process, the cross section of one bamboo stick is marked by one rectangular frame, all the bamboo stick cross sections in each picture are marked, corresponding marking files are stored after all the pictures are marked, and the left upper corner position coordinates and the right lower corner position coordinates of the rectangular boundary frame corresponding to each bamboo stick in each picture are stored in the marking files.

The marked data set is divided into a training set, a verification set and a test set:

test set = 9:1;

Training set validation set = 9:1;

the training set after division is 303, the verification set is 34, and the test set is 38.

As shown in fig. 2, a yolov4-tiny network model is constructed as follows:

The fifth layer CSP_Blocks of the back-bone feature extraction module, the attention mechanism module CBAM, the Neck feature fusion fourth layer Concat and the YOLO Head detection Head;

As shown in fig. 3, the basic_dcn module structure is as follows:

Input feature map→deformable convolution (DCN, deformable Convolution Network) → BatchNorm d→ LeakyRelu →output feature map Output.

As shown in fig. 4, the DCN operates as follows:

As shown in fig. 5, the attention mechanism module CBAM mainly includes two serial spatial attention branches and a channel attention branch, and has the structure:

After the model is built, a training stage is entered, a prepared training data set is transmitted into the model for training, 300 rounds of training are performed, and training weight files are stored every 10 rounds.

The trained model was evaluated using Recall (Recall), precision (Precision), mAP (average Precision) and F1 evaluation index.

Recall (Recall): representing the ratio of the number of bamboo sticks detected by the model to the actual number of bamboo sticks.

The formula: recall=tp/(tp+fn)

Precision (Precision): representing the duty ratio of the correct number in the number of bamboo sticks detected by the model.

The formula: precision = TP/(tp+fp)

MAP (average accuracy): representing the average accuracy of the model at a confidence threshold of 0.5.

The formula: mapj= (ap_1+ap_2+) +ap_n)/n

Where AP_ i (Average Precision) represents the average accuracy of the ith category and n represents the total number of categories.

F1 score (F1 score): is the harmonic mean of the accuracy and recall for comprehensively considering the accuracy and integrity of the model. The larger the F1 value, the better the overall performance of the network model.

The formula: f1 =2× (Precision)/(precision+recall)

And (3) transmitting the verification data set into the yolov-tini network by using the evaluation indexes, replacing the weight files stored in the training process, selecting the weight file corresponding to the highest evaluation index in the result as the optimal weight, and storing the optimal weight for subsequent calling.

For comparison, ablation experiments were also performed on yolov other improved methods of yolov-tini, and test sets were transmitted into the network to obtain the results shown in table 1:

in the above table yolov-tiny refers to the original network, the model naming containing dcn refers to the substitution of the normal convolution in basic_conv in yolov-tiny network with the deformable convolution, and containing SE, CA, ECA and CBAM refers to the addition of different attention mechanism methods at the same location in the network.

From the experimental results, it can be derived that: compared with the original yolov-tiny model, the adopted yolov-tiny_dcn_ CBAM network model is improved in F1, recall, precision and mAP, and compared with the original yolov-tiny model, the improved yolov-tiny_dcn_ CBAM model is improved by 0.06 in F1 value, is improved by 3.91% in recall, is improved by 7.32% in Precision and is improved by 2.74% in mAP, and can be verified that the yolov-tiny_dcn_ CBAM network model achieves a certain improvement effect in bamboo stick detection.

Designing and realizing a result demonstration reasoning program, calling a stored optimal weight file, detecting and counting input to-be-detected bamboo stick pictures by using yolov-tiny_dcn_ CBAM network model, and displaying the total number of bamboo sticks on an image.

A portion of the untrained bamboo stick image was prepared separately and passed into the demonstration program with the results shown in fig. 6. The top left corner Count of each picture shows the total number of sticks in the figure, and the position of each stick cross section is marked with a rectangular box.

The invention provides a bamboo stick counting method based on yolov-tiny network, which realizes more accurate and efficient bamboo stick counting function by introducing deformable convolution and increasing attention mechanism. Compared with the traditional manual counting method, weighing counting method and image processing method, the bamboo stick counting method has higher precision, accuracy and stability.

Claims

1. The bamboo stick counting method based on yolov-tiny is characterized by comprising the following steps of:

s4: constructing a yolov-based 4-tiny network;

Through the steps, the bamboo sticks are counted;

in step S4, the yolov-tiny network structure constructed is as follows:

The method comprises the steps of (1) a first layer of Input of a back-bone feature extraction module, (2) a second layer of basic_DCN of the back-bone feature extraction module, (3) a third layer of basic_DCN of the back-bone feature extraction module, (4) a fourth layer of CSP_blocks of the back-bone feature extraction module, (5) a fifth layer of CSP_blocks of the back-bone feature extraction module, (6) a sixth layer of CSP_blocks of the back-bone feature extraction module, and (7) a seventh layer of basic_DCN of the back-bone feature extraction module;

The fifth layer CSP_Blocks (5) →attention mechanism module CBAM (8) → Neck feature fusion fourth layer Concat (10) →Yolo Head detection Head (14);

the seventh layer CSP_Blocks (7) →attention mechanism module CBAM (9) → Neck feature fusion first layer basic_Conv (13) →YOLO Head detection Head (15);

neck feature fusion first layer basic_conv (13) → attention mechanism module CBAM (12) → feature fusion third layer Upsample (11) → feature fusion fourth layer Concat (10).

2. The method of claim 1, wherein the basic_dcn module structure is constructed as follows:

3. The method according to claim 2, characterized in that the method of construction of the Deformable Convolution (DCN) is:

4. The method of claim 1, wherein the attention mechanism module CBAM used mainly comprises two spatial attention branches and a channel attention branch connected in series, and is structured as follows:

5. The method according to claim 1, wherein in step S6, the evaluation index used is:

1) Recall (Recall): representing the proportion of the number of the bamboo sticks detected by the model to the actual number of the bamboo sticks;

The formula: recall=tp/(tp+fn)

Wherein TP (True Positive) represents the number of sticks correctly detected by the model, FN (False Negative) represents the number of sticks not detected by the model;

2) Precision (Precision): representing the correct number duty ratio in the number of bamboo sticks detected by the model;

the formula: precision = TP/(tp+fp)

Wherein TP (True Positive) represents the number of sticks correctly detected by the model, FP (False Positive) represents the number of sticks incorrectly detected by the model;

3) mAP (average accuracy): representing the average accuracy of the model at a confidence threshold of 0.5;

the formula: mapj= (ap_1+ap_2+) +ap_n)/n

Where AP_ i (Average Precision) represents the average accuracy of the ith category and n represents the total number of categories;

4) F1 score (F1 score): the reconciliation average value of the accuracy rate and the recall rate is used for comprehensively considering the accuracy and the integrity of the model;

the formula: f1 =2× (precision×recall)/(precision+recall).