CN116363138B

CN116363138B - Lightweight integrated identification method for garbage sorting images

Info

Publication number: CN116363138B
Application number: CN202310638350.XA
Authority: CN
Inventors: 梁桥康; 邓淞允; 秦海; 邹坤霖; 肖海华; 方乐缘; 汤琳
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-22
Anticipated expiration: 2043-06-01
Also published as: CN116363138A

Abstract

The application discloses a lightweight integrated recognition method for garbage sorting images, which comprises the steps of constructing a lightweight integrated recognition network, wherein the lightweight integrated recognition network comprises a basic prediction model and two lightweight integrated classifier units B1 and B2; the basic prediction model is divided into a front end structure and a tail end classifier unit B0; the backbone structures of B1 and B2 are the same as B0, and a channel attention mechanism is added between the last convolution layer and the global pooling layer; training a basic prediction model; fixing the structural parameters before the channel attention mechanisms of B1 and B2 to be consistent with B0, and training the rest structural parameters of B1 and B2 by using the current training set; updating the training sets of B1 and B2 to differentiate the two based on the category of the misprediction result, and retraining the corresponding classifier units; and (3) carrying out light integrated identification on the garbage sorting image by combining the voting module with the B0, the B1 and the B2. The method can greatly improve the classification precision of the garbage sorting images.

Description

Lightweight integrated identification method for garbage sorting images

Technical Field

The application belongs to the field of image processing, and particularly relates to a lightweight integrated identification method for garbage sorting images.

Background

With the rapid development of human science and technology and the acceleration of the urban process, the quality of life of urban residents is continuously improved, which leads to the rapid increase of the yield of household garbage. How to treat the garbage and further suppress the generation of more garbage is an urgent social problem, and in order to build a better living environment, sorting of garbage is an essential means for treating the living garbage, and how to sort the garbage better is a current research hotspot.

The rapid classification of garbage images can provide great convenience for classifying garbage at high quality and high speed, and with the development of artificial intelligence technology, more and more deep learning methods are applied to garbage sorting, such as ResNet, VGGNet, garbageNet. However, since the areas where the household garbage is generated are wide, the classification scenes are different, and most of the treatment sites and collection sites of the household garbage are difficult to use high-power equipment to replace manual distinction. Existing conventional deep learning models are often difficult to apply to low-power devices due to the large number of parameters and computational complexity. Meanwhile, the lightweight model is difficult to achieve a higher classification level in classification precision, and has poor robustness and generalization capability, so that the problem of high-quality garbage image classification in a low-computation force field is solved.

As a traditional machine learning method, ensemble learning is widely applied to various deep learning models in the front field of image recognition at present, so that different deep learning models are fused, and the overall classification accuracy is improved. However, the fusion of the method and the deep learning greatly increases the overall calculation complexity of the deep learning model and the parameter number of the model, which brings a difficult problem for the application of the method to a low-calculation force scene.

Disclosure of Invention

The application provides a lightweight integrated recognition method for garbage sorting images, which is characterized in that two lightweight integrated classifier units are used for carrying out differential retraining based on sample sets added with different prediction error classes, and a plurality of prediction results are fused by combining a basic prediction model, so that the sorting precision of the garbage sorting images is greatly improved on the basis of a small amount of enhancement parameters and calculation complexity.

In order to achieve the technical purpose, the application adopts the following technical scheme:

a lightweight integrated recognition method for a trash sorting image, comprising:

step 1, obtaining a large number of marked garbage images, and constructing an original training set, a verification set and a test set; the original training set is used as the current training set of each lightweight integrated classifier unit;

step 2, constructing a lightweight integrated recognition network, which comprises a basic prediction model, two lightweight integrated classifier units B1 and B2 and a voting fusion module; the basic prediction model is divided into a front-end structure and a tail-end classifier unit B0, and initial parameters of the model are obtained by pre-training an image classification data set ImageNet; each lightweight integrated classifier unit adopts the same backbone structure as B0, and a channel attention mechanism is added between the last convolution layer and the subsequent global pooling layer; b0, B1 and B2 share the front end structure of the same basic training model during training and testing;

step 3, training the basic prediction model again by using the original training set;

step 4, fixing the structural parameters before the channel attention mechanism of the lightweight integrated classifier unit to be consistent with the corresponding structural parameters in the basic prediction model after retraining, and training the rest structural parameters of the lightweight integrated classifier unit by using the current training set; then using the verification set to calculate the prediction precision of the lightweight integrated classifier unit obtained in the training;

step 5, judging whether the prediction precision of the lightweight integrated classifier unit reaches a given precision;

if the prediction precision of the two lightweight integrated classifier units reaches the given precision, executing the step 6;

if the prediction precision of a certain lightweight integrated classifier unit does not reach a given precision, discarding the training of the lightweight integrated classifier unit in the step 4 in the current cycle, modifying the random factors of the lightweight integrated classifier unit and the random factors of the image input sequence, and executing the operations of the step 4 and the step 5 on the lightweight integrated classifier unit again;

step 6, the lightweight integrated recognition network obtained by the current training is used for carrying out integrated recognition on the verification set, an integrated recognition precision curve is observed, if the curve converges, the lightweight integrated recognition network obtained by the current training is used for carrying out integrated recognition on the test set, and the lightweight integrated recognition on the garbage sorting image of the test set is completed; if the curve is not converged, continuing the step 7;

step 7, inputting the current training sets of the two lightweight integrated classifier units into the lightweight integrated recognition network and the basic prediction model to obtain a prediction result; aiming at a lightweight integrated classifier unit B1, screening training set samples in which a predicted result belongs to a first type of error, enhancing the training set samples, and adding the training set samples to a current training set of the lightweight integrated classifier unit B1; aiming at a lightweight integrated classifier unit B2, screening training set samples in which a predicted result belongs to a second class of errors, enhancing the training set samples, and adding the training set samples to a current training set; and re-executing the step 4-6 until the light integrated identification of the test set garbage sorting image is completed.

Further, the basic prediction model adopts MobileNet V2.

Further, the first type of error and the second type of error are defined as: the method comprises the steps of randomly dividing the label category of the garbage sorting image into two major categories A1 and A2, defining the condition that the actual label in the current training set belongs to A1 and the predicted result belongs to A2 as a first type error, and defining the condition that the actual label in the current training set belongs to A2 and the predicted result belongs to A1 as a second type error.

Further, the tag class of the garbage image includes: cardboard, metal, plastic, glass, paper, other waste.

Further, the sample enhancement processing in step 7, specifically, the sample replicationPortion and go onEnhancement treatment of the copies, i.e. corresponding increase for each sample screenedA new sample of the correlation;

for training set samples with prediction results belonging to a first type of errors, the offline enhancement mode comprises the following steps: (1) random horizontal and vertical flipping, (2) random rotation by a plurality of 90 °, (3) random addition of gaussian noise with 50% probability, (4) random filtering of images with 50% probability;

for training set samples with prediction results belonging to the second class of errors, the offline enhancement mode comprises the following steps: (1) color equalization processing with 50% probability, (2) dithering color saturation with 50% probability, (3) changing image contrast over a random range interval.

Further, when the current training set obtained after the enhancement processing in the step 7 is used for returning to the step 4 to retrain the lightweight integrated classifier unit, the on-line enhancement complexity of the garbage sorting image is reduced within a given amplitude range; the online complexity enhancement method comprises the following steps: the range of random image cuts, the range of random color dithering, and the number of such on-line enhancements applied.

Further, the smaller the range of random image interception is, the greater the on-line enhancement complexity is; the larger the random color dithering range is, the larger the on-line enhancement complexity is; the greater the number of online enhancements applied, the greater the online enhancement complexity.

The method greatly improves the recognition accuracy of the model on the basis of less increasing the calculation complexity of the original basic prediction model, thereby further improving the recognition accuracy of the existing lightweight garbage sorting image recognition model, overcoming the problem of unbalanced categories of the original data set to a certain extent, and being capable of carrying out real-time recognition on low-power consumption and low-calculation-force equipment. Compared with the existing lightweight identification technology for garbage sorting images, the method has the following advantages:

(1) The training method for the basic prediction model and the lightweight integrated classifier unit provided by the method can be established on any image classification and identification model based on a deep learning method, and have universality.

(2) Compared with the prior art, the training method for the basic prediction model can effectively improve the identification precision of the original model, and in the embodiment of the method, the method can effectively improve the classification performance of the MobileNet V2 on garbage sorting images, can achieve 94.42% of test precision on a TrashNet data set, and improves the precision by 3% compared with the prior art.

(3) According to the application, the voting strategy is used for fusing the basic prediction model and the two lightweight integrated classifier units, and through the terminal integrated learning method, the classification precision of the original recognition model can be effectively improved on the basis of lightweight, and the robustness of overall model recognition is improved. In the embodiment of the method, the testing precision of the original model can be improved by 1.16 percent by using the method, and the computational complexity is improved by only 0.207GFLOPs.

(4) The method has strong practicability, can be conveniently used for identifying the garbage image on the equipment with low power consumption and low calculation power based on the light-weight technology, and has higher identification precision and generalization capability compared with the prior art.

Drawings

FIG. 1 is a training flow chart of the lightweight integrated unit according to the present application.

Fig. 2 is a schematic diagram of an overall recognition flow of the lightweight integrated recognition method for garbage sorting images according to the present application.

Fig. 3 is a schematic diagram of a channel attention module according to an embodiment of the application.

Fig. 4 is an iteration result of integrating a model on a verification set after training a lightweight integrated classifier unit according to an embodiment of the present application.

FIG. 5 is an iteration result of the integrated model on the test set after training the lightweight integrated classifier unit according to the embodiment of the present application.

Fig. 6 is a class activation diagram of different classifiers of a final model of an embodiment of the application for different classes of pictures on a TrashNet dataset.

Detailed Description

The following describes in detail the embodiments of the present application, which are developed based on the technical solution of the present application, and the detailed embodiments and specific operation procedures are shown in the drawings, and the technical solution of the present application is further explained.

The present example provides a light-weight integrated recognition method, which can be applied to experiments or engineering by using programming languages such as Python, C/c#/c++, etc., and is shown in fig. 1, and includes the following steps:

step 1, obtaining a large number of marked garbage images, and constructing an original training set, a verification set and a test set; the original training set is used as the current training set of each lightweight integrated classifier unit.

The example adopts a recoverable garbage classification data set trashNet proposed by Stanford university, wherein the data set is one of the most widely used open acquisition data sets in the current recoverable garbage image classification, and consists of 2527 pictures, and comprises garbage images with the following 6 label categories: cardboard, metal, plastic, glass, paper, other waste. Among them, 403 sheets of cardboard-like image, 482 sheets of plastic image, 501 sheets of glass image, 410 sheets of metal image, 594 sheets of paper image, 137 sheets of other garbage image are not overlapped with each other. The pixel sizes of all images are 513×384.

Each image is explicitly classified into a garbage category and is classified according to 70:13:17 to construct training, validation and test sets, respectively.

Step 2, constructing a lightweight integrated recognition network, which comprises a basic prediction model, two lightweight integrated classifier units B1 and B2 and a voting fusion module; the basic prediction model is divided into a front-end structure and a tail-end classifier unit B0, and initial parameters of the model are obtained by pre-training an image classification data set ImageNet; each lightweight integrated classifier unit adopts the same backbone structure as B0, and a channel attention mechanism is added between the last convolution layer and the subsequent global pooling layer; b0, B1 and B2 all share the front end structure of the same basic training model during training and testing. As shown in fig. 2 and 3.

The basic prediction model can be derived from the existing lightweight convolutional network model and can also be reconstructed, in this embodiment, mobilenet v2 is adopted as the basic prediction model, the number of end categories of the network full-connection layer is set to be 6, and image classification dataset ImageNet is used for pre-training. The back end part of the MobileNet V2 comprises a convolution layer, a global pooling layer and a full connection layer.

The two lightweight integrated classifier units B1, B2 have the same backbone structure as the end classifier unit B0 and add a channel attention mechanism between the last convolutional layer and the following global pooling layer.

The end classifier unit B0 and the two lightweight integrated classifier units B1 and B2 of the basic prediction model are arranged in a sampleThe output value is expressed asDimension vectors, i.e.

Wherein,,is a lightweight integrated classifierAnd a base prediction model end classifierIs used to determine the output vector of (a),representing the addition of a classifierNetwork input samples of (2)After at the firstAnd outputting the result in the dimension.

Because the adopted voting strategy needs to convert the class probability into the class label, in order to avoid sequence errors, the output class probability needs to be converted into the range of 0-1, namely

The maximum probability is then set to 1 and the remaining probabilities are set to 0, i.e

The final voting result can be expressed asThe calculation method is that

And step 3, training the basic prediction model again by using the original training set.

And training the basic prediction model by using the established original training set, and adopting the model with the highest prediction precision in the verification set as the basic model of the iterative training method of the next integrated classifier in a certain iteration range.

In this example, ubuntu 18.04.5 LTS and beyond is required for training, python3.8.5, pythoch 1.10.1, tonchvision 0.11.2, CUDA11.1 and beyond is required for system environment. The hardware platform needs to meet the requirements that the display card is NVIDIA GeForce RTX 3090 (24G), more than 16G exists in the hardware platform, and a hard disk with the capacity not lower than 256G is adopted. When model training is carried out, 600 epochs are trained in total, an optimizer is Adam, a loss function is a cross entropy loss function, a learning rate scheduler is oneyclelr, an initial learning rate is 0.0009, a maximum learning rate is 0.005, a learning rate modification frequency is 1, a learning rate rising round is 120, a final learning rate is 0.000001, and 256 pictures are loaded each time.

This example uses an on-line enhancement of the image during training, using random horizontal and vertical flipping with color dithering of the image (ColorJitter) and random range clipping of the image (random resize). Wherein the color dithering parameter is 0.12, the parameter intercepted in a random range is 0.07, and the size of the image is compressed to 224×224 after on-line enhancement, and then the image is input into a model for training.

In this example, the training method is repeated 5 times, the model with the highest test precision in the verification set is selected as the basic prediction model of the subsequent step, and if a plurality of models with the same precision exist, the model with the lowest corresponding loss is selected. The highest verification precision of the basic prediction model tested by the example is 95.40%, the loss of the corresponding model in the verification set and the precision of the corresponding model in the test set are 0.26508 and 94.42%, and the arrived Epoch is 548.

Step 4, fixing the structural parameters before the channel attention mechanism of the lightweight integrated classifier unit to be consistent with the corresponding structural parameters in the basic prediction model after retraining, and training the rest structural parameters of the lightweight integrated classifier unit by using the current training set; and then using the verification set to calculate the prediction precision of the lightweight integrated classifier unit obtained in the training.

According to the application, the structural parameters before the channel attention mechanism are fixed to be consistent with the corresponding structural parameters in the basic prediction model after retraining, so that lower training cost can be provided for subsequent integrated iterative training to achieve the iterative condition.

In the training of the lightweight integrated classifier unit B1 in the pair of embodiments, the initial parameter of color shake is set to 0.28, the parameter intercepted in the random range is 0.069, and the parameters are adjusted after iteration according to the ratio of 0.0997 to 1.01. The training parameters of 150 epochs per training of the training set are fixed to be the initial learning rate of 0.00024, the maximum learning rate of 0.0024, the learning rate modification frequency of 1, the learning rate rising cycle of 60, the final learning rate of 0.0000016, and the rest parameters are consistent with the training parameters when the basic prediction model is trained.

In training the lightweight integrated classifier unit B2, the initial parameters of color dithering were set to 0.24, the parameters of random range clipping were 0.065, and the parameters were adjusted after iteration according to the ratio of 0.0997 to 1.01, respectively. The training parameters of 150 epochs per training of the training set are fixed to be the initial learning rate of 0.00029, the maximum learning rate of 0.002, the learning rate modification frequency of 1, the learning rate rising cycle of 60, the final learning rate of 0.0000013, and the rest parameters are consistent with the training parameters when the basic prediction model is trained.

if the prediction precision of the two lightweight integrated classifier units reaches the given precision, the lightweight integrated classifier units with lower training errors are represented, so that the recognition precision of the whole model can be improved due to the increase of the difference between different classifiers, and the step 6 can be further performed;

if a certain lightweight integrated classifier unit cannot be tested on the verification set to reach a given precision when the maximum iteration number is reached, the overall recognition precision of the integrated model is adversely affected, so that training of the lightweight integrated classifier unit in step 4 in the current cycle is abandoned, random factors initialized by an unfixed module model in the lightweight integrated classifier unit and random factors of an image input sequence are modified, and operations of step 4 and step 5 are executed again on the lightweight integrated classifier unit.

The present embodiment will give a given accuracySet as 95.40% of the highest validation accuracy of the underlying predictive model training.

Step 6, the lightweight integrated recognition network obtained by the current training is used for carrying out integrated recognition on the verification set, an integrated recognition precision curve is observed, if the curve converges, the lightweight integrated recognition network obtained by the current training is used for carrying out integrated recognition on the test set, and the lightweight integrated recognition on the garbage sorting image of the test set is completed; if the curve does not converge, the step 7 is continued.

The present embodiment defines the first type of error and the second type of error as: the method comprises the steps that the label categories of the garbage sorting images are randomly divided into two major categories A1 and A2, wherein the major category A1 comprises three label categories of cardboard, metal and plastic, and the major category A2 comprises three label categories of glass, paper and other garbage; and then defining the condition that the actual label in the current training set belongs to A1 and the predicted result belongs to A2 as a first type error, and defining the condition that the actual label in the current training set belongs to A2 and the predicted result belongs to A1 as a second type error.

Aiming at the lightweight integrated classifier unit B1, training set samples in which the prediction result belongs to the first type of errors are screened, and the training set samples are added to the current training set after enhancement processing. Aiming at the lightweight integrated classifier unit B2, training set samples in which the prediction result belongs to the second class of errors are screened, and the training set samples are added to the current training set after enhancement processing. In which the samples are enhanced, in particular copiedPortion and go onEnhancement treatment of the copies, i.e. corresponding increase for each sample screenedThe new sample was correlated. The embodiment is setAnd is also provided withI.e. the way in which the erroneous samples are enhanced is only by image offline enhancement rather than copying the original imageMode(s).

When in offline enhancement, all contents in the corresponding column are enhanced for each sample, namely, each row of contents in the corresponding column in the table 1 is enhanced once, and finally, an enhancement result is output. The manner in which offline enhancements are detailed in table 1 below.

Different training parameter settings related to the lightweight integrated classifier units B1 and B2 are based on achieving more iteration rounds and more differentiated training iterations, which can be performed in parallel.

The method can finally realize 14 iterations in 100 tests, and finally realize that the integration precision on the verification set is up to 96.01%, the integration precision on the test set is up to 95.58% compared with the basic prediction model, the integration precision on the test set is up to 1.16% compared with the basic prediction model, the total calculation complexity of the final integrated model is 0.527GFLOPs, the parameter quantity is 12.9M, and the calculation complexity and the parameter quantity of the basic prediction model are only respectively improved by 0.207GFLOPs and 9.4M.

The existing integrated model has no integrated learning method adopting light weight on the TrashNet data set, and table 2 shows the comparison of the method of the application with other existing integrated models for training on the TrashNet data set, so that the method of the application is advanced in accuracy, computational complexity or parameter quantity over the training results of the existing integrated models.

The existing garbage sorting image recognition method based on the single depth network model has the problems of optimizing training precision and model calculation amount. Table 3 shows the performance comparison of the method of the present application with the current advanced existing identification method on the TrashNet dataset, and it can be seen that the method of the present application can achieve extremely high identification accuracy with extremely little computational complexity. Compared with the ViT-B/32 method with highest precision in the table, the method obtains the near recognition precision with the computational complexity which is more than 10 times lower than that of the Vit-B/32 method, and simultaneously, the recognition precision is 4.16 percent higher than that of the existing method using MobileNet V2, and the calculated amount is only 0.207GFLOPs more.

In the embodiment of the application, the iterative results on the verification set and the test set are respectively shown in fig. 4 and 5, the recognition effect of the final model is shown in fig. 6, and the attention areas of three classifiers of the final integrated model on different types of images are described in fig. 6 by adopting a Grad-Cam method. The method has the advantages that the different models have different attention areas on the same picture, so that the attention difference among the different models is increased, the different models have better diversity, and the problem of unbalanced categories of the original data set can be overcome to a certain extent. Meanwhile, the techniques such as training threshold value and the like provided by the method in the integrated model training ensure the basic prediction capability of different lightweight integrated classifier units, and finally, the recognition precision is greatly improved.

According to the application, error samples of different error types in the training set are iteratively enhanced by a differential enhancement method, and then the error samples are used for retraining of different lightweight integrated classifier units, so that the lightweight integrated classifier units with differential recognition capability are obtained, different models have better diversity, the problem of unbalanced categories of an original data set can be overcome to a certain extent, and the classification precision is greatly improved on the basis of slightly increasing the parameter number and the calculation complexity by combining with the original classifier.

The above embodiments are preferred embodiments of the present application, and various changes or modifications may be made thereto by those skilled in the art, which should be construed as falling within the scope of the present application as claimed herein, without departing from the general inventive concept.

Claims

1. A lightweight integrated recognition method for a trash sorting image, comprising:

step 7, inputting the current training sets of the two lightweight integrated classifier units into the lightweight integrated recognition network and the basic prediction model to obtain a prediction result; aiming at a lightweight integrated classifier unit B1, screening training set samples in which a predicted result belongs to a first type of error, enhancing the training set samples, and adding the training set samples to a current training set of the lightweight integrated classifier unit B1; aiming at a lightweight integrated classifier unit B2, screening training set samples in which a predicted result belongs to a second class of errors, enhancing the training set samples, and adding the training set samples to a current training set; re-executing the step 4-6 until light integrated identification of the test set garbage sorting image is completed;

the first type of error and the second type of error are defined as: the method comprises the steps of randomly dividing the label category of the garbage sorting image into two major categories A1 and A2, defining the condition that the actual label in the current training set belongs to A1 and the predicted result belongs to A2 as a first type error, and defining the condition that the actual label in the current training set belongs to A2 and the predicted result belongs to A1 as a second type error.

2. The lightweight integrated recognition method for garbage sorting images according to claim 1, wherein the base prediction model employs MobileNetV2.

3. The lightweight integrated recognition method for spam images of claim 1, wherein the label categories of spam images comprise: cardboard, metal, plastic, glass, paper, other waste.

4. The lightweight integrated recognition method for spam sorting images according to claim 1, wherein the sample enhancement processing in step 7, in particular the copying of the samplePart and go->Enhancement treatment of the copies, i.e.corresponding increase +/for each sample screened>A new sample of the correlation;

5. The lightweight integrated recognition method for garbage sorting images according to claim 1, wherein when the lightweight integrated classifier unit is retrained using the current training set obtained after the reinforcement processing of step 7, returning to step 4, the on-line reinforcement complexity of the garbage sorting images is reduced within a given range of magnitudes; the online complexity enhancement method comprises the following steps: the range of random image interception, the random color dithering range and the number of the on-line enhancement modes applied;

the smaller the random interception range of the image is, the greater the on-line enhancement complexity is; the larger the random color dithering range is, the larger the on-line enhancement complexity is; the greater the number of online enhancements applied, the greater the online enhancement complexity.