CN117689731A

CN117689731A - Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model

Info

Publication number: CN117689731A
Application number: CN202410149737.3A
Authority: CN
Inventors: 郭佳豪; 晋军; 刘一霏; 魏雨辰; 谷霄月; 邓雄
Original assignee: Shaanxi Dechuang Digital Industrial Intelligent Technology Co ltd
Current assignee: Shaanxi Dechuang Digital Industrial Intelligent Technology Co ltd
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-03-12
Anticipated expiration: 2044-02-02
Also published as: CN117689731B

Abstract

The invention relates to a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv5 model, which comprises the following steps: collecting an image dataset of the battery pack; preprocessing an image data set to construct a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to proportions; constructing an LFNT model; repeatedly adjusting parameters and training the LFNT model by using the training set to obtain an optimal new energy heavy-duty battery pack detection model; predicting a test set through an optimal new energy heavy-duty battery pack detection model, and identifying the specific position of a battery pack; the abstract features and the advanced features of the battery pack are extracted through a deep learning method, and a deep learning model for battery pack identification is obtained through repeated training of a deep convolutional neural network, so that the detection precision and the working efficiency of the battery pack can be greatly improved in the actual power-changing process.

Description

Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model

Technical Field

The invention relates to the field of battery pack identification, in particular to a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv5 model.

Background

The trend of new energy heavy truck gradually changes to new energy and intelligent direction in the traffic field is continuously enhanced, and the new energy heavy truck utilizes an electric energy driving mode, so that the new energy heavy truck has the advantages of high energy utilization rate and low operation cost compared with the traditional heavy truck, and gradually becomes an important choice in industries such as ports, mining areas, logistics and the like.

The new energy heavy truck is usually powered by a battery pack loaded on the back of the truck body, and when the electric energy of the vehicle-mounted battery pack is consumed to a certain extent, electric quantity is required to be supplemented so as to maintain continuous running of the new energy heavy truck. At present, battery pack replacement is one of the main modes of new energy heavy truck energy replenishment, the battery pack to be exhausted is detected mainly through a battery replacement station camera, the battery pack to be exhausted is replaced by a full battery pack of a battery replacement robot to complete replenishment of new energy heavy truck energy, and the battery pack of the new energy heavy truck is large in capacity and large in mass, so that the battery pack to be exhausted needs to be detected before battery replacement, and then the battery replacement robot is responded to perform battery replacement. The existing detection method is to detect the battery pack through non-contact measurement technologies such as sensors, on one hand, the sensors are easily interfered by multiple aspects to reduce the detection precision, and potential safety hazards exist; on the other hand, the complexity and the cost of the detection equipment are high, a great deal of time is spent on operation and maintenance by professionals, and the efficiency is low.

Disclosure of Invention

Aiming at the problems, the invention provides a lightweight new energy heavy-duty battery pack identification method based on an improved YOLOv5 model, which is characterized in that abstract features and advanced features of a battery pack are extracted through a deep learning method, and a deep learning model for battery pack identification is obtained through repeated training of a deep convolutional neural network, so that the detection precision and the working efficiency of the battery pack can be greatly improved in the actual power change process.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a light new energy heavy-duty battery pack identification method based on an improved YOLOv5 model comprises the following steps:

s1, acquiring an image data set of the battery pack;

s2, preprocessing an image data set, constructing a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to the proportion;

s3, constructing an LFNT (lightweight rapid detection network) model;

s4, performing repeated parameter adjustment and training on the LFNT model by using the training set to obtain an optimal new energy heavy-duty battery pack detection model;

s5, predicting the test set through an optimal new energy heavy-duty battery pack detection model, and identifying the specific position of the battery pack.

Preferably, S2 specifically includes:

s21, converting the image data set in the S1 into a YOLOv5 format to obtain txt files of information of the left upper corner coordinates and the right lower corner coordinates of the battery pack in each image;

s22, generating three same-level folders under the total folder, and generating two subfolders under each folder in the three same-level folders, wherein the two subfolders respectively store images and corresponding txt files to form a battery pack data set;

s23, the images in the battery pack data set and txt files are all processed according to 8:1:1, generating a training set, a verification set and a test set, automatically creating a class. Txt file in a label folder under the folder where each training set is located, and writing a package English word in the class. Txt file.

Preferably, in S3, the LFNet model includes a trunk feature extraction network and a neg feature processing network, and the input of the LFNet model is the input of the trunk feature extraction network, and the output of the trunk feature extraction network is the input of the neg feature processing network.

Preferably, the trunk feature extraction network includes four-layer feature extraction blocks and an SPPF (rapid spatial pyramid pooling) module, in the four-layer feature extraction blocks, an output of a previous-layer feature extraction block is used as an input of a next-layer feature extraction block, an input of the trunk feature extraction network is an input of a first-layer feature extraction block, an output of a last-layer feature extraction block is an input of the SPPF module, and an output of the SPPF module is an input of a neg feature processing network.

Preferably, each layer of the feature extraction block is composed of a convolution combination and a CFNet (cross-stage lightweight network) -1 module, the input of the feature extraction block is combined through the convolution, the output of the convolution combination is used as the input of the CFNet-1 module, and the output of the CFNet-1 module is the output of the feature extraction block.

Preferably, the CFNet-1 module includes a convolution combination, a multi-layer CFBlock (lightweight block) and an SCABlock (space-channel dual-attention block), wherein the output of the former layer CFBlock is used as the input of the latter layer CFBlock; after the input of the CFNet-1 module is combined through a convolution, the input of the CFblock is used as the input of a first layer of CFblock, the output of the last layer of CFblock in the multi-layer CFblock and the input of the first layer of CFblock are subjected to channel splicing (jointing), and then the output of the CFblock is used as the input of the SCA block, and the output of the SCA block is the output of the CFNet-1 module;

the CFBlock extracts the characteristics of the partial channel number input by PConv (partial convolution), the extracted characteristics sequentially pass through two 1X 1 convolutions with different channel numbers, and the sum of the output of the second 1X 1 convolution and the input of the CFBlock is used as the output of the CFBlock.

Preferably, the SCABlock includes a CABlock (channel attention block) and a SABlock (spatial attention block), the inputs of the SCABlock are respectively taken as the inputs of the CABlock and SABlock, and the sum of the output of the CABlock and the output of the SABlock is taken as the output of the SCABlock;

the input of the CABLock is subjected to convolution combination and then is respectively used as the input of global average pooling and global maximum pooling; adding the global average pooled output and the global maximum pooled output, inputting a sigmoid function, and weighting the output of the sigmoid function into the input of a CABLock to obtain the output of the CABLock module;

the input of the SABlock module is respectively combined through two parallel convolutions, the product of the result of one convolution combination output after reshape (size change) and the result of the other convolution combination output after reshape (transposition) is input into a softmax function, and the output of the softmax function is weighted into the input of the SABlock to obtain the output of the SABlock.

Preferably, the input of the SPPF module sequentially carries out a convolution combination and three layers of maximum pooling layers, and after the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, the output of the SPPF module is obtained through the convolution combination.

Preferably, the neg feature processing network includes an FPN (feature pyramid network) structure and a PAN (path aggregation network) structure, where the FPN structure and the PAN structure each include two CFNet-2 modules, compared with the CFNet-1 modules, each layer of CFBlock in the CFNet-2 modules has no residual edge, the output of the SPPF module is used as the input of the FPN structure, the output of the SPPF module is channel-spliced with the output of the third layer of feature extraction block in the trunk feature extraction network to obtain a first feature image, the output of the first CFNet-2 module in the FPN structure is channel-spliced with the output of the second layer of feature extraction block in the trunk feature extraction network to obtain a second feature image, the output of the second CFNet-2 module in the FPN structure is used as the input of the PAN structure, the output of the second CFNet-2 module in the FPN structure is channel-spliced with the output of the first CFNet-2 module in the second feature extraction network, and the output of the first CFNet-2 module in the second feature structure is channel-spliced with the output of the second CFNet-2 module in the first cfn structure, and the output of the second feature image is channel-spliced with the first CFNet-2 module in the second feature image is the first CFNet-2 module.

Preferably, the convolution combination includes a 1×1 convolution, BN normalization, and a lu activation function, and the input of the convolution combination sequentially passes through the 1×1 convolution, BN normalization, and the lu activation function, and the output of the lu activation function is the output of the convolution combination.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the CFNet module is introduced into the trunk feature extraction network and the neg feature processing network of the LFNT model, so that the overall parameter quantity of the model can be greatly reduced, the memory access and calculation resources of the model are reduced, meanwhile, the SCABlock is also introduced into the trunk feature extraction network of the LFNT model, the space perception capability and the channel correlation of the model are enhanced, the battery pack recognition capability is greatly enhanced, and the introduction of the CFNet module and the SCABlock ensures the model detection precision and simultaneously accelerates the reasoning speed of the LFNT model;

(2) The invention can automatically detect the new energy heavy-duty battery pack after entering the station in real time, and has the advantages of high detection precision, high detection speed, high efficiency, high safety and the like;

(3) The invention effectively applies the deep learning model to the battery pack detection technology of the new energy heavy-duty intelligent power conversion system, so that the application scene of the invention is wide, and the invention can be used for ports, mining areas, logistics and the like, and is an effective way for the intelligent power conversion technology.

Drawings

FIG. 1 is a flow chart of a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv5 model;

FIG. 2 is a diagram illustrating placement of data set files in accordance with the present invention;

FIG. 3 is a block diagram of the LFNT model of the present invention;

FIG. 4 is a block diagram of the CFNet-1 module of the present invention;

FIG. 5 is a block diagram of the CFblock of the present invention;

FIG. 6 is a block diagram of the SCABlock module of the present invention;

FIG. 7 is a block diagram of an SPPF module of the present invention;

FIG. 8 is a graph of the final loss value for LFNT training at the same epoch as the prior lightweight model;

FIG. 9 is a graph comparing the LFNT model and the YOLOv5 model for light weight index;

fig. 10 is a visual diagram of LFNet model and YOLOv5 model predictions.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention provides a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv5 model, which comprises the following steps:

s1, acquiring an image data set of the battery pack; the data set contains rich information and rules, is a basic stone for model learning, and is a material for model learning and optimization, so that the acquisition of the data set is important. The new energy heavy-duty battery pack is black in color and cubic in shape, and the battery pack is singly collected and used for training a deep learning model, so that the model detection effect is poor, and the model generalization capability is also poor. Therefore, in order to improve the detection precision of the model and enhance the generalization capability of the model, the battery pack image data are collected from different visual angles, different brightness, different distance sizes and the like, the collected battery pack image data are effectively expanded by using a data enhancement mode, mainly the operations such as fuzzy processing, sharpening processing, brightness change and the like are carried out on an image data set, and the collection and processing mode can enrich the battery pack data on one hand, so that the model obtains abundant information and insight; on the other hand, the model can be simulated and adapted to different scene changes in a training stage so as to be used in complex and changeable practical application scenes.

S2, referring to FIG. 2, preprocessing an image data set to construct a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to proportions;

the method for preprocessing the image data set in the S1 comprises the following steps: converting the image data set into a YOLOv5 format to obtain txt files of information of the left upper corner coordinates and the right lower corner coordinates of the battery pack in each image;

the method for constructing the battery pack data set comprises the following steps: generating three same-level folders under the total folder, generating two subfolders under each folder in the three same-level folders, wherein the two subfolders respectively store images and corresponding txt files to form a battery pack data set;

and (3) the images and txt files in the battery pack data set are all processed according to 8:1:1, generating a training set, a verification set and a test set, automatically creating a class. Txt file in a label folder under the folder where each training set is located, and writing a package English word in the class. Txt file.

S3, referring to FIG. 3, constructing an LFNT model:

the LFNT model comprises a trunk feature extraction network and a neg feature processing network, wherein the input of the LFNT model is the input of the trunk feature extraction network, and the output of the trunk feature extraction network is the input of the neg feature processing network;

the trunk feature extraction network comprises four layers of feature extraction blocks and an SPPF module, wherein in the four layers of feature extraction blocks, the output of the former layer of feature extraction block is used as the input of the latter layer of feature extraction block, the input of the trunk feature extraction network is the input of the first layer of feature extraction block, the output of the last layer of feature extraction block is the input of the SPPF module, and the output of the SPPF module is the input of the neg feature processing network;

each layer of the feature extraction block consists of a convolution combination and a CFNet-1 module, wherein the input of the feature extraction block is subjected to the convolution combination, the output of the convolution combination is used as the input of the CFNet-1 module, and the output of the CFNet-1 module is the output of the feature extraction block;

referring to fig. 4, the CFNet-1 module includes a convolution combination, multiple layers of CFBlock and a SCABlock, CFBlock layer number, wherein the layers can be sequentially set to 3, 6 and 9 layers along with the increase of the model depth, in the multiple layers of CFBlock, the output of the previous layer of CFBlock is used as the input of the next layer of CFBlock, each layer of CFBlock has a residual edge, and the residual edge represents an edge directly added from the input to the output; after the input of the CFNet-1 module is combined through a convolution, the input of the CFblock is used as the input of a first layer of CFblock, the output of the last layer of CFblock in the multi-layer CFblock is spliced with the input of the first layer of CFblock to be used as the input of the SCA block, and the output of the SCA block is the output of the CFNet-1 module; the CFNet-1 module is constructed based on a CSPNet (cross-stage local network) module, and the original local core module in the CSPNet module is replaced by CFBlock, so that the advantages of the cross-stage structural form of the CSPNet module are reserved, the model learning capacity is further enhanced, and the memory cost is reduced;

the convolution combination comprises a 1×1 convolution, BN normalization and a SiLU activation function, wherein the input of the convolution combination sequentially passes through the 1×1 convolution, the BN normalization and the SiLU activation function, and the output of the SiLU activation function is the output of the convolution combination.

Referring to fig. 5, the CFBlock performs feature extraction on the number of partial channels inputted by the CFBlock through PConv, and the extracted features sequentially pass through two 1×1 convolutions with different channel numbers, and the sum of the output of the second 1×1 convolution and the input of the CFBlock is used as the output of the CFBlock.

Referring to fig. 6, the SCABlock includes CABlock and SABlock. The input of the SCA block is respectively used as the input of CABACK and SABlock, and the sum of the output of the CABACK and the output of the SABlock after adding is used as the output of the SCA block; the introduction of CABLock and SABlock can enhance the perceptibility of the model to the spatial features and also enhance the correlation between channels in the features.

The input of the CABLock is subjected to convolution combination and then is respectively used as the input of global average pooling and global maximum pooling; adding the global average pooled output and the global maximum pooled output, inputting a sigmoid function, outputting a channel vector with weight, weighting the output of the sigmoid function into the input of a CABLock to obtain the output of the CABLock, and completing recalibration of the input of the SCABlock in the channel dimension;

in this embodiment, an input feature map of an SCABlock is taken as an example for explanation, the input feature map of the SCABlock enters a CABlock, and a feature map a with the same size as the input feature map is obtained by a convolution combination; then the feature map A respectively obtains two global feature maps through a global average pooling layer and a global maximum pooling layer; then adding the two feature graphs with global property, and obtaining a channel vector with weight through a sigmoid function; and finally, weighting the channel vector weights on the input feature map channel by channel, and outputting the channel vector weights by CABlock, so as to finish recalibration of the input feature map on the channel dimension.

The SABlock is mainly composed of two parallel convolution combinations and softmax functions and reshape, transpose operation, wherein the input of the SABlock is respectively formed by the two parallel convolution combinations, the product of the result of one convolution combination output subjected to reshape and the result of the other convolution combination output subjected to reshape and transpost is input into the softmax function, the output of the softmax function is weighted into the input of the SABlock, the output of the SABlock is obtained, and the recalibration of the input of the SCABlock in the space dimension is completed;

in this embodiment, an input feature map of an SCABlock is taken as an example for explanation, and the input feature map of the SCABlock enters the SABlock to obtain a feature map a and a feature map B with the same size as the input feature map respectively through two parallel convolution combinations; in order to enable the feature map A and the feature map B to meet matrix multiplication conditions, so that a space weight matrix is generated, carrying out reshape operation on the feature map A and the feature map B to obtain a matrix of R epsilon (H multiplied by W) multiplied by I, wherein R represents the obtained matrix, H represents the height of the feature map A after reshape operation, W represents the width of the feature map A after reshape operation, I represents the number of channels of the feature map A after reshape operation, and carrying out transfer on the feature map B to obtain a matrix of R epsilon I multiplied by X (H multiplied by W); multiplying the two changed matrixes, and obtaining a product which is subjected to a softmax function to obtain a weight characteristic matrix, wherein each point in the weight characteristic matrix has a certain spatial relationship; and finally, restoring the spatial relation weight matrix into the size of the input feature map, weighting the restored spatial relation weight matrix onto the input feature map, and outputting by SABlock, thereby completing recalibration of the input feature map on the spatial latitude.

Referring to fig. 7, the input of the SPPF module sequentially performs a convolution combination and three layers of maximum pooling layers, and after the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, the output of the SPPF module is obtained by the convolution combination.

The network for processing the neg features comprises an FPN structure and a PAN structure, wherein the FPN structure and the PAN structure respectively comprise two CFNet-2 modules, each layer of CFBlock in the CFNet-2 modules is free of residual edges compared with the CFNet-1 modules, namely, the output of a second 1X 1 convolution is directly the output of the CFBlock in the CFNet-2 modules, the output of the SPPF module is used as the input of the FPN structure, channel stitching is carried out on the output of a third layer feature extraction block in the trunk feature extraction network to obtain a first feature image, the output of the first CFNet-2 module in the FPN structure is used as the input of a first CFNet-2 module in the FPN structure, channel stitching is carried out on the output of the first CFNet-2 module in the FPN structure and the output of a second layer feature extraction block in the trunk feature extraction network to obtain a second feature image, the second feature image is used as the input of the second CFNet-2 module in the FPN structure, and the output of the second CFNet-2 module in the FPN structure is used as the input of the second CFNet-2 module in the PAN structure, and the first PAN structure is used as the output of the first PAN-2 module in the PAN structure, and the output of the first PAN-2 module in the PAN structure is processed on the second PAN structure;

in this embodiment, an input feature image of an FPN structure is taken as an example to describe, the input feature image of the FPN structure and an output of a third layer feature extraction block in a trunk feature extraction network are subjected to channel splicing to obtain a feature image C, the feature image C passes through a first CFNet-2 module in the FPN structure to output a feature image D, the feature image D and an output of a second layer feature extraction block in the trunk feature extraction network are subjected to channel splicing to obtain a feature image E, the feature image E passes through a second CFNet-2 module in the FPN structure to output a feature image F, the feature image F is taken as an input of a PAN structure, and the FPN structure aims to fully utilize features of each level, so that features of each level contain abundant semantic information and abundant detail information.

The PAN structure is opposite to the FPN feature fusion direction, an input feature image F of the PAN structure is spliced with the feature image E in a channel mode, then a feature image G is output through a first CFNet-2 module in the PAN structure, after the feature image G is spliced with the feature image C, the input of a second CFNet-2 module in the PAN structure is used as the input of a next feature processing network, the output of the second CFNet-2 module in the PAN structure is the output of the next feature processing network, and the PAN structure aims to further cascade and integrate feature information and improve the expression capacity of the model.

the pictures in the training set are input into the LFNet model, the LFNet model is repeatedly subjected to parameter adjustment and training, and in the embodiment, a loss value (loss) obtained after the model is trained 300 times and a lightweight index (floating point operation times and parameter amount) of the model are used as evaluation indexes of the model performance.

The loss values (loss) obtained after training the LFNet model 300 times with the existing lightweight models (YOLOv 3, PP-YOLO, YOLOv4, YOLOv5s, YOLOX-s) are shown in fig. 8, and as can be seen from fig. 8, the LFNet model using the CFNet module and the SCABlock is significantly faster in convergence speed compared with the original YOLOv5 model, and the final loss value is also lower, which indicates that the LFNet model of the invention is faster in reasoning speed and higher in detection accuracy.

The light weight indexes of the LFNet model and the existing light weight models (YOLOv 5s, YOLOv5 sn) are shown in fig. 9, wherein the horizontal axis is the parameter number (Params), the vertical axis is the number of floating point operations (FLOPs), the LFNet-n model is the LFNet model with the parameter number of 1.2M, the LFNet-s model is the LFNet model with the parameter number of 5.4M, the parameter number of the LFNet-n model is reduced by 36.8% compared with the parameter number of the YOLOv5n model, the parameter number of the LFNet-s model is reduced by 25% compared with the parameter number of the YOLOv5s model, and the LFNet model has lower FLOPs value compared with the corresponding YOLOv5n and YOLOv5s, namely the LFNet model has relatively lower calculation complexity.

Fig. 8 and 9 verify that LFNet models are lighter than existing lightweight models.

S5, inputting the pictures in the test set into an optimal new energy heavy-duty battery pack detection model, and predicting the pictures in the test set by the optimal new energy heavy-duty battery pack detection model to identify the specific positions of the battery packs.

Referring to fig. 10, in order to further demonstrate performance of LFNet, in this embodiment, the LFNet model and the YOLOv5 model are compared with each other in the detection result visualization of partial images in the battery pack data set, the upper row of pictures in fig. 10 is the YOLOv5n detection result visualization, the lower row of pictures is the LFNet-n detection result visualization, and as can be seen from the first row of pictures on the left in fig. 10, the overall detection effect of LFNet-n (accuracy is 0.9) is better than that of YOLOv5n (accuracy is 0.8); as can be seen from the second column of the pictures from the left in fig. 10, YOLOv5n has a missed detection condition (only one target is detected), whereas LFNet-n has no such condition (two targets are detected); as can be seen from the third column of the pictures from the left in fig. 10, the small target detection effect of LFNet-n (small target accuracy of 0.9) is better than YOLOv5n (small target accuracy of 0.8); as can be seen from the fourth column of the left image in fig. 10, both LFNet-n and YOLOv5n can detect multiple targets, but LFNet has better detection effect, YOLOv5n also has repeated detection of targets, and a large number of detection frames repeatedly appear on the same target, but LFNet-n does not have such a situation. Therefore, the LFNT model can effectively detect the whole battery pack, the small battery pack targets and a plurality of battery pack targets, and the conditions of missing detection and repeated detection are avoided.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. The light new energy heavy-duty battery pack identification method based on the improved YOLOv5 model is characterized by comprising the following steps of:

s1, acquiring an image data set of the battery pack;

s3, constructing an LFNT model;

the CFNet-1 module comprises a convolution combination, a plurality of layers of CFblocks and an SCA block, wherein in the plurality of layers of CFblocks, the output of the former layer of CFblock is used as the input of the latter layer of CFblock; after the input of the CFNet-1 is combined through a convolution, the input of the CFblock is used as the input of a first layer of CFblock, the output of the last layer of CFblock in the multi-layer CFblock and the input of the first layer of CFblock are subjected to channel splicing, and then the output of the CFblock is used as the input of the SCA block, and the output of the SCA block is the output of the CFNet-1 module;

the SCA Block comprises CABACK and SABlock, wherein the input of the SCA Block is respectively used as the input of CABACK and SABlock, and the sum of the output of the CABACK and the output of the SABlock is added to be used as the output of the SCA Block;

the network for processing the neg features comprises an FPN structure and a PAN structure, wherein the FPN structure and the PAN structure comprise two CFNet-2 modules, compared with the CFNet-1 modules, each layer of CFBlock in the CFNet-2 modules has no residual edge, the output of the SPPF module is used as the input of the FPN structure, the output of the SPPF module is used for channel stitching with the output of a third layer of feature extraction block in the trunk feature extraction network to obtain a first feature image, the first feature image is used as the input of a first CFNet-2 module in the FPN structure, the output of the first CFNet-2 module in the FPN structure is used for channel stitching with the output of a second layer of feature extraction block in the trunk feature extraction network to obtain a second feature image, the output of the second feature image is used as the input of a second CFNet-2 module in the FPN structure, the output of the second CFNet-2 module in the FPN structure is used for channel stitching with the second feature image, and the output of the second feature image is used for channel stitching with the first CFNet-2 module in the PAN structure, and the second feature image is used for channel stitching with the second CFNet-2 module in the second CFNet structure;

2. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv5 model according to claim 1, wherein S2 specifically comprises:

3. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv5 model according to claim 1, wherein the CFBlock extracts the characteristics of the partial channel number input by PConv, the extracted characteristics sequentially pass through two 1×1 convolutions with different channel numbers, and the sum of the output of the second 1×1 convolution and the input of the CFBlock is used as the output of the CFBlock.

4. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv5 model according to claim 1, wherein the input of the CABlock is respectively used as the input of global average pooling and global maximum pooling after being subjected to convolution combination; adding the global average pooled output and the global maximum pooled output, inputting a sigmoid function, and weighting the output of the sigmoid function into the input of a CABLock to obtain the output of the CABLock;

and the input of the SABlock is respectively combined through two parallel convolutions, the product of the result of one convolution combination output after reshape and the result of the other convolution combination output after reshape and transpost is input into a softmax function, and the output of the softmax function is weighted into the input of the SABlock to obtain the output of the SABlock.

5. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv5 model according to claim 1, wherein the input of the SPPF module is sequentially subjected to a convolution combination and three layers of maximum pooling layers, the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, and then the output of the SPPF module is obtained through the convolution combination.

6. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv5 model according to any one of claims 1 to 5, wherein the convolution combination comprises a 1 x 1 convolution, BN normalization and a SiLU activation function, and the input of the convolution combination sequentially passes through the 1 x 1 convolution, BN normalization and the SiLU activation function, and the output of the SiLU activation function is the output of the convolution combination.