CN117689731B

CN117689731B - Lightweight new energy heavy-duty battery pack identification method based on improved YOLOv model

Info

Publication number: CN117689731B
Application number: CN202410149737.3A
Authority: CN
Inventors: 郭佳豪; 晋军; 刘一霏; 魏雨辰; 谷霄月; 邓雄
Original assignee: Shaanxi Dechuang Digital Industrial Intelligent Technology Co ltd
Current assignee: Shaanxi Dechuang Digital Industrial Intelligent Technology Co ltd
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-04-26
Anticipated expiration: 2044-02-02
Also published as: CN117689731A

Abstract

The invention relates to a light new energy heavy-duty battery pack identification method based on an improved YOLOv model, which comprises the following steps: collecting an image dataset of the battery pack; preprocessing an image data set to construct a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to proportions; constructing LFNet models; repeatedly adjusting parameters and training the LFNet model by using the training set to obtain an optimal new energy heavy-duty truck battery pack detection model; predicting a test set through an optimal new energy heavy-duty battery pack detection model, and identifying the specific position of a battery pack; the abstract features and the advanced features of the battery pack are extracted through a deep learning method, and a deep learning model for battery pack identification is obtained through repeated training of a deep convolutional neural network, so that the detection precision and the working efficiency of the battery pack can be greatly improved in the actual power-changing process.

Description

Lightweight new energy heavy-duty battery pack identification method based on improved YOLOv model

Technical Field

The invention relates to the field of battery pack identification, in particular to a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv model.

Background

The trend of new energy heavy truck gradually changes to new energy and intelligent direction in the traffic field is continuously enhanced, and the new energy heavy truck utilizes an electric energy driving mode, so that the new energy heavy truck has the advantages of high energy utilization rate and low operation cost compared with the traditional heavy truck, and gradually becomes an important choice in industries such as ports, mining areas, logistics and the like.

The new energy heavy truck is usually powered by a battery pack loaded on the back of the truck body, and when the electric energy of the vehicle-mounted battery pack is consumed to a certain extent, electric quantity is required to be supplemented so as to maintain continuous running of the new energy heavy truck. At present, battery pack replacement is one of the main modes of new energy heavy truck energy replenishment, the battery pack to be exhausted is detected mainly through a battery replacement station camera, the battery pack to be exhausted is replaced by a full battery pack of a battery replacement robot to complete replenishment of new energy heavy truck energy, and the battery pack of the new energy heavy truck is large in capacity and large in mass, so that the battery pack to be exhausted needs to be detected before battery replacement, and then the battery replacement robot is responded to perform battery replacement. The existing detection method is to detect the battery pack through non-contact measurement technologies such as sensors, on one hand, the sensors are easily interfered by multiple aspects to reduce the detection precision, and potential safety hazards exist; on the other hand, the complexity and the cost of the detection equipment are high, a great deal of time is spent on operation and maintenance by professionals, and the efficiency is low.

Disclosure of Invention

Aiming at the problems, the invention provides a lightweight new energy heavy-duty battery pack identification method based on an improved YOLOv model, which is characterized in that abstract features and advanced features of a battery pack are extracted through a deep learning method, and a deep learning model for battery pack identification is obtained through repeated training of a deep convolutional neural network, so that the detection precision and the working efficiency of the battery pack can be greatly improved in the actual power change process.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

A lightweight new energy heavy-duty battery pack identification method based on an improved YOLOv model comprises the following steps:

S1, acquiring an image data set of the battery pack;

S2, preprocessing an image data set, constructing a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to the proportion;

S3, constructing LFNet (lightweight rapid detection network) models;

s4, repeatedly adjusting parameters and training the LFNet model by using the training set to obtain an optimal new energy heavy-duty battery pack detection model;

S5, predicting the test set through an optimal new energy heavy-duty battery pack detection model, and identifying the specific position of the battery pack.

Preferably, S2 specifically includes:

S21, converting the image data set in the S1 into YOLOv format to obtain txt files of information of the left upper corner coordinates and the right lower corner coordinates of the battery pack in each image;

S22, generating three same-level folders under the total folder, and generating two subfolders under each folder in the three same-level folders, wherein the two subfolders respectively store images and corresponding txt files to form a battery pack data set;

S23, the images in the battery pack data set and txt files are all processed according to 8:1:1, generating a training set, a verification set and a test set, automatically creating a class. Txt file in a label folder under the folder where each training set is located, and writing a package English word in the class. Txt file.

Preferably, the model LFNet in S3 includes a backbone feature extraction network and a neg feature processing network, and the input of the LFNet model is the input of the backbone feature extraction network, and the output of the backbone feature extraction network is the input of the neg feature processing network.

Preferably, the trunk feature extraction network includes four-layer feature extraction blocks and an SPPF (rapid spatial pyramid pooling) module, in the four-layer feature extraction blocks, an output of a previous-layer feature extraction block is used as an input of a next-layer feature extraction block, an input of the trunk feature extraction network is an input of a first-layer feature extraction block, an output of a last-layer feature extraction block is an input of the SPPF module, and an output of the SPPF module is an input of a neg feature processing network.

Preferably, each layer of the feature extraction block is composed of a convolution combination and a CFNet (cross-stage lightweight network) -1 module, the input of the feature extraction block is combined by convolution, the output of the convolution combination is taken as the input of the CFNet-1 module, and the output of the CFNet-1 module is the output of the feature extraction block.

Preferably, the CFNet-1 module includes a convolution combination, multiple layers CFBlock (lightweight block) and one SCABlock (space-channel dual-attention block), the output of the previous layer CFBlock being the input of the next layer CFBlock in the multiple layers CFBlock; the input of the CFNet-1 module is combined by a convolution to be used as the input of the first layer CFBlock, the output of the last layer CFBlock in the multi-layer CFBlock is subjected to channel splicing (splicing) with the input of the first layer CFBlock to be used as the input of the SCABlock, and the output of the SCABlock is the output of the CFNet-1 module;

The CFBlock performs feature extraction on the partial channel number input by PConv (partial convolution), and the extracted features sequentially pass through two 1×1 convolutions with different channel numbers, and the sum of the output of the second 1×1 convolution and the input of CFBlock is taken as the output of CFBlock.

Preferably, the SCABlock includes CABlock (channel attention block) and SABlock (spatial attention block), the inputs of the SCABlock are respectively inputs of CABlock and SABlock, and the sum of the output of CABlock and the output of SABlock is added as the output of the SCABlock;

The input of CABlock is subjected to convolution combination and then is respectively used as the input of global average pooling and global maximum pooling; adding the global average pooled output and the global maximum pooled output to input a sigmoid function, and weighting the output of the sigmoid function into the input of CABlock to obtain the output of the CABlock module;

The input of SABlock module is respectively through two parallel convolution combinations, the product of the result of one convolution combination output after reshape (size change) and the result of the other convolution combination output after reshape and transfer is input into the softmax function, and the output of the softmax function is weighted into the input of SABlock, so as to obtain the output of SABlock.

Preferably, the input of the SPPF module sequentially carries out a convolution combination and three layers of maximum pooling layers, and after the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, the output of the SPPF module is obtained through the convolution combination.

Preferably, the neg feature processing network includes an FPN (feature pyramid network) structure and a PAN (path aggregation network) structure, where the FPN structure and the PAN structure each include two CFNet-2 modules, compared with the CFNet-1 modules, each layer CFBlock of the 4238-2 modules has no residual edge, the output of the SPPF module is used as the input of the FPN structure, the output of the SPPF module is channel-spliced with the output of the third layer feature extraction block in the trunk feature extraction network to obtain a first feature image, the first feature image is used as the input of the first CFNet-2 module in the FPN structure, the output of the first CFNet-2 module in the FPN structure is channel-spliced with the output of the second layer feature extraction block in the trunk feature extraction network to obtain a second feature image, the output of the second feature image is used as the input of the second CFNet-2 module in the FPN structure, the output of the second CFNet-2 module in the FPN structure is channel-spliced with the second feature image is used as the input of the first CFNet-2 module in the FPN structure, and the output of the second feature image is channel-spliced with the second image is the second 37-2 module in the second PAN structure, and the second feature image is processed as the output of the second image is the second 37-2 module in the second structure.

Preferably, the convolution combination comprises a 1×1 convolution, BN normalization and SiLU activation functions, the inputs of the convolution combination sequentially passing through the 1×1 convolution, BN normalization and SiLU activation functions, and the output of the SiLU activation function is the output of the convolution combination.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the CFNet module is introduced into the trunk feature extraction network and the neg feature processing network of the LFNet model, so that the overall parameter quantity of the model can be greatly reduced, the memory access and calculation resources of the model are reduced, and SCABlock is also introduced into the trunk feature extraction network of the LFNet model, so that the space perception capability and the channel correlation of the model can be enhanced, the battery pack recognition capability is greatly enhanced, and the introduction of the CFNet module and the SCABlock module ensures the model detection precision and simultaneously can accelerate the reasoning speed of the LFNet model;

(2) The invention can automatically detect the new energy heavy-duty battery pack after entering the station in real time, and has the advantages of high detection precision, high detection speed, high efficiency, high safety and the like;

(3) The invention effectively applies the deep learning model to the battery pack detection technology of the new energy heavy-duty intelligent power conversion system, so that the application scene of the invention is wide, and the invention can be used for ports, mining areas, logistics and the like, and is an effective way for the intelligent power conversion technology.

Drawings

FIG. 1 is a flow chart of a light new energy heavy-duty truck battery pack identification method based on an improved YOLOv model;

FIG. 2 is a diagram illustrating placement of data set files in accordance with the present invention;

FIG. 3 is a diagram of a model structure of the present invention LFNet;

FIG. 4 is a block diagram of a CFNet-1 module according to the present invention;

FIG. 5 is a block diagram of the present invention CFBlock;

FIG. 6 is a block diagram of a SCABlock module according to the present invention;

FIG. 7 is a block diagram of an SPPF module of the present invention;

FIG. 8 is a graph of the resulting loss values for LFNet and a prior lightweight model trained at the same epoch;

FIG. 9 is a graph comparing light weight metrics of LFNet model and YOLOv model;

FIG. 10 is a visualization of predicted results for LFNet and YOLOv models.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention provides a light new energy heavy-duty battery pack identification method based on an improved YOLOv model, which comprises the following steps:

S1, acquiring an image data set of the battery pack; the data set contains rich information and rules, is a basic stone for model learning, and is a material for model learning and optimization, so that the acquisition of the data set is important. The new energy heavy-duty battery pack is black in color and cubic in shape, and the battery pack is singly collected and used for training a deep learning model, so that the model detection effect is poor, and the model generalization capability is also poor. Therefore, in order to improve the detection precision of the model and enhance the generalization capability of the model, the battery pack image data are collected from different visual angles, different brightness, different distance sizes and the like, the collected battery pack image data are effectively expanded by using a data enhancement mode, mainly the operations such as fuzzy processing, sharpening processing, brightness change and the like are carried out on an image data set, and the collection and processing mode can enrich the battery pack data on one hand, so that the model obtains abundant information and insight; on the other hand, the model can be simulated and adapted to different scene changes in a training stage so as to be used in complex and changeable practical application scenes.

S2, referring to FIG. 2, preprocessing an image data set to construct a battery pack data set, and dividing the battery pack data set into a training set, a verification set and a test set according to proportions;

the method for preprocessing the image data set in the S1 comprises the following steps: converting the image data set into YOLOv format to obtain txt file of information of the left upper corner coordinate and the right lower corner coordinate of the battery pack in each image;

The method for constructing the battery pack data set comprises the following steps: generating three same-level folders under the total folder, generating two subfolders under each folder in the three same-level folders, wherein the two subfolders respectively store images and corresponding txt files to form a battery pack data set;

and (3) the images and txt files in the battery pack data set are all processed according to 8:1:1, generating a training set, a verification set and a test set, automatically creating a class. Txt file in a label folder under the folder where each training set is located, and writing a package English word in the class. Txt file.

S3, referring to FIG. 3, constructing LFNet models:

The LFNet model comprises a trunk feature extraction network and a neg feature processing network, wherein the input of the LFNet model is the input of the trunk feature extraction network, and the output of the trunk feature extraction network is the input of the neg feature processing network;

The trunk feature extraction network comprises four layers of feature extraction blocks and an SPPF module, wherein in the four layers of feature extraction blocks, the output of the former layer of feature extraction block is used as the input of the latter layer of feature extraction block, the input of the trunk feature extraction network is the input of the first layer of feature extraction block, the output of the last layer of feature extraction block is the input of the SPPF module, and the output of the SPPF module is the input of the neg feature processing network;

Each layer of the feature extraction block consists of a convolution combination and a CFNet-1 module, wherein the input of the feature extraction block is subjected to the convolution combination, the output of the convolution combination is used as the input of the CFNet-1 module, and the output of the CFNet-1 module is the output of the feature extraction block;

Referring to fig. 4, the CFNet-1 module includes a convolution combination, multiple layers CFBlock and a layer SCABlock, CFBlock, where the layers 3, 6 and 9 can be sequentially set as the model depth increases, and in the multiple layers CFBlock, the output of the previous layer CFBlock is used as the input of the next layer CFBlock, each layer CFBlock has a residual edge, and the residual edge represents an edge that is directly added from the input to the output; the input of CFNet-1 module is combined by a convolution to be used as the input of the first layer CFBlock, the output of the last layer CFBlock in the layers CFBlock is spliced with the input of the first layer CFBlock to be used as the input of SCABlock, and the output of SCABlock is the output of CFNet-1 module; the CFNet-1 module is constructed based on a CSPNet (cross-stage local network) module, and the original local core module in the CSPNet module is replaced by CFBlock, so that the advantages of the cross-stage structure form of the CSPNet module are maintained, the model learning capacity is further enhanced, and the memory cost is reduced;

The convolution combination comprises a1×1 convolution, BN normalization and SiLU activation functions, the input of the convolution combination sequentially passes through the 1×1 convolution, BN normalization and SiLU activation functions, and the output of the SiLU activation function is the output of the convolution combination.

Referring to fig. 5, the CFBlock performs feature extraction on the partial channel number inputted by the CFBlock through PConv, and the extracted features sequentially pass through two 1×1 convolutions with different channel numbers, and the sum of the output of the second 1×1 convolution and the input of the CFBlock is taken as the output of CFBlock.

Referring to fig. 6, SCABlock includes CABlock and SABlock. The inputs of SCABlock are taken as the inputs of CABlock and SABlock, respectively, and the sum of the output of CABlock and the output of SABlock is taken as the output of SCABlock; the introduction of CABlock and SABlock can enhance the perceptibility of the model to the spatial features and also enhance the correlation between channels in the features.

The input of CABlock is subjected to convolution combination and then is respectively used as the input of global average pooling and global maximum pooling; adding the global average pooled output and the global maximum pooled output, inputting a sigmoid function, outputting a channel vector with weight, weighting the output of the sigmoid function into the input of CABlock to obtain CABlock output, and completing recalibration of the input of SCABlock on the channel dimension;

in this embodiment, taking SCABlock as an example of an input feature map, the input feature map of SCABlock enters CABlock, and a feature map a with the same size as the input feature map is obtained by a convolution combination; then the feature map A respectively obtains two global feature maps through a global average pooling layer and a global maximum pooling layer; then adding the two feature graphs with global property, and obtaining a channel vector with weight through a sigmoid function; and finally, weighting the channel vector weights on the input feature map channel by channel, CABlock outputting, and thus, completing recalibration of the input feature map on the channel dimension.

The method mainly comprises the steps of SABlock, performing operation on two parallel convolution combinations and a softmax function and reshape, transpose, wherein the input of SABlock is formed by performing two parallel convolution combinations respectively, inputting the product of the result obtained by reshape on one convolution combination output and the result obtained by reshape and transferring on the other convolution combination output into the softmax function, weighting the output of the softmax function into the input of SABlock to obtain the output of SABlock, and completing the recalibration of the input of SCABlock in the space dimension;

In this embodiment, taking the input feature map SCABlock as an example, the input feature map of SCABlock enters SABlock to obtain feature map a and feature map B with the same size as the input feature map respectively through two parallel convolution combinations; in order to enable the feature map A and the feature map B to meet the matrix multiplication condition, thereby generating a space weight matrix, carrying out reshape operations on the feature map A and the feature map B to obtain a matrix of R epsilon (H multiplied by W) multiplied by I, wherein R represents the obtained matrix, H represents the height of the feature map A after reshape operations, W represents the width of the feature map A after reshape operations, I represents the number of channels of the feature map A after reshape operations, and carrying out transferring on the feature map B to obtain a matrix of R epsilon I multiplied by (H multiplied by W); multiplying the two changed matrixes, and obtaining a product which is subjected to a softmax function to obtain a weight characteristic matrix, wherein each point in the weight characteristic matrix has a certain spatial relationship; and finally, restoring the spatial relationship weight matrix into the size of the input feature map, weighting the restored spatial relationship weight matrix onto the input feature map, outputting SABlock, and thus completing recalibration of the input feature map on the spatial latitude.

Referring to fig. 7, the input of the SPPF module sequentially performs a convolution combination and three layers of maximum pooling layers, and after the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, the output of the SPPF module is obtained by the convolution combination.

The neg feature processing network comprises an FPN structure and a PAN structure, wherein the FPN structure and the PAN structure respectively comprise two CFNet-2 modules, compared with the CFNet-1 module, each CFBlock layer in the CFNet-2 modules has no residual edge, namely, the output of the second 1X 1 convolution is directly the output of CFBlock in the CFNet-2 modules, the output of the SPPF module is used as the input of the FPN structure, the output of the SPPF module is in channel splicing with the output of the third layer feature extraction module in the trunk feature extraction network to obtain a first feature image, the output of the first CFNet-2 module in the FPN structure is used as the input of the first CFNet-2 module in the FPN structure, the output of the first CFNet-2 module in the FPN structure is in channel splicing with the output of the second layer feature extraction module in the trunk feature extraction network to obtain a second feature image, the output of the second CFNet-2 module in the FPN structure is used as the input of the PAN structure, the output of the second CFNet-2 module in the FPN structure is used as the input of the second structure, the output of the second CFNet-2 module in the FPN structure is in channel splicing with the second image in the second channel of the second layer feature extraction module in the CPN structure, and the second PAN structure is in the channel splicing mode of the second image is used as the second input of the second module in the second PAN module and the second module is in the second channel structure 2-2 module;

In this embodiment, an input feature image of an FPN structure is taken as an example to describe, the input feature image of the FPN structure and the output of a third layer feature extraction block in a trunk feature extraction network are subjected to channel splicing to obtain a feature image C, the feature image C passes through a first CFNet-2 module in the FPN structure to output a feature image D, the feature image D and the output of a second layer feature extraction block in the trunk feature extraction network are subjected to channel splicing to obtain a feature image E, the feature image E passes through a second CFNet-2 module in the FPN structure to output a feature image F, the feature image F is used as the input of a PAN structure, and the FPN structure aims to fully utilize features of each level, so that the features of each level contain abundant semantic information and abundant detail information.

The PAN structure is opposite to the FPN feature fusion direction, an input feature image F of the PAN structure is spliced with a feature image E in a channel mode, then a feature image G is output through a first CFNet-2 module in the PAN structure, after the feature image G is spliced with a feature image C, the input of a second CFNet-2 module in the PAN structure is used as the input of a second CFNet-2 module in the PAN structure, the output of a next feature processing network is the output of the PAN structure, and the PAN structure aims to further cascade and integrate feature information and improve the expression capability of a model.

the pictures in the training set are input to LFNet models, and the LFNet models are repeatedly subjected to parameter adjustment and training, and in this embodiment, a loss value (loss) obtained after the model is trained 300 times and a lightweight index (floating point operation number and parameter amount) of the model are used as evaluation indexes of the model performance.

The model LFNet and the existing lightweight models (YOLOv, PP-YOLO, YOLOv4, YOLOv s, YOLOX-s) are trained 300 times to obtain loss values (loss) as shown in fig. 8, and as can be seen from fig. 8, the model LFNet using CFNet and SCABlock is significantly faster in convergence speed compared with the original model YOLOv5, and the final loss values are also lower, which indicates that the model LFNet of the invention has faster reasoning speed and higher detection accuracy.

The light weight index of LFNet model and the existing light weight model (YOLOv s, YOLOv5 sn) is shown in fig. 9, wherein the horizontal axis is the parameter number (Params), the vertical axis is FLOPs (floating point operation times), the LFNet-n model is a LFNet model with the parameter number of 1.2M, the LFNet-s model is a LFNet model with the parameter number of 5.4M, as can be seen from fig. 9, the parameter number of the LFNet-n model is reduced by 36.8% compared with the parameter number of the YOLOv n model, the parameter number of the LFNet-s model is reduced by 25% compared with the parameter number of the YOLOv s model, and the LFNet model has lower FLOPs values compared with the corresponding YOLOv n and YOLOv5s, namely the model has relatively lower calculation complexity.

Fig. 8 and 9 verify that LFNet model is lighter than existing lightweight models.

S5, inputting the pictures in the test set into an optimal new energy heavy-duty battery pack detection model, and predicting the pictures in the test set by the optimal new energy heavy-duty battery pack detection model to identify the specific positions of the battery packs.

Referring to fig. 10, in order to further demonstrate the performance of LFNet, in this embodiment, the detected result visual image of the partial images of the LFNet model and the YOLOv model in the battery pack dataset is compared, the upper row of images in fig. 10 is YOLOv n detected result visual image, the lower row of images in fig. 10 is LFNet-n detected result visual image, and as can be seen from the left first row of images in fig. 10, the overall detected effect of LFNet-n (accuracy is 0.9) is better than YOLOv n (accuracy is 0.8); as can be seen from the second column of the picture from the left in fig. 10, YOLOv n has a missed detection condition (only one object is detected), while LFNet-n has no such condition (two objects are detected); as can be seen from the third column of the pictures from the left in fig. 10, the small target detection effect of LFNet-n (small target accuracy is 0.9) is better than YOLOv n (small target accuracy is 0.8); as can be seen from the fourth column of the left image in fig. 10, whether LFNet-n or YOLOv n can detect multiple targets, but the detection effect of LFNet is better, YOLOv n also has repeated detection of targets, and a large number of detection frames repeatedly appear on the same target, but LFNet-n does not have such a situation. Therefore, the LFNet model can effectively detect the whole battery pack, the small battery pack targets and a plurality of battery pack targets, and the conditions of missed detection and repeated detection can not exist.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. The light new energy heavy-duty battery pack identification method based on the improved YOLOv model is characterized by comprising the following steps of:

S1, acquiring an image data set of the battery pack;

S3, constructing LFNet models;

The CFNet-1 module includes a convolution combination, multiple layers CFBlock, and one SCABlock, where the output of the previous layer CFBlock is the input of the next layer CFBlock in the multiple layers CFBlock; the input of CFNet-1 is combined by a convolution to be used as the input of the first layer CFBlock, the output of the last layer CFBlock in the layers CFBlock is spliced with the input of the first layer CFBlock to be used as the input of SCABlock, and the output of SCABlock is the output of the CFNet-1 module;

The CFBlock extracts the characteristics of the partial channel number input by PConv, the extracted characteristics sequentially pass through two 1X 1 convolutions with different channel numbers, and the sum of the output of the second 1X 1 convolution and the input of CFBlock is used as the output of CFBlock;

said SCABlock comprising CABlock and SABlock, the input of said SCABlock being the input of CABlock and SABlock, respectively, the sum of the output of CABlock and the output of SABlock being the output of said SCABlock;

The input of CABlock is subjected to convolution combination and then is respectively used as the input of global average pooling and global maximum pooling; adding the global average pooled output and the global maximum pooled output to input a sigmoid function, and weighting the output of the sigmoid function into the input of CABlock to obtain CABlock output;

The input of SABlock is respectively combined through two parallel convolutions, the product of the result of one convolution combination output after reshape and the result of the other convolution combination output after reshape and the result of the transfer are input into a softmax function, and the output of the softmax function is weighted into the input of SABlock to obtain the output of SABlock;

The neg feature processing network comprises an FPN structure and a PAN structure, wherein the FPN structure and the PAN structure respectively comprise two CFNet-2 modules, compared with the CFNet-1 modules, each layer CFBlock in the 4238-2 modules has no residual edge, the output of the SPPF module is used as the input of the FPN structure, the output of the SPPF module is subjected to channel splicing with the output of a third layer feature extraction block in the trunk feature extraction network to obtain a first feature image, the first feature image is used as the input of a first CFNet-2 module in the FPN structure, the output of the first CFNet-2 module in the FPN structure is subjected to channel splicing with the output of a second layer feature extraction block in the trunk feature extraction network to obtain a second feature image, the output of the second CFNet-2 module in the FPN structure is used as the input of the PAN structure, the output of the second CFNet-2 module in the FPN structure is used as the input of the PAN structure after being subjected to channel splicing with the second feature image, the first 3749-2 module in the FPN structure is used as the input of the first 3749-2 module in the PAN structure, the output of the first CFNet-2 module in the FPN structure is subjected to channel splicing with the output of the second feature image in the second PAN structure is used as the second image CFNet-2 module in the PAN structure, and the output of the second image is processed by the second module in the second channel structure is subjected to the second image;

2. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv model according to claim 1, wherein the step S2 specifically includes:

3. The method for identifying the light new energy heavy-duty battery pack based on the improved YOLOv model according to claim 1, wherein the input of the SPPF module is sequentially subjected to a convolution combination and three layers of maximum pooling layers, the output of the convolution combination and the output of each layer of maximum pooling layer are subjected to channel splicing, and then the output of the SPPF module is obtained through the convolution combination.

4. A lightweight new energy heavy-duty battery pack identification method based on a modified YOLOv model according to any one of claims 1 to 3, wherein the convolution combination includes a1 x 1 convolution, BN normalization and SiLU activation functions, the input of the convolution combination is sequentially through the 1 x 1 convolution, BN normalization and SiLU activation functions, and the output of the SiLU activation function is the output of the convolution combination.