CN115082688A

CN115082688A - Multi-scale feature fusion method based on target detection

Info

Publication number: CN115082688A
Application number: CN202210620848.9A
Authority: CN
Inventors: 闫连山; 董高照; 姚涛
Original assignee: Yantai New Generation Information Technology Research Institute Of Southwest Jiaotong University; Aidian Shandong Technology Co ltd
Current assignee: Yantai New Generation Information Technology Research Institute Of Southwest Jiaotong University; Aidian Shandong Technology Co ltd
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-09-20
Anticipated expiration: 2042-06-02
Also published as: CN115082688B

Abstract

The invention discloses a multi-scale feature fusion method based on target detection, which comprises the steps of collecting computer vision image samples through a network to establish a multi-scale target detection data set, and dividing the data set into a training set and a test set; the YOLOv5 algorithm is used as a one-stage representation to be responsible for detecting the target object in the image; extracting multi-scale image features through multi-stage and multi-level convolution operation of a backbone network; one branch is connected with a neck network in a traditional characteristic fusion mode, the other branch is connected with the neck network with the same sampling multiplying power in a shortcut mode, and the last branch is connected with a prediction structure with the same sampling multiplying power in the shortcut mode; the method comprises the steps that a three-branch backbone network structure is deeply learned, and characteristic images with different scales in the backbone network are transmitted backwards through three branches to achieve forward and backward transmission of a neural network; the method has the advantages of high target detection accuracy, easy application to large-scale data sets and various network model structures, and simple implementation mode, thereby having wide application prospect and huge market value.

Description

Multi-scale feature fusion method based on target detection

Technical Field

The invention relates to the field of deep learning, in particular to a multi-scale feature fusion method based on target detection.

Background

Before various feature fusion networks come into play, most of large network structures adopt a one-way one-dimensional structure from beginning to end, such as AlexNet, VGGNet, ResNet and the like in the earliest period, YOLOv1 and YOLOv2 in the early period also adopt the structure, until a FPN feature pyramid network structure is published in CVPR 2017, people only gradually realize that the connection mode, the stacking mode and the overall trend of the network structures can be changed besides the fact that the backbone network structure is continuously stacked to simply pursue feature extraction benefits, and the overall structure can be presented in a two-dimensional mode, and the structure is adopted in YOLOv3 in the later period and a Neck Neck network in the later period is independently presented. Later, a large union Tencent optimization diagram in Port is provided by CVPR 2018, an improved network-PANET Path aggregation network based on FPN is provided, a one-dimensional Bottom-up Path Augmentation structure is added in a fusion mode of an FPN structure from the perspective of network output, and the method mainly considers that shallow features of the network contain a large number of fine-grained features and plays a vital role in a classification task of pixel levels, namely fusion of target detection in different scales and example segmentation. Then, a Google Brain team publishes a characteristic pyramid network NAS-FPN based on neural architecture search in a CVPR 2019, and the pyramid network is used for carrying out automatic ML automatic machine learning on a PAN network structure, namely, the optimal connection mode and parameters based on the PAN network structure are automatically searched through the machine learning. However, the three network structures are all constructed by being constrained to a two-dimensional plane, so that good feature transmission and information fusion cannot be performed with the backbone network when the network model is connected in a back-up and back-down manner for many times, and especially when the network structures are connected in a back-up and back-up manner for many times, the connection between the feature information of the deep network and the backbone network is weakened. In addition, the way of AutoML like NAS-FPN is very demanding for computation, and usually, AutoML has a good GPU and its computation time is up to several hundred days. Later, the Google Brain team published a bipartite feature pyramid network structure of bipartite feature pyramid (bipartite feature pyramid) in the CVPR in 2020, each layer of modules in the FPN network model was regarded as a node, a three-dimensional stereo connection mode was introduced, the feature transfer and feature fusion modes of the whole network were improved from a three-dimensional perspective, and the whole network model was made to jump from the first two-dimensional plane connection mode to paper, so that the stereo third-dimensional connection was increased.

Currently, in the YOLO algorithm, only the network structure of FPN and PAN is used: the method is characterized in that a FPN network structure is adopted by YOLOv3, a PAN network structure is adopted by YOLOv4 and YOLOv5 which are published simultaneously, the two structures are both CVPR 2018 and earlier structures, next, a latest BiFPN connection mode is adopted on YOLOv5, performance improvement brought on YOLOv5 by the mode is analyzed, existing defects are overcome, improvement is carried out, and a new network structure-AS-BiFPN is designed.

Disclosure of Invention

The present invention aims to provide a multi-scale feature fusion method based on target detection to solve the problems proposed in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention discloses a multi-scale feature fusion method based on target detection, which comprises the steps of collecting computer vision image samples through a network to establish a multi-scale target detection data set, and dividing the data set into a training set and a test set; a one-stage representation is used as a YOLOv5 (You Only Look Oncevent 5) algorithm to be responsible for detecting the target object in the image; extracting multi-scale image features through multi-stage and multi-level convolution operation of a backbone network; one branch is connected with a neck network in a traditional characteristic fusion mode, the other branch is connected with the neck network with the same sampling multiplying power in a Shortcut (Shortcut) mode, and the last branch is connected with a prediction structure with the same sampling multiplying power in the Shortcut mode; the method comprises the steps that a three-branch backbone network structure is deeply learned, and characteristic images with different scales in the backbone network are transmitted backwards through three branches to achieve forward and backward transmission of a neural network; the method has the characteristics of high target detection accuracy, easiness in application to large-scale data sets, easiness in application to various network model structures, easiness in implementation, better fidelity to characteristic image information of different scales and the like, and therefore has wide application prospects and huge market values.

The invention discloses a multi-scale feature fusion method based on target detection, which is characterized in that the method comprises the following steps of:

step S1, collecting computer vision image samples through a network to establish a multi-scale target detection data set, and establishing and dividing the data set into a training set and a testing set;

step S2, extracting the characteristics of the image by using a Backbone network (Backbone) of a YOLOv5 target detection algorithm;

step S3, realizing multi-scale fusion by using a three-branch feature fusion method of a Backbone network (Backbone), a Neck network (neutral) and a Prediction structure (Prediction), repeatedly learning weight parameters on each structural branch through deep learning, and continuously reducing the difference between a target value and a predicted value during training according to a training mode of the deep learning, namely obtaining an optimized network structure under a target domain data set by taking a minimum loss function as a learning criterion, wherein the fusion mode is based on FPN (feature neural networks);

step S4, improving and forming PAN on the basis of FPN, and utilizing accurate positioning signals stored in low-level features to promote a feature pyramid framework;

and step S5, improving and forming BiFPN on the basis of PAN, and enabling the network to learn the weights of different input features by itself through BiFPN.

Further, the FPN in step S3 is decomposed into three progressive stages, which includes the following steps:

step S31, in the stage of generating characteristics by the Backbone network, the task in the deep learning computer vision field is to generate abstract semantic characteristics based on the commonly used and pre-trained Backbone network, and then to adjust the image morphological characteristics extracted by the Backbone network respectively aiming at different application scenes; the characteristics generated by the Backbone network backhaul are divided according to the stages and are respectively recorded as

N is a natural number, wherein the number n is the same as the number of the stage, and represents different stage characteristics of the image morphological characteristic downsampling, namely the times of halving the resolution, such as

Representing feature maps output by stage2, at resolution of the input picture

，

Representing feature maps output by stage5, with resolution of input pictures

；

Step S32, feature fusion stage, FPN using the different resolution features generated in step S31 as input, outputting the fused features, the output features being marked with P as number, the input of FPN being

、

、

、

、

After being fused, the output is

、

、

、

、

Expressed by a mathematical formula:

and step S33, outputting a bounding box through the detection head, outputting the fused characteristics through the FPN, and inputting the fused characteristics into the detection head for specific object detection.

Further, the Fusion policy used by the BiFPN in step S5 specifically includes the following steps:

step S51, the unboundfusion policy has the formula:

the formula is the first strategy of deep learning feature fusion, wherein,

the weight parameters which can be learnt represent the weight proportion of data among the nodes of the single deep learning neural network;

representing the image morphological characteristic matrix input by the neural network in the computer vision field for inputting characteristic information;

in step S52, the Softmax-basedfusion policy formula is:

the formula is a second strategy of deep learning feature fusion, wherein,

、

the weight parameters which can be learnt represent the weight proportion of data among a plurality of deep learning neural network nodes;

in step S53, the fastnormalized fusion strategy equation is:

the formula is a third strategy of deep learning feature fusion, wherein,

、

is aA very small number to ensure that the denominator is not 0,

step S54, integrating two-way cross-scale connection and fast normalization fusion:

in the formula, the content of the active carbon is shown in the specification,

are edge intermediate nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

are the lateral nodes at level 6 in the bottom-up path, all other feature nodes are constructed in a similar manner.

Further, the method also comprises the following steps:

step S61, adding a connection structure on the skip paper on the lateral path to span the path from bottom to top, directly performing information fusion on the backbone network and the prediction structure, and improving the network by updating the weight ratio of different paths during training so as to enhance the characteristic information acquisition capability and the information fusion capability of the prediction structure;

step S62, reserving an edge feature layer and head and tail nodes on the basis of BiFPN, applying weight parameter influence on each structure fusion path during training, avoiding edge feature fusion structures weakened by BiFPN, and connecting all feature layers required to be used in a cross-channel mode in the same way;

step S63, integrating bidirectional cross-scale connection and fast normalization fusion based on steps S61 and S62:

in the formula, the first step is that,

are edge intermediate nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

is a lateral node at level 6 in the bottom-up path, all other feature nodes are constructed in a similar manner;

in step S64, to further improve efficiency, the image two-dimensional tensor convolution operation may adopt a depth separable convolution operation for feature fusion, and add batch normalization and activation after each convolution operation. This step is used depending on the application of different scenes, and is not related to the structure of the present invention.

Further, step S1 includes obtaining multi-scale target images from channels such as personal multi-scale images, Kaggle target detection competition, etc. on the network, and the image target scale division standard given by MS COCO, the size area of the small target form of the image morphology is

Of medium target size area of

The large target size area is

(ii) a When all images are scaled to one size by a Resize function when the images are input into a network structure, scale features of different target sizes are formed in the images with uniform sizes, and therefore an image data set of multi-scale information is established.

Further, in step S2, a stage target detection algorithm is adopted as YOLOv5 (young Only Look Once version 5) as a research basic model; the multi-scale feature fusion method is a hot-pluggable modular method, and is effectively migrated and used on different models, the process described in the step S3 is adopted for changing different models, namely the structural models of the target detection algorithm are respectively improved in sequence by the evolution process of FPN → PAN → BiFPN → AS-BiFPN, and the improvement steps are deep in sequence.

Compared with the prior art, the invention has the advantages that:

the multi-scale cross-structure node connection mode is utilized to realize effective fusion of multi-scale characteristic information, and deep specialized semantic information loss caused by the increase of the depth of a deep learning network structure is avoided. If learning weight parameters are introduced to the feature fusion structures, the multi-scale feature information fusion can be influenced by different iteration times and learning rates in the training process of the network model, and the weight proportion in the multi-scale feature fusion can be learned in the training process. Since the extracted upstream feature information is used without generating deeper feature information, the network structure has a computational complexity of

I.e. by

And furthermore, the feature fusion capability of the traditional network structure represented by FPN and PAN is enhanced under the condition of not changing the computational complexity of the network structure. The invention has the modular design mode of hot plugging, simple realization mode and easy application to various typesThe large-scale data set is easy to be applied in practice, so that the method has wide application prospect and huge market value.

Drawings

Fig. 1 is a diagram of an improved basic BiFPN network structure of the present invention.

FIG. 2 is a schematic diagram of the AS-BiFPN structure of the present invention.

FIG. 3 is a graph of the average precision mean value experiment result of the AS-BiFPN structure.

Fig. 4 is a comparison of experimental results of the optimized network model on large-scale target detection.

Fig. 5 is a comparison of experimental results of the optimized network model on small-scale target detection.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

Example 1

The invention relates to a multi-scale feature fusion method based on target detection, which comprises the steps of collecting computer visual image samples through a network to establish a multi-scale target detection data set, and dividing the data set into a training set and a test set; a one-stage representation is used as a YOLOv5 (You Only Look Oncevent 5) algorithm to be responsible for detecting the target object in the image; extracting multi-scale image features through multi-stage and multi-level convolution operation of a backbone network; one branch is connected with a neck network in a traditional characteristic fusion mode, the other branch is connected with the neck network with the same sampling multiplying power in a Shortcut (Shortcut) mode, and the last branch is connected with a prediction structure with the same sampling multiplying power in the Shortcut mode; the method comprises the steps that a three-branch backbone network structure is deeply learned, and characteristic images with different scales in the backbone network are transmitted backwards through three branches to achieve forward and backward transmission of a neural network; the method has the characteristics of high target detection accuracy, easiness in application to large-scale data sets, easiness in application to various network model structures, easiness in implementation, better fidelity to characteristic image information of different scales and the like, and therefore has wide application prospects and huge market values.

The invention realizes the following steps through a computer device:

step S1 includes obtaining multi-scale target image from channels such as personal multi-scale image and Kaggle target detection competition on the network, and the image target scale division standard given by MS COCO, the size area of small target shape of image morphology is

With an area of medium target size of

The large target size area is

Step S2, extracting the morphological matrix characteristics of the input image by using a Backbone network (Backbone) of a YOLOv5 target detection algorithm;

in step S2, a stage target detection algorithm is adopted as a YOLOv5 (You Only Look one version 5) as a research basic model; the multi-scale feature fusion method is a hot-pluggable modular method, can be effectively migrated and used on different models, and can adopt the process described in the following step S3 for changing different models, namely the structural models of the target detection algorithm are respectively improved in sequence by the evolution process of FPN → PAN → BiFPN → AS-BiFPN, and the improvement steps are deep in sequence.

Step S3, realizing multi-scale fusion by using a three-branch feature fusion method of a Backbone network (Backbone), a Neck network (neutral) and a Prediction structure (Prediction), repeatedly learning weight parameters on each structural branch through deep learning, and obtaining an optimized network structure under a target domain data set by taking a minimized loss function as a learning criterion, wherein the fusion mode is based on FPN (feature Pyramid networks);

FPN can be decomposed into three progressive stages, which include the following steps:

step S31, in the backhaul generation characteristic stage, the task in the deep learning computer vision field is to generate abstract semantic characteristics based on the commonly used pre-trained Backbone network backhaul, and then to perform fine tuning of specific tasks; the characteristics of the backhaul generation are generally classified by stage and are respectively recorded as

N is a natural number, wherein the number n is the same as the number of the stage, and represents the different stage characteristics of the image morphological characteristic down sampling, i.e. the number of times of halving the resolution, such as

Representing feature maps output by stage2, with resolution of input pictures

，

Representing feature maps output by stage5, with resolution of input pictures

；

Step S32, feature fusion stage, which is a stage specific to the FPN, and the FPN generally takes the feature with different resolutions generated in the previous step S31 as input and outputs the fused feature. The output features are generally labeled with P as a number. The input of FPN is

、

、

、

、

After being fused, the output is

、

、

、

、

Expressed by a mathematical formula:

and step S33, the detection head outputs a bounding box, and the fused features are output by the FPN and then can be input to the detection head for specific object detection.

Step S4, on the basis of FPN, improving and forming pan (pathaggregation network), creating a bottom-up (bottom-up) enhanced path for shortening the information path, and using the accurate positioning signal stored in the low-level feature to promote the feature pyramid architecture.

Step S5, on the basis of PAN, improving BiFPN, which is a new architecture, compared with PANet, it adds cross-layer link, one big characteristic of BiFPN is weight fed feature fusion, that is, giving different scale weight value; the traditional method is to directly stack features of different scales, and BiFPN enables a network to learn the weights of different input features;

BiFPN mainly uses three Fusion strategies, and specifically comprises the following steps:

step S51, the unboundfusion policy has the formula:

the formula is the first strategy of deep learning feature fusion, wherein,

step S52, a Softmax-based fusion strategy, whose formula is:

the formula is a second strategy of deep learning feature fusion, wherein,

、

step S53, fastnormazedfusion strategy, with the formula:

the formula is a third strategy of deep learning feature fusion, wherein,

、

a very small number to ensure that the denominator is not 0,

step S54, integrating bidirectional cross-scale connection and fast normalization fusion, as shown in fig. 1:

according to the node relationship in the diagram, in the formula,

are edge middle nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

Step S6, the design idea based on the two network structures BiFPN and AS-BiFPN specifically includes the following steps:

step S63, integrating two-way cross-scale connection and fast normalization fusion on the basis of step S61 and step S62, as shown in fig. 2:

according to the node relationship in the diagram, in the formula,

are edge intermediate nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

In the attached figure 1, a basic BiFPN network structure is improved, the BiFPN network structure enables the front and back feature fusion mode of a model to jump from a two-dimensional plane to paper, cross-structure connection of network transmission during turning back is increased, the connection can effectively transmit and fuse the features of the front and back structures, and contributes to multi-scale feature information. Since the YOLOv5 network adopts a two-dimensional PAN structure, that is, multiple times of feature transfer and foldback are formed during the feature processing of upsampling and downsampling, a three-dimensional network structure feature processing method based on BiFPN is applied to the network model of YOLOv 5.

Due to the existence of the Bottom-Up Path (Bottom-Up Path) in the Yolov5 neck network, an information gap is formed between the backbone network adopting the top-down Path (Up-Bottom Path) and the prediction structure, and no good effect is achieved on feature transfer and feature fusion interaction, so that the connection structure on the jump paper is added on the lateral Path to span the Bottom-Up Path, the backbone network and the prediction structure are directly subjected to information fusion, and the weight ratio of different paths is updated during training so as to improve the network and enhance the feature information acquisition capability and the information fusion capability of the prediction structure. It should be noted that BiFPN is to enhance the feature information processing capability of the EfficientDet network, and therefore, the feature information of the main feature layer is considered heavily and the edge feature layer is weakened, and the fusion contribution of the head-tail node layer is not considered. BiFPN needs to be stacked for multiple times to form a reinforced characteristic network structure for reinforcing the EfficientDet network, but the multiple stacking can cause huge calculation amount, so that the contribution of head and tail nodes and characteristic information of an edge characteristic layer are not considered, and then head and tail short-cut operation is adopted on the BiFPN. The EfficientDet refers to the practice as "efficiency trade-off" which is the purpose of balancing between better computational resource consumption and network performance improvement. The EfficientDet network stacks such models multiple times in order to form a "reinforced feature network structure", thus bringing more model parameters, GFLOPs, and model complexity, and therefore requiring complicated simplification of the network structure.

Fig. 2 is a schematic structural diagram of this embodiment, and a network model does not need to stack network structures multiple times, but a network itself only implements a PAN structure once. Therefore, the Yolov5 network of the BiFPN structure is changed into a further improvement, based on the characteristics of the structure of the Yolov5 and the characteristics of multiple scales of a firework detonation data set, an edge feature layer and a head-tail node are reserved on the basis of the BiFPN, and weight parameter influence is exerted on each structure fusion path during training. The edge feature fusion structure omitted by BiFPN is considered in the modification, all the feature layers required to be used are connected across channels in the same mode, and the improvement is more beneficial to multi-scale feature fusion. Therefore, the starting point of the network structure improvement idea is to need a more comprehensive scale for a detection target of research on one hand, and to make up for the insufficient scale of the BiFPN caused by selectively skipping part of nodes on the other hand.

In addition, since the YOLOv5 Network has no structure stacked multiple times, the Network structure has no particularly strong change in model parameters, GFLOPs, model complexity and the like, and is named AS-bipfn, i.e. All Scale Bidirectional Feature Pyramid Network structure.

The Yolov5 target detection algorithm is respectively improved by adopting BiFPN and AS-BiFPN network structures, and compared with the original FPN network structure. The experimental results are shown in table 1, and the experimental average accuracy mean graph is shown in fig. 3. From the experimental results, it can be seen that YOLOv5s using the BiFPN network structure is 0.8% higher than that of the original YOLOv5s by mAP, and is increased from 78.4% to 79.2% mAP, the precision is slightly lost, the number of network layers is not changed, the number of network parameters is also maintained at the same level, the GFLOPs of the model is increased by 0.2, and in general, the YOLOv5s using the BiFPN network structure has a small increase in capacity.

Fig. 3, 4, and 5 show experimental results of the original YOLOv5 and the YOLOv5 after the AS-bipfn network structure is adopted on multi-scale target detection, which respectively show an experimental result graph of an average precision mean, a large-scale target detection experimental result, and a small-scale target detection experimental result. The experiment graph is combined to see that the YOLOv5s network model using the AS-BiFPN full-scale bidirectional feature pyramid network structure can be continuously promoted on the performance of the experiment result of the BiFPN, the mAP parameter of the network using the AS-BiFPN is promoted by 1% mAP compared with that of the BiFPN network, the accuracy is promoted by 1.5% compared with that of the original YOLOv5s network, and the AP values on the smoke and flame targets are respectively promoted by 1.4% and 0.6%, but the AP value is promoted by 7.6ms to 8.4ms of the BiFPN in the network reasoning time, and the time is increased by 0.8 ms. The network model using AS-BiFPN was 213 layers in the overall hierarchy compared to the original YOLOv5s and YOLOv5s using BiFPN, and did not change, except that the AS-BiFPN network structure had a smaller change in parameter and GFLOPs than BiFPN, respectively, and the parameter changes but maintains the same level, with GFLOPs rising by 0.2 compared to BiFPN and 0.4 compared to original YOLOv5 s. In short, the optimized AS-BiFPN network structure is used, so that the network can improve the capability of the original network on multi-scale target detection under the condition that the network depth, the algorithm module and the network infrastructure are not changed.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain a separate embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A multi-scale feature fusion method based on target detection is characterized in that,

step S2, extracting the morphological matrix characteristics of the input image by using a Backbone network backhaul of a YOLOv5 target detection algorithm;

step S3, realizing multi-scale fusion by using a three-branch feature fusion method of a Backbone network Backbone, a Neck network Neck and a Prediction structure Prediction, repeatedly learning weight parameters on each structure branch through deep learning, continuously reducing the difference between a target value and a predicted value during training according to a deep learning training mode, namely obtaining an optimized network structure under a target domain data set by taking a minimized loss function as a learning criterion, wherein the fusion mode is based on FPN;

2. The method of claim 1, wherein the FPN decomposition in step S3 is in three progressive stages, and comprises the following steps:

Representing feature maps output by stage2, with resolution of input pictures

，

Representing feature maps output by stage5, with resolution of input pictures

；

、

、

、

、

After being fused, the output is

、

、

、

、

Expressed by a mathematical formula:

3. The method of claim 2, wherein the Fusion strategy used by the BiFPN of step S5 specifically comprises the following steps:

step S51, the unboundfusion policy has the formula:

the formula is the first formula of deep learning feature fusionA strategy is described in which, among other things,

step S52, a Softmax-based fusion strategy, whose formula is:

the formula is a second strategy of deep learning feature fusion, wherein,

、

step S53, fastnormazedfusion strategy, with the formula:

the formula is a third strategy of deep learning feature fusion, wherein,

、

is a very small number to ensure that the denominator is not 0,

in the formula, the first step is that,

are edge intermediate nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

4. The method of claim 3, further comprising the steps of:

in the formula, the first step is that,

are edge intermediate nodes from top to bottom and from bottom to top,

are edge lateral nodes from top to bottom and from bottom to top,

is an intermediate node of 6 levels in the top-down path,

5. According to claimThe multi-scale feature fusion method based on target detection is characterized in that the step S1 includes obtaining multi-scale target images from online personal multi-scale images and Kaggle target detection competition channels, the image target scale division standard given by MS COCO is adopted, and the size area of small target forms of image morphology is

With a medium target size of

The large target size area is

6. The method for multi-scale feature fusion based on object detection as claimed in claim 5, wherein a one-stage object detection algorithm is used as a base model for research as YOLOv5 in step S2; the multi-scale feature fusion method is a modular method capable of hot plug, effective migration and use are carried out on different models, the process described in the step S3 is adopted for changing the different models, namely the structural models of the target detection algorithm are respectively improved in sequence through the evolution process of FPN → PAN → BiFPN → AS-BiFPN, and the improvement steps are deep in sequence.