CN116682076A - Multi-scale target detection method, system and equipment for ship safety supervision - Google Patents

Multi-scale target detection method, system and equipment for ship safety supervision Download PDF

Info

Publication number
CN116682076A
CN116682076A CN202310653227.5A CN202310653227A CN116682076A CN 116682076 A CN116682076 A CN 116682076A CN 202310653227 A CN202310653227 A CN 202310653227A CN 116682076 A CN116682076 A CN 116682076A
Authority
CN
China
Prior art keywords
target
convolution
scale
module
different scales
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310653227.5A
Other languages
Chinese (zh)
Inventor
蔡东升
黄琦
黄鑫
李坚
井实
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202310653227.5A priority Critical patent/CN116682076A/en
Publication of CN116682076A publication Critical patent/CN116682076A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a multi-scale target detection method, a system and equipment for ship safety supervision, and relates to the field of ship supervision. The method comprises the following steps: acquiring a ship image, carrying out enhancement processing on the ship image, inputting the enhanced ship image into a feature extraction part in an improved YOLOv5s-MS model, and generating a multi-scale feature map; the improved YOLOv5s-MS model is based on the YOLOv5s model, and is characterized in that the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer; inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model to generate a fused feature map; and inputting the fused characteristic diagram into a positioning and classifying part of the improved YOLOv5s-MS model to perform positioning and classifying treatment, and generating detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales. The invention can more accurately detect the small target or the very small target, and improves the detection precision of the small target or the very small target.

Description

Multi-scale target detection method, system and equipment for ship safety supervision
Technical Field
The invention relates to the field of ship supervision, in particular to a multi-scale target detection method, system and equipment for ship safety supervision.
Background
In a practical application scenario of crew object detection for a large-scale tanker complex environment, the collected image is often extremely complex, multiple objects may exist, and the objects may be converged in a small area. Due to the different sizes and shapes of the targets, multiple targets may be blocked from each other, which results in the target detection algorithm facing multi-scale detection targets. Because of this problem, when many commonly used target detection algorithms are applied to target detection tasks in complex environments, a phenomenon of missing or false detection of small targets or very small targets often occurs, resulting in lower detection accuracy.
Disclosure of Invention
The invention aims to provide a multi-scale target detection method, a system and equipment for ship safety supervision, which are used for solving the problem of lower detection precision caused by the phenomenon of missing or false detection of small targets or very small targets when a common target detection algorithm is applied to a target detection task in a complex environment.
In order to achieve the above object, the present invention provides the following solutions:
A multi-scale target detection method facing ship safety supervision comprises the following steps:
acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image;
inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, and is characterized in that the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer;
inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map;
inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
Optionally, the inputting the enhanced ship image to the feature extraction part in the improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map specifically includes:
dividing the enhanced ship image into two parts of images by utilizing the convolutional neural network module, and respectively carrying out convolutional operation on the two parts of images;
merging the images after the convolution operation to generate feature images with different scales;
and carrying out pooling treatment on the feature graphs with different scales by using a pyramid pooling module to acquire multi-scale feature information and generate a multi-scale feature graph.
Optionally, the convolutional neural network module specifically includes: the system comprises a first convolution computing unit, a bottleneck layer unit, a splicing operation unit and a second convolution computing unit which are connected in sequence; the splicing operation unit is used for splicing the characteristic graph subjected to convolution calculation by the first convolution calculation unit and the characteristic graph subjected to processing by the bottleneck layer unit.
Optionally, the bottleneck layer unit specifically includes: two bottleneck convolution units connected in sequence and an addition operation unit; the adding operation unit is used for adding the channel dimension of the feature map of the input bottleneck layer and the channel dimension of the feature map of the output bottleneck layer.
Optionally, the pyramid pooling module specifically includes: the system comprises a first convolution module, three maximum pooling operation units, a splicing operation unit and a second convolution module which are connected in sequence; the splicing operation is used for splicing the feature graphs subjected to the three maximum pooling operations after being processed by the three maximum pooling operation units.
Optionally, the feature fusion portion specifically includes: three convolution and up-sampling modules and three target output modules which are connected in sequence; the convolution and up-sampling module comprises a convolution module, an up-sampling operation unit, a splicing unit and a bottleneck layer unit which are connected in sequence; the target output module comprises a convolution module, a splicing operation unit and a bottleneck layer unit which are connected in sequence; the convolution module in the first convolution and up-sampling module is connected with the splicing unit in the third target output module; the convolution module in the second convolution and up-sampling module is connected with the splicing unit in the second target output module; and the bottleneck layer unit in the second convolution and up-sampling module is connected with the convolution module in the first target output module.
Optionally, when detecting the very small target, the detection head adopts a 4-time downsampling mode for detection;
when the small target is detected, the detection head adopts an 8-time downsampling mode for detection;
when the middle target is detected, the detection head adopts a 16-time downsampling mode for detection;
when the large target is detected, the detection head adopts a 32-time downsampling mode for detection.
A multi-scale target detection system for ship safety supervision, comprising:
acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image;
inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, and is characterized in that the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer;
inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map;
Inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the multi-scale target detection method for ship-oriented safety supervision of any one of the above.
A computer readable storage medium storing a computer program which when executed by a processor implements the multi-scale object detection method for vessel safety supervision according to any one of the preceding claims.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention provides a multi-scale target detection method, a system and equipment for ship safety supervision, wherein a 4 th layer is directly connected with a 23 rd layer, a 6 th layer is directly connected with a 26 th layer on the basis of a YOLOv5s model, and an improved YOLOv5s-MS model is constructed, so that more characteristic information is fused, the problem of information loss is solved, and the improved YOLOv5s-MS model is better adapted to targets with different scales, so that small targets or very small targets can be detected more accurately; in addition, the invention adds a detection head with smaller scale for targets with different scales, thereby better processing the characteristics of small targets or very small targets and further improving the detection precision of the small targets or very small targets.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a multi-scale target detection method for ship safety supervision provided by the invention;
FIG. 2 is a diagram of the improved YOLOv5s-MS model structure provided by the present invention;
FIG. 3 is a schematic structural diagram of a feature extraction part provided by the present invention;
FIG. 4 is a detailed view of a CBL module provided by the present invention;
FIG. 5 is a detailed view of the BottleneckGSCSP provided by the invention;
FIG. 6 is a detail view of BottleneckGS provided by the present invention;
FIG. 7 is a detail view of GSConv provided by the invention;
FIG. 8 is a detail view of SPPF modules provided by the present invention;
FIG. 9 is a schematic structural view of a feature fusion part provided by the present invention;
FIG. 10 is a schematic view of a positioning and sorting section provided by the present invention;
FIG. 11 is a simplified node diagram of YOLOv5s provided by the present invention;
fig. 12 is a schematic diagram of bilinear interpolation provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a multi-scale target detection method, a system and equipment for ship safety supervision, which can more accurately detect small targets or very small targets and improve the detection precision of the small targets or the very small targets.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
As shown in fig. 1, the invention provides a multi-scale target detection method for ship safety supervision, which comprises the following steps:
step 101: and acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image.
Step 102: inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, wherein the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer.
Although YOLOv5s has achieved good results for multi-scale target detection and small target detection, there are still some problems.
For multi-scale object detection, YOLOv5s employs a Feature Pyramid Network (FPN) and cross-phase connection (PANet) to process feature maps of different scales, but this approach may result in reduced detection accuracy or increased computational cost. In addition, multi-scale detection faces variations in the size and shape of the target at different scales, which can create difficulties for the detector.
For small target detection, YOLOv5s adopts a multi-scale feature fusion mode to increase the detection precision of the small target, but in practical application, the condition of small target missing detection or false detection still exists. Since the pixel values of small objects are less, the features may be ignored in the convolution layer, and the small objects are often blocked by the large objects in the image, resulting in increased detection difficulty.
In order to solve the problems, the invention provides a feature detection head with smaller scale, and the added detection head can detect targets with four scales, namely a very small target, a medium target and a large target. By adding the feature detection head with smaller scale, the features of the small target can be better processed, and the detection precision is improved. Meanwhile, by increasing the number and the dimensions of the detection heads, the size and the shape change of the target under different dimensions can be better processed, and the robustness of the detector is improved.
In the feature pyramid structure of YOLOv5s, the scale gradually becomes smaller along with the layer-by-layer downsampling of the feature map, so that semantic information in a high-level feature map is weaker, and small targets are difficult to accurately detect.
The improved YOLOv5s-MS model is shown in fig. 2, wherein Conv [6,2] is the convolution kernel size of 6×6 in the convolution operation, the convolution step size of 2, conv [3,2] is the convolution kernel size of 3×3 in the convolution operation, the convolution step size of 2, conv [1,1] is the convolution kernel size of 1×1 in the convolution operation, the convolution step size of 1, GSCSP1_1 is the operation which is repeated once in the feature extraction stage, i.e., the backbond, GSCSP1_2 is the operation which is repeated twice in the feature extraction stage, i.e., the backbond, GSCSP1_3 is the operation which is repeated 1 time in the feature fusion stage, i.e., the negk, GSCSP2_2 is the feature fusion stage, i.e., the operation which is repeated twice in the negk, and sampling is the bilinear interpolation upsampling operation. The structure enables the model to be better adapted to targets of different scales, and the detection of small targets is better performed.
As shown in fig. 3, the feature extraction part has an input image size of h×w×3, cbl is a convolution module, 1 in the bottleleckgscsp1_x represents the feature extraction stage, and X represents the number of repetitions of the module.
The feature extraction part combines the SPPF with the BottendeckGSCSP to extract the features of the input image; the BottleneckGSCSP is a convolutional neural network module, which can effectively improve the performance of a model and reduce the number of model parameters; by dividing the input feature map into two parts, then performing convolution operations on the two parts, respectively, and finally combining them together, efficient feature extraction is achieved. The SPPF module is a pyramid pooling module, and can pool the feature graphs with different scales so as to obtain multi-scale feature information; the module can improve the detection capability of the model to targets with different sizes, and simultaneously reduce the calculation amount of the model.
In practical applications, the step 102 specifically includes: dividing the enhanced ship image into two parts of images by utilizing the convolutional neural network module, and respectively carrying out convolutional operation on the two parts of images; merging the images after the convolution operation to generate feature images with different scales; and carrying out pooling treatment on the feature graphs with different scales by using a pyramid pooling module to acquire multi-scale feature information and generate a multi-scale feature graph.
As shown in FIG. 4, the input is the number of channels C 1 Is output as a generalNumber of tracks C 2 Conv is a convolution calculation unit, BN is a batch normalization unit, leakyrlu is an activation function, and a convolution module specifically comprises: the system comprises a convolution calculation unit, a batch normalization unit and an activation function which are connected in sequence.
As shown in FIG. 5, the input is the number of channels C 1 Is output as the channel number C 2 Conv is a convolution calculation unit, bottleneckGS is a bottleneck layer unit, and the convolution neural network module specifically includes: the system comprises a first convolution computing unit, a bottleneck layer unit, a splicing operation unit and a second convolution computing unit which are connected in sequence; the splicing operation unit is used for splicing the characteristic graph subjected to convolution calculation by the first convolution calculation unit and the characteristic graph subjected to processing by the bottleneck layer unit.
As shown in FIG. 6, the input is the number of channels C 1 Is output as the channel number C 2 Is the addition operation in the feature map channel dimension, C 2/2 The number of the image channels after the input image is processed is half of the number of the output image channels; the bottleneck layer unit specifically comprises: two bottleneck convolution units connected in sequence and an addition operation unit; the adding operation unit is used for adding the channel dimension of the feature map of the input bottleneck layer and outputting the channel dimension of the feature map of the bottleneck layer; the bottleneck convolution unit is shown in FIG. 7, and the input is the channel number C 1 Is output as the channel number C 2 DWConv is a depth separable convolution operation; the shuffle is a random shuffle feature map channel dimension feature vector.
In practical application, as shown in FIG. 8, the input is the number of channels C 1 Is output as the channel number C 2 CBL is a convolution module, maxpool is a max pooling operation, concat is a stitching operation in a feature map channel dimension, and the pyramid pooling module specifically includes: the system comprises a first convolution module, three maximum pooling operation units, a splicing operation unit and a second convolution module which are connected in sequence; wherein the splicing operation is used for splicing the positions passing through the three maximum pooling operation unitsAnd (5) the feature map after the three processed maximum pooling operations.
Step 103: inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map.
In practical application, as shown in fig. 9, the input is a feature map after passing through a feature extraction part, and the output is a detection head with a size of a large, medium, small and very small target, and the feature fusion part specifically includes: three convolution and up-sampling modules and three target output modules which are connected in sequence; the convolution and up-sampling module comprises a convolution module, an up-sampling operation unit, a splicing unit and a bottleneck layer unit which are connected in sequence; the target output module comprises a convolution module, a splicing operation unit and a bottleneck layer unit which are connected in sequence; the convolution module in the first convolution and up-sampling module is connected with the splicing unit in the third target output module; the convolution module in the second convolution and up-sampling module is connected with the splicing unit in the second target output module; and the bottleneck layer unit in the second convolution and up-sampling module is connected with the convolution module in the first target output module.
The feature fusion part uses multi-scale fusion and horizontal cross-layer connection technology, and can process feature information of different scales and different layers at the same time, so that the performance and efficiency of the model are improved. And the feature images of different layers are fused by using multi-scale fusion, so that the detection capability of the model on targets of different scales is improved. Up-sampling and down-sampling operations are used in the model to up-sample the low resolution feature map to the same scale as the high resolution feature map, and then splice them together to obtain a multi-scale feature map. The fusion method can improve the performance of the model without increasing the calculation amount. By using cross-layer connection, the information of the low-layer feature map can be transferred to the high-layer feature map, so that the detection capability of the model on small targets is improved. The cross-layer connection technology can improve the detection capability of the model to targets with different scales and reduce the calculation amount of the model.
Step 104: inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
As shown in fig. 10, assuming that the input image size is 416×416, the detection heads are respectively 4, 8, 16, and 32 times downsampled, and the output feature vector sizes are respectively 104×104, 52×52, 26×26, and 13×13, the minimum target, the small target, the medium target, and the large target in the image are respectively detected. When the extremely small target is detected, the detection head adopts a 4-time downsampling mode for detection; when the small target is detected, the detection head adopts an 8-time downsampling mode for detection; when the middle target is detected, the detection head adopts a 16-time downsampling mode for detection; when the large target is detected, the detection head adopts a 32-time downsampling mode for detection.
The target positioning and classifying part comprises a pre-measuring head, a classifying head and a regression head, and the model can simultaneously complete the positioning and classifying tasks of the target through the combination of the heads. Wherein the prediction head is used to detect the position of the target. Specifically, the input feature map is mapped onto the same number of output feature maps as the number of target frames by a convolution operation, and the center point coordinates, width, height of each target frame, and confidence of the presence of the target are predicted. These predictions can be used to locate and screen target boxes. The classification header is used to predict the class of the target. The feature map output by the prediction head is taken as input, and is mapped to the output feature map with the same number as the target categories through convolution operation. The feature vector for each target box will then be used to predict the class to which the target belongs. The regression head is used for fine tuning the target frame output by the prediction head. Specifically, a feature map of the prediction head output is taken as an input, and mapped onto the same output feature map as the prediction head output by a convolution operation. The feature vector for each target frame will then be used to fine tune the position and size of the target frame.
The invention merges the repeated partial structure in the model structure of fig. 2 into a single node, and a simplified node diagram is shown in fig. 11. In fig. 11, the i-th level feature extraction layer of the backup is represented by Bi, the bottom-up and top-down feature pyramid feature extraction layers are represented by Pi and Ni, respectively, and feature maps of different depth outputs are represented by Hi, i=1, 2,3 or 4.
According to the invention, the Bottleneck module structure of the Neck part in the YOLOv5s is redesigned, and GSConv is used for replacing part of conventional convolution blocks, so that the weight of the model can be effectively ensured, and the characteristic information of the model can be better reserved. The redesigned bottleneck layer unit is shown in fig. 5-6. The modified bottleeckgs consists of two GSConv series and one shortcut, where the first GSConv output profile channel number is half the bottleeckgs output profile channel number. And redesigning a Bottleneck CSP module of a Neck part in YOLOv5s by using Bottleneck GS, wherein the new module is named as Bottleneck GSCSP, wherein the Bottleneck GS is used for replacing Bottleneck, an input feature map firstly passes through a conventional convolution block and then is divided into two branches, the first branch passes through a plurality of Bottleneck GS modules and then is spliced with the second branch, and finally, the feature map is output through the conventional convolution block.
The FPN is used in the Neck part of YOLOv5 to perform feature pyramid network processing, and the resolution of the feature map is improved through upsampling so as to better capture objects with different scales and features in the feature map, thereby improving the target detection accuracy. In YOLOv5, nearest neighbor sampling (Nearestneighbor sampling) is used for up-sampling processing, and the method has the advantages of simplicity, rapidness and small calculation amount, and has a good effect on densely distributed targets on the feature map. However, nearest neighbor sampling has the disadvantage of generating jagged edges because it simply sets the value of the target pixel to the value of the nearest neighbor pixel, rather than weighted averaging the surrounding pixels. In addition, nearest neighbor sampling is also prone to introducing some noise, as the input pixel may vary more strongly, resulting in a more dramatic change in the output pixel value. It is proposed herein to use bilinear interpolation samples (bilinear interpolation sampling) instead of nearest neighbor samples, although the bilinear interpolation is calculated at a slightly slower speed, but performs better in preserving and processing details of the image context. The basic idea of bilinear interpolation is to find four nearest neighbor pixels around the target pixel in the original image and then calculate the gray value of the target pixel using the gray values of these four pixels. Compared with nearest neighbor sampling, the bilinear interpolation sampling can better keep the details and textures of the image, so that the quality of the feature map is improved, and the accuracy of target detection is further improved.
Bilinear interpolation is a common method of image upsampling that can enhance low resolution images to high resolution. In image upsampling, since the number of pixels in a low-resolution image is smaller than that in a high-resolution image, an interpolation method is required to estimate the missing pixel values.
The bilinear interpolation method is based on two assumptions: 1) The variation between pixels in the image is continuous; 2) The value of one pixel can be estimated by the values of 4 pixels around it.
Specifically, for each pixel on the image to be enlarged, the bilinear interpolation method finds its nearest four pixels in the low resolution image, and then uses the values of these four pixels to estimate the value of the target pixel. The method has simple calculation and good effect, can enhance details while keeping the image smooth, and has the function of improving the resolution and definition of the image by increasing the number of pixels in image processing.
Assuming a low resolution image with pixel values stored in a matrix of size mxn, the image is now up-sampled to a high resolution image of size pM xn, and the value of each pixel in the new image needs to be calculated before up-sampling.
For a pixel in the new image, its corresponding position in the low resolution image can be found first. For convenience, it is assumed that this position is (x, y), where x and y are real numbers. Next, the 4 pixels closest to this position in the low resolution image need to be found, Q respectively 11 、Q 12 、Q 21 And Q 22 . The 4 pixels are weighted averaged to calculate the pixel values in the new image.
As shown in fig. 12, assuming that the target pixel point is P (x, y), the coordinates are expressed by f (P), and four surrounding pixel points Q are known 11 (x 1 ,y 1 )、Q 12 (x 1 ,y 2 )、Q 21 (x 2 ,y 1 )、Q 22 (x 2 ,y 2 ) The target pixel point is obtained by the following steps:
bringing formula (1) and formula (2) into formula (3) gives:
because the target detection task needs to regress on the whole image, some situations of small target, shielding or illumination and the like may exist in the image, and the image in the situations is often difficult to detect. At the same time, there may be some targets in the dataset that are less common in categories or that appear less in specific scenarios, which are also difficult to detect correctly. Such imbalance can lead to insufficient detection capability of the model for small targets, and is prone to missed detection. In order to solve the problem of sample unbalance, the detection capability of the model for few categories can be effectively improved by adopting a weighting strategy in the loss function, and meanwhile, the accuracy and recall rate of the model can be improved.
The classification loss was calculated in YOLOv5s using binary cross entropy (BCELoss) as follows:
in formula (5), y i Representing sample tags, p (y i ) For label y i Is a predicted value of (a). When y is i When=1, p (y i ) Approaching 1, the loss value approaches 0; when y is i When=0, p (y i ) Near 0, the loss value will also approach 0. In order to control the specific gravity of the positive and negative samples and the difficulty samples, some adjustment factors are added on the basis of BCELoss, so that the loss of the positive sample and the loss of the negative sample are respectively multiplied by different weights, and the size of the loss is adjusted according to the difficulty degree of the samples. The following is shown:
FL loss =-[α(1-p) γ ·ylog(p)+(1-α)·p γ ·(1-y)·log(1-p)] (6)
formula (6) may be rewritten:
FL loss =-α t ·(1-p t ) γ ·log(p t ) (7)
in the loss function, α t And gamma are used for solving the imbalance problem of the positive and negative samples and the difficult and easy samples respectively. Wherein the similarity between the predicted value and the true label yThe degree is p t And (3) representing. P is p t The larger the confidence that the predicted value is the correct class, the higher the confidence. When the model generates misclassification, the gamma value is fixed, and p is the time t The smaller (1-p) t ) γ The closer to 1, the worse the classification prediction effect, the more difficult the sample is to classify, with relatively less impact on the loss function; conversely, when p t Approaching 1, (1-p) t ) γ The closer to 0, the better the classification prediction is, the more simple the sample is, reducing the impact on the loss function. By introducing alpha t And gamma value, can make the model focus on the unbalanced problem of positive and negative samples and sample difficult to classify, thus improve generalization ability and detection performance of the model.
Example two
In order to execute the corresponding method of the above embodiment to achieve the corresponding functions and technical effects, a multi-scale target detection system for ship safety supervision is provided below.
A multi-scale target detection system for ship safety supervision, comprising:
and acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image.
Inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, wherein the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer.
Inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map.
Inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
Example III
The embodiment of the invention provides an electronic device which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic device to execute the multi-scale target detection method facing ship safety supervision.
In practical applications, the electronic device may be a server.
In practical applications, the electronic device includes: at least one processor (processor), memory (memory), bus, and communication interface (Communications Interface).
Wherein: the processor, communication interface, and memory communicate with each other via a communication bus.
And the communication interface is used for communicating with other devices.
And a processor, configured to execute a program, and specifically may execute the method described in the foregoing embodiment.
In particular, the program may include program code including computer-operating instructions.
The processor may be a central processing unit, CPU, or specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included in the electronic device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory or may further comprise non-volatile memory, such as at least one disk memory.
Based on the description of the above embodiments, an embodiment of the present application provides a storage medium having stored thereon computer program instructions executable by a processor to implement the method of any embodiment
The multi-scale target detection system facing ship safety supervision provided by the embodiment of the application exists in various forms, including but not limited to:
(1) A mobile communication device: such devices are characterized by mobile communication capabilities and are primarily aimed at providing voice, data communications. Such terminals include: smart phones (e.g., iPhone), multimedia phones, functional phones, and low-end phones, etc.
(2) Ultra mobile personal computer device: such devices are in the category of personal computers, having computing and processing functions, and generally having mobile internet access capabilities. Such terminals include: PDA, MID, and UMPC devices, etc., such as iPad.
(3) Portable entertainment device: such devices may display and play multimedia content. The device comprises: audio, video players (e.g., iPod), palm game consoles, electronic books, and smart toys and portable car navigation devices.
(4) Other electronic devices with data interaction functions.
Thus, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of a storage medium for a computer include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory or other memory technology, a compact disc read only memory (CD-ROM), a compact disc Read Only Memory (ROM),
Digital Versatile Disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, may be used to store information that may be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. The application may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. The multi-scale target detection method for ship safety supervision is characterized by comprising the following steps of:
acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image;
inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, and is characterized in that the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer;
Inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map;
inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
2. The multi-scale object detection method for ship safety supervision according to claim 1, wherein the inputting the enhanced ship image into the feature extraction part in the improved YOLOv5s-MS model extracts features of the enhanced ship image to generate a multi-scale feature map, specifically comprises:
Dividing the enhanced ship image into two parts of images by utilizing the convolutional neural network module, and respectively carrying out convolutional operation on the two parts of images;
merging the images after the convolution operation to generate feature images with different scales;
and carrying out pooling treatment on the feature graphs with different scales by using a pyramid pooling module to acquire multi-scale feature information and generate a multi-scale feature graph.
3. The multi-scale target detection method for ship safety supervision according to claim 2, wherein the convolutional neural network module specifically comprises: the system comprises a first convolution computing unit, a bottleneck layer unit, a splicing operation unit and a second convolution computing unit which are connected in sequence; the splicing operation unit is used for splicing the characteristic graph subjected to convolution calculation by the first convolution calculation unit and the characteristic graph subjected to processing by the bottleneck layer unit.
4. The multi-scale target detection method for ship safety supervision according to claim 3, wherein the bottleneck layer unit specifically comprises: two bottleneck convolution units connected in sequence and an addition operation unit; the adding operation unit is used for adding the channel dimension of the feature map of the input bottleneck layer and the channel dimension of the feature map of the output bottleneck layer.
5. The multi-scale target detection method for ship safety supervision according to claim 2, wherein the pyramid pooling module specifically comprises: the system comprises a first convolution module, three maximum pooling operation units, a splicing operation unit and a second convolution module which are connected in sequence; the splicing operation is used for splicing the feature graphs subjected to the three maximum pooling operations after being processed by the three maximum pooling operation units.
6. The multi-scale target detection method for ship safety supervision according to claim 1, wherein the feature fusion part specifically comprises: three convolution and up-sampling modules and three target output modules which are connected in sequence; the convolution and up-sampling module comprises a convolution module, an up-sampling operation unit, a splicing unit and a bottleneck layer unit which are connected in sequence; the target output module comprises a convolution module, a splicing operation unit and a bottleneck layer unit which are connected in sequence; the convolution module in the first convolution and up-sampling module is connected with the splicing unit in the third target output module; the convolution module in the second convolution and up-sampling module is connected with the splicing unit in the second target output module; and the bottleneck layer unit in the second convolution and up-sampling module is connected with the convolution module in the first target output module.
7. The multi-scale target detection method for ship safety supervision according to claim 1, wherein the detection head detects the extremely small target in a 4-time downsampling mode;
when the small target is detected, the detection head adopts an 8-time downsampling mode for detection;
when the middle target is detected, the detection head adopts a 16-time downsampling mode for detection;
when the large target is detected, the detection head adopts a 32-time downsampling mode for detection.
8. A multi-scale target detection system for ship safety supervision, comprising:
acquiring a ship image, and performing enhancement processing on the ship image to generate an enhanced ship image;
inputting the enhanced ship image to a feature extraction part in an improved YOLOv5s-MS model, extracting features of the enhanced ship image, and generating a multi-scale feature map; the feature extraction part comprises a convolution module, a convolution neural network module and a pyramid pooling module; the improved YOLOv5s-MS model is based on the YOLOv5s model, and is characterized in that the 4 th layer is directly connected with the 23 rd layer, and the 6 th layer is directly connected with the 26 th layer;
Inputting the multi-scale feature map to a feature fusion part of the improved YOLOv5s-MS model, and performing fusion processing on feature information of different scales and different layers in the multi-scale feature map by using a multi-scale fusion and horizontal cross-layer connection technology and adopting a bilinear interpolation up-sampling method to generate a fused feature map;
inputting the fused feature map to a positioning and classifying part of the improved YOLOv5s-MS model, and performing positioning and classifying treatment on the fused feature map to generate detection heads, pre-detection heads, classification heads and regression heads of targets with different scales and targets with different scales; the targets with different scales comprise a large target, a medium target, a small target and a very small target; the detection head is used for detecting targets with different scales; the prediction head is used for detecting the positions of targets with different scales; the classification head is used for predicting the categories of targets with different scales; the regression head is used for adjusting the positions and the sizes of target frames with different scales output by the prediction head.
9. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the multi-scale target detection method for ship-oriented safety supervision according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the multi-scale object detection method oriented to ship safety supervision according to any one of claims 1-7.
CN202310653227.5A 2023-06-05 2023-06-05 Multi-scale target detection method, system and equipment for ship safety supervision Pending CN116682076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310653227.5A CN116682076A (en) 2023-06-05 2023-06-05 Multi-scale target detection method, system and equipment for ship safety supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310653227.5A CN116682076A (en) 2023-06-05 2023-06-05 Multi-scale target detection method, system and equipment for ship safety supervision

Publications (1)

Publication Number Publication Date
CN116682076A true CN116682076A (en) 2023-09-01

Family

ID=87788478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310653227.5A Pending CN116682076A (en) 2023-06-05 2023-06-05 Multi-scale target detection method, system and equipment for ship safety supervision

Country Status (1)

Country Link
CN (1) CN116682076A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911679A (en) * 2024-03-15 2024-04-19 青岛国实科技集团有限公司 Hull identification system and method based on image enhancement and tiny target identification

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911679A (en) * 2024-03-15 2024-04-19 青岛国实科技集团有限公司 Hull identification system and method based on image enhancement and tiny target identification
CN117911679B (en) * 2024-03-15 2024-05-31 青岛国实科技集团有限公司 Hull identification system and method based on image enhancement and tiny target identification

Similar Documents

Publication Publication Date Title
US11361546B2 (en) Action recognition in videos using 3D spatio-temporal convolutional neural networks
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
TWI773189B (en) Method of detecting object based on artificial intelligence, device, equipment and computer-readable storage medium
US20240112035A1 (en) 3d object recognition using 3d convolutional neural network with depth based multi-scale filters
CN111402130B (en) Data processing method and data processing device
CN112016475B (en) Human body detection and identification method and device
CN111079739A (en) Multi-scale attention feature detection method
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN110414593B (en) Image processing method and device, processor, electronic device and storage medium
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
WO2022072199A1 (en) Sparse optical flow estimation
CN116682076A (en) Multi-scale target detection method, system and equipment for ship safety supervision
CN114742799A (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN115049927A (en) SegNet-based SAR image bridge detection method and device and storage medium
CN114821823A (en) Image processing, training of human face anti-counterfeiting model and living body detection method and device
CN114202473A (en) Image restoration method and device based on multi-scale features and attention mechanism
CN115880516A (en) Image classification method, image classification model training method and related equipment
CN112633066A (en) Aerial small target detection method, device, equipment and storage medium
CN115393868B (en) Text detection method, device, electronic equipment and storage medium
CN112825116A (en) Method, device, medium and equipment for detecting and tracking face of monitoring video image
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
CN113723352B (en) Text detection method, system, storage medium and electronic equipment
CN115311542B (en) Target detection method, device, equipment and medium
CN114972090B (en) Training method of image processing model, image processing method and device
CN115952830A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination