CN114677558A - Target detection method based on direction gradient histogram and improved capsule network - Google Patents

Target detection method based on direction gradient histogram and improved capsule network Download PDF

Info

Publication number
CN114677558A
CN114677558A CN202210234245.5A CN202210234245A CN114677558A CN 114677558 A CN114677558 A CN 114677558A CN 202210234245 A CN202210234245 A CN 202210234245A CN 114677558 A CN114677558 A CN 114677558A
Authority
CN
China
Prior art keywords
image
convolution
feature
network
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210234245.5A
Other languages
Chinese (zh)
Inventor
史振宁
章妍妍
倪宇昊
余晨晨
林晨曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202210234245.5A priority Critical patent/CN114677558A/en
Publication of CN114677558A publication Critical patent/CN114677558A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on a direction gradient histogram and an improved capsule network. The improved capsule network utilizes a parallel convolution network to extract comprehensive characteristics, forms characteristic vectors through a redundancy-removing capsule network, utilizes a deconvolution image reconstruction network to realize image reconstruction, and trains a network model. And finally, extracting a feature graph of the center point of the detection frame and a feature graph of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.

Description

Target detection method based on direction gradient histogram and improved capsule network
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method based on a direction gradient histogram and an improved capsule network.
Background
The traditional convolutional neural network extracts target characteristics through convolution operation and realizes network learning through back propagation so as to achieve the aim of target detection, and a remarkable result is obtained in a target detection task. However, the conventional convolutional neural network has the problems of insufficient attention to information such as relative positions and directions of elements in an image and information loss of a pooling layer, and factors such as barrier shielding and severe weather can seriously affect the detection and identification of a target. The capsule neural network provides the concept of the capsule to replace neurons in a partial convolutional neural network, expands the scalar quantity of the traditional neural network into a vector quantity, and effectively overcomes the defects of the traditional neural network. Therefore, the capsule neural network has higher overall recognition rate when being applied to target detection, and has higher accuracy and robustness to meet the target detection task under different influence factors.
However, conventional capsule networks have only one capsule of a given type at a given location, and thus two objects of the same type cannot be detected if a capsule network is too close to each other. The method is characterized in that a directional gradient histogram is adopted to preprocess an original image in a text of 'vehicle type classification of a capsule network with gradient histogram convolution characteristics under traffic monitoring', so that the problem that two objects of the same type cannot be detected is solved to a certain extent, but the problems of insufficient extraction of original image information, redundant capsule structures, high algorithm complexity and the like exist because the structure of the capsule network is not improved.
Disclosure of Invention
The invention aims to solve the technical problems that the original image information extraction is not rich enough, the capsule structure has redundancy and the algorithm complexity is higher in the prior art, and provides a target detection method based on a direction gradient histogram and an improved capsule network.
A target detection method based on a direction gradient histogram and an improved capsule network comprises the following steps:
1) obtaining a target original image, marking the target position by using a marking tool, and then randomly selecting different images as a training set;
2) Merging a direction gradient Histogram (HOG) of an original image and a convolution feature map in parallel to combine an edge contour feature of the image and a visual field feature of a convolution kernel, and taking the edge contour feature and the visual field feature as input of an improved capsule network;
3) the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by removing redundant capsule networks, and realizes image reconstruction by using a deconvolution image reconstruction network;
4) and extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
Preferably, the parallel fusion algorithm of the Histogram of Oriented Gradients (HOG) and the convolution feature map in the step 2) specifically comprises the following steps:
2-1) normalization; firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
Figure BDA0003541410270000021
Figure BDA0003541410270000022
in the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,
Figure BDA0003541410270000023
and f represents the target image after the normalization of all the block feature vectors is completed.
2-2) selecting a detection window from the normalized image. A detection window is selected that is equal to the image aspect ratio and does not exceed one-half the image size.
2-3) selecting a block from the window. And selecting rectangular blocks with the same length and width according to the detection window.
2-4) dividing cell cells within a block. The block is divided within a rectangular block using square cells of 8 × 8 pixel size as the minimum unit for feature extraction within the block.
2-5) performing directional projection in the cell. 9 directions are divided in the cell, and direction information is extracted every 20 degrees as an angle range. The direction information is obtained by performing convolution operation on the gray level image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
Figure BDA0003541410270000024
Figure BDA0003541410270000025
in the formula, H (x, y) represents a gray scale value in a corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents the vertical gradient value, G represents the gradient magnitude, and α represents the gradient direction.
2-6) normalization within the cell. And counting the actual direction angle quantity of each direction angle range in the cell to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the cell.
2-7) building HOG features within a block. And counting the actual direction angle number of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block.
2-8) if the last block is not reached, returning to step 2-3).
2-9) if the last window is not reached, returning to the step 2-2), otherwise, obtaining a direction gradient histogram.
2-10) single row convolution. Inputting an original image, and performing feature extraction on the original image by using a single-row convolution layer to obtain a convolution feature map.
2-11) feature splicing and parallel fusion. And dimension connecting the convolution feature maps with the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
The pooling layer structure adopted in the traditional convolutional neural network does not consider the relative relation in space, thereby causing partial loss of value information of the layer. In order to solve the problem, a digital capsule layer is used in a capsule network to perform the function of a pooling layer, and a vector neuron is proposed, wherein the vector neuron stores information such as direction and position in a vector form and continuously transmits the information in the network, so that the capsule network can have sensitivity to the position and direction information of elements in an image.
The improved capsule network based on the directional gradient histogram extracts features through a convolutional layer and an HOG in parallel, an image preprocessing part of the improved capsule network adopts HOC-C features, the advantages of an original capsule network are kept, meanwhile, the extraction of edge feature information of a target detection image is enhanced through the introduction of the directional gradient histogram, the identification degree between two capsule networks which are too close to each other is increased through the gradient direction and the amplitude, and the problem that two objects of the same type cannot be detected is solved to a certain extent.
Preferably, the step 3) of improving the capsule network algorithm specifically comprises the following steps:
3-1) extracting comprehensive characteristics by utilizing a parallel convolution network.
3-2) generating the feature vector by utilizing the redundancy removing capsule network.
And 3-3) restoring the original image by using the deconvolution image reconstruction network, and evaluating the network loss.
Preferably, the parallel convolution network algorithm in step 3-1) specifically comprises the following steps:
3-1-1) first use a parallel convolutional neural network as the feature extraction network. The parallel fused images were used as input, with an image size of 28 × 1. The parallel convolution neural network adopts 4 convolution kernels in the convolution layer, the sizes of the convolution kernels are respectively 3, 5, 7 and 9, the number of the convolution kernels is selected to be 32, and the step length is 2.
3-1-2) boundary filling. And adjusting padding size to carry out boundary filling on the original matrix.
3-1-3) feature extraction. The nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α is a learning rate.
3-1-4) feature tensor concatenation. The third dimension enables the connection of the feature tensors, resulting in a feature tensor of 14 × 128.
Preferably, the redundancy removing capsule network algorithm in the step 3-2) specifically comprises the following steps:
3-2-1) input redundancy removal. And (3) adopting the output of the parallel convolution network as the input of the redundancy removing main capsule network, and removing redundant capsules by using a convolution kernel of 1 × 1 so that the feature map of 16 × 16 after feature extraction is converted into a feature image of 14 × 14, and the number of capsules is reduced to 196.
3-2-2) input capsule scalar ui
3-2-3) vector obtained by multiplying input capsule vector by transformation matrix
Figure BDA0003541410270000041
The mathematical formula is as follows
Figure BDA0003541410270000042
In the formula WijIs a transformation matrix.
3-2-4) dividing the vector
Figure BDA0003541410270000043
And coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
Figure BDA0003541410270000044
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
Figure BDA0003541410270000045
in the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function.
3-2-6) update the coupling coefficient c using the softmax equationijThe mathematical formula is as follows:
Figure BDA0003541410270000046
in the formula
Figure BDA0003541410270000047
Indicating the updated coupling coefficient of the dynamic route,
Figure BDA0003541410270000048
is set to 0. v. ofjRepresenting non-linear compression functions, vectors
Figure BDA0003541410270000049
I.e. the multiplication of the capsule vector and the transformation matrix.
3-2-7) pairs of vjThe number of updates k is the number of capsules, the final result is
Figure BDA0003541410270000051
I.e. v of the final outputjI.e. the feature vector characterizing the jth class. Otherwise, returning to the step 3-2-2).
Preferably, the deconvolution image reconstruction network algorithm in step 3-3) specifically comprises the following steps:
3-3-1) image input. The input image size is 14 × 14, the feature input is 5 × 5, the convolution kernel sizes are 3, 5, 7, 9, respectively, the number of convolution kernels is selected to be 32, the step size is 2, and the output image is 28 × 28 by adjusting the padding. The output image size after input and deconvolution is given by the following equation:
Figure BDA0003541410270000052
In the formula
Figure BDA0003541410270000053
Denotes the output image size, s denotes the step size,
Figure BDA0003541410270000054
denotes the input image size, k denotes the convolution kernel size, and p denotes padding, i.e., the fill size.
3-3-2) splitting the encapsulation information. For 6272 neurons in the capsule, the full connectivity layer translates into a 14 × 32 tensor, which, combined with connectivity in the parallel convolutional network in the third dimension, makes the 14 × 32 tensor into a 14 × 160 tensor.
3-3-3) redundancy removal deconvolution. Redundancy is removed from the 14 × 160 tensor using a convolution kernel of 1 × 1, and finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, thereby generating a final reconstructed image.
Preferably, the step 4) of extracting the feature map of the center point of the detection frame and the feature map of the scale of the detection frame, forming the corresponding target frame and outputting the detection result specifically comprises the steps of:
4-1) image input. The reconstructed image was used as input to a convolution layer of size 3 x 3 and output channel 256.
4-2) extracting a central feature map. And extracting a feature map of the center point of the detection frame by using a 1-by-1 convolution kernel, wherein the coordinate of the center point of the object is marked as a positive value, and the coordinate of the center point of the non-object is marked as a negative value.
4-3) extracting a scale feature map. And extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels. And the scale in the scale feature map is the length and width of the detection frame.
4-4) outputting the image. And fusing the central characteristic diagram and the scale characteristic diagram to complete object target detection.
The invention has the beneficial effects that:
the invention aims at the problem of insufficient extraction of original image information in the prior art, improves the capsule network structure, adds a parallel convolution network layer in front of a main capsule layer, and after the parallel convolution network extracts convolution characteristics, the convolution characteristic graph and the direction gradient histogram of an original image are directly fused in a dimension vector connection mode instead of a network mode, so that the input information of the network is richer and closer to the original image information.
The invention aims at the problems of redundancy and high algorithm complexity of the existing capsule structure, and performs redundancy removal operation on the main capsule. The redundancy-removing main capsule network simplifies the network structure, reduces the parameters compared with the original main capsule network, optimizes the time required by network training and improves the algorithm efficiency.
Drawings
FIG. 1 is a schematic diagram of the overall structure of a target detection model according to the present invention;
FIG. 2 is a flow chart of a parallel model of Histogram of Oriented Gradients (HOG) and a convolution feature map in accordance with the present invention;
FIG. 3 is a diagram of the improved capsule network architecture of the present invention;
FIG. 4 is a schematic diagram of a parallel convolutional network in the present invention;
FIG. 5 is a schematic diagram of a redundancy elimination capsule network according to the present invention;
FIG. 6 is a schematic diagram of a deconvolution image reconstruction network in accordance with the present invention;
FIG. 7 is an original image;
FIG. 8 is a schematic diagram of HOG histogram processing images;
FIG. 9 is a schematic diagram of a detection result of a pedestrian image by the method.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
As shown in fig. 1, a target detection method based on histogram of oriented gradients and improved capsule network mainly includes the following steps:
1) the target original image is obtained as shown in fig. 7.
2) As shown in fig. 2, the histogram of the directional gradient of the original image is fused in parallel with the convolution feature map to combine the edge contour feature of the image with the view feature of the convolution kernel, and then the result is used as the input of the improved capsule network;
2-1) normalization. Firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
Figure BDA0003541410270000061
Figure BDA0003541410270000062
In the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,
Figure BDA0003541410270000063
and f represents the target image after the normalization of all the block feature vectors is completed.
2-2) selecting a detection window from the normalized image. A detection window is selected that is equal to the image aspect ratio and does not exceed one-half the image size.
2-3), blocks are selected from the window. And selecting rectangular blocks with the same length and width according to the detection window.
2-4), cell cells are divided within a block. The block is divided within the rectangular block using square cells of 8 × 8 pixels size as the minimum unit of feature extraction within the block.
2-5), performing directional projection in the cell. 9 directions are divided in the cell, and direction information is extracted by taking an angle range of every 20 degrees. The direction information is obtained by performing convolution operation on the gray level image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
Figure BDA0003541410270000071
Figure BDA0003541410270000072
in the formula, H (x, y) represents a gray value at the corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents the vertical gradient value, G represents the gradient magnitude, and α represents the gradient direction.
2-6) normalization within the cell. The actual number of direction angles in each direction angle range is counted in the cell to obtain a direction histogram, as shown in fig. 8. And selecting the direction angle with the most concentrated angle directions as the direction of the cell.
2-7) building HOG features within blocks. And counting the actual direction angle number of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block.
2-8) if the last block has not been reached, return to step 3).
2-9) if the last window is not reached, returning to the step 2), otherwise, obtaining a directional gradient histogram.
2-10) single row convolution. Inputting an original image, and performing feature extraction on the original image by using a layer of single-line convolution layer to obtain a convolution feature map.
2-11) feature splicing and parallel fusion. And dimension connecting the convolution feature maps with the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
3) As shown in fig. 3 and 4, the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by using a redundancy-removing capsule network, and realizes image reconstruction by using a deconvolution image reconstruction network;
3-1) extracting comprehensive characteristics by using a parallel convolution network;
3-1-1) first use a parallel convolutional neural network as the feature extraction network. The parallel fused images were used as input, with an image size of 28 × 1. The parallel convolution neural network adopts 4 convolution kernels in the convolution layer, the sizes of the convolution kernels are respectively 3, 5, 7 and 9, the number of the convolution kernels is selected to be 32, and the step length is 2.
3-1-2) boundary filling. And adjusting padding size to carry out boundary filling on the original matrix.
3-1-3) feature extraction. The nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula of the PReLU function is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α represents a learning rate.
3-1-4) feature tensor connections. The third dimension enables the connection of the feature tensors, resulting in a feature tensor of 14 × 128.
3-2) generating feature vectors using the redundancy removal capsule network as in FIG. 5;
3-2-1) input to remove redundancy. And (3) adopting the output of the parallel convolution network as the input of the redundancy removing main capsule network, and removing redundant capsules by using a convolution kernel of 1 × 1 so that the feature map of 16 × 16 after feature extraction is converted into a feature image of 14 × 14, and the number of capsules is reduced to 196.
3-2-2) input capsule scalar ui
3-2-3) vector obtained by multiplying input capsule vector by transformation matrix
Figure BDA0003541410270000081
The mathematical formula is as follows:
Figure BDA0003541410270000082
in the formula WijIs a transformation matrix.
3-2-4) to convert the vector
Figure BDA0003541410270000083
And coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
Figure BDA0003541410270000084
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
Figure BDA0003541410270000085
in the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function.
3-2-6) update the coupling coefficient c using the softmax equation ijThe mathematical formula is as follows:
Figure BDA0003541410270000091
in the formula
Figure BDA0003541410270000092
Indicating the updated coupling coefficient of the dynamic route,
Figure BDA0003541410270000093
is set to 0. v. ofjRepresenting non-linear compression functions, vectors
Figure BDA0003541410270000094
I.e. the multiplication of the capsule vector and the transformation matrix.
3-2-7) pairs of vjThe number of updates k is the number of capsules, the final result is
Figure BDA0003541410270000095
I.e. v of the final outputjI.e. the feature vector characterizing the jth class. Otherwise, returning to the step 3-2-2).
3-3) as in FIG. 6, the original image is restored using the deconvolution image reconstruction network.
3-3-1) image input. The input image size is 14 × 14, the feature input is 5 × 5, the convolution kernel sizes are 3, 5, 7, 9, respectively, the number of convolution kernels is selected to be 32, the step size is 2, and the output image is 28 × 28 by adjusting the padding. The output image size after input and deconvolution is given by the following equation:
Figure BDA0003541410270000096
in the formula
Figure BDA0003541410270000097
Representing the output image size, s represents the step size,
Figure BDA0003541410270000098
representing the input image size, k the convolution kernel size, and p the padding, i.e. the fill size.
3-3-2) splitting the encapsulation information. For 6272 neurons in the capsule, the full connectivity layer translates into a 14 × 32 tensor, which, combined with connectivity in the parallel convolutional network in the third dimension, makes the 14 × 32 tensor into a 14 × 160 tensor.
3-3-3) redundancy removal deconvolution. Redundancy is removed from the 14 × 160 tensor using a convolution kernel of 1 × 1, and finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, thereby generating a final reconstructed image.
4) And extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
4-1) image input. The reconstructed image was used as input to a convolution layer of size 3 x 3 and output channel 256.
4-2) center feature map. And extracting a feature map of the center point of the detection frame by using a 1-by-1 convolution kernel, and marking the coordinate mark of the center point of the object as a positive value, while marking the coordinate mark of the center point of the object as a negative value.
4-3) a scale feature map. And extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels. And the scale in the scale feature map is the length and width of the detection frame.
4-4) outputting the image. And fusing the central feature map and the scale feature map to complete object target detection, as shown in fig. 9.

Claims (7)

1. A target detection method based on a direction gradient histogram and an improved capsule network is characterized by comprising the following steps:
1) obtaining a target original image, marking the target position by using a marking tool, and then randomly selecting different images as a training set;
2) Merging the direction gradient histogram of the original image and the convolution characteristic graph in parallel to combine the edge contour characteristic of the image and the visual field characteristic of the convolution kernel, and taking the edge contour characteristic and the visual field characteristic as the input of an improved capsule network;
3) the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by removing redundant capsule networks, and realizes image reconstruction by using a deconvolution image reconstruction network;
4) and extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
2. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the algorithm for parallel fusion of the direction gradient histogram and the convolution feature map in the step 2) comprises the following specific steps:
2-1) normalization; firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
Figure FDA0003541410260000011
Figure FDA0003541410260000012
in the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,
Figure FDA0003541410260000013
representing the ith block in the jth cell of the target image, and f representing the target image after the normalization of all block feature vectors is completed;
2-2) selecting a detection window from the normalized image, selecting a detection window having an aspect ratio equal to that of the image and not more than half the size of the image;
2-3) selecting blocks from the window, and selecting rectangular blocks with the same length and width according to the detection window;
2-4) dividing cell units in the block, and dividing the block by using square cells with 8 × 8 pixel size as minimum units for feature extraction in the block in a rectangular block;
2-5) performing directional projection in the cell, dividing 9 directions in the cell, extracting direction information in an angle range of every 20 degrees, wherein the direction information is obtained by performing convolution operation on the gray image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
Figure FDA0003541410260000021
Figure FDA0003541410260000022
in the formula, H (x, y) represents a gray value at the corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents a vertical gradient value, G represents a gradient amplitude, and alpha represents a gradient direction;
2-6) carrying out normalization in the cell, counting the actual direction angle quantity of each direction angle range in the cell to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the cell;
2-7) constructing HOG characteristics in the block, counting the actual direction angle quantity of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block;
2-8) if the last block is not reached, returning to the step 2-3);
2-9) if the last window is not reached, returning to the step 2-2), otherwise, obtaining a directional gradient histogram;
2-10) performing single-line convolution, inputting an original image, and performing feature extraction on the original image by using a single-line convolution layer to obtain a convolution feature map;
2-11) feature splicing and parallel fusion, performing dimension connection on the convolution feature maps and the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
3. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the specific steps of improving the capsule network algorithm in the step 3) are as follows:
3-1) extracting comprehensive characteristics by using a parallel convolution network;
3-2) generating a feature vector by utilizing a redundancy removing capsule network;
and 3-3) restoring the original image by using a deconvolution image reconstruction network, and evaluating the network loss.
4. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the parallel convolution network algorithm in the step 3-1) comprises the following specific steps:
3-1-1) firstly, using a parallel convolutional neural network as a feature extraction network, using an image after parallel fusion as input, wherein the size of the image is 28 × 1, the parallel convolutional neural network adopts 4 convolutional kernels in a convolutional layer, the sizes of the convolutional kernels are respectively 3, 5, 7 and 9, the number of the convolutional kernels is selected to be 32, and the step length is 2;
3-1-2) filling the boundary, and adjusting the padding size to fill the boundary of the original matrix;
3-1-3) feature extraction, wherein the nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α is a learning rate;
3-1-4) the feature tensors are connected, and the third dimension realizes the connection of the feature tensors, so that the feature tensor of 14 × 128 is obtained.
5. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the redundancy removing capsule network algorithm in the step 3-2) comprises the following specific steps:
3-2-1) inputting redundancy removal, adopting the output of the parallel convolution network as the input of a redundancy removal main capsule network, removing redundant capsules by using a 1 x 1 convolution kernel so that a 16 x 16 feature diagram after feature extraction is converted into 14 x 14 feature images, and reducing the number of capsules to 196;
3-2-2) input capsule scalar ui
3-2-3) vector obtained by multiplying input capsule vector by transformation matrix
Figure FDA0003541410260000031
The mathematical formula is as follows
Figure FDA0003541410260000032
In the formula WijIs a transformation matrix;
3-2-4) to convert the vector
Figure FDA0003541410260000033
And coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
Figure FDA0003541410260000034
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
Figure FDA0003541410260000035
In the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function;
3-2-6) update the coupling coefficient c using the softmax equationijThe mathematical formula is as follows:
Figure FDA0003541410260000036
in the formula
Figure FDA0003541410260000037
Indicating the updated coupling coefficient of the dynamic route,
Figure FDA0003541410260000038
is set to 0, vjRepresenting non-linear compression functions, vectors
Figure FDA0003541410260000039
The vector is obtained by multiplying the capsule vector by a transformation matrix;
3-2-7) pairs of vjThe number of updates k is the number of capsules, the final result is
Figure FDA0003541410260000041
I.e. the final outputV isjI.e. the feature vector characterizing the jth class, otherwise, return to step 3-2-2).
6. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the deconvolution image reconstruction network algorithm in the step 3-3) comprises the following specific steps:
3-3-1) image input, the size of the input image is 14 × 14, the feature input is 5 × 5, the sizes of convolution kernels are 3, 5, 7 and 9 respectively, the number of convolution kernels is selected to be 32, the step size is 2, the output image is 28 × 28 by adjusting the filling, and the output image size after input and deconvolution is as follows:
Figure FDA0003541410260000042
in the formula
Figure FDA0003541410260000043
Representing the output image size, s represents the step size,
Figure FDA0003541410260000044
representing the input image size, k the convolution kernel size, p the padding, i.e. the fill size;
3-3-2) splitting of encapsulation information, for 6272 neurons in the capsule, transforming into a 14 x 32 tensor by fully connecting layers, combining this tensor with connections in a parallel convolutional network in a third dimension such that the 14 x 32 tensor becomes a 14 x 160 tensor;
3-3-3) redundancy removal deconvolution, redundancy removal is performed on the 14 × 160 tensor using a 1 × 1 convolution kernel, finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, so that a final reconstructed image is generated.
7. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the step 4) of extracting the feature map of the center point of the detection frame and the feature map of the scale of the detection frame to form the corresponding target frame and outputting the detection result specifically comprises the following steps:
4-1) image input, using the reconstructed image as the input of the convolution layer with the size of 3 x 3 and the output channel of 256;
4-2) extracting a central feature map, extracting a feature map of the central point of the detection frame by using a 1 x 1 convolution kernel, marking the coordinate of the central point of the object as a positive value, and marking the coordinate of the central point of the non-object as a negative value;
4-3) extracting a scale feature map, and extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels, wherein scales in the scale feature map are the length and width of a detection frame;
And 4-4) outputting an image, and fusing the central characteristic diagram and the scale characteristic diagram to finish object target detection.
CN202210234245.5A 2022-03-10 2022-03-10 Target detection method based on direction gradient histogram and improved capsule network Pending CN114677558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234245.5A CN114677558A (en) 2022-03-10 2022-03-10 Target detection method based on direction gradient histogram and improved capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234245.5A CN114677558A (en) 2022-03-10 2022-03-10 Target detection method based on direction gradient histogram and improved capsule network

Publications (1)

Publication Number Publication Date
CN114677558A true CN114677558A (en) 2022-06-28

Family

ID=82073153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234245.5A Pending CN114677558A (en) 2022-03-10 2022-03-10 Target detection method based on direction gradient histogram and improved capsule network

Country Status (1)

Country Link
CN (1) CN114677558A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065536A1 (en) * 2022-09-29 2024-04-04 Intel Corporation Methods and apparatus for image segmentation on small datasets
CN117953316A (en) * 2024-03-27 2024-04-30 湖北楚天龙实业有限公司 Image quality inspection method and system based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065536A1 (en) * 2022-09-29 2024-04-04 Intel Corporation Methods and apparatus for image segmentation on small datasets
CN117953316A (en) * 2024-03-27 2024-04-30 湖北楚天龙实业有限公司 Image quality inspection method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN111461291B (en) Long-distance pipeline inspection method based on YOLOv3 pruning network and deep learning defogging model
CN110059698B (en) Semantic segmentation method and system based on edge dense reconstruction for street view understanding
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
CN111914838B (en) License plate recognition method based on text line recognition
CN113344806A (en) Image defogging method and system based on global feature fusion attention network
CN111507275B (en) Video data time sequence information extraction method and device based on deep learning
CN109753959B (en) Road traffic sign detection method based on self-adaptive multi-scale feature fusion
CN109886159B (en) Face detection method under non-limited condition
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN110969171A (en) Image classification model, method and application based on improved convolutional neural network
CN114022770A (en) Mountain crack detection method based on improved self-attention mechanism and transfer learning
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN111612024A (en) Feature extraction method and device, electronic equipment and computer-readable storage medium
CN116052016A (en) Fine segmentation detection method for remote sensing image cloud and cloud shadow based on deep learning
CN111640116A (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN113269224A (en) Scene image classification method, system and storage medium
CN117197763A (en) Road crack detection method and system based on cross attention guide feature alignment network
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
CN115294356A (en) Target detection method based on wide area receptive field space attention
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN115393928A (en) Face recognition method and device based on depth separable convolution and additive angle interval loss
CN115222754A (en) Mirror image segmentation method based on knowledge distillation and antagonistic learning
CN114677558A (en) Target detection method based on direction gradient histogram and improved capsule network
CN111027472A (en) Video identification method based on fusion of video optical flow and image space feature weight
CN114519819A (en) Remote sensing image target detection method based on global context awareness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination