CN114677558A - Target detection method based on direction gradient histogram and improved capsule network - Google Patents
Target detection method based on direction gradient histogram and improved capsule network Download PDFInfo
- Publication number
- CN114677558A CN114677558A CN202210234245.5A CN202210234245A CN114677558A CN 114677558 A CN114677558 A CN 114677558A CN 202210234245 A CN202210234245 A CN 202210234245A CN 114677558 A CN114677558 A CN 114677558A
- Authority
- CN
- China
- Prior art keywords
- image
- convolution
- feature
- network
- parallel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method based on a direction gradient histogram and an improved capsule network. The improved capsule network utilizes a parallel convolution network to extract comprehensive characteristics, forms characteristic vectors through a redundancy-removing capsule network, utilizes a deconvolution image reconstruction network to realize image reconstruction, and trains a network model. And finally, extracting a feature graph of the center point of the detection frame and a feature graph of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method based on a direction gradient histogram and an improved capsule network.
Background
The traditional convolutional neural network extracts target characteristics through convolution operation and realizes network learning through back propagation so as to achieve the aim of target detection, and a remarkable result is obtained in a target detection task. However, the conventional convolutional neural network has the problems of insufficient attention to information such as relative positions and directions of elements in an image and information loss of a pooling layer, and factors such as barrier shielding and severe weather can seriously affect the detection and identification of a target. The capsule neural network provides the concept of the capsule to replace neurons in a partial convolutional neural network, expands the scalar quantity of the traditional neural network into a vector quantity, and effectively overcomes the defects of the traditional neural network. Therefore, the capsule neural network has higher overall recognition rate when being applied to target detection, and has higher accuracy and robustness to meet the target detection task under different influence factors.
However, conventional capsule networks have only one capsule of a given type at a given location, and thus two objects of the same type cannot be detected if a capsule network is too close to each other. The method is characterized in that a directional gradient histogram is adopted to preprocess an original image in a text of 'vehicle type classification of a capsule network with gradient histogram convolution characteristics under traffic monitoring', so that the problem that two objects of the same type cannot be detected is solved to a certain extent, but the problems of insufficient extraction of original image information, redundant capsule structures, high algorithm complexity and the like exist because the structure of the capsule network is not improved.
Disclosure of Invention
The invention aims to solve the technical problems that the original image information extraction is not rich enough, the capsule structure has redundancy and the algorithm complexity is higher in the prior art, and provides a target detection method based on a direction gradient histogram and an improved capsule network.
A target detection method based on a direction gradient histogram and an improved capsule network comprises the following steps:
1) obtaining a target original image, marking the target position by using a marking tool, and then randomly selecting different images as a training set;
2) Merging a direction gradient Histogram (HOG) of an original image and a convolution feature map in parallel to combine an edge contour feature of the image and a visual field feature of a convolution kernel, and taking the edge contour feature and the visual field feature as input of an improved capsule network;
3) the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by removing redundant capsule networks, and realizes image reconstruction by using a deconvolution image reconstruction network;
4) and extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
Preferably, the parallel fusion algorithm of the Histogram of Oriented Gradients (HOG) and the convolution feature map in the step 2) specifically comprises the following steps:
2-1) normalization; firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
in the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,and f represents the target image after the normalization of all the block feature vectors is completed.
2-2) selecting a detection window from the normalized image. A detection window is selected that is equal to the image aspect ratio and does not exceed one-half the image size.
2-3) selecting a block from the window. And selecting rectangular blocks with the same length and width according to the detection window.
2-4) dividing cell cells within a block. The block is divided within a rectangular block using square cells of 8 × 8 pixel size as the minimum unit for feature extraction within the block.
2-5) performing directional projection in the cell. 9 directions are divided in the cell, and direction information is extracted every 20 degrees as an angle range. The direction information is obtained by performing convolution operation on the gray level image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
in the formula, H (x, y) represents a gray scale value in a corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents the vertical gradient value, G represents the gradient magnitude, and α represents the gradient direction.
2-6) normalization within the cell. And counting the actual direction angle quantity of each direction angle range in the cell to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the cell.
2-7) building HOG features within a block. And counting the actual direction angle number of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block.
2-8) if the last block is not reached, returning to step 2-3).
2-9) if the last window is not reached, returning to the step 2-2), otherwise, obtaining a direction gradient histogram.
2-10) single row convolution. Inputting an original image, and performing feature extraction on the original image by using a single-row convolution layer to obtain a convolution feature map.
2-11) feature splicing and parallel fusion. And dimension connecting the convolution feature maps with the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
The pooling layer structure adopted in the traditional convolutional neural network does not consider the relative relation in space, thereby causing partial loss of value information of the layer. In order to solve the problem, a digital capsule layer is used in a capsule network to perform the function of a pooling layer, and a vector neuron is proposed, wherein the vector neuron stores information such as direction and position in a vector form and continuously transmits the information in the network, so that the capsule network can have sensitivity to the position and direction information of elements in an image.
The improved capsule network based on the directional gradient histogram extracts features through a convolutional layer and an HOG in parallel, an image preprocessing part of the improved capsule network adopts HOC-C features, the advantages of an original capsule network are kept, meanwhile, the extraction of edge feature information of a target detection image is enhanced through the introduction of the directional gradient histogram, the identification degree between two capsule networks which are too close to each other is increased through the gradient direction and the amplitude, and the problem that two objects of the same type cannot be detected is solved to a certain extent.
Preferably, the step 3) of improving the capsule network algorithm specifically comprises the following steps:
3-1) extracting comprehensive characteristics by utilizing a parallel convolution network.
3-2) generating the feature vector by utilizing the redundancy removing capsule network.
And 3-3) restoring the original image by using the deconvolution image reconstruction network, and evaluating the network loss.
Preferably, the parallel convolution network algorithm in step 3-1) specifically comprises the following steps:
3-1-1) first use a parallel convolutional neural network as the feature extraction network. The parallel fused images were used as input, with an image size of 28 × 1. The parallel convolution neural network adopts 4 convolution kernels in the convolution layer, the sizes of the convolution kernels are respectively 3, 5, 7 and 9, the number of the convolution kernels is selected to be 32, and the step length is 2.
3-1-2) boundary filling. And adjusting padding size to carry out boundary filling on the original matrix.
3-1-3) feature extraction. The nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α is a learning rate.
3-1-4) feature tensor concatenation. The third dimension enables the connection of the feature tensors, resulting in a feature tensor of 14 × 128.
Preferably, the redundancy removing capsule network algorithm in the step 3-2) specifically comprises the following steps:
3-2-1) input redundancy removal. And (3) adopting the output of the parallel convolution network as the input of the redundancy removing main capsule network, and removing redundant capsules by using a convolution kernel of 1 × 1 so that the feature map of 16 × 16 after feature extraction is converted into a feature image of 14 × 14, and the number of capsules is reduced to 196.
3-2-2) input capsule scalar ui。
3-2-3) vector obtained by multiplying input capsule vector by transformation matrixThe mathematical formula is as follows
In the formula WijIs a transformation matrix.
3-2-4) dividing the vectorAnd coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
in the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function.
3-2-6) update the coupling coefficient c using the softmax equationijThe mathematical formula is as follows:
in the formulaIndicating the updated coupling coefficient of the dynamic route,is set to 0. v. ofjRepresenting non-linear compression functions, vectorsI.e. the multiplication of the capsule vector and the transformation matrix.
3-2-7) pairs of vjThe number of updates k is the number of capsules, the final result isI.e. v of the final outputjI.e. the feature vector characterizing the jth class. Otherwise, returning to the step 3-2-2).
Preferably, the deconvolution image reconstruction network algorithm in step 3-3) specifically comprises the following steps:
3-3-1) image input. The input image size is 14 × 14, the feature input is 5 × 5, the convolution kernel sizes are 3, 5, 7, 9, respectively, the number of convolution kernels is selected to be 32, the step size is 2, and the output image is 28 × 28 by adjusting the padding. The output image size after input and deconvolution is given by the following equation:
In the formulaDenotes the output image size, s denotes the step size,denotes the input image size, k denotes the convolution kernel size, and p denotes padding, i.e., the fill size.
3-3-2) splitting the encapsulation information. For 6272 neurons in the capsule, the full connectivity layer translates into a 14 × 32 tensor, which, combined with connectivity in the parallel convolutional network in the third dimension, makes the 14 × 32 tensor into a 14 × 160 tensor.
3-3-3) redundancy removal deconvolution. Redundancy is removed from the 14 × 160 tensor using a convolution kernel of 1 × 1, and finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, thereby generating a final reconstructed image.
Preferably, the step 4) of extracting the feature map of the center point of the detection frame and the feature map of the scale of the detection frame, forming the corresponding target frame and outputting the detection result specifically comprises the steps of:
4-1) image input. The reconstructed image was used as input to a convolution layer of size 3 x 3 and output channel 256.
4-2) extracting a central feature map. And extracting a feature map of the center point of the detection frame by using a 1-by-1 convolution kernel, wherein the coordinate of the center point of the object is marked as a positive value, and the coordinate of the center point of the non-object is marked as a negative value.
4-3) extracting a scale feature map. And extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels. And the scale in the scale feature map is the length and width of the detection frame.
4-4) outputting the image. And fusing the central characteristic diagram and the scale characteristic diagram to complete object target detection.
The invention has the beneficial effects that:
the invention aims at the problem of insufficient extraction of original image information in the prior art, improves the capsule network structure, adds a parallel convolution network layer in front of a main capsule layer, and after the parallel convolution network extracts convolution characteristics, the convolution characteristic graph and the direction gradient histogram of an original image are directly fused in a dimension vector connection mode instead of a network mode, so that the input information of the network is richer and closer to the original image information.
The invention aims at the problems of redundancy and high algorithm complexity of the existing capsule structure, and performs redundancy removal operation on the main capsule. The redundancy-removing main capsule network simplifies the network structure, reduces the parameters compared with the original main capsule network, optimizes the time required by network training and improves the algorithm efficiency.
Drawings
FIG. 1 is a schematic diagram of the overall structure of a target detection model according to the present invention;
FIG. 2 is a flow chart of a parallel model of Histogram of Oriented Gradients (HOG) and a convolution feature map in accordance with the present invention;
FIG. 3 is a diagram of the improved capsule network architecture of the present invention;
FIG. 4 is a schematic diagram of a parallel convolutional network in the present invention;
FIG. 5 is a schematic diagram of a redundancy elimination capsule network according to the present invention;
FIG. 6 is a schematic diagram of a deconvolution image reconstruction network in accordance with the present invention;
FIG. 7 is an original image;
FIG. 8 is a schematic diagram of HOG histogram processing images;
FIG. 9 is a schematic diagram of a detection result of a pedestrian image by the method.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
As shown in fig. 1, a target detection method based on histogram of oriented gradients and improved capsule network mainly includes the following steps:
1) the target original image is obtained as shown in fig. 7.
2) As shown in fig. 2, the histogram of the directional gradient of the original image is fused in parallel with the convolution feature map to combine the edge contour feature of the image with the view feature of the convolution kernel, and then the result is used as the input of the improved capsule network;
2-1) normalization. Firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
In the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,and f represents the target image after the normalization of all the block feature vectors is completed.
2-2) selecting a detection window from the normalized image. A detection window is selected that is equal to the image aspect ratio and does not exceed one-half the image size.
2-3), blocks are selected from the window. And selecting rectangular blocks with the same length and width according to the detection window.
2-4), cell cells are divided within a block. The block is divided within the rectangular block using square cells of 8 × 8 pixels size as the minimum unit of feature extraction within the block.
2-5), performing directional projection in the cell. 9 directions are divided in the cell, and direction information is extracted by taking an angle range of every 20 degrees. The direction information is obtained by performing convolution operation on the gray level image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
in the formula, H (x, y) represents a gray value at the corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents the vertical gradient value, G represents the gradient magnitude, and α represents the gradient direction.
2-6) normalization within the cell. The actual number of direction angles in each direction angle range is counted in the cell to obtain a direction histogram, as shown in fig. 8. And selecting the direction angle with the most concentrated angle directions as the direction of the cell.
2-7) building HOG features within blocks. And counting the actual direction angle number of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block.
2-8) if the last block has not been reached, return to step 3).
2-9) if the last window is not reached, returning to the step 2), otherwise, obtaining a directional gradient histogram.
2-10) single row convolution. Inputting an original image, and performing feature extraction on the original image by using a layer of single-line convolution layer to obtain a convolution feature map.
2-11) feature splicing and parallel fusion. And dimension connecting the convolution feature maps with the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
3) As shown in fig. 3 and 4, the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by using a redundancy-removing capsule network, and realizes image reconstruction by using a deconvolution image reconstruction network;
3-1) extracting comprehensive characteristics by using a parallel convolution network;
3-1-1) first use a parallel convolutional neural network as the feature extraction network. The parallel fused images were used as input, with an image size of 28 × 1. The parallel convolution neural network adopts 4 convolution kernels in the convolution layer, the sizes of the convolution kernels are respectively 3, 5, 7 and 9, the number of the convolution kernels is selected to be 32, and the step length is 2.
3-1-2) boundary filling. And adjusting padding size to carry out boundary filling on the original matrix.
3-1-3) feature extraction. The nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula of the PReLU function is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α represents a learning rate.
3-1-4) feature tensor connections. The third dimension enables the connection of the feature tensors, resulting in a feature tensor of 14 × 128.
3-2) generating feature vectors using the redundancy removal capsule network as in FIG. 5;
3-2-1) input to remove redundancy. And (3) adopting the output of the parallel convolution network as the input of the redundancy removing main capsule network, and removing redundant capsules by using a convolution kernel of 1 × 1 so that the feature map of 16 × 16 after feature extraction is converted into a feature image of 14 × 14, and the number of capsules is reduced to 196.
3-2-2) input capsule scalar ui。
3-2-3) vector obtained by multiplying input capsule vector by transformation matrixThe mathematical formula is as follows:
in the formula WijIs a transformation matrix.
3-2-4) to convert the vectorAnd coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
in the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function.
3-2-6) update the coupling coefficient c using the softmax equation ijThe mathematical formula is as follows:
in the formulaIndicating the updated coupling coefficient of the dynamic route,is set to 0. v. ofjRepresenting non-linear compression functions, vectorsI.e. the multiplication of the capsule vector and the transformation matrix.
3-2-7) pairs of vjThe number of updates k is the number of capsules, the final result isI.e. v of the final outputjI.e. the feature vector characterizing the jth class. Otherwise, returning to the step 3-2-2).
3-3) as in FIG. 6, the original image is restored using the deconvolution image reconstruction network.
3-3-1) image input. The input image size is 14 × 14, the feature input is 5 × 5, the convolution kernel sizes are 3, 5, 7, 9, respectively, the number of convolution kernels is selected to be 32, the step size is 2, and the output image is 28 × 28 by adjusting the padding. The output image size after input and deconvolution is given by the following equation:
in the formulaRepresenting the output image size, s represents the step size,representing the input image size, k the convolution kernel size, and p the padding, i.e. the fill size.
3-3-2) splitting the encapsulation information. For 6272 neurons in the capsule, the full connectivity layer translates into a 14 × 32 tensor, which, combined with connectivity in the parallel convolutional network in the third dimension, makes the 14 × 32 tensor into a 14 × 160 tensor.
3-3-3) redundancy removal deconvolution. Redundancy is removed from the 14 × 160 tensor using a convolution kernel of 1 × 1, and finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, thereby generating a final reconstructed image.
4) And extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
4-1) image input. The reconstructed image was used as input to a convolution layer of size 3 x 3 and output channel 256.
4-2) center feature map. And extracting a feature map of the center point of the detection frame by using a 1-by-1 convolution kernel, and marking the coordinate mark of the center point of the object as a positive value, while marking the coordinate mark of the center point of the object as a negative value.
4-3) a scale feature map. And extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels. And the scale in the scale feature map is the length and width of the detection frame.
4-4) outputting the image. And fusing the central feature map and the scale feature map to complete object target detection, as shown in fig. 9.
Claims (7)
1. A target detection method based on a direction gradient histogram and an improved capsule network is characterized by comprising the following steps:
1) obtaining a target original image, marking the target position by using a marking tool, and then randomly selecting different images as a training set;
2) Merging the direction gradient histogram of the original image and the convolution characteristic graph in parallel to combine the edge contour characteristic of the image and the visual field characteristic of the convolution kernel, and taking the edge contour characteristic and the visual field characteristic as the input of an improved capsule network;
3) the improved capsule network extracts comprehensive features by using a parallel convolution network, forms feature vectors by removing redundant capsule networks, and realizes image reconstruction by using a deconvolution image reconstruction network;
4) and extracting a feature map of the center point of the detection frame and a feature map of the scale of the detection frame by using the convolution layer of 3 x 256 and two parallel convolution kernels of 1 x 1 to form a corresponding target frame and output a detection result.
2. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the algorithm for parallel fusion of the direction gradient histogram and the convolution feature map in the step 2) comprises the following specific steps:
2-1) normalization; firstly, dividing a target image into 4 cells, dividing each cell into 9 blocks, carrying out Gamma normalization processing, and carrying out parameter optimization on a Gamma correction value; the normalization formula is as follows:
in the formula, tau represents a feature vector of a block, epsilon takes a smaller constant value,representing the ith block in the jth cell of the target image, and f representing the target image after the normalization of all block feature vectors is completed;
2-2) selecting a detection window from the normalized image, selecting a detection window having an aspect ratio equal to that of the image and not more than half the size of the image;
2-3) selecting blocks from the window, and selecting rectangular blocks with the same length and width according to the detection window;
2-4) dividing cell units in the block, and dividing the block by using square cells with 8 × 8 pixel size as minimum units for feature extraction in the block in a rectangular block;
2-5) performing directional projection in the cell, dividing 9 directions in the cell, extracting direction information in an angle range of every 20 degrees, wherein the direction information is obtained by performing convolution operation on the gray image I and the gradient template U in the x horizontal direction and the y vertical direction, and the mathematical formula is as follows:
Gx(x,y)=H(x+1,y)-H(x-1,y)
Gy(x,y)=H(x,y+1)-H(x,y-1)
in the formula, H (x, y) represents a gray value at the corresponding coordinate, GxRepresenting a horizontal gradient value, GyRepresents a vertical gradient value, G represents a gradient amplitude, and alpha represents a gradient direction;
2-6) carrying out normalization in the cell, counting the actual direction angle quantity of each direction angle range in the cell to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the cell;
2-7) constructing HOG characteristics in the block, counting the actual direction angle quantity of the direction angle range of each cell in the block to obtain a direction histogram, and selecting the direction angle with the most concentrated angle direction as the direction of the block;
2-8) if the last block is not reached, returning to the step 2-3);
2-9) if the last window is not reached, returning to the step 2-2), otherwise, obtaining a directional gradient histogram;
2-10) performing single-line convolution, inputting an original image, and performing feature extraction on the original image by using a single-line convolution layer to obtain a convolution feature map;
2-11) feature splicing and parallel fusion, performing dimension connection on the convolution feature maps and the directional gradient histogram, and connecting the two feature maps in the third dimension to obtain an image of 28 × 1.
3. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the specific steps of improving the capsule network algorithm in the step 3) are as follows:
3-1) extracting comprehensive characteristics by using a parallel convolution network;
3-2) generating a feature vector by utilizing a redundancy removing capsule network;
and 3-3) restoring the original image by using a deconvolution image reconstruction network, and evaluating the network loss.
4. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the parallel convolution network algorithm in the step 3-1) comprises the following specific steps:
3-1-1) firstly, using a parallel convolutional neural network as a feature extraction network, using an image after parallel fusion as input, wherein the size of the image is 28 × 1, the parallel convolutional neural network adopts 4 convolutional kernels in a convolutional layer, the sizes of the convolutional kernels are respectively 3, 5, 7 and 9, the number of the convolutional kernels is selected to be 32, and the step length is 2;
3-1-2) filling the boundary, and adjusting the padding size to fill the boundary of the original matrix;
3-1-3) feature extraction, wherein the nonlinear function of the feature extraction layer adopts a PReLU function, and the mathematical formula is as follows:
PReLU(x)=max(0,x)+α*min(0,x)
in the formula, α is a learning rate;
3-1-4) the feature tensors are connected, and the third dimension realizes the connection of the feature tensors, so that the feature tensor of 14 × 128 is obtained.
5. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the redundancy removing capsule network algorithm in the step 3-2) comprises the following specific steps:
3-2-1) inputting redundancy removal, adopting the output of the parallel convolution network as the input of a redundancy removal main capsule network, removing redundant capsules by using a 1 x 1 convolution kernel so that a 16 x 16 feature diagram after feature extraction is converted into 14 x 14 feature images, and reducing the number of capsules to 196;
3-2-2) input capsule scalar ui;
3-2-3) vector obtained by multiplying input capsule vector by transformation matrixThe mathematical formula is as follows
In the formula WijIs a transformation matrix;
3-2-4) to convert the vectorAnd coefficient of coupling cijWeighted summation is carried out to obtain weighted sum sjThe mathematical formula is as follows:
3-2-5) Using a nonlinear function pair sjCompressed and propagated forward, the mathematical formula is as follows:
In the formula sjRepresenting a weighted sum, vjRepresenting a non-linear compression function;
3-2-6) update the coupling coefficient c using the softmax equationijThe mathematical formula is as follows:
in the formulaIndicating the updated coupling coefficient of the dynamic route,is set to 0, vjRepresenting non-linear compression functions, vectorsThe vector is obtained by multiplying the capsule vector by a transformation matrix;
6. The histogram of oriented gradients and improved capsule network based object detection method of claim 3, wherein: the deconvolution image reconstruction network algorithm in the step 3-3) comprises the following specific steps:
3-3-1) image input, the size of the input image is 14 × 14, the feature input is 5 × 5, the sizes of convolution kernels are 3, 5, 7 and 9 respectively, the number of convolution kernels is selected to be 32, the step size is 2, the output image is 28 × 28 by adjusting the filling, and the output image size after input and deconvolution is as follows:
in the formulaRepresenting the output image size, s represents the step size,representing the input image size, k the convolution kernel size, p the padding, i.e. the fill size;
3-3-2) splitting of encapsulation information, for 6272 neurons in the capsule, transforming into a 14 x 32 tensor by fully connecting layers, combining this tensor with connections in a parallel convolutional network in a third dimension such that the 14 x 32 tensor becomes a 14 x 160 tensor;
3-3-3) redundancy removal deconvolution, redundancy removal is performed on the 14 × 160 tensor using a 1 × 1 convolution kernel, finally a 14 × 32 tensor is obtained, and a deconvolution operation is performed on the corresponding convolution kernel in the parallel convolution layer, so that a final reconstructed image is generated.
7. The histogram of oriented gradients and improved capsule network based object detection method of claim 1, wherein: the step 4) of extracting the feature map of the center point of the detection frame and the feature map of the scale of the detection frame to form the corresponding target frame and outputting the detection result specifically comprises the following steps:
4-1) image input, using the reconstructed image as the input of the convolution layer with the size of 3 x 3 and the output channel of 256;
4-2) extracting a central feature map, extracting a feature map of the central point of the detection frame by using a 1 x 1 convolution kernel, marking the coordinate of the central point of the object as a positive value, and marking the coordinate of the central point of the non-object as a negative value;
4-3) extracting a scale feature map, and extracting a detection frame scale feature map by using parallel 1-by-1 convolution kernels, wherein scales in the scale feature map are the length and width of a detection frame;
And 4-4) outputting an image, and fusing the central characteristic diagram and the scale characteristic diagram to finish object target detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234245.5A CN114677558A (en) | 2022-03-10 | 2022-03-10 | Target detection method based on direction gradient histogram and improved capsule network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234245.5A CN114677558A (en) | 2022-03-10 | 2022-03-10 | Target detection method based on direction gradient histogram and improved capsule network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114677558A true CN114677558A (en) | 2022-06-28 |
Family
ID=82073153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210234245.5A Pending CN114677558A (en) | 2022-03-10 | 2022-03-10 | Target detection method based on direction gradient histogram and improved capsule network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114677558A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065536A1 (en) * | 2022-09-29 | 2024-04-04 | Intel Corporation | Methods and apparatus for image segmentation on small datasets |
CN117953316A (en) * | 2024-03-27 | 2024-04-30 | 湖北楚天龙实业有限公司 | Image quality inspection method and system based on artificial intelligence |
-
2022
- 2022-03-10 CN CN202210234245.5A patent/CN114677558A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065536A1 (en) * | 2022-09-29 | 2024-04-04 | Intel Corporation | Methods and apparatus for image segmentation on small datasets |
CN117953316A (en) * | 2024-03-27 | 2024-04-30 | 湖北楚天龙实业有限公司 | Image quality inspection method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111461291B (en) | Long-distance pipeline inspection method based on YOLOv3 pruning network and deep learning defogging model | |
CN110059698B (en) | Semantic segmentation method and system based on edge dense reconstruction for street view understanding | |
CN110070091B (en) | Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding | |
CN111914838B (en) | License plate recognition method based on text line recognition | |
CN113344806A (en) | Image defogging method and system based on global feature fusion attention network | |
CN111507275B (en) | Video data time sequence information extraction method and device based on deep learning | |
CN109753959B (en) | Road traffic sign detection method based on self-adaptive multi-scale feature fusion | |
CN109886159B (en) | Face detection method under non-limited condition | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN110969171A (en) | Image classification model, method and application based on improved convolutional neural network | |
CN114022770A (en) | Mountain crack detection method based on improved self-attention mechanism and transfer learning | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN111612024A (en) | Feature extraction method and device, electronic equipment and computer-readable storage medium | |
CN116052016A (en) | Fine segmentation detection method for remote sensing image cloud and cloud shadow based on deep learning | |
CN111640116A (en) | Aerial photography graph building segmentation method and device based on deep convolutional residual error network | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN117197763A (en) | Road crack detection method and system based on cross attention guide feature alignment network | |
CN113298817A (en) | High-accuracy semantic segmentation method for remote sensing image | |
CN115294356A (en) | Target detection method based on wide area receptive field space attention | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN115393928A (en) | Face recognition method and device based on depth separable convolution and additive angle interval loss | |
CN115222754A (en) | Mirror image segmentation method based on knowledge distillation and antagonistic learning | |
CN114677558A (en) | Target detection method based on direction gradient histogram and improved capsule network | |
CN111027472A (en) | Video identification method based on fusion of video optical flow and image space feature weight | |
CN114519819A (en) | Remote sensing image target detection method based on global context awareness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |