CN117876716B

CN117876716B - Image processing method using grouping wavelet packet transformation

Info

Publication number: CN117876716B
Application number: CN202410282726.2A
Authority: CN
Inventors: 郭锴凌; 龚楷钧; 徐向民
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2024-03-13
Filing date: 2024-03-13
Publication date: 2024-06-18
Anticipated expiration: 2044-03-13
Also published as: CN117876716A

Abstract

The invention discloses an image processing method utilizing packet wavelet packet transformation, relating to general image data processing or generation. The convolution operation in the convolution neural network model is replaced by a grouping wavelet packet transformation module, and the input feature map is transformed from a space domain to a frequency domain so as to obtain a frequency domain feature map; performing depth separable convolution on the frequency domain feature map by using convolution check in the frequency domain, and adding and splicing the output results to obtain a target frequency domain feature map; restoring the target frequency domain feature map to a space domain by using a grouping wavelet packet inverse transformation module to obtain an output feature map; and carrying out forward propagation on the output characteristic diagram by using a convolutional neural network model, and carrying out iterative training by using a gradient descent method to obtain an image classification model for classifying the images. The method can greatly improve the image classification precision, effectively reduce the quantity and calculated quantity of the convolutional neural network, enhance the channel information interaction and improve the network expression capability.

Description

Image processing method using grouping wavelet packet transformation

Technical Field

The present invention relates to image data processing or generation in general, and more particularly, to an image processing method using packet wavelet packet transform.

Background

In recent years, as the scale of large-scale deep neural networks is increasing, light-weight research of neural networks has attracted attention and is applied to various fields of computer vision. The neural network is light and used for reducing the low importance part in the network, reserving the high importance part in the network, improving the calculation speed and saving the calculation resources while maintaining the model reasoning accuracy, and is widely researched and applied to improving the performance of the modern deep neural network.

Some technologies use artificial design structures to construct a lightweight neural network ^[1][2], which uses lightweight convolution such as depth separable convolution and 1×1 convolution to construct a network, so as to reduce parameters and computational redundancy in the network, but still suffer from limitation of convolution operation on channel information extraction capability. The weight pruning is adopted in the technology to construct a lightweight neural network ^[3][4], the low-importance weight in the neural network is cut, the calculation cost of the neural network is reduced, the network performance is kept, but the processing speed is influenced by the irregular network structure after pruning, and the compression rate and the reasoning accuracy are also not strictly guaranteed.

The wavelet packet transformation ^[5] is a component part for decomposing signals into information with different scales, two-dimensional wavelet transformation is adopted to extract the spatial information ^[6][7] with different frequencies of images in certain technologies, and the characteristic map is preprocessed through the wavelet transformation, so that the network can extract the characteristic map information more easily, the neural network can achieve the same expression capacity as the original one by using fewer convolution operations, and the overall light weight of the network is realized. Although wavelet transform is remarkable in spatial information processing, it is not effective in channel information processing.

[1]Wang X, Chu X, Han C, et al. SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 731-741.

[2]Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589.

[3]Ye Y, You G, Fwu J K, et al. Channel pruning via optimal thresholding[C]//Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V 27. Springer International Publishing, 2020: 508-516.

[4]Guo Y, Yuan H, Tan J, et al. Gdp: Stabilized neural network pruning via gates with differentiable polarization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 5239-5250.

[5]Shensa M J. The discrete wavelet transform: wedding the a trous and Mallat algorithms[J]. IEEE Transactions on signal processing, 1992, 40(10): 2464-2482.

[6]Ramamonjisoa M, Firman M, Watson J, et al. Single image depth prediction with wavelet decomposition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 11089-11098.

[7]Liu P, Zhang H, Lian W, et al. Multi-level wavelet convolutional neural networks[J]. IEEE Access, 2019, 7: 74973-74985.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image processing method utilizing grouping wavelet packet transformation, which effectively reduces the quantity and calculated amount of convolutional neural network parameters, enhances channel information interaction and improves network expression capability.

The invention relates to an image processing method using grouping wavelet packet transformation, which comprises the following steps:

Step one, constructing an image data set, determining a channel compression rate, and determining the length r of a grouping wavelet packet transformation module group according to the channel compression rate;

Step two, inputting an input image into a convolutional neural network, and replacing part of convolutional operations in a convolutional neural network model with a grouping wavelet packet transformation module, wherein in the grouping wavelet packet transformation module, an input feature map of the convolutional operations to be replaced is set as X, and the input feature map X is transformed from a space domain to a frequency domain so as to obtain a frequency domain feature map Y;

step three, the frequency domain feature map Y is subjected to depth separable convolution by using convolution check in the frequency domain, and the output results are added and spliced to obtain a target frequency domain feature map M with channel information and spatial information aggregated;

Restoring the target frequency domain feature map M to a space domain by using a grouping wavelet packet inverse transformation module to obtain an output feature map Q;

and fifthly, carrying out forward propagation on the output feature map Q by using a convolutional neural network model, and carrying out iterative training by using a gradient descent method to obtain an image classification model for classifying images.

Further improvement, the channel compression rate in the first step is determined according to the parameter quantity target of the model.

In a further improvement, in the second step, the transforming the input feature map X from the spatial domain to the frequency domain, specifically includes,

Setting a replaced convolution operation input dimension Cin; dividing the input characteristic diagram X into Cin/r groups along the channel dimension according to the length r of a grouping wavelet packet transformation module group to obtain a grouped input characteristic diagram X ', transforming the vector of the grouped input characteristic diagram X' to a frequency domain through grouping wavelet packet transformation, and transforming the input characteristic diagram X into grouping wavelet packets through convolution operation.

Further, the vector of the input characteristic diagram X' after grouping is changed to a frequency domain through grouping wavelet packet transformation, specifically,

Corresponding each spatial position (h, w) of the grouped input feature map XAs an input vector of the grouping wavelet packet transformation module, the vector of the grouped input feature map X' is transformed into the frequency domain by the following formula:

；

Wherein, Representing the input feature map/>, after the i-th group groupingWavelet packet transformation results of the h-th row and w-th column vectors; i=1..cin/r; H. w represents two dimensions of space; DWT () represents the wavelet packet transform.

Further, the convolution operation is specifically that,

The feature extraction operation of each level of the grouping wavelet packet transformation module is represented by a matrix, and the matrix of each level is multiplied according to the level sequence to obtain a transformation matrix; Sharing the transformation matrix to the grouped input feature graphs X' of the Cin/r group, and splicing the shared transformation matrix in a first dimension to obtain a convolution kernel W ¹ of 1X 1 groups;

the input feature map X is subjected to a group convolution by the convolution kernel W ¹:

；

Wherein, A packet convolution representing a number of packets of Cin/r; concat (-) represents a splice operation along the channel dimension; * Representing a convolution operation; /(I)Representing input feature graphs taken along a channel dimensionSubscript the portions from the (r×i+1) to the (r×i+r); y represents a frequency domain feature map.

Further, the third step comprises, in particular,

The first step, the frequency domain feature map Y is grouped into Cin/r, and the frequency domain feature map Y _i and the convolution kernel W ⁱ are subjected to depth separable convolution by the following formula:

；

Wherein, Z _i represents an output frequency domain feature map of the i-th group of frequency domain feature map Y _i and a convolution kernel W ⁱ which are subjected to depth separable convolution; depthwise represents a depth separable convolution with the convolution kernel feature map;

step two, adding all the output frequency domain feature graphs Z _i to obtain a frequency domain feature graph Z with the channel information aggregated;

Thirdly, repeating the Cout/r times in the first step and the second step to obtain Cout/r frequency domain feature graphs Z with the channel information aggregated; splicing Cout/r frequency domain feature graphs Z aggregated with channel information along the channel dimension to obtain a target frequency domain feature graph M;

Where Cout represents the convolution operation output dimension that is replaced.

Further, the fourth step, specifically includes,

Dividing the target frequency domain feature map M into Cout/r groups according to the length r of a convolution group of a grouping wavelet packet transformation module in a first dimension to obtain a grouped target frequency domain feature map M';

Corresponding each spatial position (h, w) of the grouped target frequency domain feature map M As an input vector of the grouping wavelet packet inverse transformation module, the vector of the grouped target frequency domain feature map M' is subjected to inverse wavelet packet transformation back to the spatial domain by the following formula:

；

Wherein IDWT (. -%) represents the inverse haar wavelet packet transform; representing the j-th set of target frequency domain feature maps/> The spatial domain results of the wavelet packet inverse transform for the h-th row and w-th column vectors;

the characteristic extraction operation of each level of the grouping wavelet packet inverse transformation module is represented by a matrix, and the matrix of each level is multiplied according to the level order to obtain an inverse transformation matrix Inverse transform matrix/>Sharing the target frequency domain feature map M' after Cout/r grouping, and splicing the shared transformation matrix in a first dimension to obtain a1 multiplied by 1 grouping convolution kernel W ²;

And carrying out grouping convolution on the target frequency domain feature map M through the convolution kernel W ²:

；

Wherein, Representing taking a frequency domain feature map/>, along a channel dimensionSubscripts are parts of the (r.times.i. +1) to (r.times.i. +r).

Further, the group length r is set according to a parameter amount target of the image classification model, and the parameter amount of the grouping wavelet packet transformation module is 1/r of the convolution operation to be replaced.

Still further, the group length r is set to be a common divisor of the input-output dimension size of the convolution operation replaced by the packet wavelet packet transform module.

Further improvements are made in that the input image is a preprocessed image; the preprocessing includes filling, cropping, flipping, and normalizing the image.

Advantageous effects

The invention has the advantages that:

(1) The present invention adopts the grouping wavelet packet transformation module to replace the convolution operation in the network, the parameter number and the calculation amount of the grouping wavelet packet transformation module are 1/r of the convolution operation replaced, the network redundancy is reduced, and the lightweight network is constructed.

(2) The existing weight pruning method can randomly generate irregular weights while constructing a lightweight network, is unfavorable for the deployment of the network in a hardware environment, and the grouping wavelet packet transformation module provided by the invention is structured operation, does not generate irregular weights and is convenient for hardware use.

(3) The invention adopts wavelet packet transformation to extract channel information, and improves the network channel information extraction capacity compared with convolution operation. Can be easily inserted into classical deep neural networks.

Drawings

FIG. 1 is a schematic diagram of the integration of a packet wavelet packet transform module with a convolutional neural network ResNet of the present invention;

FIG. 2 is a schematic diagram of a block convolution kernel construction flow in a block wavelet packet transformation according to the present invention;

FIG. 3 is a schematic diagram of a flow chart of the channel aggregation and spatial information in the frequency domain according to the present invention;

Fig. 4 is a schematic diagram of a block convolution kernel construction flow in the block wavelet packet inverse transformation according to the present invention.

Detailed Description

The invention is further described below in connection with the examples, which are not to be construed as limiting the invention in any way, but rather as falling within the scope of the claims.

The wavelet packet transform has spatial feature extraction capability, but its channel information extraction capability remains to be explored. The invention relates to an image processing method utilizing grouping wavelet packet transformation, which adopts a channel dimension grouping wavelet packet transformation mode to transform an input characteristic diagram into a frequency domain, uses convolution to check the frequency domain characteristic diagram in the frequency domain to carry out depth separable convolution so as to aggregate space and channel dimension information, uses grouping wavelet packet inverse transformation to restore the frequency domain characteristic diagram into the space domain, adds a grouping wavelet packet transformation module into a convolution neural network to train simultaneously, and finally obtains a lightweight convolution neural network.

Referring specifically to fig. 1-4, an image processing method using packet wavelet packet transform according to the present invention includes the steps of:

s1, constructing a data set, determining a channel compression rate, and determining the length r of the grouping wavelet packet transformation module group according to the channel compression rate.

In the invention, the disclosed image classification data set is divided into a training set and a testing set, and the images in the data set are subjected to preprocessing such as filling, cutting, overturning, normalization and the like. The training set is used for training the weight of the model and the structural parameters of the model, and the effect of the visual attention method is tested on the testing set. And meanwhile, setting the group length r in the grouping wavelet packet transformation module according to the parameter quantity target of the image classification model to be constructed. The group length r is required to be set as a common divisor of the input and output dimension of the convolution operation replaced by the grouping wavelet packet transformation module, and the group length r corresponds to the parameter number and the calculated amount of the grouping wavelet packet transformation module and is compressed to be 1/r of the replaced convolution operation. The smaller the group length r, the smaller the compression ratio; the larger the parameter number, the higher the accuracy.

S2, building a convolutional neural network model, and replacing partial convolutional operation in the convolutional neural network by using a packet wavelet packet transformation module. Then in the grouping wavelet packet transformation module, the input feature diagram of the convolution operation to be replaced is set as X, the input feature diagram X is transformed from a space domain to a frequency domain through the grouping wavelet packet transformation module, the feature diagram is grouped by convolution check in the frequency domain to carry out depth separable convolution, the channel and the space information are aggregated, and then the frequency domain feature diagram is restored to the space domain by the grouping wavelet packet inversion, wherein the specific flow is shown in figures 2, 3 and 4.

In the present invention, taking classical convolutional neural network ResNet as an example, the packet wavelet packet transform module replaces the convolutional operation of each basic block or bottleneck block. The basic block and bottleneck block structure is shown in fig. 1.

In S2, the specific content of transforming the input feature map X from the spatial domain to the frequency domain by the packet wavelet packet transform module of the channel dimension is: given the convolution operation input dimension Cin and output dimension Cout, which are replaced, an input feature map. Wherein H, W denotes two dimensions of space. Dividing the input feature map X into Cin/r groups along the channel dimension according to the group length r of the grouping wavelet packet transformation module to obtain a grouped input feature map/>And transforming the grouped input characteristic diagram X' to a frequency domain along the channel dimension by using a grouping wavelet packet transformation module.

In this embodiment, we will correspond to each spatial position (h, w) of the input signature X' after grouping along the channelsAs an input vector to the packet wavelet packet transform module. Wherein h=1,. -%, H; w=1. Group i input feature map/>, after groupingWavelet packet transform result/>, of the h-th row and w-th column vectorsCan be expressed as:

。

Where DWT (°) represents the packet wavelet packet transform.

The wavelet packet transform used in the present invention is a haar wavelet packet transform, which is defined as follows: dividing the vector with the length of N into two parts of odd number and even number, and calculating the average value and the difference value of the vectors of the two parts to obtain two new vectors with the length of N/2. Repeating the above steps, recursion processing is carried out on the new vectors until the length of each sub-vector is 1, and the sub-vectors are arranged to obtain the frequency domain wavelet coefficients.

In the present invention, 1×1 group convolution is used to implement the input feature mapA packet wavelet packet transform is performed. The construction method of the convolution kernel is as shown in fig. 2, the feature extraction operation of each level of the grouping wavelet packet transformation module is represented by a matrix, and the haar wavelet packet transformation matrix of each level is multiplied according to the level sequence to obtain a transformation matrix/>The transformation matrix is shared to a Cin/r group input feature diagram X', and the shared transformation matrix is spliced in a first dimension to obtain a 1X 1 grouping convolution kernel W ¹:

。

Wherein, ；/>Representing a stitching operation along a first dimension.

And finally, carrying out 1X 1 group convolution on the input feature map by using the constructed convolution kernel W ¹, wherein the group length of the group convolution is equal to the group length r of the group wavelet transformation module, the group number is Cin/r, and the frequency domain feature map Y is obtained, and the result is expressed as follows:

。

Wherein, A packet convolution representing a number of packets of Cin/r; concat (-) represents a splice operation along the channel dimension; * Representing a convolution operation; /(I)Representing input feature graphs taken along a channel dimensionSubscript the portions from the (r×i+1) to the (r×i+r); frequency domain feature map/>。

The method comprises the steps of carrying out depth separable convolution on a frequency domain characteristic Y diagram by convolution check in a frequency domain, and aggregating channel and space information, and specifically: and (3) utilizing the property that the space domain convolution is equal to the multiplication of the frequency domains, checking each group of frequency domain feature maps Y _i by using the convolution on the frequency domains to carry out the depth separable convolution, adding and splicing the results, and aggregating the space information and the channel information. The amount of parameters and computation is reduced compared to spatial domain convolution.

In the invention, the mechanism for aggregating spatial information and channel information is shown in FIG. 3, and the input frequency domain feature map is shown in the figureDividing the channel dimension into Cin/r groups according to the group length r of the grouping wavelet packet transformation module to obtain grouped Cin/r frequency domain feature graphs/>. Will/>Respectively with convolution kernel/>Performing depth separable convolution, aggregating spatial information, and performing i-th set of frequency domain feature graphs/>The output frequency domain feature map depth-separable convolutions with the convolution kernel W ⁱ can be expressed as:

。

Depthwise (x) represents that the convolution kernel W ⁱ space size, step length and edge complement 0 in the depth separable convolution are the same as the convolution operation replaced by the grouping wavelet packet transformation module by utilizing the convolution kernel feature map; ，/> Representing the two dimensions of the space after depth separable convolution.

Then adding each group of output frequency domain feature graphs, aggregating channel information to obtain the frequency domain feature graph aggregated with the channel information：

。

Repeating Cout/r times by the channel information processing and aggregation mode to obtain Cout/r frequency domain feature graphs aggregated with the channel information. All Cout/r frequency domain feature maps are spliced along the channel dimension to obtain a target frequency domain feature map M with the same dimension as the replaced convolution output feature map:

。

Wherein, ; The term "first dimension" refers to the first dimension.

The frequency domain characteristic diagram is restored back to the space domain by inverse transformation of a grouping wavelet packet, and the method specifically comprises the following steps: mapping the target frequency domain characteristic diagramDividing the first dimension into Cout/r groups according to the length r of the grouping wavelet packet transformation convolution group to obtain a target frequency domain characteristic diagram/>And grouping the target frequency domain feature map/>And performing inverse wavelet packet transformation in the channel dimension to restore the space domain. In the present embodiment, the grouped target frequency domain feature mapCorresponding/>, to each spatial position (h, w)As an input vector to the inverse packet wavelet packet transform module. J-th set of frequency domain feature maps/>Spatial domain results of wavelet packet inverse transform for h-th row and w-th column vectors of (c)Can be expressed as:

。

Wherein IDWT (°) represents the inverse haar wavelet packet transform. Namely, the inverse wavelet packet transformation in the invention is the inverse haar wavelet packet transformation, and is defined as follows: according to the mode that the Haer wavelet transformation decomposes the vector step by step from the high layer to the bottom layer, the Haer wavelet packet inverse transformation uses the mean value and the difference value of the lower layer to restore the upper layer vector step by step from the bottom layer to the high layer.

Finally, the space domain result is spliced in the first dimension to obtain the vector of the output characteristic diagram Q。

In the present invention, 1×1 block convolution is used to achieve a target frequency domain feature mapAnd carrying out inverse transformation on the grouping wavelet packet. The construction method of the convolution kernel is as shown in fig. 4, the feature extraction operation of each level of the grouping wavelet packet inverse transformation module is represented by a matrix, and the haar wavelet packet inverse transformation matrix of each level is multiplied according to the level sequence to obtain an inverse transformation matrix/>Sharing the inverse transformation matrix to all Cout/r feature maps, and splicing the shared transformation matrix in a first dimension to obtain a1×1 group convolution kernel W ²:

。

Wherein, ，/>Representing a stitching operation along a first dimension. And finally, carrying out 1×1 group convolution on the frequency domain feature map M by using the constructed convolution kernel W ², wherein the group length of the group convolution is equal to the group length r of the group wavelet transformation, the group number is Cout/r, and the frequency domain feature map Q is obtained, and the result is expressed as follows:

。

Wherein, A packet convolution representing a packet number Cout/r; concat (.) represents a splice operation along the channel dimension, x represents a convolution operation,/>Representing taking a frequency domain feature map along a channel dimensionSubscript (rxi+1) to (rxi+r) parts, and the obtained output spatial domain feature map。

The forward propagation of the model is continued for the output feature map Q and the model parameters are updated by the backward propagation. After the model is trained by using the gradient descent method, a general and efficient image classification network can be obtained. After the reference model is added with the combined multidimensional visual attention method, the image classification accuracy can be improved under the condition of greatly reducing the model parameter quantity and the calculated quantity.

The present invention also provides a storage medium, which may be a storage medium such as a ROM, a RAM, a magnetic disk, or an optical disk, and the storage medium stores one or more programs that, when executed by a processor, implement an image processing method using packet wavelet packet transformation provided in the above embodiment.

The invention also provides a device, which can be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer or other terminal devices with display function, wherein the computing device comprises a processor and a memory, the memory stores one or more programs, and when the processor executes the programs stored in the memory, the image processing method using the grouping wavelet packet transformation provided by the embodiment is realized.

While only the preferred embodiments of the present invention have been described above, it should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these do not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. An image processing method using packet wavelet packet transformation, comprising the steps of:

Step two, inputting an input image into a convolutional neural network, and replacing convolutional operation in a convolutional neural network model with a grouping wavelet packet transformation module, wherein in the grouping wavelet packet transformation module, an input feature map of the convolutional operation to be replaced is set as X, and the input feature map X is transformed from a space domain to a frequency domain so as to obtain a frequency domain feature map Y;

Step five, forward propagation of the convolutional neural network model is carried out on the output feature map Q, iterative training is carried out by using a gradient descent method, and an image classification model for classifying images is obtained;

In step two, the input feature map X is transformed from the spatial domain to the frequency domain, specifically comprising,

Setting a replaced convolution operation input dimension Cin; dividing the input characteristic diagram X into Cin/r groups along the channel dimension according to the length r of a grouping wavelet packet transformation module group to obtain a grouped input characteristic diagram X ', transforming the vector of the grouped input characteristic diagram X' to a frequency domain through grouping wavelet packet transformation, and transforming the input characteristic diagram X into grouping wavelet packets through convolution operation;

the vector of the grouped input feature map X' is changed to the frequency domain by grouping wavelet packet transformation, specifically,

；

Wherein, Representing the input feature map/>, after the i-th group groupingWavelet packet transformation results of the h-th row and w-th column vectors; i=1..cin/r; h=1, a, H is formed; w=1.. W is a metal; H. w represents two dimensions of space; DWT (-) represents wavelet packet transform;

the convolution operation is specifically described as,

；

Wherein, A packet convolution representing a number of packets of Cin/r; concat (-) represents a splice operation along the channel dimension; * Representing a convolution operation; /(I)Representing input feature graphs taken along a channel dimensionSubscript the portions from the (r×i+1) to the (r×i+r); y represents a frequency domain feature map;

The third step, specifically comprises the steps of,

；

where Cout represents the convolution operation output dimension that is replaced;

the fourth step, specifically comprises the steps of,

；

Wherein, Representing taking a frequency domain feature map/>, along a channel dimensionSubscript the portions from the (r×i+1) to the (r×i+r); /(I)Representing two dimensions of space after depth separable convolution;

The group length r is set according to the parameter targets of the image classification model, and the parameter of the grouping wavelet packet transformation module is 1/r of the convolution operation to be replaced.

2. An image processing method using a packet wavelet packet transform according to claim 1, wherein said channel compression rate in step one is determined based on a parameter amount target of a model.

3. An image processing method using a packet wavelet packet transform according to claim 1, wherein the group length r is set to be a common divisor of the input-output dimension size of the convolution operation replaced by the packet wavelet packet transform module.

4. An image processing method using a packet wavelet packet transform according to claim 1, wherein said input image is a preprocessed image; the preprocessing includes filling, cropping, flipping, and normalizing the image.