CN110309837B

CN110309837B - Data processing method and image processing method based on convolutional neural network characteristic diagram

Info

Publication number: CN110309837B
Application number: CN201910603672.4A
Authority: CN
Inventors: 马宁宁
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2021-07-06
Anticipated expiration: 2039-07-05
Also published as: CN110309837A

Abstract

The invention relates to and provides a data processing method, a data processing device and an image processing method based on a feature map of a convolutional neural network. The method comprises the following steps: acquiring data to be processed, and performing first convolution operation on the data to be processed to obtain an original characteristic diagram; rearranging the elements of the original feature map according to a first rule and a second rule on the dimensions of the length and the width of the original feature map respectively; carrying out region division on the rearranged feature graph to obtain N feature subgraphs; splicing the N characteristic subgraphs in a channel dimension; performing convolution operation according to the spliced rearrangement tensor structure to obtain a rearranged convolution result; reducing the rearranged convolution result, and reducing each element in the rearranged convolution result into an arrangement mode of the length dimension and the width dimension of the original characteristic diagram to obtain rearranged characteristic data; and carrying out data processing according to the characteristic data after the rearrangement processing. And forming a new convolution structure to improve the network performance.

Description

Data processing method and image processing method based on convolutional neural network characteristic diagram

Technical Field

The invention relates to the technical field of deep learning, in particular to a data processing method and device based on a convolutional neural network characteristic diagram and an image processing method.

Background

With the increasing computer power and the continuous improvement of data processing methods, the application of neural network technology, especially deep learning, has become a research hotspot in recent years and has enjoyed great commercial success. Among them, computer vision and image processing algorithms based on deep learning are the most mature in technology and the most widely applied. Convolutional Neural Networks (CNNs) are one of the most widely and successfully used networks in current computer vision tasks due to their excellent performance of network structure in image data processing and pattern recognition.

When selecting to use CNN for image processing tasks, three criteria are typically considered: accuracy, simulation speed, and memory consumption. The performance indicators are directly related to the model structure, and the performance indicators are balanced by different CNN networks. With the enhancement of the processing capability of the mobile terminal, the CNN model based on the mobile terminal is applied from scratch, and the demand is increasing day by day. For the application scenario of the mobile terminal, two indexes of simulation speed and memory consumption are more important.

In order to be suitable for various computer vision and image processing applications of a mobile terminal, some technical schemes for overall design improvement of the CNN are proposed in recent years, and on the premise of not losing too much precision, the CNN simulation running time can be shortened, and the memory consumption can be reduced, so that the CNN simulation running method is suitable for light-weight application. The technical scheme of the light weight comprises the following steps: MobileNets using a deep separable convolution technique; XNOR-Net using binary convolution (i.e., the convolution kernel has only two values of-1 or 1); and removing part of the structure of the CNN model to reduce simulation running time, Network Pruning and the like of memory consumption.

In particular, the shuffle net proposed by sun sword et al uses point-by-point group convolution and channel shuffle techniques to greatly reduce the computational cost, while the accuracy of the network model is better than that of Mobile Nets. However, the current related channel rearrangement techniques are rearrangement performed in channel dimensions, which has reached a bottleneck in improving the computational performance, and is difficult to further improve. New ideas are needed to further improve the model structure of CNN.

Disclosure of Invention

The present invention is directed to solve at least one of the technical problems in the related art to a certain extent, and provides a data processing method based on a convolutional neural network feature map, which includes:

acquiring data to be processed, and performing first convolution operation on the data to be processed to obtain an original characteristic diagram;

rearranging the elements of the original feature map according to a first rule and a second rule on the dimensions of the length and the width of the original feature map to obtain a rearranged feature map;

carrying out region division on the rearranged feature graph to obtain N feature subgraphs, wherein N is a natural number;

splicing the N characteristic subgraphs in a channel dimension according to a third rule to obtain a rearrangement tensor structure with increased channel number;

performing convolution operation according to the rearrangement tensor structure to obtain a rearranged convolution result;

restoring the rearranged convolution result according to the first rule, the second rule and the third rule, and restoring each element in the rearranged convolution result into the arrangement mode of the length dimension and the width dimension of the original feature map to obtain rearranged feature data;

and carrying out data processing according to the characteristic data after the rearrangement processing.

In some embodiments, in the step of rearranging elements of the original feature map according to a first rule and a second rule in the length and width dimensions of the original feature map respectively to obtain a rearranged feature map, the first rule includes: on the length dimension of the original characteristic diagram, elements are divided into Mh groups according to the principle that coordinates are congruence with natural numbers Mh, and the elements of the Mh groups are arranged by taking the elements of the Mh groups as a unit.

In some embodiments, the natural number Mh is an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the length dimension of the original feature map, and the odd group and the even group are staggered.

In some embodiments, in the step of rearranging elements of the original feature map according to a first rule and a second rule in the length and width dimensions of the original feature map respectively to obtain a rearranged feature map, the second rule includes: on the width dimension of the original characteristic diagram, elements are divided into Mw groups according to the principle that coordinates are congruence with natural numbers Mw, and the elements in the Mw groups are arranged in units.

In some embodiments, the natural number Mw is an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the width dimension of the original feature map, and the odd and even groups are staggered.

In some embodiments, the performing region division on the rearranged feature map to obtain N feature sub-maps includes: and carrying out equal-area division on the rearranged feature graph to obtain N feature subgraphs with equal areas.

In some embodiments, the performing region division on the rearranged feature map to obtain N feature sub-maps includes: and determining the region division mode of the rearranged feature graph and the number N of feature subgraphs according to the first rule and the second rule.

In some embodiments, the performing a convolution operation according to the rearranged tensor structure to obtain a rearranged convolution result includes: extracting the features of each feature subgraph by using depth separation convolution; and performing point convolution with convolution kernel of 1 on every N points of the features of the N feature subgraphs extracted by the deep separation convolution to obtain a rearranged convolution result.

In some embodiments, said extracting features of each said feature sub-graph with depth-separated convolution comprises: extracting features of each of the feature sub-graphs using a depth-separated convolution with a convolution kernel of 3 x 3.

To achieve the above object, an embodiment of the first aspect of the present invention further provides a data processing apparatus based on a convolutional neural network feature map, which includes:

the data acquisition module is used for acquiring data to be processed and carrying out first convolution operation on the data to be processed to obtain an original characteristic diagram;

the characteristic diagram rearranging module is used for rearranging the elements of the original characteristic diagram according to a first rule and a second rule on the length dimension and the width dimension of the original characteristic diagram to obtain a rearranged characteristic diagram;

the characteristic diagram dividing module is used for carrying out region division on the rearranged characteristic diagram to obtain N characteristic sub-diagrams, wherein N is a natural number;

the channel splicing module is used for splicing the N characteristic subgraphs in a channel dimension according to a third rule to obtain a rearrangement tensor structure with increased channel number;

the convolution operation module is used for carrying out convolution operation according to the rearrangement tensor structure to obtain a rearranged convolution result;

the structure reduction module is used for reducing the rearranged convolution result according to the first rule, the second rule and the third rule, and reducing each element in the rearranged convolution result into an arrangement sequence corresponding to the length dimension and the width dimension of the original feature map to obtain rearranged feature data;

and the data processing module is used for processing data according to the rearranged characteristic data.

By using the data processing method or device based on the convolutional neural network characteristic diagram, a new convolutional structure is formed to replace the traditional convolutional structure by carrying out rearrangement operation on the characteristic diagram, so that the identification effect can be improved; and improve network performance without increasing the number of neural network computations (FLOPs). In addition, most importantly, the method of the invention does not need to rewrite convolution kernels, and only needs to perform simple feature map rearrangement operation on tensor (tensor), so that the network structure of the invention can be applied in the field of image recognition, has good generalization capability, and can be rapidly applied to any other existing convolution neural network structures.

To achieve the above object, an embodiment of the second aspect of the present invention provides an image processing method based on a convolutional neural network, where the convolutional neural network includes at least one feature map rearrangement convolutional layer, and the feature map rearrangement convolutional layer includes at least one computing unit for implementing the data processing method based on the feature map of the convolutional neural network according to the first aspect of the present invention.

In some embodiments, the structure of the convolutional neural network is a deep learning network of a rearranged network structure, and the feature map processing method is applied in conjunction with the rearranged network channel rearrangement.

According to the image processing method based on the convolutional neural network, not only can channel dimensionality be rearranged, but also rearrangement operation can be carried out on the characteristic diagram, a new convolutional structure is formed to replace a traditional convolutional structure, and therefore the identification effect can be improved; and improve network performance without increasing the number of neural network computations (FLOPs). In addition, most importantly, the method of the invention does not need to rewrite convolution kernel, and only needs to perform simple feature map rearrangement operation on tensor (tensor), so that the network structure of the invention can be applied in the field of image recognition, has good generalization capability, and can be rapidly applied to any other application.

To achieve the above object, an embodiment of the third aspect of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements a data processing method based on a feature map of a convolutional neural network according to the first aspect of the present invention or implements an image processing method based on a convolutional neural network according to the second aspect of the present invention.

To achieve the above object, an embodiment of a fourth aspect of the present invention provides a computing device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the data processing method based on the feature map of the convolutional neural network according to the first aspect of the present invention, or to implement the image processing method based on the convolutional neural network according to the second aspect of the present invention.

According to the non-transitory computer-readable storage medium and the computing device of the present invention, the data processing method based on the feature map of the convolutional neural network according to the first aspect of the present invention and the image processing method based on the convolutional neural network according to the second aspect of the present invention have similar advantageous effects, and are not described in detail herein.

Drawings

FIG. 1 is a schematic diagram of the operation principle of ShuffleNet;

FIG. 2 is a schematic diagram of a tensor data structure (n, c, h, w);

FIG. 3 is a schematic flow chart diagram of a feature map processing method according to an embodiment of the invention;

FIG. 4 is a diagram illustrating grouping and reordering of feature maps in length and width dimensions according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of channel splicing according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of reduction of a characteristic subgraph rearrangement convolution according to an embodiment of the invention;

FIG. 7 is a block diagram of a feature map processing apparatus according to an embodiment of the present invention;

FIG. 8a is a schematic structural diagram of a channel rearrangement calculation unit of ShuffleNet in the related art;

FIG. 8b is a schematic structural diagram of a computing unit for combined application of channel rearrangement and feature map processing of ShuffLeNet according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a computing device according to an embodiment of the invention.

Detailed Description

Embodiments in accordance with the present invention will now be described in detail with reference to the drawings, wherein like reference numerals refer to the same or similar elements throughout the different views unless otherwise specified. It is to be noted that the embodiments described in the following exemplary embodiments do not represent all embodiments of the present invention. They are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the claims, and the scope of the present disclosure is not limited in these respects. Features of the various embodiments of the invention may be combined with each other without departing from the scope of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

To better explain the technical solution of the present invention, first, a simple description is given of Shuffle Net (rearrangement network). The main idea at the beginning of the Shuffle Net design is to reduce the computational complexity by using point-by-point packet convolution as opposed to 1 × 1 convolution, and at the same time, to overcome the problem that the output is only related to the channels within the packet after packet convolution, a channel rearrangement technique is used to help the information flow between the feature channels.

Fig. 1 is a schematic diagram of the operation principle of Shuffle Net, in which two stacked convolutional layers are taken as an example, and each convolutional layer is a packet convolution. In sub-graph 110, a model structure of a normal packet convolution is shown, where two convolution layers have the same number of packets, each output channel is only related to the input channels in its group, and is not related to the input channels of other groups, and there is no information exchange between channels, which may seriously affect the learning accuracy. In sub-diagram 120, the input of the second packet convolution is shown from each different packet of the first packet convolution layer, the input and output channels achieving correlation. Sub-diagram 130 shows the way in which a channel shuffle (channel shuffle) technique is applied to achieve the correlation operation. The channel rearrangement only involves transposition operation (dimension shuffle) of dimension rearrangement, and almost no extra calculation is needed, so that the total calculation amount of the neural network is not increased (usually measured by floating point operand-FLOPs).

At present, in a lightweight design model, a channel shuffle (channel shuffle) technique is increasingly applied, and is particularly applied to applications of a given computational power scene of a mobile terminal such as an image recognition person, for example, in terminal applications such as biometric recognition and face unlocking. Various improved schemes are also proposed continuously, but so far, the improvement of various network structures is mostly in the channel dimension, and the identification effect is improved by increasing the diversity of different modules of the neural network.

The inventors have noted that currently the rearrangement technique is not applied to feature maps. On one hand, it is generally considered in the industry that it is difficult to improve the effect when operating on the feature map, and on the other hand, there is no efficient processing method for operating the feature map. In addition, a Receptive Field (Receptive Field) is one of the important problems of the convolutional neural network, is a key that a prediction result of the last layer can reflect the information of the whole original image as much as possible, and is influenced by parameters such as the size of a convolutional kernel and the step length, the Receptive Field in the CNN is often limited, and the problem of the limited Receptive Field is particularly serious in a grouped convolutional structure. The current channel rearrangement technology has no effective means in this aspect to improve the problem of limited scope of the receptive field caused by packet convolution and other reasons.

Based on the analysis, the invention provides a data processing method based on a feature map of a convolutional neural network, which solves the problems in the related art to a certain extent through rearrangement of the feature map.

Referring to fig. 2, fig. 2 is a schematic diagram of a tensor data structure used in the related art for describing a CNN network structure, where the data structure may be generally expressed as: tenor (n, c, h, w), where n denotes the number of profiles, c denotes the number of channels, h denotes the profile height, w denotes the profile width, and the data format is NCHW. Taking the data structure of fig. 2 as an example, the first element of the signature is, for example, 000, and the second element is along the w direction, i.e. 001, followed by 002, 003, and then along the h direction, i.e. 004, 005, 006, 007.

For convenience of explanation, the embodiments of the present invention will be described in conjunction with a description form of a four-dimensional tensor (n, c, h, w), and it will be understood by those skilled in the art that the description form of CNN is not limited thereto, and the scope of the present invention is not limited to the data structure form of the four-dimensional vector. The existing Shuffle Net and its modified structure represented by fig. 1 are usually rearranged in the c dimension, i.e. the channel dimension, and the present invention creatively proposes a rearrangement mode in the feature diagram dimensions, i.e. the height h dimension and the width w dimension.

Embodiments of the first aspect of the present invention provide a data processing method and apparatus based on a feature map of a convolutional neural network, where the data processing method may include a plurality of methods for processing specific data objects. For example, it may be an image data processing method, a language data processing method, a voice data processing method, a knowledge data processing method, etc., and any data type that can be modeled and calculated using a convolutional neural network may be subjected to the processing of intermediate data using the data processing method of the present invention.

Referring to fig. 3, fig. 3 is a schematic flow chart of a data processing method based on a convolutional neural network feature diagram according to an embodiment of the present invention. The data processing method based on the convolutional neural network feature map comprises steps S100 to S700.

Step S100, acquiring data to be processed, and performing first convolution operation on the data to be processed to obtain an original characteristic diagram.

Step S200, rearranging the elements of the original feature map according to a first rule and a second rule on the dimensions of the length and the width of the original feature map to obtain a rearranged feature map.

The main purpose of rearrangement is to rearrange the non-adjacent parts of the original characteristic diagram and then increase the correlation, so that after rearrangement, convolution can provide more information exchange paths among different channels, and therefore, all rearrangement rules which can achieve the purpose can be applied to the step.

For example, in some embodiments, a computationally simple way is proposed, and the first rule may be an equidistant grouping and rearrangement, in which elements are divided into Mh groups and arranged in units of Mh group elements according to a principle that coordinates are congruence with respect to a natural number Mh in a length dimension of the original feature map. Therefore, the calculated amount is extremely small, in each grouped group, the number of elements is basically the same, the error is not more than 1, and the operation such as channel splicing in the later period is facilitated.

Preferably, the natural number Mh is selected as an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the length dimension of the original feature map, and the odd group and the even group are arranged in a staggered mode. The operation is simple and convenient, and rearrangement transformation with good symmetry is facilitated.

In some embodiments, the second rule also divides elements into Mw groups and arranges the Mw groups in units of elements in accordance with the principle that coordinates are congruence with respect to natural numbers Mw in the width dimension of the original feature map.

Preferably, the natural number Mw is an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the width dimension of the original feature map, and the odd group and the even group are arranged alternately.

Referring to fig. 4, a characteristic diagram rearrangement manner is shown, taking h ═ w ═ 8 as an example. Wherein (h, w) dimensions of the input feature map are grouped in an odd-even manner: the h dimension is divided into odd and even groups, and the w dimension is also divided into odd and even groups. That is, the feature map may be considered to be equally divided in the length and width dimensions so as to take the remainder of 2, and a grouping form in which Mw is 2 and Mh is 2 may be adopted. After grouping, for h dimension of the feature map, odd groups (1, 3, 5, 7) are arranged on the left side, and even groups (2, 4, 6, 8) are arranged on the right side; the odd groups (1, 3, 5, 7) are arranged on the top and the even groups (2, 4, 6, 8) on the bottom for the w dimension of the feature map. Obviously, the rearrangement shown in fig. 4 is only an example of an arrangement manner that is more concise to implement, and other grouping manners and other arrangement manners may also be adopted.

And step S300, carrying out region division on the rearranged feature map to obtain N feature sub-maps, wherein N is a natural number.

The purpose of partitioning into multiple subgraphs is to splice in the channel dimension, so that the size consistency between the subgraphs will facilitate subsequent operations. Therefore, the rearranged feature graph can be divided into equal areas to obtain N feature subgraphs with equal areas. For example, the division is performed equally in both the w and h directions.

Note that the equal area includes a case where the feature sub-shape and the map area are completely the same, and an error is smaller than a predetermined threshold, for example, a case where the difference in the number of elements per side is not larger than 1 or a specified threshold. Since it is obvious that when the number of w and h of the feature map is not easy to divide evenly, the N feature sub-maps with approximately equal areas are obtained by roughly equally dividing. At this time, the subgraphs with the same area can be further obtained by performing operations such as edge completion on the subgraphs, so as to realize subsequent channel splicing.

Preferably, the method for performing region division on the rearranged feature graph and the number N of feature subgraphs can be determined according to the first rule and the second rule. For example, referring to fig. 4, the rearranged feature map is divided into regions by using a 2 × 2 equal-area division form that is the same as the first and second rules during feature map rearrangement, and referring to fig. 5, 2 × 2 division is performed on h × w to obtain 4 feature sub-maps. The size of each feature sub-image is 1/4 of the original image. In other embodiments of the present invention, other feature subgraphs which can be divided into integer numbers can also be divided. After the division is finished, the 4 characteristic subgraphs are placed on the dimension c for splicing, and the number of channels is changed to 4 times of the original number, namely 4 c.

And S400, splicing the N characteristic subgraphs in channel dimension according to a third rule to obtain a rearrangement tensor structure with increased channel number.

In this step, since the subsequent convolution operation is to perform convolution operation on each feature subgraph respectively, and is relatively independent, there is no special requirement for the third rule, and the N feature subgraphs can be spliced in the channel dimension according to any sequence.

Referring to FIG. 5, when a new tenar (n, 4c, h/2, w/2) is formed, the channel dimension becomes 4 times, and the length and width of the feature subgraph are 1/2 of the original feature graph respectively. In the embodiment of fig. 5, the 4 feature subgraphs are placed in the c-dimension, for example, in the order of top-left, top-right, bottom-left, and bottom-right.

And S500, performing convolution operation according to the rearrangement tensor structure to obtain a rearranged convolution result. In this step, corresponding convolution operation can be performed according to the model structure of the CNN.

In some embodiments, in order to obtain a better receptive field, the invention provides a convolution operation method combining deep separation convolution and point convolution. Firstly, extracting the features of each feature subgraph by using depth separation convolution; the original non-adjacent elements can be changed into adjacent elements through common convolution, and therefore the receptive field is enlarged. And then, performing point convolution with a convolution kernel of 1 on the features of the N feature subgraphs extracted by the deep separation convolution according to every N corresponding element points to obtain a rearranged convolution result, thereby reserving a small receptive field.

Thus, by extracting different receptive fields twice, combining the two operations is equivalent to simultaneously increasing both the large and small receptive fields, forming a convolution structure with both the large and small receptive fields preserved. The problem of limited receptive field is solved to a certain extent. The accuracy of the neural network is improved by improving the information fusion of the large and small receptive fields.

In some embodiments, for common image processing and other application scenarios, the depth-separated convolution with convolution kernel 3 × 3 may be used to extract the features of each of the feature subgraphs, so as to obtain a better effect.

After 3 × 3 depth separation convolution is performed on each feature subgraph (for example, the step length of the depth separation convolution is 1, the filling is 1), and every 4 point-wise convolution with a convolution kernel of 1 is performed, then every 4 elements corresponding to the same position are original adjacent 4 elements, and a small receptive field is reserved. For example, the default point convolution has a step size of 1 and a padding of 0. The tensor dimension after the convolution process remains unchanged, and is still tensor (n, 4c, h/2, w/2).

The selection of the above parameters is based on the following findings of the following inventors for constructing an efficient network architecture: (1) the same channel width minimizes Memory Access Cost (MAC), so a "balanced" convolution (same channel width) is used; (2) excessive group convolution increases the MAC, so the cost of using group convolution needs to be considered; (3) the fragmentation degree is reduced; (4) element level operations are reduced. And 3 x 3 of deep separation convolution and 1 x 1 of point convolution are used, so that a good calculation effect is achieved according to the design principle, and the optimal balance of the operation cost and the network effect is achieved.

Step S600, according to the first rule, the second rule and the third rule, reducing the rearranged convolution result, and reducing each element in the rearranged convolution result to the arrangement mode of the length and width dimensions of the original feature map, so as to obtain the rearranged feature data.

This step can be seen as the reverse application of the first and second rules and the third rule. And firstly, carrying out reverse merging operation according to a dividing mode of a third rule, carrying out tensor dimension reduction on the rearranged convolution result, and returning h and w to the dimension of the original characteristic diagram. And resetting the elements in h and w dimensions according to the first rule and the second rule, and restoring the arrangement mode of the elements.

Referring to fig. 6, the 4 feature subgraphs processed in fig. 5 can be restored to a feature graph, and then the h-dimension and the w-dimension are restored to the original sequence. Namely, reduction is carried out according to the splitting sequence of the characteristic subgraph, namely, the sequence of upper left, upper right, lower left and lower right is reduced into tenor (n, c, h, w). And reducing the h dimension and the w dimension into the original sequence (1, 2, 3, 4, 5, 6, 7, 8).

And step S700, performing data processing according to the rearranged characteristic data.

It is noted that any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and that the scope of the preferred embodiments of the present invention includes alternative implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

By using the data processing method based on the convolutional neural network characteristic diagram, a new convolutional structure is formed to replace the traditional convolutional structure by carrying out rearrangement operation on the characteristic diagram, so that the identification effect can be improved; and improve network performance without increasing the number of neural network computations (FLOPs). In addition, importantly, the method of the invention does not need to rewrite convolution kernel, and only needs to perform simple feature map rearrangement operation on tensor (tensor), so that the network structure of the invention can be applied in the field of image recognition, has good generalization capability, and can be rapidly applied to any other existing convolution neural network structures.

Thus, the data processing method and apparatus according to the embodiments of the present invention can be applied to various deep learning fields, not limited to image processing. For example, CNNs have currently revealed headings in Natural Language Processing (NLP) tasks, and feature map reordering would be beneficial in identifying correlations between words that are not contiguous in word order, particularly distant. Applying the feature map to NLP will bring beneficial improvement to the understanding of long sentences, paragraphs and even chapters by the neural network.

In addition, although the data processing method based on the characteristic diagram of the convolutional neural network is particularly suitable for the lightweight network of the mobile terminal, the data processing method can also be used for a server side to reduce the calculation amount obviously.

The invention also provides a data processing device 200 based on the convolutional neural network characteristic diagram, referring to fig. 7, the characteristic diagram processing device 200 includes a data acquisition module 210, a characteristic diagram rearrangement module 220, a characteristic diagram division module 230, a channel splicing module 240, a convolution operation module 250, a structure restoration module 260, and a data processing module 270.

The data obtaining module 210 is configured to obtain data to be processed, and perform a first convolution operation on the data to be processed to obtain an original feature map.

The feature map rearranging module 220 is configured to rearrange, according to a first rule and a second rule, elements of the original feature map in the length dimension and the width dimension of the original feature map, respectively, to obtain a rearranged feature map.

In some embodiments, the first rule may include: on the length dimension of the original characteristic diagram, elements are divided into Mh groups according to the principle that coordinates are congruence with natural numbers Mh, and the elements of the Mh groups are arranged by taking the elements of the Mh groups as a unit. The natural number Mh may be an integer multiple of 2, elements are divided into at least one odd group and at least one even group in the length dimension of the original feature map, and the odd group and the even group are arranged alternately.

In some embodiments, the second rule may include: on the width dimension of the original characteristic diagram, elements are divided into Mw groups according to the principle that coordinates are congruence with natural numbers Mw, and the elements in the Mw groups are arranged in units. The natural number Mw may be taken as an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the width dimension of the original feature map, and the odd group and the even group are staggered.

The feature map dividing module 230 is configured to perform region division on the rearranged feature map to obtain N feature sub-maps, where N is a natural number. For example, the rearranged feature map may be divided into equal areas to obtain N feature sub-maps with equal areas. Or, the manner of performing region division on the rearranged feature graph and the number N of feature subgraphs can be determined according to the first rule and the second rule.

The channel splicing module 240 is configured to splice the N feature sub-graphs in a channel dimension according to a third rule, so as to obtain a rearrangement tensor structure with an increased channel number.

The convolution operation module 250 is configured to perform convolution operation according to the rearrangement tensor structure to obtain a rearranged convolution result. The features of each feature sub-image can be extracted by depth separation convolution, and then the features of the N feature sub-images extracted by the depth separation convolution are subjected to point convolution with a convolution kernel of 1 according to every N corresponding element points to obtain a rearranged convolution result. In some embodiments, the features of each of the feature subgraphs may be extracted using a depth-separated convolution with a convolution kernel of 3 x 3.

The structure restoring module 260 is configured to restore the rearranged convolution result according to the first rule, the second rule, and the third rule, and restore each element in the rearranged convolution result to an arrangement sequence corresponding to the length and width dimensions of the original feature map, so as to obtain the feature data after the rearrangement processing.

The data processing module 270 is configured to perform data processing according to the rearranged feature data.

For a more specific implementation manner of each module of the data processing apparatus, reference may be made to the description of the data processing method of the present invention, and similar beneficial effects are obtained, and details are not repeated here.

The embodiment according to the second aspect of the invention provides an image processing method based on a convolutional neural network, so as to fully utilize the good effect of the data processing method in image processing. The convolutional neural network comprises at least one feature map rearrangement convolutional layer, and the feature map rearrangement convolutional layer comprises at least one computing unit for implementing the data processing method based on the feature map of the convolutional neural network.

Wherein the image processing method based on the convolutional neural network can be a deep learning method, and comprises a plurality of convolutional layers. The feature map rearrangement can be applied to one or more hidden layers, and is used for the purposes of increasing data association among different channels, reducing calculation amount and the like.

In some embodiments, the structure of the convolutional neural network may be a deep learning network of a Shuffle Net structure, and the feature map processing method is applied in combination with channel rearrangement of the Shuffle Net to achieve a better overall effect.

The effects of the present invention will be described below with reference to specific examples. Referring to fig. 8a and 8b, fig. 8a is a schematic structural view of a feature map calculation unit of shefflenet V2 (see document 2); fig. 8b is a schematic structural diagram of a computing unit that employs the feature map computing method of the present invention in the second channel. The entire neural network may comprise a stacked plurality of said computational units.

For further details on ShuffleNet, reference may be made to the description of the relevant literature, for example:

document 1: xiangyu Zhang et al, Shuffle Net: An extreme efficiency New functional Network for Mobile Devices, arXiv:1707.01083v 2;

document 2: ningning Ma et al, short Net V2: Practical Guidelines for Efficient CNN Architecture Design, arXiv 1807.11164V 1. Experiments show that higher accuracy can be obtained by using the data processing method based on the characteristic diagram of the convolutional neural network to replace the convolution operation of the second channel in the figure 8 a. Taking running ImageNet data set as an example, the Top-1 error rate (maximum error rate) of the original ShuffleNet V2 shown in FIG. 8a is 30.5, and the Top-1 error rate is 30.2 after replacing the present invention. On the premise of not increasing the calculated amount, the image recognition with higher precision is realized. Moreover, the convolution kernel is not required to be rewritten, only the characteristic diagram is rearranged, and good generalization capability is displayed.

An embodiment of the third aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a data processing method based on a feature map of a convolutional neural network according to an embodiment of the first aspect of the present invention; or to implement the image processing method based on the convolutional neural network according to the embodiment of the second aspect of the present invention.

Generally, computer instructions for carrying out the methods of the present invention may be carried using any combination of one or more computer-readable storage media. Non-transitory computer readable storage media may include any computer readable medium except for the signal itself, which is temporarily propagating.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages, and in particular may employ Python languages suitable for neural network computing and the tensorflow, PyTorch, etc. based platform frameworks. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Embodiments of the fourth aspect of the present invention provide a computer program product, wherein when the instructions of the computer program product are executed by a processor, the data processing method based on the feature map of the convolutional neural network according to embodiments of the first aspect of the present invention is implemented; or implementing the convolutional neural network-based image processing method according to an embodiment of the second aspect of the present invention.

An embodiment of a fifth aspect of the present invention provides a computing device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the data processing method based on the feature map of the convolutional neural network according to the first aspect of the present invention; or implementing the convolutional neural network-based image processing method according to an embodiment of the second aspect of the present invention.

The non-transitory computer-readable storage medium, the computer program product, and the computing device according to the third to fifth aspects of the present invention may be implemented with reference to the content specifically described in the embodiments according to the first aspect of the present invention, and have similar beneficial effects to the data processing method based on the feature map of the convolutional neural network according to the first aspect of the present invention and the image processing method based on the convolutional neural network according to the second aspect of the present invention, and will not be described herein again.

FIG. 9 illustrates a block diagram of an exemplary computing device suitable for use to implement embodiments of the present disclosure. The computing device 12 shown in FIG. 9 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in FIG. 9, computing device 12 may be implemented in the form of a general purpose computing device. Components of computing device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.

Computing device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computing device 12 may further include other removable/non-removable, volatile/nonvolatile computer-readable storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown, but commonly referred to as a "hard drive"). Although not shown in FIG. 9, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described in this disclosure.

Computing device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the computer system/server 12, and/or with any devices (e.g., network card, modem, etc.) that enable the computer system/server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, computing device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet via Network adapter 20. As shown, network adapter 20 communicates with the other modules of computing device 12 via bus 18. It is noted that although not shown, other hardware and/or software modules may be used in conjunction with computing device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing, for example, implementing the methods mentioned in the foregoing embodiments, by executing programs stored in the system memory 28.

The computing device of the invention may be a server or a terminal device with limited computing power, and the lightweight network architecture of the invention is particularly suitable for the latter. The base body implementation of the terminal device includes but is not limited to: intelligent mobile communication terminal, unmanned aerial vehicle, robot, portable image processing equipment, security protection equipment etc..

Although embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are illustrative and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An image data processing method based on a convolutional neural network feature map is characterized by comprising the following steps:

acquiring image data to be processed, and performing first convolution operation on the image data to be processed to obtain an original characteristic diagram;

rearranging the elements of the original feature map according to a first rule and a second rule on the length dimension and the width dimension of the original feature map respectively to obtain a rearranged feature map, wherein the first rule comprises the following steps: on the length dimension of the original characteristic diagram, dividing elements into Mh groups according to the principle that coordinates are congruence with respect to a natural number Mh, and arranging by taking the Mh group elements as a unit; the second rule includes: dividing elements into Mw groups according to the principle that coordinates are congruence with natural numbers Mw on the width dimension of the original characteristic diagram, and arranging the Mw groups by taking the elements as units;

and processing image data according to the rearranged characteristic data.

2. The method according to claim 1, wherein the natural number Mh is an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the length dimension of the original feature map, and the odd group and the even group are arranged alternately.

3. The convolutional neural network feature map-based image data processing method as claimed in claim 1, wherein the natural number Mw is an integer multiple of 2, the elements are divided into at least one odd group and at least one even group in the width dimension of the original feature map, and the odd group and the even group are staggered.

4. The image data processing method based on the convolutional neural network feature map of claim 1, wherein the region division of the rearranged feature map to obtain N feature sub-maps comprises:

and carrying out equal-area division on the rearranged feature graph to obtain N feature subgraphs with equal areas.

5. The image data processing method based on the convolutional neural network feature map of claim 1, wherein the region division of the rearranged feature map to obtain N feature sub-maps comprises:

and determining the region division mode of the rearranged feature graph and the number N of feature subgraphs according to the first rule and the second rule.

6. The method according to claim 1, wherein the performing convolution operations according to the rearrangement tensor structure to obtain the rearranged convolution result comprises:

extracting the features of each feature subgraph by using depth separation convolution;

and performing point convolution with convolution kernel of 1 on every N points of the features of the N feature subgraphs extracted by the deep separation convolution to obtain a rearranged convolution result.

7. The method for processing image data based on feature maps of convolutional neural networks as claimed in claim 6, wherein said extracting features of each of said feature sub-maps with depth-separation convolution comprises:

extracting features of each of the feature sub-graphs using a depth-separated convolution with a convolution kernel of 3 x 3.

8. An image data processing apparatus based on a convolutional neural network feature map, comprising:

the data acquisition module is used for acquiring image data to be processed and performing first convolution operation on the image data to be processed to obtain an original characteristic diagram;

a feature map rearrangement module, configured to rearrange, in the length dimension and the width dimension of the original feature map, elements of the original feature map according to a first rule and a second rule, respectively, to obtain a rearranged feature map, where the first rule includes: on the length dimension of the original characteristic diagram, dividing elements into Mh groups according to the principle that coordinates are congruence with respect to a natural number Mh, and arranging by taking the Mh group elements as a unit; the second rule includes: dividing elements into Mw groups according to the principle that coordinates are congruence with natural numbers Mw on the width dimension of the original characteristic diagram, and arranging the Mw groups by taking the elements as units;

and the data processing module is used for processing the image data according to the rearranged characteristic data.

9. An image processing method based on a convolutional neural network, which is characterized in that the convolutional neural network comprises at least one feature map rearrangement convolutional layer, and the feature map rearrangement convolutional layer comprises at least one computing unit for implementing the image data processing method based on the feature map of the convolutional neural network according to any one of claims 1 to 7.

10. The convolutional neural network-based image processing method as claimed in claim 9, wherein the structure of the convolutional neural network is a deep learning network of a rearranged network structure, and the image data processing method of the feature map is applied in conjunction with channel rearrangement of the rearranged network.

11. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for image data processing based on the feature map of the convolutional neural network according to any one of claims 1 to 7, or implements the method for image processing based on the convolutional neural network according to any one of claims 9 to 10.

12. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements a method of image data processing based on a feature map of a convolutional neural network as defined in any one of claims 1 to 7, or implements a method of image processing based on a convolutional neural network as defined in any one of claims 9 to 10.