CN111797972A

CN111797972A - Method, device and electronic system for processing data by using convolutional neural network

Info

Publication number: CN111797972A
Application number: CN202010465620.8A
Authority: CN
Inventors: 马宁宁
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2020-10-20

Abstract

The invention provides a method, a device and an electronic system for processing data by using a convolutional neural network, wherein the data can be input into the convolutional neural network trained in advance for convolutional processing after acquiring the collected data of a target object, wherein the convolutional neural network comprises at least one target convolutional layer, the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1, when the target convolutional layer receives the input data, the input data and the target convolution kernel are subjected to convolution operation and capacity expansion processing, and the target data is output; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result. The invention can reduce the time complexity under the condition of ensuring that the space complexity is not changed, thereby improving the calculation rate of the network.

Description

Method, device and electronic system for processing data by using convolutional neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a method, a device and an electronic system for processing data by using a convolutional neural network.

Background

Convolutional neural networks have been widely used in the fields of text recognition, speech recognition, and the like. In practical application, in order to improve the accuracy of the output result of the convolutional neural network, a commonly used method is to increase the parameter amount, but the time complexity is inevitably increased due to the increase of the parameter amount, so that the calculation rate of the network is reduced.

Disclosure of Invention

Accordingly, the present invention is directed to a method, an apparatus and an electronic system for processing data by using a convolutional neural network, so as to alleviate the above technical problems.

In a first aspect, an embodiment of the present invention provides a method for processing data by using a convolutional neural network, where the method includes: acquiring collected data of a target object; inputting the collected data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1; when the target convolution layer receives input data, carrying out convolution operation and capacity expansion processing on the input data and a target convolution kernel, and outputting target data; the target data comprises convolution result units and capacity expansion unit groups inserted into adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the step of performing convolution operation and capacity expansion processing on the input data and the target convolution kernel, and outputting the target data includes: taking each data in the input data as convolution center data one by one, and for each convolution center data, performing the following operations to obtain target data: taking the convolution center data as a convolution center, and performing convolution operation on the convolution center data and a target convolution kernel to obtain a current convolution result; reading insertion data corresponding to convolution center data from input data based on the convolution step of the target convolution kernel; inserting the insertion data into a position corresponding to the current convolution result according to the position relation of the insertion data and the convolution center data; the position of the current convolution result is a convolution result unit of the target data, the position of the insertion data is a capacity expansion unit of the target data, and the capacity expansion unit corresponding to each current convolution result is a capacity expansion unit group.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the target convolution kernel is an N × N convolution kernel; n is an odd number greater than or equal to 3; a step of reading insertion data corresponding to convolution center data from input data based on a convolution step of a target convolution kernel, including: determining a spanning data area corresponding to convolution center data in input data based on the convolution step of a target convolution kernel; the data in the span data area is taken as insertion data corresponding to the convolution center data.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the step of performing convolution operation and capacity expansion processing on the input data and the target convolution kernel, and outputting the target data includes: carrying out convolution operation on input data and a target convolution kernel to obtain a convolution result matrix; based on the convolution step length of a target convolution kernel, splitting the position of each convolution result in a convolution result matrix into a convolution result unit and an expansion unit group, so that the matrix size of the split convolution result matrix is the same as the matrix size of input data; wherein the convolution result unit stores a convolution result; taking each convolution result unit as a target unit one by one, and executing the following operations on each target unit to obtain target data: reading insertion data corresponding to the target unit from the input data based on the convolution step of the target convolution kernel; and adding the insertion data to the capacity expansion unit group corresponding to the target unit.

With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the target convolution kernel is an N × N convolution kernel; wherein N is an odd number greater than or equal to 3; the step of reading insertion data corresponding to the target cell from the input data based on the convolution step size of the target convolution kernel includes: determining a spanning data area corresponding to a target unit in input data based on the convolution step of a target convolution kernel; the data in the span data area is taken as insertion data corresponding to the target unit.

With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the step of acquiring the acquired data of the target object includes: acquiring image data of a target object through a camera; wherein the target object is one of the following objects: a human body, a human body part, a vehicle or a license plate.

With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the convolutional neural network is used to perform a detection or identification process on a target object.

In a second aspect, an embodiment of the present invention further provides an apparatus for processing data by using a convolutional neural network, where the apparatus includes: the acquisition module is used for acquiring the acquisition data of the target object; the processing module is used for inputting the acquired data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1; the operation module is used for performing convolution operation and capacity expansion processing on the input data and the target convolution kernel when the target convolution layer receives the input data and outputting the target data; the target data comprises convolution result units and capacity expansion unit groups inserted into adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group; and the output module is used for inputting the target data to the next convolution layer of the target convolution layer and continuing subsequent operation until the convolution neural network outputs a final result.

In a third aspect, an embodiment of the present invention further provides an electronic system, where the electronic system includes: the device comprises an image acquisition device, a processing device and a storage device; the image acquisition equipment is used for acquiring an image to be detected; the storage means has stored thereon a computer program which, when run by a processing device, performs the above-described method of processing data using a convolutional neural network.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processing device to perform the above-mentioned steps of the method for processing data by applying a convolutional neural network.

The embodiment of the invention has the following beneficial effects:

the method, the device and the electronic system for processing data by using the convolutional neural network can input data into the pre-trained convolutional neural network for convolutional processing after acquiring the acquired data of a target object, wherein the convolutional neural network comprises at least one target convolutional layer, the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1, when the target convolutional layer receives the input data, the input data and the target convolution kernel are subjected to convolution operation and capacity expansion processing, and the target data is output; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result. According to the data processing mode, after convolution operation and capacity expansion processing are carried out on input data by using the target convolution kernel with the convolution step length larger than 1, the matrix size of the target data is the same as that of the input data, convolution calculation is not needed for capacity expansion processing data, and therefore time complexity is reduced under the condition that the matrix size of the target data is the same as that of the input data, and therefore the calculation rate of a network is improved.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for processing data by using a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a convolution result according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for processing data using a convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a flow chart of another method for processing data using a convolutional neural network according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating an alternative convolution structure according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a structure of another convolution result according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a resnet network according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an apparatus for processing data by using a convolutional neural network according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic system according to an embodiment of the invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (fed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep). Because the convolutional neural network has the characteristic learning capability and can carry out translation invariant classification on input information according to the hierarchical structure, the convolutional neural network is widely applied to processing calculation of target detection and identification. In order to measure the performance of a trained convolutional neural network, time complexity and space complexity are generally used as indexes for evaluating the performance of the convolutional neural network.

The time complexity determines the training/prediction time of the model, if the complexity is too high, a great deal of time is consumed for model training and prediction, so that not only can the idea be quickly verified and the model be improved, but also the model can not be quickly predicted; the space complexity determines the number of parameters of the model, and due to the limitation of dimensionality, the more the parameters of the model are, the larger the data size required for training the model is, while the data set in real life is usually not too large, which can cause the model to be easier to over-fit for training; therefore, the above-described temporal complexity and spatial complexity are often used as one of criteria for evaluating a neural network.

In actual use, in order to improve the accuracy of the output result of the convolutional neural network, a common method is to increase the parameter amount, but since the increase of the parameter amount inevitably increases the time complexity, the increase of the parameter amount cannot be achieved without increasing the time complexity, for the convenience of understanding, the description of the time complexity and the space complexity is performed by taking as an example that the input data is c × h × w (1 × 6), the convolution kernel of the convolutional neural network is k × k (3 × 3), and the padding — 1 step size is 1 and 2 respectively, where w represents the width of the input data, h represents the height of the input data, and c represents the number of channels; when the step size is 1, because the size of the matrix of the input data is the same as that of the matrix of the output data, the time complexity is c × h ' × k ═ c × h × w × k ═ 324, and the space complexity, that is, the parameter quantity is k × c ═ 9, wherein h ' is the height of the output data, and w ' is the width of the output data; when the step size is 2, the matrix of the output data is half of the matrix of the input data, and since the data matrix is proportional to the time complexity, when the matrix of the output data with the step size of 2 is the same as the matrix of the output data with the step size of 1, the time complexity with the step size of 2 is greater than the time complexity with the step size of 1.

In summary, the time complexity is also increased while the parameter amount is increased, and in order to reduce the time complexity and increase the network computation rate without reducing the space complexity, embodiments of the present invention provide a method, an apparatus, and an electronic system for processing data by using a convolutional neural network, which can alleviate the above technical problems, and are described below with embodiments.

The present embodiment provides a method for processing data by using a convolutional neural network, and referring to a flowchart of a method for processing data by using a convolutional neural network shown in fig. 1, the method specifically includes the following steps:

step S102, acquiring acquisition data of a target object;

the collected data is image data of a target object obtained by shooting through a camera, wherein the target object is an object to be identified or detected and can be one of the following objects: a human body, a human body part, a vehicle or a license plate, but not limited to the above.

Step S104, inputting the collected data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

the pre-trained convolutional neural network is used for detecting or identifying a target object, generally, attributes such as brightness and contrast of an image have a very large influence on the image, and the same object has a very large difference between different brightness and contrast, however, in many problems of image identification and detection, these factors should not influence the final identification and detection result, and by preprocessing the image, the convolutional neural network can be prevented from being influenced by irrelevant factors as much as possible.

The pre-trained convolutional neural network is obtained by training a convolutional neural network by using a training data sample of a target object, a convolutional layer where a convolutional kernel with a convolutional step length larger than 1 is located in the convolutional neural network is set as a target convolutional layer, and the convolutional kernel of the target convolutional layer is called a target convolutional kernel; in practical use, a plurality of stacked convolutional layers can be arranged in the convolutional neural network in order to sufficiently extract the data features, and convolution operation is carried out on the data by utilizing the convolution kernel of the convolutional layer to realize the extraction of the data features.

Step S106, when the target convolution layer receives the input data, carrying out convolution operation and capacity expansion processing on the input data and the target convolution kernel, and outputting the target data; the target data comprises convolution result units and capacity expansion unit groups inserted into adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group;

the input data is data input into the target convolutional layer, and if the target convolutional layer is on the first layer of the convolutional neural network, the input data is the preprocessed acquired data; if the target convolutional layer is not in the first layer of the convolutional neural network, the input data is the output result of the layer above the target convolutional layer, and the layer above the target convolutional layer may be a convolutional layer, a pooling layer, or the like.

After the target convolutional layer receives the input data, carrying out convolution operation on the input data by using a target convolutional core to obtain a convolution result, and carrying out capacity expansion processing on the convolution result by using the target convolutional layer to obtain target data processed by the target convolutional layer; the capacity expansion processing is processing of expanding a matrix of convolution result data to the same size as a matrix of input data, because a convolution step corresponding to a target convolution kernel is larger than 1, the dimension of the data matrix after convolution operation is lower than the dimension of the matrix of the input data, and the capacity expansion processing is directly inserting capacity expansion data in the adjacent position of the convolution result, wherein the capacity expansion data is input data which is not subjected to convolution operation by the convolution kernel, therefore, the target data comprises a convolution result unit and capacity expansion unit groups inserted in the adjacent positions of the convolution result units, wherein the convolution result unit stores convolution results, and the capacity expansion unit groups store capacity expansion data.

In the process of identifying or detecting the target object by using the convolutional neural network containing the target convolutional layer, the time complexity can be reduced, and the calculation rate of the network can be improved. For convenience of understanding, fig. 2 shows a schematic structural diagram of a convolution result, as shown in fig. 2, a matrix on the left side of an arrow represents a matrix of input data input to the target convolution layer, a matrix on the right side of the arrow represents a matrix of target data after convolution and expansion of the target convolution layer, and the matrix of the target data and the matrix of the input data are the same as a matrix of 8 × 8; the diagonal units in the matrix of the target data represent convolution result units, the set of blank units represents a capacity expansion unit group, each blank unit can be regarded as a capacity expansion unit, and the set of capacity expansion units located in the same area forms a capacity expansion unit group. The target data in this example is the result of processing the input data by the target convolution kernel with a step size of 2 and padding (expansion) of 1, 3 × 3.

As can be seen from fig. 2, only 1/4 cells in the target data are the result of the target convolutional layer calculation, and 3/4 cells are the result of directly inserting the left input data into the target data. To X_1,1Convolution is carried out to obtain Y_1,1For example, Y_1,1The position is convolution result unit, the adjacent position of the convolution result unit is capacity-expanding unit group, the data filled in each capacity-expanding unit in the capacity-expanding unit group is identical to Y_1,1Corresponding input data, i.e. X_1,1Data of interest, in particular X_1,1The data of the adjacent positions. As shown in FIG. 2, Y_1,1The data in the corresponding capacity expansion unit groups are X respectively_1,2、X_2,1And X_2,2。

As can be seen from fig. 2, taking the step size of the convolution kernel as an example of 2, the time complexity of the target convolution layer becomes c × h '× w' × k/s/s 1/4 × c × h '× k' × 1/4 × c × h × k ═ i, i.e., the time complexity is the same, and the convolution neural network increases the expression capability of the model while maintaining the operation speed.

And step S108, inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result.

If the next convolution layer is also the target convolution layer, the above process is also executed, which is not described herein again, and if the next convolution layer is not the target convolution layer, only convolution operation is executed without capacity expansion processing, and the whole convolution neural network is utilized to realize processing of acquired data, so as to realize identification or detection of the target object.

The method for processing data by using the convolutional neural network can input the data into the pre-trained convolutional neural network for convolutional processing after acquiring the acquired data of a target object, wherein the convolutional neural network comprises at least one target convolutional layer, the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1, and when the target convolutional layer receives the input data, the input data and the target convolution kernel are subjected to convolution operation and capacity expansion processing to output the target data; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result. According to the data processing mode, after convolution operation and capacity expansion processing are carried out on input data by using a target convolution kernel with the convolution step length larger than 1, the size of a matrix of the target data is the same as that of the matrix of the input data, wherein the capacity expansion processed data does not need convolution calculation, so that time complexity is reduced under the condition that space complexity is not changed, and the calculation rate of a network is improved.

The embodiment provides another method for processing data by applying a convolutional neural network, which is implemented on the basis of the above embodiments; this embodiment mainly describes a specific implementation manner of performing convolution operation and expansion processing on input data and a target convolution kernel to output target data, that is, a manner of performing convolution while expanding capacity. As shown in fig. 3, a flow chart of another method for processing data by using a convolutional neural network, the method for processing data in this embodiment includes the following steps:

step S302, acquiring the collected data of a target object;

step S304, inputting the collected data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

step S306, taking each data in the input data as convolution center data one by one, taking the convolution center data as a convolution center for each convolution center data, and performing convolution operation on the convolution center data and a target convolution kernel to obtain a current convolution result;

the convolution center data is used as the convolution center, which is the input data corresponding to the center position of the convolution kernel when the input data is convolved by the convolution kernel, for example, the convolution kernel is 3 × 3, the convolution step is 2, and when the input data in fig. 2 is convolved for the first time, the middle position of the convolution kernel is just opposite to the X of the input data_1,1When the convolution is performed for the second time, the median position of the convolution kernel is just opposite to the X of the input data_1,3By analogy, the description is omitted here; the convolution operation in the embodiment of the present invention is similar to the conventional convolution operation, and is not described in detail here.

Step S308, reading insertion data corresponding to convolution center data from input data based on the convolution step length of the target convolution kernel;

the insertion data is determined based on convolution step corresponding to the target convolution kernel, taking the target convolution kernel as a convolution kernel of N × N as an example, where N is an odd number greater than or equal to 3; a step of reading insertion data corresponding to convolution center data from input data based on a convolution step of a target convolution kernel, including: determining a spanning data area corresponding to convolution center data in input data based on the convolution step of a target convolution kernel; the data in the span data area is taken as insertion data corresponding to the convolution center data.

As can be seen from the above, since N of the target convolution kernel is greater than or equal to 3, and the convolution step size of the target convolution kernel is greater than 1, spanning data that does not participate in the convolution operation exists during the convolution of the input data, and the positions corresponding to these spanning data are referred to as spanning data areas.

If the convolution step length of the target convolution kernel is 2, taking the area where N data adjacent to the convolution center data in the input data are located as a spanning data area, and taking the data in the spanning data area as insertion data corresponding to the convolution center data; for easy understanding, taking the target convolution kernel of 3 × 3 and the convolution step of 2 as an example in fig. 2, padding is set to 1 for the convolution kernel of 3 × 3, so that one circle of 0 is added to the outer circle of the matrix of input data on the left side of fig. 2, as shown in fig. 2, the position of zero padding is in the dashed line cell, and when the convolution kernel of 3 × 3 performs the first convolution on the input data, X in the matrix of input data is added_1,1Performing convolution operation as convolution center to obtain Y in right matrix_1,1Y is the same as_1,1As a result of the convolution, since X in the matrix of the input data is not used in the first convolution_1,2、X_2,1、X_2,2Data at three positions are taken as convolution centers, so the above X is taken as_1,2、X_2,1、X_2,2As convolution-centered data X_1,1Correspondingly across the data area.

Since the convolution step corresponding to the target convolution kernel is 2, when the convolution kernel of 3 × 3 performs the second convolution on the input data, X in the matrix of the input data is determined_1,3Performing convolution operation as convolution center to obtain Y in right matrix_1,3Convolution result, X_1,4、X_2,3、X_2,4Data at three positions as convolution center data X_1,3Correspondingly across the data area.

Similarly, the other three insertion data adjacent to the convolution result are also determined as in the above process, and are not described herein again, so that the size of the matrix of the target data is the same as the size of the matrix of the input data. For the determination process of inserting data into convolution kernels of other sizes and the process of determining inserted data when zero padding operation is not performed on input data, the above-mentioned processes are referred to, and details are not repeated herein.

The above description is made with respect to the convolution step of the target convolution kernel being 2, and if the convolution step of the target convolution kernel is 3, N data adjacent to the convolution center data and N +2 data adjacent to the N data in the input data are data spanning the data area and are taken as insertion data;

taking an example in which the matrix of the input data is 9 × 9 and padding is 1, when the convolution kernel of 3 × 3 convolves the input data, a convolution result matrix of 3 × 3 can be generated; in this embodiment, when the convolution kernel of 3X3 is to input the data X_1,1When the first convolution is carried out as the convolution center, Y in the target data matrix is obtained_1,1Y is the same as_1,1For the convolution result, since the convolution step corresponding to the target convolution kernel is 3, when the convolution kernel of 3 × 3 performs the second convolution on the input data, it is essential that X in the matrix of the input data is used_1,4Performing convolution operation as convolution center to obtain Y in target data matrix_1,4Convolution result, due to X in the matrix of data not to be input during convolution as described above_1,2、X_1,3、X_2,1、X_2,2、X_2,3、X_3,1、X_3,2、X_3,3These 8 pieces of input data serve as convolution centers, and therefore, the convolution center data X in the input data can be compared with the input data_1,1Three adjacent data X_1,2、X_2,1、X_2,2As insertion data, and 5 data X adjacent to the above three data_1,3、X_2,3、X_3,1、X_3,2、X_3,3Also as X_1,1The insertion data of (2).

Similarly, the input data can be compared with the current data X_1,4Adjacent 8 data X_1,5、X_1,6、X_2,4、X_2,5、X_2,6、X_3, ₄X_3,5、X_3,6As insertion data. And corresponding insertion data of other convolution center dataThe process of (a) is as described above, and is not described herein again.

If the convolution step of the target convolution kernel is 4, taking the N data adjacent to the convolution center data, the N +2 data adjacent to the N data, and the N +4 data adjacent to the N +2 data in the input data as the data in the cross data area, and taking the data as the insertion data.

Continuing with the example of fig. 2 in which the matrix of the input data is 8 × 8 and padding is 1, when the convolution kernel is 3 × 3 and the convolution step is 4, a convolution result matrix of 3 × 3 can be generated when the input data is convolved; in this embodiment, when the convolution kernel of 3X3 is to input the data X_1,1When the first convolution is carried out as the convolution center, Y in the target data matrix is obtained_1,1Y is the same as_1,1For the convolution result, since the convolution step corresponding to the target convolution kernel is 4, when the convolution kernel of 3 × 3 performs the second convolution on the input data, it is essential that X in the matrix of the input data is used_1,5Performing convolution operation as convolution center to obtain Y in target data matrix_1,5Convolution result, due to X in the matrix of data not to be input during convolution as described above_1,2、X_1,3、X_1,4、X_2,1、X_2,2、X_2,3、X_2,4、X_3,1、X_3,2、X_3,3、X_3,4、X_4,1、X_4,2、X_4,3、X_4,4These 15 input data serve as convolution centers, and thus, the current data X can be compared with the input data_1,1Three adjacent data X_1,2、X_2,1、X_2,2As insertion data, and 5 data X adjacent to the above three data_1,3、X_2,3、X_3,1、X_3,2、X_3,3Also as X_1,1And 7 data X to be adjacent to the above 5 data X_1,4、X_2,4、X_3,4、X_4,1、X_4,2、X_4,3、X_4,4Also as X_1,1The insertion data of (2).

Similarly, the input data can be compared with the current data X_1,515 adjacent data X_1,6、X_1,7、X_1,8、X_2,5、X_2,6、X_2,7、X_2,8、X_3,5、X_3,6、X_3,7、X_3,8、X_4,5、X_4,6、X_4,7、X_4,8As the insertion data, other insertion data corresponding to the current data are also determined as in the above process, and are not described herein again, and the size of the matrix of the target data is made to be the same as the size of the matrix of the input data by inserting the insertion data at a position adjacent to the convolution result.

In summary, when the width k of the matrix of k × k input data is divided by the step size s, after the convolution result is obtained by using the target convolution kernel, s-1 pieces of insertion data are inserted between two adjacent convolution results, so that the size of the matrix of the target data is the same as the size of the matrix of the input data, for example, the size of the matrix of the input data is 8 × 8, one piece of insertion data is inserted between convolution results with a step size of 2, and three pieces of insertion data are inserted between convolution results with a step size of 4, so that the matrix of the target data after capacity expansion processing is 8 × 8; when the matrix size of the input data is 9 × 9, inserting two pieces of inserted data between convolution results with the step length of 3 to enable the matrix of the target data after capacity expansion processing to be 9 × 9; if the matrix size of the input data is 10 × 10, inserting one piece of inserted data between convolution results with the step size of 2, and inserting four pieces of inserted data between convolution results with the step size of 5, so that the matrix of the target data after capacity expansion processing is 10 × 10; therefore, the number of inserted data is related to the step size.

Step S310, inserting the insertion data into a position corresponding to the current convolution result according to the position relation of the insertion data and the convolution center data; the position of the current convolution result is a convolution result unit of the target data, the position of the inserted data is a capacity expansion unit of the target data, and the capacity expansion unit corresponding to each current convolution result is a capacity expansion unit group;

because the matrix size of the expanded target data is the same as the matrix size of the input data, the unit position coordinates in the two matrices are in one-to-one correspondence, and therefore, data at the same position as the position coordinate can be read in the matrix of the input data according to the position coordinate of the inserted data in the convolution result matrix after expansion, and the data is the inserted data; while the data at the position corresponding to the convolution result unit in the matrix of input data need not be read.

In this embodiment, the steps S306 to S310 are repeatedly executed until each data in the input data is used as the current data to perform convolution operation and expansion processing with the target convolution kernel, so as to obtain the target data after convolution operation and expansion processing of all the input data.

Step S312, inputting the target data to the next convolution layer of the target convolution layer, and continuing the subsequent operation until the convolution neural network outputs the final result.

The embodiment of the invention provides another method for processing data by using a convolutional neural network, which describes the specific processes of convolutional operation and capacity expansion processing in detail, and can input the acquired data into a pre-trained convolutional neural network for convolutional processing after acquiring the acquired data of a target object, take each piece of data in the input data as convolutional central data one by one, and perform convolutional operation on the convolutional central data and a target convolutional kernel to obtain a current convolutional result; reading insertion data corresponding to convolution center data from input data based on the convolution step of the target convolution kernel; inserting the insertion data into a position corresponding to the current convolution result according to the position relation of the insertion data and the convolution center data; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result. According to the method, the read insertion data can be inserted into the position corresponding to the current convolution result according to the position relation between the convolution step length and the convolution center data, so that the size of the inserted target data matrix is consistent with that of the input data matrix, the insertion data can be directly read in the input data matrix without convolution calculation, and the time complexity is reduced under the condition that the space complexity is not changed.

The embodiment provides another method for processing data by applying a convolutional neural network, which is implemented on the basis of the above embodiments; this embodiment mainly describes a specific implementation manner of performing convolution operation and capacity expansion processing on input data and a target convolution kernel to output target data, that is, a manner of performing capacity expansion after integral convolution. As shown in fig. 4, a flowchart of another method for processing data by using a convolutional neural network, the method for processing data in this embodiment includes the following steps:

step S402, acquiring the collected data of the target object;

s404, inputting the collected data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

step S406, performing convolution operation on the input data and the target convolution kernel to obtain a convolution result matrix;

the convolution operation process can be implemented by referring to the prior art, and is not described in detail here.

Step S408, based on the convolution step of the target convolution kernel, splitting the position of each convolution result in the convolution result matrix into a convolution result unit and an expansion unit group, so that the matrix size of the split convolution result matrix is the same as the matrix size of the input data; wherein the convolution result unit stores a convolution result;

since the step size corresponding to the target convolution kernel is not less than 1, even if the zero padding expansion operation is performed on the input data, the size of the obtained convolution result matrix is smaller than the size of the matrix of the input data, in this embodiment, in order to implement the capacity expansion of the convolution result matrix to the same size as the matrix of the input data, the convolution result matrix may be split according to the convolution step size corresponding to the target convolution kernel, so as to implement the size of the convolution result matrix to be the same as the size of the matrix of the input data, where the split capacity expansion unit group includes a plurality of capacity expansion units, and each capacity expansion unit stores one piece of capacity expansion data. For example, the convolution step size of the target convolution kernel is 2, and the position of each convolution result in the convolution result matrix is split into a first unit structure of (N-1) × (N-1), where the first unit structure includes: a convolution result unit and N expansion units adjacent to the convolution result unit;

for ease of understanding, the input data is illustrated as a matrix of 8 x 8, the target convolution kernel is illustrated as 3x3, when the convolution operation is performed on the input data using the convolution kernel of 3x3, since the convolution step size is 2, so that the obtained convolution result matrix is a 4 x 4 matrix, the position of each convolution result in the convolution result matrix is divided into a first unit structure of 2 x 2, for easy understanding, referring to the matrix on the right side of the arrow in fig. 2, which is the matrix of the convolution result after splitting, as shown in fig. 2, there are 16 first unit structures, and, in each first unit structure, the oblique line unit is at the position of the convolution result, the blank unit is at the position of the capacity expansion unit, in each first unit structure, the convolution result unit is located at the first row and the first column of the first unit structure, and the remaining three positions are the positions of the three capacity expansion units.

For example, the convolution step of the target convolution kernel is 3, and the position of each convolution result in the convolution result matrix is split into an N × N second unit structure, where the second unit structure includes: a convolution result unit, N expansion units adjacent to the convolution result unit, and N +2 expansion units adjacent to the N expansion units.

Similarly, if a convolution result matrix of 4 × 4 is obtained through the target convolution kernel convolution, the position of each convolution result in the convolution result matrix is split into a second unit structure of 3 × 3, for convenience of understanding, see another schematic structure diagram of the convolution result shown in fig. 5, as shown in fig. 5, a dotted frame is the second unit structure, and therefore there are 16 second unit structures in total, and in each second unit structure, a slashed unit is the position of the convolution result, and a blank unit is the position of an expansion unit, where in each second unit structure, the convolution result unit is located at the first row and the first column of the second unit structure, the remaining 8 positions are the positions of 8 expansion units, and the matrix of the target data after expansion is a matrix of 12 × 12.

For example, the convolution step size of the target convolution kernel is 4, the position of each convolution result in the convolution result matrix is split into a third unit structure of (N +1) × (N +1), and the third unit structure includes: the convolution result unit, the N expansion units adjacent to the convolution result unit, the N +2 expansion units adjacent to the N expansion units, and the N +4 expansion units adjacent to the N +2 expansion units.

If a convolution result matrix of 2 × 2 is obtained through the target convolution kernel convolution, the position of each convolution result in the convolution result matrix is split into a third unit structure of 4 × 4, for easy understanding, see another structural schematic diagram of the convolution result shown in fig. 6, as shown in fig. 6, a dotted frame is the third unit structure, so there are 4 third unit structures in total, and in each third unit structure, an oblique line unit is the position of the convolution result, and a blank unit is the position of a capacity expansion unit, where in each third unit structure, the convolution result unit is located at the first row and the first column of the third unit structure, the remaining 15 positions are the positions of 15 capacity expansion units, and the matrix of the target data after capacity expansion is an 8 × 8 matrix.

And by analogy, when the convolution step length of the target convolution kernel is s (s >1), splitting the position of each convolution result in the convolution result matrix into an s × s unit structure, wherein in the unit structure, each convolution result unit is located at the first row and the first column of the unit structure, and the remaining s × (s-1) positions are the positions of the capacity expansion units.

Step S410, taking each convolution result unit as a target unit one by one, and for each target unit, reading insertion data corresponding to the target unit from input data based on the convolution step length of a target convolution kernel;

the target convolution kernel is an NxN convolution kernel; wherein N is an odd number greater than or equal to 3; based on the convolution step size of the target convolution kernel, the step of reading the insertion data corresponding to the target cell from the input data may be performed by steps B1-B2:

step B1, determining a crossing data area corresponding to the target unit in the input data based on the convolution step size of the target convolution kernel;

based on the above-mentioned split convolution result matrix, since the matrix size of the convolution result matrix is consistent with that of the input data, the process of determining the data crossing region corresponding to the target unit is the same as that of determining the data crossing region corresponding to the convolution center data in the input data based on the convolution step of the target convolution kernel, and therefore, the process of determining the data crossing region is not repeated.

Step B2, the data in the span data area is treated as the insert data corresponding to the target cell. Taking the example that the convolution step of the target convolution kernel is 2 to obtain the split convolution result matrix as an example, because the input data matrix and the split convolution result matrix are the same matrix of 8 × 8, the unit position coordinates in the two matrices are in one-to-one correspondence, so that data at the same position as the position coordinate can be read in the input data matrix according to the position coordinate of the capacity expansion unit in the split convolution result matrix, and the data is the insertion data; while the data at the position corresponding to the convolution result unit in the matrix of input data need not be read.

Step S412, adding the insertion data to the capacity expansion unit group corresponding to the target unit;

and adding the coordinate position of the insertion data in the matrix of the input data to the same capacity expansion unit as the coordinate position to complete capacity expansion processing.

And step S414, inputting the target data to the next convolution layer of the target convolution layer, and continuing the subsequent operation until the convolution neural network outputs the final result.

The embodiment of the invention provides another method for processing data by using a convolutional neural network, which describes the specific processes of convolutional operation and capacity expansion processing in detail, and can input the acquired data into a pre-trained convolutional neural network for convolutional processing after acquiring the acquired data of a target object, perform convolutional operation on the input data and a target convolutional kernel to obtain a convolutional result matrix, split the position of each convolutional result in the convolutional result matrix into a convolutional result unit and a capacity expansion unit group based on the convolutional step length of the target convolutional kernel, use each convolutional result unit as a target unit one by one, read insertion data corresponding to the target unit from the input data based on the convolutional step length of the target convolutional kernel, add the insertion data to the capacity expansion unit group corresponding to the target unit, input the target data to the next convolutional layer of the target convolutional layer, and continuing subsequent operation until the convolution neural network outputs a final result. The method can lead the matrix scale of the split convolution result matrix to be consistent with that of the input data through splitting the convolution result matrix, and the data in the split capacity expansion unit group can be directly read in the matrix of the input data without convolution calculation, so that the time complexity is reduced under the condition of ensuring the space complexity to be unchanged, which means that under the condition of ensuring the same time complexity, the convolutional neural network can increase the expression capacity of the neural network while ensuring the operation speed to be unchanged.

In summary, the data in the matrix after the convolution and expansion can be represented by the following formula:

where h [ N, N ] represents a convolution kernel of N × N, N is an odd number greater than or equal to 3, and the division by N/2 is a division operation by rounding, for example: n is 3, then N/2 is 1; n/2 is 2 if N is 5; when N is 7, N/2 is 3;

x (i, j) represents data in the ith row and the jth column in the matrix of input data, i is 0,1, 2.. the M-1, j is 0,1, 2.. the M-1, M represents the dimension of the matrix of input data, Y (i, j) represents data in the ith row and the jth column in the convolution result matrix (i.e., the matrix of the target data), and s represents the convolution step.

The above% refers to Mod, i.e. the remainder of the remainder operation, i% s ═ 0 means that the remainder of i divided by s is 0, in other words, i can be divided by s exactly. In this embodiment, the matrix size corresponding to the matrix of the input data and the matrix of the convolution result (i.e., the target data) is the same, and as can be seen from the above formula, the result of Y (i, j) is determined by the values of i and j, that is, if i% s is 0, andj% s is 0, then

In other cases, for example, if i% s ≠ 0 or j% s ≠ 0, then Y (i, j) ═ X (i, j).

Of course, the above formula is described by marking i and j in the matrix identifier from 0, and in a specific application, i and j in the matrix identifier may also be marked from 1, and accordingly, the above formula may be transformed accordingly, which is not described herein again.

In practical applications, a resnet network including three convolutional layers is taken as an example for explanation, and as shown in fig. 7, a schematic structural diagram of a resnet network is shown, where the resnet network includes two convolutional layers with convolution kernels 1 × 1 and one convolutional layer with convolution kernels 3 × 3, in this embodiment, the convolutional layer 3 × 3 in the resnet network is replaced with the target convolutional layer, the target convolution kernel of the target convolutional layer is 3 × 3, and the two convolutional layers are used to perform target identification tasks on the ImageNet1000 types respectively, where table 1 shows experimental effects of two different convolutional layers, and it is understood from the results in table 1 that the error rate of the target convolutional layer is significantly lower than that of the convolutional layer of the res network while the parameter amount is significantly increased.

TABLE 1

Convolutional layer	Calculated amount of	Amount of ginseng	Error rate
				resnet network convolution layer 3x3	556M	5.3M	30.2％
Target convolutional layer 3x3	561M	10M	27.1％

It should be noted that the above method embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

Corresponding to the above method embodiment, an embodiment of the present invention provides an apparatus for processing data by using a convolutional neural network, and fig. 8 shows a schematic structural diagram of an apparatus for processing data by using a convolutional neural network, as shown in fig. 8, the apparatus includes:

an obtaining module 802, configured to obtain collected data of a target object;

the processing module 804 is used for inputting the acquired data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

the operation module 806 is configured to, when the target convolution layer receives input data, perform convolution operation and expansion processing on the input data and the target convolution kernel, and output target data; the target data comprises convolution result units and capacity expansion unit groups inserted into adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group;

and the output module 808 is configured to input the target data to a next convolutional layer of the target convolutional layer, and continue subsequent operation until the convolutional neural network outputs a final result.

The device for processing data by using the convolutional neural network can input the data into the pre-trained convolutional neural network for convolutional processing after acquiring the acquired data of a target object, wherein the convolutional neural network comprises at least one target convolutional layer, the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1, and when the target convolutional layer receives the input data, the input data and the target convolution kernel are subjected to convolution operation and capacity expansion processing to output the target data; and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result. According to the data processing mode, after convolution operation and capacity expansion processing are carried out on input data by using a target convolution kernel with the convolution step length larger than 1, the size of a matrix of the target data is the same as that of the matrix of the input data, wherein the capacity expansion processed data does not need convolution calculation, so that time complexity is reduced under the condition that space complexity is not changed, and the calculation rate of a network is improved.

Further, the obtaining module 802 is further configured to obtain image data of the target object through a camera; wherein the target object is one of the following objects: a human body, a human body part, a vehicle or a license plate.

Generally, a convolutional neural network is used to perform a detection or recognition process on a target object.

Further, the processing module 804 is further configured to use each piece of data in the input data as convolution center data one by one, and for each piece of convolution center data, perform the following operations to obtain target data: taking the convolution center data as a convolution center, and performing convolution operation on the convolution center data and a target convolution kernel to obtain a current convolution result; reading insertion data corresponding to convolution center data from input data based on the convolution step of the target convolution kernel; inserting the insertion data into a position corresponding to the current convolution result according to the position relation of the insertion data and the convolution center data; the position of the current convolution result is a convolution result unit of the target data, the position of the insertion data is a capacity expansion unit of the target data, and the capacity expansion unit corresponding to each current convolution result is a capacity expansion unit group.

Wherein the target convolution kernel is an NxN convolution kernel; n is an odd number greater than or equal to 3; a step of reading insertion data corresponding to convolution center data from input data based on a convolution step of a target convolution kernel, including: determining a spanning data area corresponding to convolution center data in input data based on the convolution step of a target convolution kernel; the data in the span data area is taken as insertion data corresponding to the convolution center data.

Further, the processing module 804 is further configured to perform convolution operation on the input data and the target convolution kernel to obtain a convolution result matrix; based on the convolution step length of a target convolution kernel, splitting the position of each convolution result in a convolution result matrix into a convolution result unit and an expansion unit group, so that the matrix size of the split convolution result matrix is the same as the matrix size of input data; wherein the convolution result unit stores a convolution result; taking each convolution result unit as a target unit one by one, and executing the following operations on each target unit to obtain target data: reading insertion data corresponding to the target unit from the input data based on the convolution step of the target convolution kernel; and adding the insertion data to the capacity expansion unit group corresponding to the target unit so as to enable the matrix size of the split convolution result matrix to be the same as the matrix size of the input data.

The implementation principle and the generated technical effect of the apparatus for processing data by using a convolutional neural network provided by the embodiment of the present invention are the same as those of the foregoing method embodiment, and for brief description, reference may be made to the corresponding contents in the foregoing method embodiment for some points not mentioned in the embodiment of the apparatus for processing data by using a convolutional neural network.

The embodiment of the invention further provides an electronic system, and refer to a schematic structural diagram of an electronic system 900 shown in fig. 9. The electronic system can be used for realizing the method and the device for processing data by applying the convolutional neural network.

As shown in fig. 9, an electronic system 900 includes one or more processing devices 902, one or more memory devices 904, an input device 906, an output device 908, and one or more image capture devices 910 interconnected by a bus system 912 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 900 shown in fig. 9 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.

The processing device 902 may be a server, a smart terminal, or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may process data for other components in the electronic system 900 and may control other components in the electronic system 900 to perform functions that process data.

The storage 904 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 902 to implement client functionality and/or other desired functionality in embodiments of the invention (implemented by the processing device) as described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 906 may be a device used by a user to input instructions and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.

Output device 908 may output various information (e.g., images or sounds) to an external (e.g., user), and may include one or more of a display, speakers, or the like.

Image capture device 910 may acquire image data of a target object and store the captured image data in storage 904 for use by other components.

For example, the devices in the electronic system and the method and apparatus for processing data by applying the convolutional neural network according to the embodiment of the present invention may be integrally disposed, or may be dispersedly disposed, such as integrally disposing the processing device 902, the storage device 904, the input device 906, and the output device 908, and disposing the image capturing device 910 at a specific position where image data can be captured. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the method for processing data by applying a convolutional neural network, and specific implementation may refer to method embodiments, and is not described herein again.

The method, apparatus, and computer program product for processing data using convolutional neural network provided in the embodiments of the present invention include a computer readable storage medium storing program codes, where instructions included in the program codes may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic system and/or the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of processing data using a convolutional neural network, the method comprising:

acquiring collected data of a target object;

inputting the acquired data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

when the target convolution layer receives input data, carrying out convolution operation and capacity expansion processing on the input data and the target convolution kernel, and outputting target data; the target data comprises convolution result units and capacity expansion unit groups inserted in adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group;

and inputting the target data to the next convolution layer of the target convolution layer, and continuing subsequent operation until the convolution neural network outputs a final result.

2. The method of claim 1, wherein the step of performing convolution operation and expansion processing on the input data and the target convolution kernel to output target data comprises:

taking each data in the input data as convolution center data one by one, and performing the following operations on each convolution center data to obtain target data:

taking the convolution center data as a convolution center, and applying the target convolution kernel to carry out convolution operation to obtain a current convolution result;

reading insertion data corresponding to the convolution center data from the input data based on the convolution step of the target convolution kernel;

inserting the insertion data into a position corresponding to the current convolution result according to the position relation of the insertion data and the convolution center data; the position of the current convolution result is a convolution result unit of the target data, the position of the insertion data is an expansion unit of the target data, and each expansion unit corresponding to the current convolution result is an expansion unit group.

3. The method of claim 2, wherein the target convolution kernel is an N x N convolution kernel; wherein N is an odd number greater than or equal to 3;

a step of reading insertion data corresponding to the convolution center data from the input data based on a convolution step of the target convolution kernel, including:

determining a cross data area corresponding to the convolution center data in the input data based on the convolution step of the target convolution kernel;

and taking the data in the crossing data area as insertion data corresponding to the convolution center data.

4. The method of claim 1, wherein the step of performing convolution operation and expansion processing on the input data and the target convolution kernel to output target data comprises:

performing convolution operation on the input data and the target convolution kernel to obtain a convolution result matrix;

splitting the position of each convolution result in the convolution result matrix into a convolution result unit and an expansion unit group based on the convolution step length of the target convolution kernel, so that the matrix size of the split convolution result matrix is the same as the matrix size of the input data; wherein the convolution result unit stores the convolution result;

taking each convolution result unit as a target unit one by one, and executing the following operations on each target unit to obtain target data:

reading insertion data corresponding to the target unit from the input data based on the convolution step of the target convolution kernel; and adding the insertion data to a capacity expansion unit group corresponding to the target unit.

5. The method of claim 4, wherein the target convolution kernel is an N x N convolution kernel; wherein N is an odd number greater than or equal to 3;

reading insertion data corresponding to the target cell from the input data based on a convolution step of the target convolution kernel, including:

determining a spanning data area corresponding to the target unit in the input data based on the convolution step of the target convolution kernel;

and taking the data in the spanning data area as the insertion data corresponding to the target unit.

6. The method of claim 1, wherein the step of acquiring acquisition data of the target object comprises:

acquiring image data of a target object through a camera; wherein the target object is one of the following objects: a human body, a human body part, a vehicle or a license plate.

7. The method of claim 1, wherein the convolutional neural network is used to detect or identify the target object.

8. An apparatus for processing data using a convolutional neural network, the apparatus comprising:

the acquisition module is used for acquiring the acquisition data of the target object;

the processing module is used for inputting the acquired data into a pre-trained convolutional neural network for convolution processing; the convolutional neural network comprises at least one target convolutional layer, and the convolution step length corresponding to a target convolution kernel of the target convolutional layer is larger than 1;

the operation module is used for performing convolution operation and capacity expansion processing on the input data and the target convolution kernel when the target convolution layer receives the input data and outputting target data; the target data comprises convolution result units and capacity expansion unit groups inserted in adjacent positions of the convolution result units, so that the matrix size of the target data is the same as that of the input data; the data in the capacity expansion unit group is the same as the input data corresponding to the convolution result unit adjacent to the capacity expansion unit group;

and the output module is used for inputting the target data to the next convolution layer of the target convolution layer and continuing subsequent operation until the convolution neural network outputs a final result.

9. An electronic system, characterized in that the electronic system comprises: the device comprises an image acquisition device, a processing device and a storage device;

the image acquisition equipment is used for acquiring an image to be detected;

the storage means having stored thereon a computer program which, when executed by the processing apparatus, performs a method of processing data using a convolutional neural network as defined in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processing device, carries out the steps of the method of processing data using a convolutional neural network as claimed in any one of claims 1 to 7.