CN107894957B

CN107894957B - Convolutional neural network-oriented memory data access and zero insertion method and device

Info

Publication number: CN107894957B
Application number: CN201711118433.7A
Authority: CN
Inventors: 周东浩; 陈艇
Original assignee: Henan Dingshi Intelligent Technology Co ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2020-09-01
Anticipated expiration: 2037-11-14
Also published as: CN107894957A

Abstract

The embodiment of the invention provides a convolutional neural network-oriented memory data access and zero insertion method and device, and belongs to the field of convolutional neural network calculation and special hardware accelerators. The method automatically reads data from a data memory according to a row mode or a column mode according to a data base address and a reading mode of an input control signal, and automatically performs zero padding on the periphery of original two-dimensional data according to the size of an input data row and column of the input control signal and zero insertion mode indication when outputting the data. Therefore, when the device is used for a convolutional neural network structure based on a shift chain, the automatic reading of two-dimensional data in a memory can be realized under the control of a small amount of input control signals, and when the data is output, the zero insertion is automatically carried out on the periphery of the original two-dimensional data and then the data is sent to the convolutional calculation module, so that the data access efficiency of the convolutional neural network calculation structure can be greatly improved, and the calculation time of the whole convolutional neural network can be shortened.

Description

Convolutional neural network-oriented memory data access and zero insertion method and device

Technical Field

The invention relates to the field of convolutional neural network calculation and a special hardware accelerator, in particular to a convolutional neural network-oriented memory data access and zero insertion method and device.

Background

In recent years, Convolutional Neural Networks (CNN) have made a significant breakthrough and progress in deep learning and artificial intelligence, and typical CNN models include AlexNet, ZFNet, VGGNet, GoogleNet, and squeezet (compressed Convolutional Neural Network). At present, CNN has become one of the research hotspots in many scientific fields, especially in the fields of image classification and pattern recognition, because CNN avoids the complex preprocessing of images and can directly input the original image data, it has been widely applied. CNNs are generally composed of a plurality of convolutional layers and downsampling layers (popling), and the convolutional layers input three-dimensional data generally composed of a plurality of two-dimensional feature maps. Assuming that each input two-dimensional feature map is composed of H × W data, each feature map is called a channel, and there are D channels in total, the input can be represented by H × W × D three-dimensional data. The convolution layer contains a plurality of convolution kernels, each convolution kernel is also three-dimensional data and can be expressed as H × W × D, where H ≦ H, W ≦ W, and D ≦ D. And carrying out convolution operation on each convolution kernel and the input data to obtain new feature map data. Because each layer has a plurality of convolution kernels participating in operation, the operation result of the CNN middle layer is also three-dimensional data. The result of the previous layer of operation is used as the input of the next layer of operation, and the output data is gradually reduced in the length and width dimensions through multilayer convolution operation or downsampling, but the depth is generally gradually increased. As shown in fig. 1, the AlexNet structure is divided into 8 layers of operations, which are input as one 224 × 224 × 3 picture data, where the first five layers are convolutional layers. The first layer of convolution adopts 96 convolution cores of 11 multiplied by 3 to carry out convolution operation on the same input data, and the output is 55 multiplied by 96 three-dimensional data; the output of the first layer is used as the input of the second layer convolution operation, the second layer convolution adopts 256 convolution kernels of 5 multiplied by 96, and the convolution calculation result is subjected to posing operation, and the second layer calculation output data dimension is 27 multiplied by 256.

The main operation of the CNN is convolution operation of three-dimensional data, and in the actual operation process, the convolution result of each input feature map is accumulated to obtain an output feature map on the basis of convolution calculation of two-dimensional input feature maps. Because the number of convolution layers and the number of convolution kernels in each layer of the CNN are large, the amount of computation is quite large. In addition, in the convolution operation, zero padding (padding) is required in the "periphery" of the two-dimensional input feature map in order to make the feature map length and width of the convolution calculation result a specific value. As shown in fig. 3, in the fourth-layer and fifth-layer convolution operations of AlexNet, the feature map size of the input data is 13 × 13, the corresponding convolution kernel size is 3 × 3, and in order to make the output feature map size of the convolution calculation also 13 × 13, the feature map of the input 13 × 13 is extended during the operation, and zero padding is performed on the "periphery" of the feature map, so that the size of the input feature map becomes 15 × 15, and the data after zero padding is shown in fig. 4.

The traditional method for realizing CNN operation is mainly based on a general purpose processor, a GPU and a convolution operation accelerator based on a shift register chain, wherein the convolution operation accelerator based on the shift register chain is characterized in that a two-dimensional shift register chain is adopted, the length of the chain is equal to the row length of input data, the width of the chain is equal to the width of a convolution kernel, one data is externally input in each clock period, all data in the shift register chain are moved backwards by one data position to achieve a windowing operation similar to that in two-dimensional convolution operation, and after the shift register chain is full of data, each row of data is input, a convolution calculation unit can output a row of convolution results. The convolution operation accelerator based on the shift register chain can share input data to the maximum extent, and the hardware calculation efficiency is very high. However, if zero padding is required to be performed on the periphery of input original data, a storage space capable of storing data and peripheral zero elements thereof is generally created in an internal memory in advance, zero is written in a peripheral storage unit of the data after the data is imported, and then the data in the memory is sequentially read and input into a shift register chain of a convolution calculation unit. The main disadvantage of this method is that a storage space larger than the original data needs to be reserved in the internal memory, and the storage space at the zero-insertion position is initialized to zero before the data is stored in the internal memory, and meanwhile, because the addresses of the input data in the memory are already discontinuous, the data transfer cannot use the burst mode of the DMA to perform continuous bulk data transfer, so that the data transfer efficiency is reduced, and the operation performance of the whole system is affected.

Disclosure of Invention

According to the convolutional neural network-oriented memory data access and zero insertion method and device, zero elements do not need to be inserted into the periphery of original data in an internal memory in advance, the zero elements are inserted into corresponding positions after the data are read out from the memory, the requirement of a convolutional neural network accelerator on storage resources is reduced, and optimization of the convolutional neural network calculation process is achieved.

The invention provides a convolutional neural network-oriented memory data access and zero insertion method, which is applied to a convolutional neural network-oriented memory data access and zero insertion system and comprises the following steps: configuring an input data row size, an input data column size, a base address of input data in a memory, a data reading mode and a zero insertion mode according to a preset rule, wherein the data reading mode comprises a row-first reading mode and a column-first reading mode, and parameters of the zero insertion mode comprise a row-first zero insertion row number K, a row-last zero insertion row number L, a column-first zero insertion column number M and a column-last zero insertion column number N; setting a start signal to high to start data reading and zero insertion operation; when the configured data reading mode is the row-first reading mode, sequentially reading data in the external data memory row by row, and performing zero insertion operation according to the zero insertion mode; if the number of lines of zero insertion at the head of the line is not zero, outputting K lines of S1 zero elements before reading data in the data memory, and then inserting M zeros and N zeros before and after outputting the original data of each line; when the configuration data reading mode is the column priority reading mode, firstly, M columns of S2 zero elements are output, and then, before and after outputting each column of original data, K zeros and L zeros are respectively inserted.

Optionally, if the row number K of zero-first inserted lines is not zero, before reading data in the data memory, outputting S1 zero elements of the row K, then inserting M zeros and N zeros before and after outputting the original data of each row, and then further including: after the row containing the last row of raw data is output, the L rows of S3 zero elements are output, wherein the S3 is the sum of the M, the N and the input data column size.

Optionally, when the configuration data reading mode is the column priority reading mode, first outputting M columns of S2 zero elements, then inserting K zeros and L zeros before and after outputting each column of original data, and then further including: after the column containing the last column of original data is output, the N columns of S4 zero elements are output, wherein the S4 is the sum of the M, the N and the size of the input data column.

Optionally, the S1 satisfies: the S1 is the sum of the M, the N, and the input data column size.

Optionally, the S2 satisfies: the S2 is the sum of the K, the L, and the input data line size.

The invention provides a convolutional neural network-oriented memory data access and zero insertion device, which is applied to a convolutional neural network-oriented memory data access and zero insertion system and comprises: the data preprocessing unit is used for configuring the row size of input data, the column size of the input data, a base address of the input data in the memory, a data reading mode and a zero insertion mode according to a preset rule, wherein the data reading mode comprises a row priority reading mode and a column priority reading mode, and parameters of the zero insertion mode comprise a row head zero insertion row number K, a row end zero insertion row number L, a column head zero insertion column number M and a column end zero insertion column number N; the processing unit is used for setting the starting signal to be high to start data reading and zero insertion operation; the mode matching unit is used for reading data in the external data memory line by line sequentially when the configuration data reading mode is the line priority reading mode and carrying out zero insertion operation according to the zero insertion mode; a first execution unit, configured to output K rows of S1 zero elements before reading data in the data memory if the number of first zero-inserted rows is not zero, and then insert M zeros and N zeros before and after outputting original data of each row, respectively; and the second execution unit is used for outputting M columns of S2 zero elements when the data reading mode is configured to be the column priority reading mode, and then inserting K zeros and L zeros before and after outputting each column of original data.

Optionally, the first execution unit then further includes: a first output unit, configured to output the L rows of S3 zero elements after a row including a last row of original data is output, where S3 is a sum of the M, the N, and the input data column size.

Optionally, the second execution unit then further includes:

a second output unit, configured to output the N columns of S4 zero elements after the column containing the last column of original data is output, where S4 is a sum of the sizes of the M, the N, and the input data column.

The invention provides a convolutional neural network-oriented memory data access and zero insertion system, which comprises: the device comprises a memory address generation and zero insertion control module, a zero insertion control delay module and an output data selection module; the memory address generation and zero insertion control module is connected with an external data memory and the zero insertion control delay module and is used for automatically generating a data memory read enable signal, a read address and a zero insertion enable signal according to the control of an external input control signal, wherein the memory read enable signal and the read address are directly connected with a read enable channel and a read address channel of the external data memory to control the reading of data, and the zero insertion enable signal is output to the zero insertion control delay module; the zero insertion control delay module is connected with the memory address generation and zero insertion control module and the output data selection module; the zero insertion control delay module is composed of a shift register chain, beats input zero insertion enabling signals, the number of beats delayed clocks is equal to the number of delayed clocks of data read by the external data memory, synchronization with data memory output is achieved, and delayed zero insertion control signals are output to the output data selection module; the output data selection module is connected with an external data memory and the zero insertion control delay module, receives control of an output signal of the zero insertion control delay module, and outputs data input by the external data memory when the output signal is 0, otherwise, outputs data 0.

Compared with the prior art, the convolutional neural network-oriented memory data access and zero insertion method and device provided by the invention have the beneficial effects that:

1. the invention provides a method and a device for accessing convolution data and automatically inserting zeros aiming at the requirement that a compressed convolution neural network based on a shift register chain needs to insert zeros at the periphery of input two-dimensional original data.

2. The method is simple, has high execution efficiency, and can effectively accelerate data storage and convolution operation in the convolution neural network algorithm; the device comprises a memory address generation and zero insertion control module, a zero insertion control delay module and an output data selection module, and has the advantages of simple structure, good performance, low power consumption and high utilization rate of functional units.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a block diagram of a convolutional neural network-oriented memory data access and zero insertion system according to an embodiment of the present invention;

FIG. 2 is a flowchart of a convolutional neural network-oriented memory data access and zero insertion method according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of data before zero insertion in the convolutional neural network-oriented memory data access and zero insertion method shown in FIG. 2;

FIG. 4 is a schematic diagram of data after zero insertion in the convolutional neural network-oriented memory data access and zero insertion method shown in FIG. 2;

fig. 5 is a functional block diagram of a convolutional neural network-oriented memory data access and zero insertion apparatus according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a block diagram of a convolutional neural network-oriented memory data access and zero insertion system according to an embodiment of the present invention. The convolutional neural network-oriented memory data reading and zero insertion system 100 comprises: a memory address generation and zero insertion control module 110, a zero insertion control delay module 120, and an output data selection module 130.

The memory address generation and zero insertion control module 110 outputs a memory read address and a read enable to be connected with an external data memory; the memory address generation and zero insertion control module 110 outputs a zero insertion enable signal to be connected with the zero insertion control delay module 120, and the delay clock number of the zero insertion control delay module 120 is equal to the memory read data delay clock number; the output of the zero-insertion control delay module 120 is connected to the output data selection module 130, so as to realize the selective output of the output data and zero.

In this embodiment, the memory address generation and zero insertion control block 110 is connected to the external data memory and zero insertion control delay block 120, and is controlled by the external input signal feature map Row size (Row _ size), feature map Column size (Column _ size), Base address (Base _ addr) of the feature map data in the external memory, memory read mode (read _ mode), zero insertion mode (padding _ mode), and Start signal (Start). The read mode of the memory comprises a row priority mode and a column priority mode, and the parameters of the zero insertion mode comprise a row number K of zero insertion at the head of a row, a row number L of zero insertion at the tail of the row, a column number M of zero insertion at the head of a column and a column number N of zero insertion at the tail of the column. According to the indication of the input signals, the memory address generation and zero insertion control module 110 can implement zero insertion operation on the periphery of the input two-dimensional data, and can implement row-first output or column-first output; the zero insertion control delay module 120 performs beat delay on the pad0 signal, wherein the delayed clock number is equal to the read delay clock number of the external data memory; the output data selection module 130 outputs a zero element if the output indication of the zero insertion control delay module is 1, and outputs the output data of the external data storage if the output indication of the zero insertion control delay module is not 1.

For example, in fig. 1, when the 8 × 8 data input is all zero-padded up, down, left, and right, the output data may be output with row priority or column priority. As can be seen from the figure, the original two-dimensional data are sequentially stored in the memory, the memory address generation and zero insertion control module 110 generates a read address and a read enable to be sent to a read interface of the data memory, and suspends one beat of the read data memory before the first row, before the first column, after the last row and after the last column of the original data are read, and simultaneously enables the pad0 signal, otherwise, the data in the memory is read out by the row or the column, and the final output data is selected by the output data selection module 130 according to the delay signal pad0 _ n of the pad 0.

In this embodiment, the zero insertion control delay module 120 is composed of a shift register chain, beats the input zero insertion enable signal, the number of clocks delayed by the beating is equal to the number of delayed clocks of the external data memory read data, so as to achieve synchronization with the data memory output, and the delayed zero insertion control signal is output to the output data selection module 130.

The output data selection module 130 is connected to the external data storage and the zero insertion control delay module 120, and receives control of the output signal of the zero insertion control delay module 120, and outputs data input by the external data storage when the output signal is 0, or outputs data 0 otherwise.

Fig. 2 is a flowchart of a convolutional neural network-oriented memory data access and zero insertion method according to a first embodiment of the present invention. The convolutional neural network-oriented memory data access and zero insertion method is applied to a convolutional neural network-oriented memory data access and zero insertion system, and a specific flow shown in fig. 2 will be described in detail below.

Step S101, configuring input data row size, input data column size, a base address of input data in a memory, a data reading mode and a zero insertion mode according to a preset rule, wherein the data reading mode comprises a row priority reading mode and a column priority reading mode, and parameters of the zero insertion mode comprise row number K of zero insertion rows at the head of a row, row number L of zero insertion rows at the tail of a row, column number M of zero insertion columns at the head of a column and column number N of zero insertion columns at the tail of a column.

In step S102, setting the start signal to high starts data reading and zero insertion operations.

And step S103, when the configured data reading mode is the line priority reading mode, sequentially reading the data in the external data memory line by line, and performing zero insertion operation according to the zero insertion mode.

As shown in fig. 3 and 4, the data of the original feature map before zero insertion and the data of the feature map after zero insertion are shown, in the conventional implementation manner, the memory space for zero insertion needs to be reserved in the memory in advance, the memory spaces are cleared, and then the original data is stored in the corresponding position, which causes the position of the data of the original feature map in the memory to be discontinuous, and the data cannot be continuously moved using the DMA mode, thereby reducing the data moving efficiency.

Step S104, if the number of the first zero-inserted columns of the row is not zero, before reading the data in the data memory, outputting K rows of S1 zero elements, and then inserting M zeros and N zeros before and after outputting the original data of each row.

Wherein the S1 satisfies: the S1 is the sum of the M, the N, and the input data column size.

In this embodiment, after the step S104, the method further includes outputting the L rows of S3 zero elements after the row containing the last row of original data is output, where S3 is the sum of the M, N and the size of the input data column.

Step S105, when the configuration data reading mode is the column priority reading mode, firstly outputting M columns of S2 zero elements, and then respectively inserting K zeros and L zeros before and after outputting each column of original data.

Wherein the S2 satisfies: the S2 is the sum of the K, the L, and the input data line size.

Optionally, after the step S105, the method further includes outputting the N columns of S4 zero elements after the column containing the last column of original data is output, where S4 is a sum of the sizes of the M, the N, and the input data column.

Fig. 5 is a schematic diagram of functional modules of a convolutional neural network-oriented memory data access and zero insertion device according to a second embodiment of the present invention. The convolutional neural network-oriented memory data access and zero insertion apparatus 400 is applied to a convolutional neural network-oriented memory data access and zero insertion system, and the convolutional neural network-oriented memory data access and zero insertion apparatus 400 includes a data preprocessing unit 410, a processing unit 420, a pattern matching unit 430, a first execution unit 440, a first output unit 450, a second execution unit 460, and a second output unit 470.

The data preprocessing unit 410 is configured to configure, according to a preset rule, an input data row size, an input data column size, a base address of input data in the memory, a data reading mode, and a zero insertion mode, where the data reading mode includes a row-first reading mode and a column-first reading mode, and parameters of the zero insertion mode include a row-first zero insertion row number K, a row-last zero insertion row number L, a column-first zero insertion column number M, and a column-last zero insertion column number N.

A processing unit 420 for setting the start signal high to initiate data read and zero insertion operations.

And a pattern matching unit 430, configured to, when the configuration data reading mode is the row-first reading mode, sequentially read out data in the external data memory row by row, and perform a zero insertion operation according to the zero insertion mode.

A first execution unit 440, configured to output K rows of S1 zero elements before reading data in the data memory if the row-first zero-inserted column number is not zero, and then insert M zeros and N zeros before and after outputting the original data of each row, respectively. Wherein the S1 satisfies: the S1 is the sum of the M, the N, and the input data column size.

A first output unit 450, configured to output the L rows of S3 zero elements after the row containing the last row of original data is output, wherein the S3 is a sum of the M, the N, and the input data column size.

A second execution unit 460, configured to output M columns of S2 zero elements first when the data reading mode is configured as the column-first reading mode, and then insert K zeros and L zeros before and after outputting each column of original data. Wherein the S2 satisfies: the S2 is the sum of the K, the L, and the input data line size.

A second output unit 470, configured to output the N columns of S4 zero elements after the column containing the last column of original data is output, wherein the S4 is a sum of the sizes of the M, the N, and the input data column.

In summary, the convolutional neural network-oriented memory data access and zero insertion method and apparatus provided by the present invention automatically read data from the data memory in a row mode or a column mode by inputting a data base address and a read mode of a control signal, and automatically perform zero padding on the periphery of the original two-dimensional data when outputting data according to the input data row and column size and the zero insertion mode indication of the input control signal. Therefore, when the device is used for a convolutional neural network structure based on a shift chain, the automatic reading of two-dimensional data in a memory can be realized under the control of a small amount of input control signals, and when the data is output, the zero insertion is automatically carried out on the periphery of the original two-dimensional data and then the data is sent to the convolutional calculation module, so that the data access efficiency of the convolutional neural network calculation structure can be greatly improved, and the calculation time of the whole convolutional neural network can be shortened.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims

1. A convolutional neural network-oriented memory data access and zero insertion method is applied to a convolutional neural network-oriented memory data access and zero insertion system, and is characterized by comprising the following steps:

configuring an input data row size, an input data column size, a base address of input data in a memory, a data reading mode and a zero insertion mode according to a preset rule, wherein the data reading mode comprises a row-first reading mode and a column-first reading mode, and parameters of the zero insertion mode comprise a row-first zero insertion row number K, a row-last zero insertion row number L, a column-first zero insertion column number M and a column-last zero insertion column number N;

setting a start signal to high to start data reading and zero insertion operation;

when the configured data reading mode is the row-first reading mode, sequentially reading data in the external data memory row by row, and performing zero insertion operation according to the zero insertion mode;

if the number K of the zero-inserted line at the head of the line is not zero, outputting S1 zeros of the K lines before reading the data in the data memory, and then respectively inserting M zeros and N zeros before and after outputting the original data of each line;

when the configuration data reading mode is the column priority reading mode, firstly, M columns of S2 zeros are output, and then, before and after outputting each column of original data, K zeros and L zeros are inserted respectively.

2. The method of claim 1, wherein if the row number K of zero first insertion lines is not zero, outputting K rows S1 zeros before reading the data in the data memory, and then inserting M zeros and N zeros before and after outputting each row of original data, respectively, and thereafter further comprising:

after the row containing the last row of raw data is output, output L rows of S3 zeros, where S3 is the sum of the M, the N, and the input data column size.

3. The method according to claim 1, wherein when the configuration data reading mode is the column-first reading mode, the M columns of S2 zeros are output first, and then K zeros and L zeros are inserted before and after outputting each column of original data, and thereafter further comprising:

after the column containing the last column of original data is output, output N columns of S4 zeros, where S4 is the sum of the M, N and the size of the input data column.

4. The method according to claim 1, wherein the S1 satisfies: the S1 is the sum of the M, the N, and the input data column size.

5. The method according to claim 1, wherein the S2 satisfies: the S2 is the sum of the K, the L, and the input data line size.

6. A convolutional neural network-oriented memory data access and zero insertion device, which is applied to a convolutional neural network-oriented memory data access and zero insertion system, and comprises:

the data preprocessing unit is used for configuring the row size of input data, the column size of the input data, a base address of the input data in the memory, a data reading mode and a zero insertion mode according to a preset rule, wherein the data reading mode comprises a row priority reading mode and a column priority reading mode, and parameters of the zero insertion mode comprise a row head zero insertion row number K, a row end zero insertion row number L, a column head zero insertion column number M and a column end zero insertion column number N;

the processing unit is used for setting the starting signal to be high to start data reading and zero insertion operation;

the mode matching unit is used for reading data in the external data memory line by line sequentially when the configuration data reading mode is the line priority reading mode and carrying out zero insertion operation according to the zero insertion mode;

a first execution unit, configured to output K rows of S1 zeros before reading data in the data memory if the row number of zero inserted at the head of the row is not zero, and then insert M zeros and N zeros before and after outputting original data of each row, respectively;

and the second execution unit is used for outputting M columns of S2 zeros firstly and then inserting K zeros and L zeros respectively before and after outputting each column of original data when the configuration data reading mode is the column priority reading mode.

7. The apparatus of claim 6, wherein the first execution unit is further configured to thereafter:

a first output unit, configured to output L rows of S3 zeros after a row including a last row of original data is output, where S3 is a sum of the M, the N, and the input data column size.

8. The apparatus of claim 6, wherein the second execution unit is further configured to thereafter:

and a second output unit, configured to output N columns of S4 zeros after a column including a last column of original data is output, where S4 is a sum of the M, N, and a size of the input data column.

9. The apparatus according to claim 6, wherein said S1 satisfies: the S1 is the sum of the M, the N, and the input data column size.

10. A convolutional neural network-oriented memory data access and zero insertion system, comprising: the device comprises a memory address generation and zero insertion control module, a zero insertion control delay module and an output data selection module;

the memory address generation and zero insertion control module is connected with an external data memory and the zero insertion control delay module and is used for automatically generating a data memory read enable signal, a read address and a zero insertion enable signal according to the control of an external input control signal, wherein the memory read enable signal and the read address are directly connected with a read enable channel and a read address channel of the external data memory to control the reading of data, and the zero insertion enable signal is output to the zero insertion control delay module;

the zero insertion control delay module is connected with the memory address generation and zero insertion control module and the output data selection module;

the zero insertion control delay module is composed of a shift register chain, beats input zero insertion enabling signals, the number of beats delayed clocks is equal to the number of delayed clocks of data read by the external data memory, synchronization with data memory output is achieved, and delayed zero insertion control signals are output to the output data selection module;

the output data selection module is connected with an external data memory and the zero insertion control delay module, receives control of an output signal of the zero insertion control delay module, and outputs data input by the external data memory when the output signal is 0, otherwise, outputs data 0.