CN113052292B - Convolutional neural network technique method, device and computer readable storage medium - Google Patents

Convolutional neural network technique method, device and computer readable storage medium Download PDF

Info

Publication number
CN113052292B
CN113052292B CN201911376391.6A CN201911376391A CN113052292B CN 113052292 B CN113052292 B CN 113052292B CN 201911376391 A CN201911376391 A CN 201911376391A CN 113052292 B CN113052292 B CN 113052292B
Authority
CN
China
Prior art keywords
weight data
block
data
external memory
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911376391.6A
Other languages
Chinese (zh)
Other versions
CN113052292A (en
Inventor
徐兵
张楠赓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sisheng Technology Co ltd
Original Assignee
Beijing Sisheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sisheng Technology Co ltd filed Critical Beijing Sisheng Technology Co ltd
Priority to CN201911376391.6A priority Critical patent/CN113052292B/en
Publication of CN113052292A publication Critical patent/CN113052292A/en
Application granted granted Critical
Publication of CN113052292B publication Critical patent/CN113052292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a calculation method, a calculation device and a computer readable storage medium of a convolutional neural network, wherein the method comprises the following steps: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network; and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer. By using the method, the preset processing mode suitable for the target convolution layer is determined according to the data volume ratio, so that the access bandwidth requirement on the external memory can be reduced.

Description

Convolutional neural network technique method, device and computer readable storage medium
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a convolutional neural network calculation method, a convolutional neural network calculation device and a computer readable storage medium.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
For a common embedded platform, the chip itself does not have enough memory space for storing these input-output feature maps (intermediate results of operations) without considering the external memory, and huge weight parameters, so that frequent data transmission between the external memory (typically DRAM) and the inside of the chip is unavoidable.
Thus, large-scale data exchange between the internal and external memories of the chip can result in a significant waste of power consumption in the convolution calculation process.
Disclosure of Invention
The problem that data exchange between the internal memory and the external memory of the chip is too frequent in the prior art is solved. The embodiment of the invention provides a convolutional neural network calculation method, a convolutional neural network calculation device and a computer readable storage medium. With this method and apparatus, the above-mentioned problems can be solved.
The following schemes are provided in the embodiments of the present invention.
In a first aspect, a method for calculating a convolutional neural network is provided, including: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network; and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
In one possible embodiment, if the data amount ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer includes: reading the weight data and the first block of the input image data from the external memory, thereby performing a convolution calculation based on the weight data and the first block; thereafter, the second block of the input image data is read from the external memory, so that the convolution calculation is performed based on the weight data and the second block.
In one possible embodiment, the method further comprises: after carrying out convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform; the intermediate results are read from the internal memory and directly participate in the convolution computation of the next layer of the target convolution layer.
In one possible embodiment, the method further comprises: if the weight data is smaller than the preset threshold value, after the weight data is read from the external memory, caching the weight data in the internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
In one possible embodiment, if the data amount ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and determining the preset processing mode of the target convolution layer includes: reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.
In one possible embodiment, the method further comprises: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
In one possible implementation, the preset processing mode of the target convolutional layer further includes: reading at least a portion of the weight data from the external memory and caching in an internal memory of the computing platform; sequentially reading each block of the input image data of consecutive frames from the external memory; at least a part of the weight data stored in the internal memory is multiplexed, and convolution calculation is sequentially performed with each block of the input image data of consecutive multiframes.
In one possible embodiment, the number of frames of consecutive multiframes is determined according to the storage space of the external memory.
In a second aspect, a computing device of a convolutional neural network is provided, including: the ratio determining module is used for determining a data quantity ratio of the input image data and the weight data aiming at a target convolutional layer of the convolutional neural network; the mode determining module is used for determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
In one possible embodiment, if the data amount ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer includes: reading the weight data and the first block of the input image data from the external memory, thereby performing a convolution calculation based on the weight data and the first block; thereafter, the second block of the input image data is read from the external memory, so that the convolution calculation is performed based on the weight data and the second block.
In one possible implementation, the mode determining module is further configured to: after carrying out convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform; the intermediate results are read from the internal memory and directly participate in the convolution computation of the next layer of the target convolution layer.
In one possible implementation, the mode determining module is further configured to: if the weight data is smaller than the preset threshold value, after the weight data is read from the external memory, caching the weight data in the internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
In one possible embodiment, if the data amount ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and determining the preset processing mode of the target convolution layer includes: reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.
In one possible embodiment, the apparatus is further for: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
In one possible implementation, the preset processing mode of the target convolutional layer further includes: reading at least a portion of the weight data from the external memory and caching in an internal memory of the computing platform; sequentially reading each block of the input image data of consecutive frames from the external memory; at least a part of the weight data stored in the internal memory is multiplexed, and convolution calculation is sequentially performed with each block of the input image data of consecutive multiframes.
In one possible embodiment, the number of frames of consecutive multiframes is determined according to the storage space of the external memory.
In a third aspect, a computing device for a neural network is provided, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network; and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
In a fourth aspect, there is provided a computer readable storage medium storing a program which, when executed by a multi-core processor, causes the multi-core processor to perform a method as in the first aspect.
The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects: the data volume ratio of the input image data and the weight data of the target convolution layer is calculated, and then a preset processing mode suitable for the target convolution layer is determined according to the data volume ratio, so that the access bandwidth requirement on an external memory can be reduced.
It should be understood that the foregoing description is only an overview of the technical solutions of the present invention, so that the technical means of the present invention may be more clearly understood and implemented in accordance with the content of the specification. The following specific embodiments of the present invention are described in order to make the above and other objects, features and advantages of the present invention more comprehensible.
Drawings
The advantages and benefits described herein, as well as other advantages and benefits, will become apparent to those of ordinary skill in the art upon reading the following detailed description of the exemplary embodiments. The drawings are only for purposes of illustrating exemplary embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic diagram of a computing device of a convolutional neural network;
FIG. 2 is a schematic diagram of a convolutional neural network;
FIG. 3 is a flow chart of a method of computing a convolutional neural network according to an embodiment of the invention;
FIG. 4 is a schematic diagram illustrating the splitting of input image data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a preset processing mode according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of another preset processing mode according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computing device of a convolutional neural network according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computing device of a convolutional neural network according to another embodiment of the present invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the present invention, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in the specification, and are not intended to exclude the possibility of the presence of one or more other features, numbers, steps, acts, components, portions, or combinations thereof.
In addition, it should be noted that, without conflict, the embodiments of the present invention and the features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
Fig. 1 is a schematic diagram of a computing device of a convolutional neural network. As shown in fig. 1, the computing unit 11 is disposed on the computing platform 10, and is mainly used for computing a neural network, the internal memory 12 is used for buffering data or results required for intermediate computing, and the internal memory 12 is typically an SRAM, which has a high cost, and a large capacity is not usually adopted, otherwise, the cost of a chip is too high. The external memory 13 generally refers to a DRAM, DDR, NAND FLASH, etc. having relatively low cost, and the external memory 13 has the advantages of low cost, but also has the disadvantages of limited access bandwidth and large access power consumption. Because of the above characteristics of internal and external storage, a chip design generally uses a smaller internal memory and a larger external memory, and data is updated by internal and external data exchange during calculation, so that the data volume of the internal and external data exchange has a great influence on the power consumption of the system.
Fig. 2 is a schematic diagram of a convolutional neural network. As shown in fig. 2, convolutional neural network 200 includes a plurality of convolutional layers. When the convolutional neural network shown in fig. 2 is implemented by using the computing chip shown in fig. 1, a layer-by-layer computing manner is generally adopted. For example, the convolution calculation of the nth layer is performed first, the convolution calculation result of the nth layer is exported to the external memory, and then the convolution calculation result of the nth layer is read in and the convolution calculation of the n+1th layer is performed.
Fig. 3 shows a flow diagram of a method 200 of computing a convolutional neural network, in accordance with an embodiment of the present invention. As shown in fig. 3, the method 300 may include:
step 301: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network;
step 302: and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
Specifically, the target convolution layer may be any one of a plurality of convolution layers, which is not particularly limited in this embodiment. The input image data refers to an input feature map of the target convolution layer, and the weight data is convolution kernel data of the target convolution layer. In the embodiment of the invention, different preset processing modes can be preset for the case of larger data volume of the input image data and the case of larger data volume of the weight data, for example, the calculation of the target convolution layer can be split into block convolution operation for the case of larger data volume of the input image data. Further, by first calculating the data volume ratio of the input image data and the weight data of the target convolution layer, a preset processing mode suitable for the target convolution layer can be determined according to the data volume ratio. For example, when the data amount ratio is too large, selecting a preset processing mode suitable for inputting image data on a large scale as a preset processing mode suitable for the target convolution layer; when the data quantity ratio is too small, selecting a preset processing mode suitable for large-scale weight data as the preset processing mode suitable for the target convolution layer. Thereby, the access bandwidth requirements for the external memory can be reduced.
For example, taking a common VGG model as an example, table 1 shows the weight data, the input image data, and the data size of the output image data of each layer of convolution layers in the VGG model.
Table 1:
Convolution layer index Weight data (KB) Input image data (KB) Output image data (KB)
1 1.69 147.00 3136.00
2 36.00 3136.00 3136.00
3 72.00 784.00 1568.00
4 144.00 1568.00 1568.00
5 288.00 392.00 784.00
6 576.00 784.00 784.00
7 576.00 784.00 784.00
8 1152.00 196.00 392.00
9 2304.00 392.00 392.00
10 2304.00 392.00 392.00
11 2304.00 98.00 98.00
12 2304.00 98.00 98.00
13 2304.00 98.00 98.00
Totalizing 14365.69 8869.00 13230.00
As can be seen from table 1, the input image data and the output image data become smaller gradually with the layer-by-layer convolution calculation, while the weight data portions are opposite. The convolution calculation process is a calculation process of actually gradually reducing the resolution of the feature map and increasing the number of convolved channels. Therefore, when the convolution calculation is performed on the 1 st layer to the 6 th layer, the embodiment of the invention can divide the input image data into blocks and sequentially read each block because the input image data of the 1 st layer to the 6 th layer is larger than the weight data, and continuously execute all calculation processes related to the blocks as far as possible based on the blocks of the input image data read each time, thereby avoiding repeated reading of large-scale input image data. In contrast, when the convolution calculation is performed on the 7 th to 13 th layers, since the input image data of the 7 th to 13 th layers is smaller than the weight data, the weight data can be split into a plurality of portions and read in a divided manner, and based on the partial weight data read each time, the whole calculation process involving the partial weight data is continuously performed as much as possible, thereby avoiding repeated reading of large-scale weight data.
Based on the calculation method of the convolutional neural network in fig. 3, some embodiments of the present application also provide some specific implementations of the method, and the extension schemes, which are described below.
In some possible embodiments, if the data amount ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer includes: reading the weight data and the first block of the input image data from the external memory, thereby performing a convolution calculation based on the weight data and the first block; thereafter, the second block of the input image data is read from the external memory, so that the convolution calculation is performed based on the weight data and the second block.
For example, it is assumed that the data amount ratio of the input feature map and the weight data is greater than the first threshold, for example, if the first threshold is set to 1, the size of the input feature map of the target convolution layer is 224×224×64, and the size of the weight data is 12×12×64×128, and thus the data amount ratio of the input feature map and the weight data is greater than 1. As shown in fig. 4 and 5, the input feature map may be partitioned, for example, into a first partition having a size of 72×72×64, a second partition, …, an nth partition, and the like. Based on this, all the weight data may be read from the external memory, the first block of the input feature map may be read from the external memory, the convolution calculation may be performed using the first block and the weight data to obtain an output result for the first block, then the second block of the input feature map may be read from the external memory, and the above steps may be repeatedly performed, and finally the output feature map of the target convolution layer may be obtained by combining.
Alternatively, the size of each of the above blocks may be determined by the bandwidth information between the computing platform and the external memory, the memory space of the internal memory in the computing platform, and other resources.
Alternatively, the first threshold may be determined by bandwidth information between the computing platform and the external memory, storage space of the internal memory, and so on.
It should be understood that, in this embodiment, by dividing the input feature map into blocks when the input image data is large, only one block is read at a time and performing convolution operation, the size of the internal memory required by the scheduling method is small, and it is not necessary to repeatedly read the input data with a large size, so that the access bandwidth requirement for the external memory can be reduced.
In some possible embodiments, after performing convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform; the intermediate results are read from the internal memory and directly participate in the convolution computation of the next layer of the target convolution layer.
For example, the weight data of the nth layer of the plurality of convolution layers may be read from the external memory, and the first block of the input feature map of the nth layer may be read from the external memory, and the convolution calculation as shown in fig. 5 may be performed using the first block and the weight data of the nth layer to obtain the first block output of the nth layer. And caching the first block output of the N layer in an internal memory of the computing platform, reading weight data of the n+1 layer from an external memory, and performing convolution calculation as shown in fig. 5 by using the first block output of the N layer and the weight data of the n+1 layer to obtain the first block output of the n+1 layer. And reading the second block of the N layer input characteristic diagram from the external memory, and repeatedly executing the steps to finally directly obtain the N+1 layer output characteristic diagram.
It should be understood that, in this embodiment, by taking the block output of the nth layer as the block input of the next layer, which is equivalent to fusing the convolution calculations of several consecutive layers together, and executing the convolution calculations of the blocks based on the fused consecutive layers, the input/output flow of intermediate data is reduced, and the access bandwidth requirement for the external memory is further reduced.
In some possible embodiments, the method 200 further comprises: if the weight data is smaller than the preset threshold value, after the weight data is read from the external memory, caching the weight data in the internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
Specifically, the above-mentioned preset threshold is determined by the space size of the internal memory. For example, as shown in fig. 4, the same weight data is adopted when the convolution operation is performed on the first block, the second block, … and the nth block of the target convolution layer, so if the internal memory of the computing platform is enough to buffer the weight data, one time of reading the weight data can be realized, and the buffered weight data is used for multiple times in the convolution operation on a plurality of blocks, so that the reading is reduced, and the access bandwidth requirement on the external memory is further reduced. Conversely, if the internal memory of the computing platform is insufficient to cache the weight data, the required weight data needs to be repeatedly read from the external memory for the convolution calculation of each block.
In some possible embodiments, if the data amount ratio is less than the second threshold, determining the preset processing mode of the target convolutional layer includes: reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.
Alternatively, the second threshold may be determined by bandwidth information between the computing platform and the external memory, storage space of the internal memory, and so on.
For example, it is assumed that the data amount ratio of the input feature map and the weight data is smaller than the second threshold, for example, if the second threshold is set to 0.8, the size of the input feature map of the target convolutional layer is 224×224×64, the size of the weight data is 12×12×64×1028, and the data amount ratio of the input feature map and the weight data is greater than 0.8. As shown in fig. 6, the weight data of the target convolutional layer may be split into several parts, such as a first part and a second part thereof. Based on this, a first portion of the weight data may be first read from the external memory, and then a first segment of the input feature map may be read from the external memory, and a convolution operation may be performed using the first segment and the first portion of the weight data to obtain a first partial output for the first segment, where the first partial output may be output data corresponding to a partial output channel, such as the first partial output in fig. 6 corresponds to the output channel 1 and the output channel 2. And then, reading a second block of the input feature map from an external memory, performing convolution operation by using the second block and the first part of the weight data to obtain first part output aiming at the second block, and so on, so as to calculate and obtain first part output aiming at each block in the input feature map, and further, combining and obtaining first part output aiming at the input feature map, namely output results corresponding to the output channel 1 and the output channel 2. After the calculation is completed for all of the first portion of the weight data, the second portion of the weight data is read from the external memory, and the above operation is repeatedly performed, resulting in output for the second portion of the input feature map, that is, output results corresponding to the output channels 3 and 4. After the above steps are performed once for each part of the weight data, each part of the input feature map is output and combined to obtain the output feature map of the target convolution layer.
Alternatively, the size of each portion of the weight data may be determined by the bandwidth information between the computing platform and the external memory, the memory space of the internal memory in the computing platform, and other resources.
In the embodiment, when the data size of the weight data is large, only part of the weight data is read each time, and the weight data of other parts are replaced after the related calculation related to the part of the weight data is executed, so that repeated reading of large-scale weight data is avoided, and the access bandwidth requirement on an external memory is further reduced.
In some possible embodiments, the method further comprises: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
Specifically, the four dimensions of the weight data include width, height, number of input channels, and number of output channels. As shown in fig. 6, the present embodiment performs segmentation based on the dimension of the number of output channels to obtain a first portion and at least one second portion of weight data.
In some possible embodiments, the preset processing modes of the target convolutional layer further include: reading at least a portion of the weight data from the external memory and caching in an internal memory of the computing platform; sequentially reading each block of the input image data of consecutive frames from the external memory; at least a part of the weight data stored in the internal memory is multiplexed, and convolution calculation is sequentially performed with each block of the input image data of consecutive multiframes.
For example, as shown in fig. 6, a first portion of the weight data may be first read from the external memory and cached in the internal memory of the computing platform, and then the first block, the second block, …, and the nth block of the first frame input feature map are read from the external memory and convolved one by one to obtain a first portion output for the first frame input feature map. And then, reading the first block, the second block, … and the Nth block of the second frame input feature map from an external memory, and carrying out convolution operation one by one to obtain a first part output aiming at the second frame input feature map. Then, the second portion of the weight data is read from the external memory, and the above steps are repeatedly performed, so that an output feature map for the first frame input feature map and the second frame input feature map can be obtained.
It should be understood that this embodiment necessarily causes a certain degree of delay, and thus the number of frames of continuous multi-frame input image data can be determined from the degree of delay. In the present embodiment, the convolution operation is performed on the multi-frame input image data by multiplexing the weight data, so that the access bandwidth requirement for the external memory can be further reduced.
In some possible embodiments, the number of frames of consecutive multiframes is determined according to the memory space of the external memory.
Specifically, since the memory space of the external memory is limited, the input image data of the above-described consecutive multiframes cannot exceed the memory space of the external memory.
Based on the aspects of the above embodiments, it is possible to select a preset processing mode suitable for a target convolution layer according to a data amount ratio of input image data and weight data, and thereby reduce access bandwidth requirements for an external memory.
Based on the same or similar technical concept, as shown in fig. 7, an embodiment of the present invention further provides a computing device 700 for a convolutional neural network, for executing the computing method for the convolutional neural network shown in fig. 3, where the device 700 includes:
A ratio determining module 701, configured to determine, for a target convolutional layer of a convolutional neural network, a data amount ratio of input image data and weight data;
The mode determining module 702 is configured to determine a preset processing mode of the target convolutional layer according to the data volume ratio, so that the computing platform performs convolutional calculation of the target convolutional layer based on the preset processing mode of the target convolutional layer.
In one possible embodiment, if the data amount ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer includes: reading the weight data and the first block of the input image data from the external memory, thereby performing a convolution calculation based on the weight data and the first block; thereafter, the second block of the input image data is read from the external memory, so that the convolution calculation is performed based on the weight data and the second block.
In one possible implementation, the mode determining module is further configured to: after carrying out convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform; the intermediate results are read from the internal memory and directly participate in the convolution computation of the next layer of the target convolution layer.
In one possible implementation, the mode determining module is further configured to: if the weight data is smaller than the preset threshold value, after the weight data is read from the external memory, caching the weight data in the internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
In one possible embodiment, if the data amount ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and determining the preset processing mode of the target convolution layer includes: reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.
In one possible embodiment, the apparatus is further for: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
In one possible implementation, the preset processing mode of the target convolutional layer further includes: reading at least a portion of the weight data from the external memory and caching in an internal memory of the computing platform; sequentially reading each block of the input image data of consecutive frames from the external memory; at least a part of the weight data stored in the internal memory is multiplexed, and convolution calculation is sequentially performed with each block of the input image data of consecutive multiframes.
In one possible embodiment, the number of frames of consecutive multiframes is determined according to the storage space of the external memory.
Based on the aspects of the above embodiments, it is possible to select a preset processing mode suitable for a target convolution layer according to a data amount ratio of input image data and weight data, and thereby reduce access bandwidth requirements for an external memory.
Fig. 8 is a schematic diagram of a computing device 800 of a convolutional neural network according to an embodiment of the present application, for performing the computing method of the convolutional neural network shown in fig. 3, the device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform:
step 301: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network;
step 302: and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
An embodiment of the present application also provides a computer-readable storage medium storing a program that, when executed by a multi-core processor, causes the multi-core processor to perform:
step 301: determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network;
step 302: and determining a preset processing mode of the target convolution layer according to the data volume ratio, so that the calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for apparatus, devices and computer readable storage medium embodiments, the description thereof is simplified as it is substantially similar to the method embodiments, as relevant points may be found in part in the description of the method embodiments.
The apparatus, the device, and the computer readable storage medium provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the apparatus, the device, and the computer readable storage medium also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the apparatus, the device, and the computer readable storage medium are not repeated here.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (18)

1. A method of computing a convolutional neural network, the method comprising:
Determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network;
determining a preset processing mode of the target convolution layer according to the data volume ratio, so that a calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer;
When the data volume ratio is greater than a first threshold, the preset processing mode includes: reading weight data from an external memory and successively reading each block of the input image data, performing convolution calculation involving the block based on the weight data and the block read each time;
When the data volume ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and the preset processing mode comprises: each block of the input image data is read from an external memory, and each part of the weight data is successively read, and a convolution calculation involving the part weight data is sequentially performed with the each block based on the part weight data read each time.
2. The method of claim 1, wherein determining the predetermined processing pattern for the target convolutional layer if the data size ratio is greater than a first threshold value comprises:
Reading the weight data and a first block of the input image data from an external memory, thereby performing a convolution calculation based on the weight data and the first block; after that, the process is carried out,
And reading a second block of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the second block.
3. The method as recited in claim 2, further comprising:
after carrying out convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform;
The intermediate results are read from internal memory and directly participate in the convolution calculations of the next layer of the target convolution layer.
4. The method as recited in claim 2, further comprising:
If the weight data is smaller than a preset threshold value, after the weight data is read from the external memory, the weight data is cached in the internal memory of the computing platform until the convolution computation of the target convolution layer is finished;
And if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
5. The method of claim 2, wherein if the data amount ratio is less than a second threshold, the second threshold being less than the first threshold, determining the predetermined processing pattern of the target convolutional layer specifically comprises:
Reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and the each block;
After the convolution calculation related to the first part of the weight data is completed, reading the second part of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the second part of the weight data, so as to sequentially perform the convolution calculation based on the second part of the weight data and each block.
6. The method as recited in claim 5, further comprising:
the weight data includes four dimensions, wherein a segmentation is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
7. The method of claim 1, wherein the predetermined processing pattern of the target convolutional layer further comprises:
reading at least a portion of the weight data from an external memory and caching in an internal memory of the computing platform;
sequentially reading each block of the input image data of consecutive frames from the external memory;
multiplexing at least a part of the weight data stored in the internal memory, and sequentially performing convolution calculation with each block of the input image data of the consecutive multiframes.
8. The method of claim 7, wherein the number of frames of the successive multiframes is determined based on a memory space of the external memory.
9. A computing device of a convolutional neural network, the device comprising:
the ratio determining module is used for determining a data quantity ratio of the input image data and the weight data aiming at a target convolutional layer of the convolutional neural network;
The mode determining module is used for determining a preset processing mode of the target convolution layer according to the data volume ratio, so that a calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer;
When the data volume ratio is greater than a first threshold, the preset processing mode includes: reading weight data from an external memory and successively reading each block of the input image data, performing convolution calculation involving the block based on the weight data and the block read each time;
When the data volume ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and the preset processing mode comprises: each block of the input image data is read from an external memory, and each part of the weight data is successively read, and a convolution calculation involving the part weight data is sequentially performed with the each block based on the part weight data read each time.
10. The apparatus of claim 9, wherein determining the predetermined processing pattern for the target convolutional layer if the data amount ratio is greater than a first threshold value comprises:
Reading the weight data and a first block of the input image data from an external memory, thereby performing a convolution calculation based on the weight data and the first block; after that, the process is carried out,
And reading a second block of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the second block.
11. The apparatus of claim 10, wherein the mode determination module is further configured to:
after carrying out convolution calculation based on the weight data and the first block, obtaining an intermediate result of the target convolution layer corresponding to the first block, and storing the intermediate result in an internal memory of the computing platform;
The intermediate results are read from internal memory and directly participate in the convolution calculations of the next layer of the target convolution layer.
12. The apparatus of claim 10, wherein the mode determination module is further configured to:
If the weight data is smaller than a preset threshold value, after the weight data is read from the external memory, the weight data is cached in the internal memory of the computing platform until the convolution computation of the target convolution layer is finished;
And if the weight data is larger than the preset threshold value, repeating the operation of reading the weight data from the external memory.
13. The apparatus of claim 10, wherein if the data amount ratio is less than a second threshold, the second threshold being less than the first threshold, determining the predetermined processing pattern of the target convolutional layer comprises:
Reading a first portion of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing a convolution calculation based on the first portion of the weight data and the each block;
After the convolution calculation related to the first part of the weight data is completed, reading the second part of the weight data from the external memory, and sequentially reading each block of the input image data from the external memory for the second part of the weight data, so as to sequentially perform the convolution calculation based on the second part of the weight data and each block.
14. The apparatus of claim 13, wherein the apparatus is further configured to:
the weight data includes four dimensions, wherein a segmentation is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.
15. The apparatus of claim 9, wherein the predetermined processing pattern of the target convolutional layer further comprises:
reading at least a portion of the weight data from an external memory and caching in an internal memory of the computing platform;
sequentially reading each block of the input image data of consecutive frames from the external memory;
multiplexing at least a part of the weight data stored in the internal memory, and sequentially performing convolution calculation with each block of the input image data of the consecutive multiframes.
16. The apparatus of claim 15, wherein the number of frames of the successive multiframes is determined based on a memory space of the external memory.
17. A computing device of a neural network, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform:
Determining a data amount ratio of input image data and weight data for a target convolutional layer of the convolutional neural network;
determining a preset processing mode of the target convolution layer according to the data volume ratio, so that a calculation platform carries out convolution calculation of the target convolution layer based on the preset processing mode of the target convolution layer;
When the data volume ratio is greater than a first threshold, the preset processing mode includes: reading weight data from an external memory and successively reading each block of the input image data, performing convolution calculation involving the block based on the weight data and the block read each time;
When the data volume ratio is smaller than a second threshold, the second threshold is smaller than the first threshold, and the preset processing mode comprises: each block of the input image data is read from an external memory, and each part of the weight data is successively read, and a convolution calculation involving the part weight data is sequentially performed with the each block based on the part weight data read each time.
18. A computer readable storage medium storing a program which, when executed by a multi-core processor, causes the multi-core processor to perform the method of any of claims 1-8.
CN201911376391.6A 2019-12-27 2019-12-27 Convolutional neural network technique method, device and computer readable storage medium Active CN113052292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911376391.6A CN113052292B (en) 2019-12-27 2019-12-27 Convolutional neural network technique method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911376391.6A CN113052292B (en) 2019-12-27 2019-12-27 Convolutional neural network technique method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113052292A CN113052292A (en) 2021-06-29
CN113052292B true CN113052292B (en) 2024-06-04

Family

ID=76506479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911376391.6A Active CN113052292B (en) 2019-12-27 2019-12-27 Convolutional neural network technique method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113052292B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024013892A1 (en) * 2022-07-13 2024-01-18 日本電信電話株式会社 Convolutional neural network inference processing device, convolutional neural network inference processing method, and convolutional neural network inference processing program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN107437110A (en) * 2017-07-11 2017-12-05 中国科学院自动化研究所 The piecemeal convolution optimization method and device of convolutional neural networks
CN109791628A (en) * 2017-12-29 2019-05-21 清华大学 Neural network model splits' positions method, training method, computing device and system
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN110188660A (en) * 2019-05-27 2019-08-30 北京字节跳动网络技术有限公司 The method and apparatus at age for identification
CN110490310A (en) * 2018-05-14 2019-11-22 北京深鉴智能科技有限公司 Neural Network Data compression and its Related Computational Methods and device
CN110555847A (en) * 2019-07-31 2019-12-10 瀚博半导体(上海)有限公司 Image processing method and device based on convolutional neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018071546A1 (en) * 2016-10-11 2018-04-19 The Research Foundation For The State University Of New York System, method, and accelerator to process convolutional neural network layers

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239829A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of method of optimized artificial neural network
CN107437110A (en) * 2017-07-11 2017-12-05 中国科学院自动化研究所 The piecemeal convolution optimization method and device of convolutional neural networks
CN109791628A (en) * 2017-12-29 2019-05-21 清华大学 Neural network model splits' positions method, training method, computing device and system
CN110490310A (en) * 2018-05-14 2019-11-22 北京深鉴智能科技有限公司 Neural Network Data compression and its Related Computational Methods and device
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN110188660A (en) * 2019-05-27 2019-08-30 北京字节跳动网络技术有限公司 The method and apparatus at age for identification
CN110555847A (en) * 2019-07-31 2019-12-10 瀚博半导体(上海)有限公司 Image processing method and device based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于可编程逻辑器件的卷积神经网络协处理器设计;杨一晨;梁峰;张国和;何平;吴斌;高震霆;;《西安交通大学学报》;第52卷(第7期);第153-159页 *

Also Published As

Publication number Publication date
CN113052292A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US20190197083A1 (en) Method and electronic device for convolution calculation in neutral network
CN109919311B (en) Method for generating instruction sequence, method and device for executing neural network operation
EP3985509A1 (en) Neural network segmentation method, prediction method, and related apparatus
CN111401518B (en) Neural network quantization method, device and computer readable storage medium
CN112668708B (en) Convolution operation device for improving data utilization rate
CN110413539B (en) Data processing method and device
CN109146065B (en) Convolution operation method and device for two-dimensional data
CN111008701A (en) Data quantization method and device based on neural network and computer readable storage medium
CN113032007A (en) Data processing method and device
CN113052292B (en) Convolutional neural network technique method, device and computer readable storage medium
CN112200310A (en) Intelligent processor, data processing method and storage medium
CN115034351A (en) Data processing method, convolutional neural network training method and device and FPGA
CN102567243A (en) Storage device and refreshing method for same
CN110390392B (en) Convolution parameter accelerating device based on FPGA and data reading and writing method
CN118012631B (en) Operator execution method, processing device, storage medium and program product
CN113888390A (en) Feature map processing method and device, electronic equipment and computer readable medium
CN111913744B (en) AI deep learning data processing method and system
CN112308762B (en) Data processing method and device
CN114090470B (en) Data preloading device and preloading method thereof, storage medium and computer equipment
CN114091085B (en) Data access control system for binary operation and method thereof
CN113962361B (en) Winograd-based CNN accelerator system data conflict-free scheduling method
CN116303108B (en) Weight address arrangement method suitable for parallel computing architecture
CN109344093B (en) Cache structure, and method and device for reading and writing data
US20240231661A9 (en) Data storage
CN102073604A (en) Method, device and system for controlling read and write of synchronous dynamic memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240506

Address after: Room 101, 20th Floor, Building 1, Zone 1, No. 81 Beiqing Road, Haidian District, Beijing, 100094

Applicant after: Beijing Sisheng Technology Co.,Ltd.

Country or region after: China

Address before: Room 206, 2 / F, building C, phase I, Zhongguancun Software Park, No. 8, Dongbei Wangxi Road, Haidian District, Beijing 100094

Applicant before: Canaan Bright Sight Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant