WO2021031351A1 - 一种数据处理***、方法及介质 - Google Patents

一种数据处理***、方法及介质 Download PDF

Info

Publication number
WO2021031351A1
WO2021031351A1 PCT/CN2019/114538 CN2019114538W WO2021031351A1 WO 2021031351 A1 WO2021031351 A1 WO 2021031351A1 CN 2019114538 W CN2019114538 W CN 2019114538W WO 2021031351 A1 WO2021031351 A1 WO 2021031351A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
convolution
arithmetic unit
convolution operation
register set
Prior art date
Application number
PCT/CN2019/114538
Other languages
English (en)
French (fr)
Inventor
董刚
赵雅倩
李仁刚
杨宏斌
刘海威
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2021031351A1 publication Critical patent/WO2021031351A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit

Definitions

  • This application relates to the field of data processing technology, in particular to a data processing system, method and medium.
  • CNN Convolutional Neural Network
  • the input layer is to input the data to be tested into the CNN network.
  • the convolutional layer is where neurons are connected to a local area in the input layer. Each neuron calculates the inner product of the small area connected to the input layer and its own weight, which is the layer where the convolution kernel is operated.
  • the convolutional layer occupies the absolute main body in the CNN network structure, and its computing speed and efficiency directly determine the performance of the CNN network. Therefore, how to efficiently and quickly perform large-scale convolution calculations, especially two-dimensional and three-dimensional convolution calculations, is a very important part of the CNN network architecture.
  • the data processing of the convolutional layer basically follows the structure of storing first or caching first and then calculating. In this way, the cost of calculation time is relatively large, and the speed of convolution calculation is relatively slow.
  • the purpose of the present application is to provide a data processing system, method, and medium that can allow data storage and calculation to be performed simultaneously, thereby increasing the calculation speed of convolution.
  • the specific plan is as follows:
  • this application discloses a data processing system, including a convolution operation module, the convolution operation module includes a first register group, a second register group, and the first register group and the second register Group connected arithmetic units; among them,
  • the first register set is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit, and perform the current round of convolution in the arithmetic unit In the process of product operation, if the data to be processed has not been processed yet, acquiring the next round of data to be convolved in the data to be processed and transmitting to the arithmetic unit;
  • the second register set is used to obtain convolution kernel data and transmit the currently obtained convolution kernel data to the arithmetic unit;
  • the arithmetic unit is used to obtain each round of convolution data transmitted by the first register set, and obtain the convolution kernel data transmitted by the second register set, and calculate the sum of the obtained data for each round of convolution.
  • the convolution kernel data are respectively subjected to convolution operation to obtain the corresponding convolution operation result.
  • Optional also includes:
  • the control switch is used to control the flow of the data to be convolved and the convolution kernel data in the convolution operation module.
  • control switch is used to control the data to be convolved to flow between the first register set or flow from the first register set to the arithmetic unit.
  • control switch is used to control the flow of the convolution kernel data between the second register set or from the second register set to the arithmetic unit.
  • Optional also includes:
  • a parameter configuration module for configuring parameters; wherein the parameters include the size of the convolution kernel, the weight of the convolution kernel, and the switch state of the control switch;
  • the convolution operation module structure adjustment module is configured to adjust the structure of the convolution operation module according to the parameters to perform convolution operations of different sizes between different convolution layers.
  • the data processing system includes a plurality of the convolution operation modules for parallel processing of different data to be convolved in the data to be processed.
  • this application discloses a data processing method applied to a data processing system; wherein the data processing system includes a convolution operation module, and the convolution operation module includes a first register set, a second register set, and An arithmetic unit connected to the first register set and the second register set; wherein, the data processing method includes:
  • the present application discloses a computer-readable storage medium for storing a computer program, wherein the computer program is executed by a processor to implement the aforementioned data processing method.
  • the data processing system disclosed in this application includes a convolution operation module that includes a first register set, a second register set, and operations connected to the first register set and the second register set
  • the first register set is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit, and in the arithmetic unit
  • the next round of data to be convolved in the data to be processed is acquired and transmitted to the arithmetic unit
  • the second register Group for acquiring convolution kernel data and transmitting the currently acquired convolution kernel data to the arithmetic unit
  • the arithmetic unit for acquiring each round of convolution data to be transmitted by the first register group, And acquiring the convolution kernel data transmitted by the second register set, and respectively performing convolution operations on the acquired round of convolution data and convolution kernel data to obtain corresponding convolution operation results.
  • the arithmetic unit performs the current round of convolution operation
  • the next round of unprocessed data to be convolved is obtained through the first register group and transmitted to the arithmetic unit
  • the arithmetic unit outputs the current round of convolution data
  • the result of the convolution operation will continue to the next round of convolution operation, so that the storage and calculation of data can be carried out at the same time, and the convolution operation is carried out during the data transmission process, so that the data transmission time and the convolution operation time are consistent, thus Improved the speed of convolution operation.
  • Figure 1 is a schematic structural diagram of a data processing system disclosed in this application.
  • FIG. 2 is a schematic diagram of a part of the structure of a two-dimensional convolution operation module disclosed in this application;
  • FIG. 3 is a schematic diagram of a partial structure of a three-dimensional convolution operation module disclosed in this application;
  • FIG. 4 is a schematic diagram of the structure of a four-dimensional convolution operation module disclosed in this application.
  • FIG. 5 is a schematic diagram of the structure of a five-dimensional convolution operation module disclosed in this application.
  • FIG. 6 is a schematic diagram of a specific three-dimensional convolution operation flow diagram disclosed in this application.
  • FIG. 7 is a schematic diagram of part of the structure of a specific data processing system disclosed in this application.
  • the data processing of the convolutional layer basically follows the structure of storing first or caching first and then calculating. In this way, the cost of calculation time is relatively large, and the speed of convolution calculation is relatively slow.
  • the present application provides a data processing system, which enables data storage and calculation to be performed at the same time, thereby increasing the operation speed of convolution.
  • an embodiment of the present application discloses a data processing system, including a convolution operation module.
  • the convolution operation module includes a first register set 11, a second register set 12, and the first register set. 11 and the arithmetic unit 13 connected to the second register set 12; wherein,
  • the first register set 11 is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit 13, and perform processing in the arithmetic unit 13 In the process of this round of convolution operation, if the data to be processed has not been processed yet, acquire the next round of data to be convolved in the data to be processed and transmit it to the arithmetic unit 13;
  • the second register set 12 is used to acquire convolution kernel data and transmit the currently acquired convolution kernel data to the arithmetic unit 13;
  • the arithmetic unit 13 is configured to obtain each round of convolution data transmitted by the first register set 11, and obtain the convolution kernel data transmitted by the second register set 12, and perform calculations on the obtained round of convolution data.
  • the convolution data and the convolution kernel data are respectively subjected to convolution operations to obtain corresponding convolution operation results.
  • the data processing system disclosed in the embodiment of the present application includes a convolution operation module.
  • the convolution operation module includes a first register group, a second register group, and is connected to the first register group and the second register group.
  • the arithmetic unit wherein the first register set is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit, and in the In the process of the arithmetic unit performing this round of convolution operation, if the data to be processed has not been processed yet, the next round of data to be convolved in the to-be-processed data is acquired and transmitted to the arithmetic unit;
  • the second register set is used to obtain convolution core data and transmit the currently obtained convolution core data to the arithmetic unit; the arithmetic unit is used to obtain each round of convolution to be transmitted by the first register set
  • the data, and the convolution kernel data transmitted by the second register group are acquired, and each
  • the arithmetic unit performs the current round of convolution operation
  • the next round of unprocessed data to be convolved is obtained through the first register group and transmitted to the arithmetic unit
  • the arithmetic unit outputs the current round of convolution data
  • the result of the convolution operation will continue to the next round of convolution operation, so that the storage and calculation of data can be carried out at the same time, and the convolution operation is carried out during the data transmission process, so that the data transmission time and the convolution operation time are consistent, thus Improved the speed of convolution operation.
  • FIG. 2 is a partial structural diagram of a two-dimensional convolution operation module disclosed in an embodiment of this application.
  • the first register group includes all registers 1 in the two-dimensional convolution operation module
  • the second register group includes all registers 2 in the two-dimensional convolution operation module.
  • this embodiment can extend the structure of FIG. 2 to a three-dimensional and higher-dimensional structure.
  • FIG. 3 is a partial structural diagram of a three-dimensional convolution operation module disclosed in an embodiment of the application.
  • the first register group includes all registers 1 in the three-dimensional convolution operation module
  • the second register group includes all registers 2 in the three-dimensional convolution operation module.
  • FIG. 4 is a schematic structural diagram of a four-dimensional convolution operation module disclosed in an embodiment of the application. Specifically, the expansion is based on the three-dimensional convolution operation module, and a series of three-dimensional convolution operation modules, that is, a 1 ⁇ 3 three-dimensional convolution operation module, is expanded to form a four-dimensional convolution operation module.
  • FIG. 5 is a schematic structural diagram of a five-dimensional convolution operation module disclosed in an embodiment of the application. Specifically, the expansion is based on the three-dimensional convolution operation module, and a three-dimensional convolution operation module is expanded, that is, a 3 ⁇ 3 three-dimensional convolution operation module, which constitutes a five-dimensional convolution operation module.
  • this embodiment may also include a control switch for controlling the flow of the data to be convolved and the convolution kernel data in the convolution operation module.
  • the control switch is used to control the data to be convolved to flow between the first register set or from the first register set to the arithmetic unit, and to control the convolution kernel data to flow in the The flow between the second register set or from the second register set to the arithmetic unit. That is, the present embodiment arranges between the registers of the first register group, between the registers of the first register group and the operator, between the registers of the second register group, and between the registers of the second register group and the operator. A large number of control switches are used to control the flow of data to meet the needs of different convolution operations.
  • This embodiment may also include a parameter configuration module and a convolution operation module structure adjustment module.
  • the parameter configuration module is used to configure parameters; wherein the parameters include the size of the convolution kernel, the weight of the convolution kernel, and the switch state of the control switch.
  • the convolution operation module structure adjustment module is configured to adjust the structure of the convolution operation module according to the parameters to perform convolution operations of different sizes between different convolution layers.
  • the data processing system disclosed in the present embodiment can form an accumulation of several values under the control of preset parameters through the control switch, thereby obtaining a round of convolution kernel and the current round of convolution The result of the data convolution operation.
  • FIG. 6 is a schematic diagram of a specific three-dimensional convolution operation flow diagram disclosed in an embodiment of the application.
  • a convolution kernel size of 3 ⁇ 3 ⁇ 3 3 ⁇ 3 data to be convolved into the first register group and 3 ⁇ 3 convolution kernel data into the second register group are simultaneously input to the first register group every clock cycle.
  • the arithmetic unit in the convolution operation module performs multiplication and addition operations to obtain the operation results of the data in its adjacent registers.
  • the data to be processed may be image data obtained by collection devices such as cameras or sensors, which are sent to the convolution operation module to perform the required convolution operation, and according to the obtained operation result
  • Image classification or recognition is mainly used in fields such as face recognition and autonomous driving.
  • the convolution operation module can be implemented in a hardware device such as a CPU or FPGA.
  • the convolution operation module is composed of a cross-combination of registers and arithmetic units. The use of registers can flexibly implement simultaneous processing of any number of data.
  • the read and write operation avoids the situation where there are fixed read and write port widths and read and write timing restrictions in RAM and IP in FPGA.
  • an embodiment of the present application discloses a specific data processing system, including a plurality of convolution operation modules, the convolution operation module includes a first register group 21, a second register group 22, and The first register group 21 and the second register group are connected to the arithmetic unit 23 of 22; wherein,
  • the first register set 21 is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit 23, and perform the processing in the arithmetic unit 23 In the process of this round of convolution operation, if the data to be processed has not been processed yet, the next round of data to be convolved in the data to be processed is acquired and transmitted to the arithmetic unit 23;
  • the second register group 22 is used to obtain convolution kernel data and transmit the currently acquired convolution kernel data to the arithmetic unit 23;
  • the arithmetic unit 23 is configured to obtain each round of convolution data transmitted by the first register set 21, and obtain the convolution kernel data transmitted by the second register set 22, and perform calculations on the obtained round of convolution data.
  • the convolution data and the convolution kernel data are respectively subjected to convolution operations to obtain corresponding convolution operation results.
  • the multiple convolution operation modules in this embodiment are used to process different data to be convolved in the data to be processed in parallel.
  • the specific scale of the convolution operation module is a typical value based on experience.
  • the convolution operation module can be arbitrarily extended to improve data The parallelism of processing speeds up the calculation. For example, using multiple 3D convolution operation modules to calculate several 3D convolutions at the same time.
  • the embodiment of the application discloses a data processing method applied to a data processing system; wherein, the data processing system includes a convolution operation module, and the convolution operation module includes a first register group, a second register group, and The arithmetic unit connected to the first register set and the second register set; wherein, the data processing method includes:
  • the data processing method disclosed in the embodiment of the present application is applied to a data processing system, wherein the data processing system includes a convolution operation module, and the convolution operation module includes a first register group, a second register group, and An arithmetic unit connected to the first register set and the second register set; wherein, the data processing method includes: obtaining the current round of convolution data in the to-be-processed data through the first register set, and then combining the obtained The current round of convolution data is transmitted to the arithmetic unit, and in the process of the current round of convolution operation by the arithmetic unit, if the processing of the to-be-processed data has not been completed, the next data in the to-be-processed data is acquired One round of data to be convolved and transmitted to the arithmetic unit; obtain convolution kernel data through the second register set, and transmit the currently acquired convolution kernel data to the arithmetic unit; obtain through the arithmetic unit Each round of data
  • the arithmetic unit performs the current round of convolution operation
  • the next round of unprocessed data to be convolved is obtained through the first register group and transmitted to the arithmetic unit
  • the arithmetic unit outputs the current round of convolution data
  • the result of the convolution operation will continue to the next round of convolution operation, so that the storage and calculation of data can be carried out at the same time, and the convolution operation is carried out during the data transmission process, so that the data transmission time and the convolution operation time are consistent, thus Improved the speed of convolution operation.
  • a control switch may also be used to control the flow of the data to be convolved and the convolution kernel data in the convolution operation module.
  • the data to be convolved may be controlled to flow between the first register set or flow from the first register set to the arithmetic unit
  • the convolution kernel data may be controlled to flow between the second register set Or flow from the second register group to the arithmetic unit.
  • this embodiment can also configure parameters; where the parameters include the size of the convolution kernel, the weight of the convolution kernel, and the switch state of the control switch, and the structure of the convolution operation module is adjusted according to the parameters to perform Convolution operations of different sizes between different convolution layers.
  • multiple convolution operation modules can be used to process different data to be convolved in the data to be processed in parallel.
  • This embodiment discloses a computer-readable storage medium for storing a computer program, where the computer program implements the following data processing method when executed by a processor, and is applied to a data processing system; wherein, the data processing system It includes a convolution operation module.
  • the convolution operation module includes a first register set, a second register set, and an operator connected to the first register set and the second register set; wherein, the data processing method includes :
  • the data processing system disclosed in the embodiment of the present application includes a convolution operation module.
  • the convolution operation module includes a first register group, a second register group, and is connected to the first register group and the second register group.
  • the arithmetic unit wherein the first register set is used to obtain the current round of convolution data in the data to be processed, and then transmit the obtained current round of convolution data to the arithmetic unit, and in the In the process of the arithmetic unit performing this round of convolution operation, if the data to be processed has not been processed yet, the next round of data to be convolved in the to-be-processed data is acquired and transmitted to the arithmetic unit;
  • the second register set is used to obtain convolution core data and transmit the currently obtained convolution core data to the arithmetic unit; the arithmetic unit is used to obtain each round of convolution to be transmitted by the first register set
  • the data, and the convolution kernel data transmitted by the second register group are acquired, and each
  • the arithmetic unit performs the current round of convolution operation
  • the next round of unprocessed data to be convolved is obtained through the first register group and transmitted to the arithmetic unit
  • the arithmetic unit outputs the current round of convolution data
  • the result of the convolution operation will continue to the next round of convolution operation, so that the storage and calculation of data can be carried out at the same time, and the convolution operation is carried out during the data transmission process, so that the data transmission time and the convolution operation time are consistent, thus Improved the speed of convolution operation.
  • the following steps can be specifically implemented: controlling the flow of the data to be convolved between the first register set or from the The first register set flows to the arithmetic unit
  • the following steps can be specifically implemented: controlling the flow of the convolution kernel data between the second register group or from the second register group
  • the second register group flows to the arithmetic unit.
  • configuration parameters wherein, the parameters include convolution kernel size, convolution kernel weight, and control The switch state of the switch.
  • the following steps can be specifically implemented: adjusting the structure of the convolution operation module according to the parameters to perform different convolution layers Convolution operations between different sizes.
  • the steps of the method or algorithm described in combination with the embodiments disclosed herein can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Complex Calculations (AREA)

Abstract

一种数据处理***、方法及介质,包括卷积运算模块,卷积运算模块包括第一寄存器组、第二寄存器组以及与第一寄存器组和第二寄存器组连接的运算器;第一寄存器组用于获取待处理数据中的本轮待卷积数据,将获取到的本轮待卷积数据传输至运算器,并且在运算器进行本轮卷积运算的过程中,如果待处理数据还未处理完毕,则获取待处理数据中的下一轮待卷积数据并传输至运算器;第二寄存器组用于获取卷积核数据,并将当前获取到的卷积核数据传输至运算器;运算器用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。

Description

一种数据处理***、方法及介质
本申请要求于2019年08月16日提交中国专利局、申请号为201910760143.5、发明名称为“一种数据处理***、方法及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别涉及一种数据处理***、方法及介质。
背景技术
目前,深度学习方面的研究主要以CNN(即Convolutional Neural Network,卷积神经网络)为研究对象。由于处理场景的不同,对CNN的性能要求也不相同,从而发展出多种网络结构。但是CNN的基本组成是固定的,分别为输入层、卷积层、激活层、池化层和全连接层。其中输入层是将待测数据输入到CNN网络中。卷积层是神经元与输入层中的一个局部区域相连,每个神经元都计算自己与输入层相连的小区域与自己权重的内积,也就是进行卷积核运算的一层。卷积层在CNN网络结构中占据了绝对的主体,其运算速度与效率直接决定了CNN网络的性能。因此如何能够高效、快速地进行大批量卷积计算,尤其是二维、三维卷积计算,是CNN网络架构中非常重要的一个部分。
在现有技术中,卷积层的数据处理基本都是遵循先存储或先缓存后计算的结构,这样计算时间上的开销较大,卷积计算的速度较慢。
发明内容
有鉴于此,本申请的目的在于提供一种数据处理***、方法及介质,能够让数据的存储和计算同时进行,从而提高卷积的运算速度。其具体方案如下:
第一方面,本申请公开了一种数据处理***,包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器 组和所述第二寄存器组连接的运算器;其中,
所述第一寄存器组,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
所述第二寄存器组,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
所述运算器,用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
可选的,还包括:
控制开关,用于控制所述待卷积数据和所述卷积核数据在所述卷积运算模块中的流向。
可选的,所述控制开关用于控制所述待卷积数据在所述第一寄存器组间流动或从所述第一寄存器组流向所述运算器。
可选的,所述控制开关用于控制所述卷积核数据在所述第二寄存器组间流动或从所述第二寄存器组流向所述运算器。
可选的,还包括:
参数配置模块,用于配置参数;其中,所述参数包括卷积核尺寸、卷积核权重以及控制开关的开关状态;
卷积运算模块结构调整模块,用于根据所述参数调整所述卷积运算模块的结构,以进行不同卷积层之间的不同尺寸的卷积运算。
可选的,所述数据处理***,包括多个所述卷积运算模块,用于并行处理所述待处理数据中的不同待卷积数据。
第二方面,本申请公开了一种数据处理方法,应用于数据处理***;其中,所述数据处理***包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述数据处理方法包括:
通过所述第一寄存器组获取待处理数据中的本轮待卷积数据,然后将 获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
通过所述第二寄存器组获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
通过所述运算器获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
第三方面,本申请公开了一种计算机可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述的数据处理方法。
可见,本申请公开的数据处理***,包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述第一寄存器组,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;所述第二寄存器组,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;所述运算器,用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。也即,在运算器进行本轮卷积运算的过程中,通过第一寄存器组获取未处理完毕的待处理数据中的下一轮待卷积数据并传输至运算器,当运算器输出本轮的卷积运算结果,继续进行下一轮卷积运算,这样,能够让数据的存储和计算同时进行,在数据的传输过程中进卷积运算,使数据传输时间和卷积运算时间一致,从而提升了卷积运算的速度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地, 下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种数据处理***结构示意图;
图2为本申请公开的一种二维卷积运算模块部分结构示意图;
图3为本申请公开的一种三维卷积运算模块部分结构示意图;
图4为本申请公开的一种四维卷积运算模块结构示意图;
图5为本申请公开的一种五维卷积运算模块结构示意图;
图6为本申请公开的一种具体的三维卷积运算操作流程示意图;
图7为本申请公开的一种具体的数据处理***部分结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在现有技术中,卷积层的数据处理基本都是遵循先存储或先缓存后计算的结构,这样计算时间上的开销较大,卷积计算的速度较慢。为此,本申请提供了一种数据处理***,能够让数据的存储和计算同时进行,从而提高卷积的运算速度。
参见图1所示,本申请实施例公开了一种数据处理***,包括卷积运算模块,所述卷积运算模块包括第一寄存器组11、第二寄存器组12以及与所述第一寄存器组11和所述第二寄存器组12连接的运算器13;其中,
所述第一寄存器组11,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器13,并且在所述运算器13进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器13;
所述第二寄存器组12,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器13;
所述运算器13,用于获取所述第一寄存器组11传输的每轮待卷积数据,以及获取所述第二寄存器组12传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
可见,本申请实施例公开的数据处理***,包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述第一寄存器组,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;所述第二寄存器组,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;所述运算器,用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。也即,在运算器进行本轮卷积运算的过程中,通过第一寄存器组获取未处理完毕的待处理数据中的下一轮待卷积数据并传输至运算器,当运算器输出本轮的卷积运算结果,继续进行下一轮卷积运算,这样,能够让数据的存储和计算同时进行,在数据的传输过程中进卷积运算,使数据传输时间和卷积运算时间一致,从而提升了卷积运算的速度。
例如,参见图2所示,图2为本申请实施例公开的一种二维卷积运算模块部分结构示意图。具体的,所述第一寄存器组包括二维卷积运算模块中的全部寄存器1,所述第二寄存器组包括二维卷积运算模块中的全部寄存器2。
可以理解的是,本实施例可以将图2的结构进行扩展,扩展为三维以及更高维度的结构。
例如,参见图3所示,图3为本申请实施例公开的一种三维卷积运算模块部分结构示意图。具体的,所述第一寄存器组包括三维卷积运算模块中的全部寄存器1,所述第二寄存器组包括三维卷积运算模块中的全部寄存器2。
参见图4所示,图4为本申请实施例公开的一种四维卷积运算模块结构示意图。具体的,以三维卷积运算模块为基础进行扩展,扩展出一列三维卷积运算模块,即1×3的三维卷积运算模块,构成四维卷积运算模块。
参见图5所示,图5为本申请实施例公开的一种五维卷积运算模块结构示意图。具体的,以三维卷积运算模块为基础进行扩展,扩展出一面三维卷积运算模块,即3×3的三维卷积运算模块,构成五维卷积运算模块。
进一步的,本实施例还可以包括控制开关,用于控制所述待卷积数据和所述卷积核数据在所述卷积运算模块中的流向。具体的,所述控制开关用于控制所述待卷积数据在所述第一寄存器组间流动或从所述第一寄存器组流向所述运算器,并且,控制所述卷积核数据在所述第二寄存器组间流动或从所述第二寄存器组流向所述运算器。也即,本实施例在第一寄存器组的寄存器之间、第一寄存器组的寄存器与运算器之间、第二寄存器组的寄存器之间以及第二寄存器组的寄存器与运算器之间布置了大量的控制开关,用于控制数据流向,以满足不同卷积运算的需求。
本实施例还可以包括参数配置模块和卷积运算模块结构调整模块。具体的,所述参数配置模块,用于配置参数;其中,所述参数包括卷积核尺寸、卷积核权重以及控制开关的开关状态。所述卷积运算模块结构调整模块,用于根据所述参数调整所述卷积运算模块的结构,以进行不同卷积层之间的不同尺寸的卷积运算。
在具体的实施方式中,本实施公开的所述数据处理***,通过所述控制开关,可以在预设参数的控制下形成若干数值的累加,从而得到一轮卷积核与本轮待卷积数据的卷积运算结果。
例如,参见图6所示,图6为本申请实施例公开的一种具体的三维卷积运算操作流程示意图。将待卷积数据输入卷积运算模块的第一寄存器组,将卷积核数据输入卷积运算模块的第二寄存器组。以卷积核尺寸3×3×3为例,每个时钟周期同时向第一寄存器组输入3×3个待卷积数据以及向第二寄存器组输入3×3个卷积核数据。则在三个时钟周期后,卷积运算模块中的运算器进行乘加运算,得到其相邻寄存器内数据的运算结果。而在接下来的三个时钟周期内,卷积运算模块中的寄存器里的数据全部更换,运算 器内的中间结果也从卷积运算模块下方输出,并再次对相邻的寄存器内部数据进行新一轮的运算,从而实现了连续不断地三维卷积运算操作。
在一些具体的实施方式中,所述待处理数据可以为由例如摄像头或者传感器等采集设备所得到的图像数据,送入该卷积运算模块,进行所需要的卷积运算,根据得到的运算结果对图像分类或者识别,主要应用于人脸识别、自动驾驶等领域内。并且,所述卷积运算模块可以在CPU或者FPGA的硬件设备中实现,其中,所述卷积运算模块由寄存器和运算器交叉组合而成,使用寄存器可以灵活地实现对任意数量的数据同时进行读写操作,避免了出现类似FPGA中RAM、IP等存在固定的读写端口宽度和读写时序限制的情况。
参见图7所示,本申请实施例公开了一种具体的数据处理***,包括多个卷积运算模块,所述卷积运算模块包括第一寄存器组21、第二寄存器组22以及与所述第一寄存器组21和所述第二寄存器组连接22的运算器23;其中,
所述第一寄存器组21,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器23,并且在所述运算器23进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器23;
所述第二寄存器组22,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器23;
所述运算器23,用于获取所述第一寄存器组21传输的每轮待卷积数据,以及获取所述第二寄存器组22传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
在具体的实施方式中,本实施例的多个所述卷积运算模块,用于并行处理所述待处理数据中的不同待卷积数据。可以理解的是,本实施例中,所述卷积运算模块的具体规模是根据经验总结得来的典型数值,在硬件资源充足的情况下可以对所述卷积运算模块进行任意扩展,提高数据处理的并行度,加快计算的速度。例如,利用多个三维卷积运算模块,同时计算 若干个三维卷积。
本申请实施例公开了一种数据处理方法,应用于数据处理***;其中,所述数据处理***包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述数据处理方法包括:
通过所述第一寄存器组获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
通过所述第二寄存器组获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
通过所述运算器获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
可见,本申请实施例公开数据处理方法,应用于数据处理***,其中,所述数据处理***包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述数据处理方法包括:通过所述第一寄存器组获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;通过所述第二寄存器组获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;通过所述运算器获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。也即,在运算器进行本轮卷积运算的过程中,通过第一寄存器组获取未处理完毕的待处理数据中的下一轮待卷积数据并传输至运算器,当运算器输出本轮的卷积运算结果,继续进行下一轮卷积运算,这 样,能够让数据的存储和计算同时进行,在数据的传输过程中进卷积运算,使数据传输时间和卷积运算时间一致,从而提升了卷积运算的速度。
在具体的实施方式中,本实施例还可以利用控制开关控制所述待卷积数据和所述卷积核数据在所述卷积运算模块中的流向。具体的,可以控制所述待卷积数据在所述第一寄存器组间流动或从所述第一寄存器组流向所述运算器,控制所述卷积核数据在所述第二寄存器组间流动或从所述第二寄存器组流向所述运算器。
具体的,本实施例还可以配置参数;其中,所述参数包括卷积核尺寸、卷积核权重以及控制开关的开关状态,并根据所述参数调整所述卷积运算模块的结构,以进行不同卷积层之间的不同尺寸的卷积运算。
进一步的,本实施例可以利用多个所述卷积运算模块并行处理所述待处理数据中的不同待卷积数据。
本实施实施例公开了一种计算机可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现以下数据处理方法,应用于数据处理***;其中,所述数据处理***包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述数据处理方法包括:
通过所述第一寄存器组获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
通过所述第二寄存器组获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
通过所述运算器获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
可见,本申请实施例公开的数据处理***,包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组 和所述第二寄存器组连接的运算器;其中,所述第一寄存器组,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;所述第二寄存器组,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;所述运算器,用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。也即,在运算器进行本轮卷积运算的过程中,通过第一寄存器组获取未处理完毕的待处理数据中的下一轮待卷积数据并传输至运算器,当运算器输出本轮的卷积运算结果,继续进行下一轮卷积运算,这样,能够让数据的存储和计算同时进行,在数据的传输过程中进卷积运算,使数据传输时间和卷积运算时间一致,从而提升了卷积运算的速度。
本实施例中,所述计算机可读存储介质中保存的计算机子程序被处理器执行时,可以具体实现以下步骤:控制所述待卷积数据在所述第一寄存器组间流动或从所述第一寄存器组流向所述运算器
本实施例中,所述计算机可读存储介质中保存的计算机子程序被处理器执行时,可以具体实现以下步骤:控制所述卷积核数据在所述第二寄存器组间流动或从所述第二寄存器组流向所述运算器。
本实施例中,所述计算机可读存储介质中保存的计算机子程序被处理器执行时,可以具体实现以下步骤:配置参数;其中,所述参数包括卷积核尺寸、卷积核权重以及控制开关的开关状态。
本实施例中,所述计算机可读存储介质中保存的计算机子程序被处理器执行时,可以具体实现以下步骤:根据所述参数调整所述卷积运算模块的结构,以进行不同卷积层之间的不同尺寸的卷积运算。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应, 所以描述的比较简单,相关之处参见方法部分说明即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的一种数据处理***、方法及介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (8)

  1. 一种数据处理***,其特征在于,包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,
    所述第一寄存器组,用于获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
    所述第二寄存器组,用于获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
    所述运算器,用于获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
  2. 根据权利要求1所述的数据处理***,其特征在于,还包括:
    控制开关,用于控制所述待卷积数据和所述卷积核数据在所述卷积运算模块中的流向。
  3. 根据权利要求2所述的数据处理***,其特征在于,所述控制开关用于控制所述待卷积数据在所述第一寄存器组间流动或从所述第一寄存器组流向所述运算器。
  4. 根据权利要求2所述的数据处理***,其特征在于,所述控制开关用于控制所述卷积核数据在所述第二寄存器组间流动或从所述第二寄存器组流向所述运算器。
  5. 根据权利要求1所述的数据处理***,其特征在于,还包括:
    参数配置模块,用于配置参数;其中,所述参数包括卷积核尺寸、卷积核权重以及控制开关的开关状态;
    卷积运算模块结构调整模块,用于根据所述参数调整所述卷积运算模块的结构,以进行不同卷积层之间的不同尺寸的卷积运算。
  6. 根据权利要求1至5任一项所述的数据处理***,其特征在于,包括多个所述卷积运算模块,用于并行处理所述待处理数据中的不同待卷积数据。
  7. 一种数据处理方法,其特征在于,应用于数据处理***;其中,所述数据处理***包括卷积运算模块,所述卷积运算模块包括第一寄存器组、第二寄存器组以及与所述第一寄存器组和所述第二寄存器组连接的运算器;其中,所述数据处理方法包括:
    通过所述第一寄存器组获取待处理数据中的本轮待卷积数据,然后将获取到的本轮待卷积数据传输至所述运算器,并且在所述运算器进行本轮卷积运算的过程中,如果所述待处理数据还未处理完毕,则获取所述待处理数据中的下一轮待卷积数据并传输至所述运算器;
    通过所述第二寄存器组获取卷积核数据,并将当前获取到的卷积核数据传输至所述运算器;
    通过所述运算器获取所述第一寄存器组传输的每轮待卷积数据,以及获取所述第二寄存器组传输的卷积核数据,并对获取到的每轮待卷积数据和卷积核数据分别进行卷积运算,以得到相应的卷积运算结果。
  8. 一种计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求7所述的数据处理方法。
PCT/CN2019/114538 2019-08-16 2019-10-31 一种数据处理***、方法及介质 WO2021031351A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910760143.5 2019-08-16
CN201910760143.5A CN110516799A (zh) 2019-08-16 2019-08-16 一种数据处理***、方法及介质

Publications (1)

Publication Number Publication Date
WO2021031351A1 true WO2021031351A1 (zh) 2021-02-25

Family

ID=68625592

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114538 WO2021031351A1 (zh) 2019-08-16 2019-10-31 一种数据处理***、方法及介质

Country Status (2)

Country Link
CN (1) CN110516799A (zh)
WO (1) WO2021031351A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570612B (zh) * 2021-09-23 2021-12-17 苏州浪潮智能科技有限公司 一种图像处理方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066239A (zh) * 2017-03-01 2017-08-18 智擎信息***(上海)有限公司 一种实现卷积神经网络前向计算的硬件结构
CN107918794A (zh) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 基于计算阵列的神经网络处理器
CN108564524A (zh) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 一种视觉图像的卷积计算优化方法
CN108665063A (zh) * 2018-05-18 2018-10-16 南京大学 用于bnn硬件加速器的双向并行处理卷积加速***
CN108805267A (zh) * 2018-05-28 2018-11-13 重庆大学 用于卷积神经网络硬件加速的数据处理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034373B (zh) * 2018-07-02 2021-12-21 鼎视智慧(北京)科技有限公司 卷积神经网络的并行处理器及处理方法
CN109871510B (zh) * 2019-01-08 2024-01-23 广东浪潮大数据研究有限公司 二维卷积运算处理方法、***、设备及计算机存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066239A (zh) * 2017-03-01 2017-08-18 智擎信息***(上海)有限公司 一种实现卷积神经网络前向计算的硬件结构
CN107918794A (zh) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 基于计算阵列的神经网络处理器
CN108564524A (zh) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 一种视觉图像的卷积计算优化方法
CN108665063A (zh) * 2018-05-18 2018-10-16 南京大学 用于bnn硬件加速器的双向并行处理卷积加速***
CN108805267A (zh) * 2018-05-28 2018-11-13 重庆大学 用于卷积神经网络硬件加速的数据处理方法

Also Published As

Publication number Publication date
CN110516799A (zh) 2019-11-29

Similar Documents

Publication Publication Date Title
WO2017181562A1 (zh) 一种神经网络的处理方法、***
US11720523B2 (en) Performing concurrent operations in a processing element
CN105892989B (zh) 一种神经网络加速器及其运算方法
US11645529B2 (en) Sparsifying neural network models
KR102068576B1 (ko) 합성곱 신경망 기반 이미지 처리 시스템 및 방법
CN107301456B (zh) 基于向量处理器的深度神经网络多核加速实现方法
US10482380B2 (en) Conditional parallel processing in fully-connected neural networks
KR101788829B1 (ko) 콘볼루션 신경망 컴퓨팅 장치
KR20190074195A (ko) 뉴럴 프로세싱 가속기
WO2019170049A1 (zh) 一种卷积神经网络加速装置和方法
WO2020133317A1 (zh) 计算资源分配技术及神经网络***
TWI634489B (zh) 多層人造神經網路
CN108304925B (zh) 一种池化计算装置及方法
WO2017020165A1 (zh) 自适应芯片和配置方法
Mousouliotis et al. Expanding a robot's life: Low power object recognition via FPGA-based DCNN deployment
CN111738276A (zh) 基于多核卷积神经网络的图像处理方法、装置及设备
US11119507B2 (en) Hardware accelerator for online estimation
Bai et al. Pointnet on fpga for real-time lidar point cloud processing
WO2021031351A1 (zh) 一种数据处理***、方法及介质
KR20220071723A (ko) 딥러닝 연산 수행 방법 및 장치
WO2020042771A9 (zh) 图像识别处理方法和装置
Véstias et al. Hybrid dot-product calculation for convolutional neural networks in FPGA
TWI766203B (zh) 用於實施神經網路應用之卷積塊陣列及其使用方法、和卷積塊電路
Erdem et al. Design space exploration for orlando ultra low-power convolutional neural network soc
CN115330683A (zh) 一种基于fpga的目标快速检测***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19942003

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19942003

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19942003

Country of ref document: EP

Kind code of ref document: A1