CN109523019B - Accelerator, accelerating system based on FPGA, control method and CNN network system - Google Patents

Accelerator, accelerating system based on FPGA, control method and CNN network system Download PDF

Info

Publication number
CN109523019B
CN109523019B CN201811639964.5A CN201811639964A CN109523019B CN 109523019 B CN109523019 B CN 109523019B CN 201811639964 A CN201811639964 A CN 201811639964A CN 109523019 B CN109523019 B CN 109523019B
Authority
CN
China
Prior art keywords
accelerator
configuration information
mux
fpga
pes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811639964.5A
Other languages
Chinese (zh)
Other versions
CN109523019A (en
Inventor
邬志影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811639964.5A priority Critical patent/CN109523019B/en
Publication of CN109523019A publication Critical patent/CN109523019A/en
Application granted granted Critical
Publication of CN109523019B publication Critical patent/CN109523019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)
  • Power Sources (AREA)

Abstract

The present disclosure provides an accelerator comprising: the system comprises a plurality of computing units PE and at least one multiplexer MUX, wherein the MUX is respectively connected with two interconnected PEs, and the number of PEs performing operation is determined by changing the connection state of the MUX. Through the accelerator provided by the disclosure, the effect of reducing power consumption is realized, and the effect of balancing the power calculation demands is realized. The disclosure also provides an acceleration system and a control method based on the FPGA, a CNN network system and a control method.

Description

Accelerator, accelerating system based on FPGA, control method and CNN network system
Technical Field
The embodiment of the disclosure relates to the technical field of Internet, in particular to an accelerator, an acceleration system and a control method based on FPGA, a CNN network system and a control method.
Background
FPGA (Field-Programmable GATE ARRAY), a Field Programmable gate array, is a product of further development on the basis of Programmable devices such as PAL, GAL, CPLD. The programmable device is used as a semi-custom circuit in the field of Application Specific Integrated Circuits (ASICs), which not only solves the defect of custom circuits, but also overcomes the defect of limited gate circuits of the original programmable device. And the core of the FPGA is the accelerator. Based on the accelerator, a corresponding operation is realized.
In the prior art, the accelerator consists of a plurality of computing units PE. In order to meet the calculation force demand, when designing the bottom layer FPGA, a plurality of PE are set as much as possible, and then the calculation demand is calculated by adopting a pulsation calculation mode.
Disclosure of Invention
The embodiment of the disclosure provides an accelerator, an acceleration system and a control method based on FPGA, a CNN network system and a control method.
According to one aspect of the disclosed embodiments, the disclosed embodiments provide an accelerator comprising: a plurality of computation units PE and at least one multiplexer MUX, the MUX is respectively connected with two interconnected PEs, and the number of PEs performing operation is determined by changing the connection state of the MUX.
In some embodiments, the MUX is plural, and any two adjacent PEs are connected by one MUX.
In some embodiments, the PEs are arranged in a matrix array.
In some embodiments, the connection states include an output state, a cascade state, and a disconnected state.
According to another aspect of the embodiments of the present disclosure, there is also provided an acceleration system based on an FPGA, including: the accelerator of any of the above embodiments, wherein the computing controller is connected to the accelerator and the register, respectively, and the transmitter is connected to the accelerator and the memory, respectively.
According to another aspect of the embodiments of the present disclosure, there is also provided a method for controlling an FPGA-based acceleration system, the method being based on the FPGA-based acceleration system as described above, including:
Determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX;
and sending the configuration information to an accelerator so that the accelerator can determine the connection state of the MUX according to the configuration information.
According to another aspect of the embodiments of the present disclosure, there is also provided an acceleration system based on an FPGA, including: the accelerator of any one of the above disclosed embodiments, further comprising:
The computing controller is used for: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of the PE and the connection state of each MUX, and transmitting the configuration information to the accelerator.
According to another aspect of the embodiments of the present disclosure, there is also provided a CNN network system including: an FPGA-based acceleration system as described above, and a processor coupled to the FPGA-based acceleration system.
In some embodiments, the processor is configured to: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX, and transmitting the configuration information to the acceleration system based on the FPGA.
According to another aspect of the embodiments of the present disclosure, there is also provided a method for controlling a CNN network system, the method being based on the system as described above, including:
acquiring calculation force demand information;
determining configuration information from the calculation force demand information and a preset configuration information table, wherein the configuration information comprises the number of PE and the connection state of each MUX;
and sending the configuration information to an FPGA-based acceleration system so that the FPGA-based acceleration system can determine the connection state of the MUX according to the configuration information.
The accelerator provided by the embodiment of the disclosure realizes the effect of reducing power consumption and the effect of balancing the calculation force requirement.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure.
The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
fig. 1 is a schematic structural view of an accelerator according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a computing module provided by an embodiment of the present disclosure;
Fig. 3 is a schematic diagram of a computing module according to another embodiment of the invention.
Fig. 4 is a schematic structural diagram of an acceleration system based on FPGA according to an embodiment of the present disclosure;
Fig. 5 is a flow chart of a control method of an acceleration system based on an FPGA according to an embodiment of the disclosure;
fig. 6 is a schematic structural diagram of a CNN network system according to an embodiment of the present disclosure;
Fig. 7 is a flow chart of a control method of a CNN network system according to an embodiment of the present disclosure;
Reference numerals:
1. a memory; 2. a transmitter; 3. a register; 4. a computing controller; 5. an accelerator;
6. a processor.
Detailed Description
In order to enable those skilled in the art to better understand the technical scheme of the invention, the accelerator, the accelerating system and the control method based on the FPGA, the CNN network system and the control method provided by the invention are described in detail below with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Embodiments described herein may be described with reference to plan and/or cross-sectional views with the aid of idealized schematic diagrams of the present disclosure. Accordingly, the example illustrations may be modified in accordance with manufacturing techniques and/or tolerances. Thus, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of the configuration formed based on the manufacturing process. Thus, the regions illustrated in the figures have schematic properties and the shapes of the regions illustrated in the figures illustrate the particular shapes of the regions of the elements, but are not intended to be limiting.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
According to one aspect of the disclosed embodiments, the disclosed embodiments provide an accelerator.
The accelerator includes: the system comprises a plurality of computing units PE and at least one multiplexer MUX, wherein the MUX is respectively connected with two interconnected PEs, and the number of PEs performing operation is determined by changing the connection state of the MUX.
In the prior art, accelerators are designed using either a pulsed design or a pseudo instruction set approach. When the accelerator is completed, the computational power of the accelerator is fixed. Equalization of the calculation force requirements cannot be achieved.
For example, an accelerator includes m PEs, and each PE participates in performing an arithmetic operation when it is required to calculate the demand for the computation force NT. The power consumption that it generates is enormous and creates a waste of computing power.
In the present embodiment, however, at least one MUX is provided. For example, when the MUX is one, a MUX may be provided between any two adjacently connected PEs. For example, a MUX is provided between PE10 and PE 11. By changing the connection state of the MUX, the number of PEs performing the arithmetic operation can be changed. For example, when the MUX is in a cascade state, PE10 and PE11 are two PEs that are connected to each other. While when the MUX is off, the number of PEs performing the arithmetic operation is only the PEs connected before the PE10 (including the PE 10).
That is, assuming that ResNet T of computing power is required for ResNet T and the maximum computing power of the accelerator can reach 1T, if the scheme provided by the embodiment is adopted, the second half of PEs of the plurality of PEs can be disconnected through the MUX, so that the operation is closed, further, the power consumption is reduced, and the balanced distribution of computing power is realized.
When there are several MUXs, such as n PEs, m MUXs, n > m+1, MUXs can be set between any two adjacent PEs.
In one possible implementation solution, the number of muxes is multiple, and any two adjacent PEs are connected through one MUX.
In this embodiment, when there are multiple muxes, for example, n PEs, m muxes, and n=m+1, the muxes are set between two adjacent PEs, so that two adjacent PEs are connected by the muxes.
In one possible implementation, the PEs are arranged in a matrix array.
Referring specifically to fig. 1, fig. 1 is a schematic structural diagram of an accelerator according to an embodiment of the disclosure.
The operation principle of the accelerator provided by the embodiment of the present disclosure will now be described in detail with reference to fig. 1. The PEs are arranged in a matrix form, assuming that there are M rows and N columns, the number of PEs is m×n, if the minimum computation parallelism of 1 PE is 32 (i.e. 32 MACs are computed in parallel in one clock cycle), and all PEs work, the total computation power is m×n× Frequence _clock×32, and the units MACs/s. The PEs are connected with adjacent PEs through MUXs, and the states of the MUXs are three: cascading, disconnecting and outputting. The calculated force reaches a maximum when the transverse cascade is longitudinally disconnected or the longitudinal cascade is transversely disconnected. When the algorithm requires only half of the computation force, the latter half can be cut longitudinally (the right half of the MUX is fully off), or the lower half can be cut laterally (the lower half of the MUX is fully off).
Specifically: assuming that the algorithm requires 0.5 tips and the maximum accelerator force is 1.5 tips, this 1.5 tips is the value that all PEs can compute simultaneously. The PE distribution is 3 rows and 12 columns, that is, 3×12=36 PEs all work simultaneously with an algorithm force of 1.5 tips, and the algorithm requires an algorithm force of 0.5 tips, only 1/3 of the maximum algorithm force, if the PEs all compute simultaneously, the requirement of 0.5 tips of the algorithm can be met, but this leads to an increase in power consumption of the accelerator, while 2/3 PEs compute data are invalid. In this embodiment, by changing the connection state of the MUX, 2 of the connection modes of the PEs that obtain the corresponding computing forces are: 1 row 12 column PE calculation module (shown in FIG. 2) and 3 row 4 column calculation module (shown in FIG. 3).
According to another aspect of the disclosed embodiments, the disclosed embodiments provide an FPGA-based acceleration system.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an acceleration system based on FPGA according to an embodiment of the disclosure.
As shown in fig. 4, the FPGA-based acceleration system includes: the system comprises a register, a calculation controller, a memory, a transmitter and an accelerator as described above, wherein the calculation controller is respectively connected with the accelerator and the register, and the transmitter is respectively connected with the accelerator and the memory.
Wherein the dashed line of the memory connection indicates that the memory can be connected to an external device. Similarly, the dashed lines of the register connection indicate that the register may be connected to an external device.
Referring to fig. 5, fig. 5 is a flowchart of a control method of an acceleration system based on an FPGA according to an embodiment of the disclosure.
As shown in fig. 5, the method includes:
S1: the computing controller determines configuration information according to the acquired computing power demand information, the configuration information including the number of PEs and the connection state of each MUX.
In this step, when the computing controller learns the calculation force demand information, the number of PEs required to complete the calculation force corresponding to the calculation force demand information may be determined from the calculation force demand information.
When the number of PEs is known, then the connection state of each MUX may be determined to achieve that the number of PEs is the same as the number of PEs required for computing power when performing the arithmetic operation.
S2: the computing controller sends the configuration information to the accelerator so that the accelerator determines the connection state of the MUX based on the configuration information.
It can be appreciated that when the accelerator receives the configuration information, the corresponding MUX is in a corresponding cascade state or disconnection state to implement connection of a corresponding number of PEs, thereby obtaining a computing module that implements a corresponding computing power.
According to another aspect of the embodiments of the present disclosure, there is further provided an acceleration system based on an FPGA, including an accelerator as described above, and further including a computation controller connected to the accelerator, wherein the computation controller is configured to: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX, and transmitting the configuration information to the accelerator.
According to another aspect of the embodiments of the present disclosure, the embodiments of the present disclosure further provide a CNN network system.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a CNN network system according to an embodiment of the disclosure.
As shown in fig. 6, the system includes: the FPGA-based acceleration system as described above, further comprising a processor coupled to the FPGA-based acceleration system.
Wherein the processor is configured to: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX, and transmitting the configuration information to an acceleration system based on the FPGA.
The scheme of this embodiment will now be described in detail with reference to fig. 6:
in the processor, a configuration information table having a mapping relation with the calculation force demand information is stored in advance. That is, based on the configuration information table, the number of PEs selected when the power demand information is information carrying how much power, and which muxes are in the cascade state and which muxes are in the off state can be acquired.
Therefore, when the processor receives (or acquires) the power demand information, the configuration information table is traversed according to the power demand information, or the power demand information is matched with the configuration information table, so as to obtain corresponding configuration information, namely the number of PEs corresponding to the power demand information and the connection state of each MUX. The processor transmits the configuration information to the FPGA-based acceleration system via the AXI bus.
Specifically, the AXI bus sends configuration information to registers in the FPGA-based acceleration system. The registers send configuration information to the compute controller.
After receiving the configuration information, the computing controller configures the accelerator based on the configuration information, i.e., the MUX in the accelerator is in a state that satisfies the corresponding computing force demand information. And after the computing controller completes the configuration operation, sending an instruction for fetching data to the memory.
After receiving the instruction of fetching data, the memory acquires data to be calculated corresponding to the calculation force demand information from the processor through the AXIS. And transmits the data to be calculated to a transmitter (in particular an input module in the transmitter).
The accelerator extracts data to be calculated from the transmitter (particularly an input module in the transmitter) for calculation, and stores a calculation result obtained by calculation into the transmitter (particularly an output module in the transmitter).
The memory obtains the calculation result from the transmitter and transmits the calculation result to the processor through the AXIS bus.
According to another aspect of the embodiments of the present disclosure, the embodiments of the present disclosure further provide a control method of a CNN network system, which is based on the system as described above.
Referring to fig. 7, fig. 7 is a flowchart illustrating a control method of a CNN network system according to an embodiment of the disclosure. The method comprises the following steps:
S10: the processor obtains the power demand information.
S20: the processor determines configuration information including the number of PEs and the connection status of each MUX from the power demand information and a preset configuration information table.
S30: the processor sends the configuration information to the FPGA-based acceleration system so that the FPGA-based acceleration system determines the connection state of the MUX according to the configuration information.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (10)

1. An accelerator, comprising: a plurality of computation units PE and a plurality of multiplexers MUX, the MUX is respectively connected with two mutually connected PEs, and the quantity of the PEs for executing operation is determined by changing the connection state of the MUX; wherein, in the case that the number of the MUXs is plural and the number of the PEs is at least two more than the number of the MUXs, the MUXs may be disposed between any two adjacent PEs.
2. The accelerator of claim 1, wherein the MUX is a plurality of muxes, and any two adjacent PEs are connected by one MUX.
3. The accelerator of claim 1, wherein the PEs are arranged in a matrix array.
4. An accelerator according to any one of claims 1 to 3, wherein the connected states include an output state, a cascade state and a disconnected state.
5. An FPGA-based acceleration system, comprising: a register, a computation controller, a memory, a transmitter, and the accelerator of any one of claims 1 to 4, wherein the computation controller is connected to the accelerator and the register, respectively, and the transmitter is connected to the accelerator and the memory, respectively.
6. A method of controlling an FPGA-based acceleration system, the method being based on the FPGA-based acceleration system of claim 5, comprising:
Determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX;
and sending the configuration information to an accelerator so that the accelerator can determine the connection state of the MUX according to the configuration information.
7. An FPGA-based acceleration system, comprising: the accelerator of any one of claims 1 to 4, further comprising:
The computing controller is used for: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of the PE and the connection state of each MUX, and transmitting the configuration information to the accelerator.
8. A CNN network system, comprising: the FPGA-based acceleration system of claim 5, and a processor coupled to the FPGA-based acceleration system.
9. The system of claim 8, wherein,
The processor is configured to: and determining configuration information according to the acquired calculation force demand information, wherein the configuration information comprises the number of PE and the connection state of each MUX, and transmitting the configuration information to the acceleration system based on the FPGA.
10. A control method of a CNN network system, the method being based on the system of claim 8, comprising:
acquiring calculation force demand information;
determining configuration information from the calculation force demand information and a preset configuration information table, wherein the configuration information comprises the number of PE and the connection state of each MUX;
and sending the configuration information to an FPGA-based acceleration system so that the FPGA-based acceleration system can determine the connection state of the MUX according to the configuration information.
CN201811639964.5A 2018-12-29 2018-12-29 Accelerator, accelerating system based on FPGA, control method and CNN network system Active CN109523019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811639964.5A CN109523019B (en) 2018-12-29 2018-12-29 Accelerator, accelerating system based on FPGA, control method and CNN network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811639964.5A CN109523019B (en) 2018-12-29 2018-12-29 Accelerator, accelerating system based on FPGA, control method and CNN network system

Publications (2)

Publication Number Publication Date
CN109523019A CN109523019A (en) 2019-03-26
CN109523019B true CN109523019B (en) 2024-05-21

Family

ID=65798565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811639964.5A Active CN109523019B (en) 2018-12-29 2018-12-29 Accelerator, accelerating system based on FPGA, control method and CNN network system

Country Status (1)

Country Link
CN (1) CN109523019B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931911B (en) * 2020-07-30 2022-07-08 山东云海国创云计算装备产业创新中心有限公司 CNN accelerator configuration method, system and device
CN114691346A (en) * 2020-12-25 2022-07-01 华为技术有限公司 Configuration method and equipment of computing power resources

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1271437A (en) * 1997-10-10 2000-10-25 博普斯公司 Method and apparatus for manifold array processing
JP2001312481A (en) * 2000-02-25 2001-11-09 Nec Corp Array type processor
CN1659540A (en) * 2002-06-03 2005-08-24 皇家飞利浦电子股份有限公司 Reconfigurable integrated circuit
CN1722130A (en) * 2004-07-12 2006-01-18 富士通株式会社 Reconfigurable operation apparatus
CN101576894A (en) * 2008-05-09 2009-11-11 中国科学院半导体研究所 Real-time image content retrieval system and image feature extraction method
CN105378651A (en) * 2013-05-24 2016-03-02 相干逻辑公司 Memory-network processor with programmable optimizations
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN107229463A (en) * 2016-03-24 2017-10-03 联发科技股份有限公司 Computing device and corresponding computational methods
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation
US9886072B1 (en) * 2013-06-19 2018-02-06 Altera Corporation Network processor FPGA (npFPGA): multi-die FPGA chip for scalable multi-gigabit network processing
CN108052449A (en) * 2017-12-14 2018-05-18 北京百度网讯科技有限公司 Operating system condition detection method and device
CN108228094A (en) * 2016-12-09 2018-06-29 英特尔公司 Access waits for an opportunity to increase in memory side cache
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN208283943U (en) * 2018-06-08 2018-12-25 南京信息工程大学 A kind of CNN acceleration optimization device based on FPGA
CN209118339U (en) * 2018-12-29 2019-07-16 百度在线网络技术(北京)有限公司 Accelerator, the acceleration system based on FPGA and CNN network system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE212007000102U1 (en) * 2007-09-11 2010-03-18 Core Logic, Inc. Reconfigurable array processor for floating-point operations
US9449257B2 (en) * 2012-12-04 2016-09-20 Institute Of Semiconductors, Chinese Academy Of Sciences Dynamically reconstructable multistage parallel single instruction multiple data array processing system
CN104734998B (en) * 2013-12-20 2018-11-06 华为技术有限公司 A kind of network equipment and information transferring method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1271437A (en) * 1997-10-10 2000-10-25 博普斯公司 Method and apparatus for manifold array processing
JP2001312481A (en) * 2000-02-25 2001-11-09 Nec Corp Array type processor
CN1659540A (en) * 2002-06-03 2005-08-24 皇家飞利浦电子股份有限公司 Reconfigurable integrated circuit
CN1722130A (en) * 2004-07-12 2006-01-18 富士通株式会社 Reconfigurable operation apparatus
CN101576894A (en) * 2008-05-09 2009-11-11 中国科学院半导体研究所 Real-time image content retrieval system and image feature extraction method
CN105378651A (en) * 2013-05-24 2016-03-02 相干逻辑公司 Memory-network processor with programmable optimizations
US9886072B1 (en) * 2013-06-19 2018-02-06 Altera Corporation Network processor FPGA (npFPGA): multi-die FPGA chip for scalable multi-gigabit network processing
CN107229463A (en) * 2016-03-24 2017-10-03 联发科技股份有限公司 Computing device and corresponding computational methods
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN108228094A (en) * 2016-12-09 2018-06-29 英特尔公司 Access waits for an opportunity to increase in memory side cache
CN107609641A (en) * 2017-08-30 2018-01-19 清华大学 Sparse neural network framework and its implementation
CN108052449A (en) * 2017-12-14 2018-05-18 北京百度网讯科技有限公司 Operating system condition detection method and device
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN208283943U (en) * 2018-06-08 2018-12-25 南京信息工程大学 A kind of CNN acceleration optimization device based on FPGA
CN209118339U (en) * 2018-12-29 2019-07-16 百度在线网络技术(北京)有限公司 Accelerator, the acceleration system based on FPGA and CNN network system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Low-power high-level synthesis for FPGA architectures;Chen, DM等;ISLPED’03:Proceedings of the 2003 international symposium on low power electronics and design;20030831;第2003卷;134-139页 *
基于FPGA的细粒度并行CYK算法加速器设计与实现;夏飞;窦勇;宋健;雷国庆;;计算机学报;20100515(第05期);全文 *

Also Published As

Publication number Publication date
CN109523019A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
US7478222B2 (en) Programmable pipeline array
US20170214405A1 (en) Clock Circuit and Clock Signal Transmission Method Thereof
CN109523019B (en) Accelerator, accelerating system based on FPGA, control method and CNN network system
US20150022236A1 (en) Apparatus and Methods for Time-Multiplex Field-Programmable Gate Arrays
WO2006039710A9 (en) Computer-based tool and method for designing an electronic circuit and related system and library for same
US9584128B2 (en) Structure of multi-mode supported and configurable six-input LUT, and FPGA device
CN108140067B (en) Method and system for circuit design optimization
JP5071707B2 (en) Data processing apparatus and control method thereof
US9292640B1 (en) Method and system for dynamic selection of a memory read port
Cardona et al. AC_ICAP: A flexible high speed ICAP controller
US20030040898A1 (en) Method and apparatus for simulation processor
US10489116B1 (en) Programmable integrated circuits with multiplexer and register pipelining circuitry
US11966736B2 (en) Interconnect device for selectively accumulating read data and aggregating processing results transferred between a processor core and memory
CN112970036B (en) Convolutional block array for implementing neural network applications and methods of use thereof
CN109857024B (en) Unit performance test method and system chip of artificial intelligence module
CN109196465B (en) Double precision floating point operation
CN209118339U (en) Accelerator, the acceleration system based on FPGA and CNN network system
EP3293883B1 (en) Programmable logic device, method for verifying error of programmable logic device, and method for forming circuit of programmable logic device
JP2005184262A (en) Semiconductor integrated circuit and its fabricating process
US20180076803A1 (en) Clock-distribution device of ic and method for arranging clock-distribution device
Malhotra et al. Novel field programmable embryonic cell for adder and multiplier
CN109271202B (en) Asynchronous Softmax hardware acceleration method and accelerator
US7861197B2 (en) Method of verifying design of logic circuit
Martínez-Alvarez et al. High performance implementation of an FPGA-based sequential DT-CNN
CN115564032A (en) Logic gate device, processing core and many-core system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant