CN112819022A - Image recognition device and image recognition method based on neural network - Google Patents

Image recognition device and image recognition method based on neural network Download PDF

Info

Publication number
CN112819022A
CN112819022A CN201911132212.4A CN201911132212A CN112819022A CN 112819022 A CN112819022 A CN 112819022A CN 201911132212 A CN201911132212 A CN 201911132212A CN 112819022 A CN112819022 A CN 112819022A
Authority
CN
China
Prior art keywords
image
control signal
processing
cache
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911132212.4A
Other languages
Chinese (zh)
Other versions
CN112819022B (en
Inventor
周培涛
陈志强
张丽
李元景
邓智
李波
唐虎
焦凌云
孙运达
邢宇翔
高河伟
焦亚涛
廖磊
付世航
岳小兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Nuctech Co Ltd
Original Assignee
Tsinghua University
Nuctech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Nuctech Co Ltd filed Critical Tsinghua University
Priority to CN201911132212.4A priority Critical patent/CN112819022B/en
Publication of CN112819022A publication Critical patent/CN112819022A/en
Application granted granted Critical
Publication of CN112819022B publication Critical patent/CN112819022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

An embodiment of the present disclosure discloses an image recognition apparatus based on a neural network, including: an image cache configured to receive and store images; a signature graph cache configured to receive and store a signature graph; a weight cache configured to receive and store weights; a controller including a configuration information register and configured to receive configuration information, generate first and second control signals according to the configuration information, and transmit the first and second control signals; the image processing device comprises a preprocessing unit, a weight buffer and a controller, wherein the preprocessing unit is configured to obtain images and weights from the image buffer and the weight buffer respectively, receive a first control signal from the controller, and perform preprocessing on the images and the weights according to the first control signal to obtain and send a preprocessing result; and the processing unit array is configured to receive the preprocessing result from the preprocessing unit, receive a second control signal from the controller, and perform processing on the preprocessing result according to the second control signal to obtain and send the characteristic diagram to the special diagram buffer.

Description

Image recognition device and image recognition method based on neural network
Technical Field
The present disclosure relates to the field of image recognition, and in particular, to an image recognition system and an image recognition method based on a neural network.
Background
Image recognition belongs to a typical application scene of a Deep Neural Network (DNN) in the fields of image processing, pattern recognition and machine vision synthesis, and in recent years, the artificial Convolutional Neural Network (CNN) enables great progress to be made in image recognition classification precision and accuracy. In the field of artificial intelligence recognition, two application occasions of cloud computing and edge computing exist. Cloud computing and the like mainly rely on general purpose processors (CPUs) such as servers and Graphics Processing Units (GPUs) for identification. The CPU has the problems of very low speed, relatively high power consumption, difficulty in coordination of the multiprocessor and the like. The GPU has large power consumption and high cost, and is not suitable for edge computing applications. With the development of the internet of things, cloud computing is not always efficient, especially in terms of intelligent image identification, image data generated by an edge end is gradually increased, a transmission process of the image data generates a large delay, and if the data can be processed and analyzed at a node of the edge end, the computing model is more efficient. In the existing image identification hardware acceleration technology aiming at the edge end, some technologies adopt the addition of a floating point calculation unit DSP and an ALU for acceleration, but the technologies have the problems of large power consumption and high cost; some technologies adopt a systolic array to carry out convolution acceleration, but the technologies also have the problems of large power consumption, high cost, poor general acceleration effect caused by only carrying out convolution acceleration and the like; some technologies accelerate the whole parallel processing sub-flow on hardware, but the technologies have the problems of complicated design scheme, no simplification of part of units, poor universality and the like. In the field of security inspection, due to the problems of high requirement on detection speed, a large number of images to be identified, limited field and the like, an image identification device and an image identification method which can improve speed, reduce power consumption and have good universality are needed.
BRIEF SUMMARY OF THE PRESENT DISCLOSURE
According to an aspect of an embodiment of the present disclosure, there is provided a neural network-based image recognition apparatus including:
an image cache configured to receive and store an image to be recognized;
a signature graph cache configured to receive and store a signature graph;
a weight cache configured to receive and store weights;
a controller including a configuration information register and configured to receive configuration information and store the configuration information in the configuration information register, generate a first control signal and a second control signal according to the configuration information, and transmit the first control signal and the second control signal;
the preprocessing unit is configured to obtain the image to be identified and the weight from the image buffer and the weight buffer, respectively, receive the first control signal from the controller, perform preprocessing on the image to be identified and the weight according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result; and
the processing unit array is configured to receive the preprocessing result from the preprocessing unit, receive the second control signal from the controller, perform processing on the preprocessing result according to the second control signal to obtain a feature map, and send the feature map to the feature map buffer.
In one embodiment, the pre-processing unit is further configured to: receiving the feature map from the feature map cache, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and sending the preprocessing result to the processing unit array; and if so, instructing the feature map cache to send the feature map.
In one embodiment, the controller further comprises a status register and a control state machine, wherein
The preprocessing unit is further configured to send a first state signal to the state register after completion of one of a plurality of operations in the preprocessing, receive a first state control signal from the control state machine, and perform a next operation of the plurality of operations after the one operation according to the first state control signal; and
the status register is configured to store the first status signal after receiving the first status signal, and the control state machine is configured to generate the first status control signal according to the first status signal stored in the status register and to transmit the first status control signal to the preprocessing unit.
In one embodiment, the processing unit array is further configured to send a second status signal to the status register after completion of one of a plurality of operations in the process, receive a second status control signal from the control state machine, and perform a next one of the plurality of operations after the one operation in accordance with the second status control signal; and
the status register is further configured to store the second status signal after receiving the second status signal, and the control state machine is further configured to generate the second status control signal from the second status signal stored in the status register and to send the second status control signal to the array of processing units.
In one embodiment, each of the plurality of processing units includes a multiplier array and an addition tree.
In one embodiment, the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and
the processing unit array is further configured to perform processing on the pre-processing result according to the second control signal to obtain a feature map by:
sequentially performing the one or more of the convolution processing, the activation processing, the pooling processing, and the full-connection processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.
In one embodiment, the processing unit array comprises:
a plurality of processing units configured to perform the convolution processing and the full-link processing;
a plurality of activation units configured to perform the activation processing; and
a plurality of pooling units configured to perform the pooling process.
In one embodiment, the preprocessing unit is further configured to perform preprocessing on the image to be recognized and the weights according to the first control signal by:
performing resizing and padding processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.
In one embodiment, the profile cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and
wherein the controller is further configured to monitor a first number of a buffer in the plurality of write buffers that is writing data and a second number of a read buffer in the plurality of read buffers that is reading data, and issue an interrupt signal to interrupt an operation of the image recognition apparatus when a difference between the first number and the second number is less than a first threshold.
In one embodiment, the weight cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and
the controller is further configured to monitor a first number of a buffer area in which data is being written in the plurality of write buffer areas and a second number of a read buffer area in which data is being read in the plurality of drain buffer areas, and send an interrupt signal to interrupt an operation of the image recognition device when a difference between the first number and the second number is smaller than a second threshold.
In one embodiment, the pooling process includes maximum pooling or average pooling.
In one embodiment, the plurality of activation units perform the activation process using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.
In one embodiment, the image recognition apparatus further comprises: a general purpose processor configured to store the image to be identified and the weights, and to send the image to be identified and the weights to the weight cache.
In one embodiment, the image cache receives the image to be identified from the general purpose processor via a first multi-channel direct memory access, MCDMA, and the weight cache receives the weights from the general purpose processor via a second MCDMA.
In one embodiment, the image recognition apparatus further comprises: a memory configured to store a first password,
wherein the general purpose processor is further configured to send a second password to the controller, and
the controller is further configured to, in response to receiving the second password from the general purpose processor, retrieve the first password from the memory and compare the first password to the second password, allow the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibit the image cache and the weight cache from receiving data when the first password is different from the second password.
In one embodiment, the general purpose processor is further configured to receive the feature map from the feature map cache and perform classification and localization processing on the feature map to determine the category and location of the image to be identified.
According to another aspect of an embodiment of the present disclosure, there is provided a neural network-based image recognition method including:
step S1: receiving and storing an image to be identified by an image cache;
step S2: receiving and storing weights by a weight cache;
step S3: receiving, by a controller, configuration information, generating a first control signal and a second control signal according to the configuration information, and transmitting the first control signal and the second control signal;
step S4: obtaining, by a pre-processing unit, the image to be recognized and the weight from the image buffer and the weight buffer, respectively, receiving the first control signal from the controller, performing pre-processing on the image to be recognized and the weight according to the first control signal to obtain a pre-processing result, and transmitting the pre-processing result, an
Step S5: and receiving the preprocessing result from the preprocessing unit by the processing unit array, receiving the second control signal from the controller, processing the preprocessing result according to the second control signal to obtain a feature map, and sending the feature map to a feature map cache.
In one embodiment, the image recognition method further comprises:
step S6: receiving and storing, by a signature graph cache, the signature graph from the processing unit array;
step S7: obtaining, by the preprocessing unit, the feature map from the feature map cache; and
step S8: determining by the pre-processing unit whether the feature map is a final feature map according to the first control signal,
if not, the preprocessing unit performs preprocessing on the feature map and the weights to obtain a preprocessing result, and returns to the step S5; and
if so, the preprocessing unit instructs the feature map cache to send the final feature map.
In one embodiment, the image recognition method further comprises:
sending, by the pre-processing unit, a first status signal to the controller after completion of one of a plurality of operations in the pre-processing;
receiving, by the controller, the first state signal, generating a first state control signal according to the first state signal, and sending the first state control signal to the preprocessing unit; and
receiving, by the pre-processing unit, the first state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the first state control signal.
In one embodiment, the image recognition method further comprises:
sending, by the processing unit array, a second status signal to the controller after completion of one of a plurality of operations in the process;
receiving, by the controller, the second state signal, generating a second state control signal according to the second state signal, and transmitting the second state control signal to the processing cell array; and
receiving, by the processing unit array, the second state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the second state control signal.
In one embodiment, the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and performing processing on the preprocessing result according to the second control signal to obtain the feature map includes:
sequentially performing the one or more of convolution processing, activation processing, pooling processing, and full-link processing indicated by the neural network structure information on the preprocessor result according to the second control signal to obtain the feature map.
In one embodiment, performing pre-processing on the image to be recognized and the weights according to the first control signal comprises:
performing resizing and padding processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.
In one embodiment, the profile cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and
wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a first threshold value.
In one embodiment, the weight cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and
wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a second threshold value.
In one embodiment, the pooling process includes maximum pooling or average pooling.
In one embodiment, the activation process is performed using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.
In one embodiment, the image recognition method further comprises: storing, by a general purpose processor, the image to be identified and the weights, and sending the image to be identified and the weights to the weight cache.
In one embodiment, receiving, by the image cache, the image to be identified comprises: the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA), and
receiving, by the weight cache, the weights comprises: receiving, by a weight cache, the weight from the general purpose processor through a second MCDMA.
In one embodiment, the image recognition method further comprises:
sending, by the general purpose processor, a second password to the controller, an
In response to receiving the second password from the general purpose processor, obtaining, by the controller, a first password from a memory and comparing the second password to the first password, allowing the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibiting the image cache and the weight cache from receiving data when the first password is different from the second password.
In one embodiment, the image recognition method further comprises:
receiving, by the general purpose processor, the feature map from the feature map cache, and performing classification and localization processing on the feature map to determine a category and a location of the image to be identified.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 shows a schematic diagram of a neural network-based image recognition apparatus according to an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of a multiplier according to an embodiment of the present disclosure;
FIG. 3 shows a schematic diagram of an adder, according to an embodiment of the disclosure;
FIG. 4 shows a schematic diagram of an activation function according to an embodiment of the present disclosure;
FIG. 5 shows a schematic diagram of a neural network-based image recognition method, according to an embodiment of the present disclosure;
FIG. 6 shows a schematic diagram comparing the effect of image recognition using a neural network based image recognition method according to an embodiment of the present disclosure with the prior art; and
fig. 7 shows a schematic diagram of a neural network-based image recognition system, in accordance with an embodiment of the present disclosure.
The figures do not show all of the circuitry or structures of the embodiments. The same reference numbers will be used throughout the drawings to refer to the same or like parts or features.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Fig. 1 shows a schematic diagram of a neural network-based image recognition apparatus 100 according to an embodiment of the present disclosure. The neural network-based image recognition apparatus 100 may include an image cache 101, a feature map cache 102, a weight cache 103, a controller 104, a preprocessor unit 105, and a processing unit array 106.
Image cache 101 may be configured to receive and store images to be identified. The profile cache 102 may be configured to receive and store a profile. The profile cache 102 may include a plurality of (typically 4 to 16) buffers, which may include a plurality of write buffers and a plurality of read buffers (e.g., FIFOs 1_ a, 2_ a, … … FIFOn-1_ a, FIFOn _ a). In one embodiment, a ping-pong manner may be adopted to perform read-write switching on a plurality of buffer areas, so as to ensure that a predetermined number of buffer areas exist between the buffer area in which data is being written and the buffer area in which data is being read, thereby ensuring that read-write conflicts do not occur in the buffer areas, i.e., the buffer areas are not written or read again when being read or written, and ensuring that the entire data stream is uninterrupted and reducing the total time of data transmission. When the buffer is about to be empty or is already empty, the read priority of the buffer is higher than the write priority, and the other situations are opposite.
The weight buffer 103 may be configured to receive and store weights (or FFT tap coefficients). The weight cache 103 may include a plurality of (typically 4 to 16) cache regions including a plurality of write cache regions and a plurality of read cache regions (e.g., FIFOs 1_ b, 2_ b, … … FIFOn-1_ b, FIFOn _ b). In one embodiment, a ping-pong manner may be adopted to perform read-write switching on a plurality of buffer areas, so as to ensure that a predetermined number of buffer areas exist between the buffer area in which data is being written and the buffer area in which data is being read, thereby ensuring that read-write conflicts do not occur in the buffer areas, i.e., the buffer areas are not written or read again when being read or written, and ensuring that the entire data stream is uninterrupted and reducing the total time of data transmission. When the buffer is about to be empty or is already empty, the read priority of the buffer is higher than the write priority, and the other situations are opposite.
To switch between reading and writing to the plurality of buffer areas in the feature map buffer 102 in the ping pong manner described above, the controller 104 may be configured to monitor a first number of a buffer area in which data is being written among the plurality of write buffer areas in the feature map buffer 102 and a second number of a read buffer area in which data is being read among the plurality of read buffer areas, and to issue an interrupt signal to interrupt the operation of the image recognition apparatus 100 when a difference between the first number and the second number is less than a first threshold (e.g., 1). To perform read/write switching on the plurality of buffer areas in the weight buffer 103 in a ping-pong manner, the controller 104 may be further configured to monitor a first number of a buffer area in which data is being written among the plurality of write buffer areas in the weight buffer 103 and a second number of a read buffer area in which data is being read among the plurality of read buffer areas, and to issue an interrupt signal to interrupt the operation of the image recognition apparatus 100 when a difference between the first number and the second number is smaller than a second threshold (e.g., 1).
The controller 104 may include a configuration information register and may be configured to receive and store configuration information (e.g., from the general-purpose processor 107, as described below) in the configuration information register, generate first and second control signals according to the configuration information, and transmit the first and second control signals. The configuration information may include structural information of the neural network, which may indicate one or more processes to be performed in convolution processing, activation processing, pooling processing, full connection processing, and other processing (e.g., Element-wise, Depthwise, scaled constants, Deconvolution, Dropout, Permute, etc.), whether data is ready, whether BN and regularization are performed, the size of convolution kernel, the number of channels per layer, and the like, and the size of an image to be recognized.
The preprocessing unit 105 may be configured to obtain the image to be recognized and the weight from the image buffer 101 and the weight buffer 103, respectively, receive the first control signal from the controller 104, and perform preprocessing on the image to be recognized and the weight according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result.
The processing unit array 106 may be configured to receive the pre-processing result from the pre-processing unit 105, receive a second control signal from the controller 104, perform processing on the pre-processing result according to the second control signal to obtain a feature map, and send the feature map (e.g., directly or through the pre-processing unit 105) to the feature map buffer 102.
The pre-processing unit 105 may be further configured to: receiving the feature map from the feature map buffer 102, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and transmitting the preprocessing result to the processing unit array 106; if so, the feature map cache 102 is instructed to send the feature map.
The controller 104 may also include status registers and a control state machine. The preprocessing unit 105 may be further configured to transmit a first state signal to the state register after completion of one of the plurality of operations in the preprocessing, receive a first state control signal from the control state machine, and control the preprocessing unit 105 to perform a next operation after the one operation among the plurality of operations according to the first state control signal. The status register may be configured to store the first status signal after receiving the first status signal, and the control state machine may be configured to generate the first status control signal according to the first status signal and to send the first status control signal to the pre-processing unit 105.
The processing unit array 106 is further configured to send a second state signal to the state register after completion of one of the plurality of operations in the process (e.g., directly or through the pre-processing unit 105), receive a second state control signal from the control state machine, and control the processing unit array 106 to perform a next operation of the plurality of operations after the one operation in accordance with the second state control signal. The status register may be further configured to store the second status signal after receiving the second status signal, and the control state machine may be further configured to generate the second status control signal from the second status signal and send the second status control signal to the processing unit array 106 (e.g., directly or through the pre-processing unit 105).
The processing unit array 106 may be further configured to perform processing on the pre-processing result according to the second control signal to obtain the feature map by: and sequentially performing one or more of convolution processing, activation processing, pooling processing, full-link processing, and other processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.
The processing element array 106 may include a plurality of processing elements 1061, a plurality of activation elements 1062, and a plurality of pooling elements 1063. Each processing unit 1061 may include a multiplier array (as shown in fig. 2) and an addition tree (as shown in fig. 3), and may perform convolution processing and full concatenation processing. As each processing unit 1061 performs the convolution process, the input vector is scanned with a first point in the convolution kernel to obtain a first result, then the input vector is scanned with a second point in the convolution kernel to obtain a second result, and so on until the last point in the convolution kernel completes the scan, and then all the results are accumulated together to obtain the convolution result. In this way, the size requirement of the convolution kernel is relaxed, and the input data does not need to be repeatedly transmitted to the on-chip buffer, so that the energy consumption for transmitting the data is reduced. Each activation unit 1062 may perform an activation process using a relu activation function (as shown in fig. 4), a leakey relu activation function (as shown in fig. 4), a sigmod activation function, a tanh activation function, and the like. Each pooling unit 1063 may perform a pooling process, such as maximum pooling or average pooling.
The pre-processing unit 105 may be further configured to perform pre-processing on the image to be recognized and the weights according to the first control signal by: and performing resizing and filling processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight. In one example, when performing a blocking operation, if the processing unit in processing unit array 106 has a dimension of 16 × 16, then the typical blocking size of the pre-processed image is N × 16, where the size of N is determined according to the size of the image buffer and feature map buffer and the size of the feature map required to be output, and dynamic blocking is performed according to configuration information (described below) received by controller 104, and the typical blocking size of the weights is 3 × 16. Furthermore, the preprocessing unit 105 may be further configured to perform an alignment operation on the processed image and the weights to obtain a preprocessing result so as to perform parallel convolution calculation, and send the preprocessing result to the processing unit array 106.
The image recognition apparatus 100 may further include: a general-purpose processor 107 configured to store the image to be recognized and the weights, and to transmit the image to be recognized to the image buffer 101 and the weights to the weight buffer 103. The general purpose processor 107 may be implemented by ARM, RISC-V, MIPS, etc. Before the image recognition device 100 receives the image to be recognized, the general purpose processor calls the underlying function to complete the configuration of the hardware. When the image recognition apparatus 100 receives an image to be recognized, the general-purpose processor 107 receives the image to be recognized from an image capturing terminal (e.g., a camera, an imaging apparatus, etc.) through a network (e.g., a wired network or a wireless network), a USB, or the like, and receives weights from an external memory. Alternatively, the general processor 107 may include a memory configured to store the weight, and when the image recognition apparatus 100 receives the image to be recognized, the general processor 107 receives the image to be recognized from the image capturing end through a network, a USB, or the like, and may receive the weight through the network or the USB at the system initialization and perform a fine-tuning update during use, and a software upgrade update program APP.
The image cache 101 may receive an image to be identified from the general purpose processor 107 through a first multi-channel direct memory access (MCDMA) via an interface (e.g., an AXI interface), and the weight cache 103 may receive weights from the general purpose processor 107 through a second MCDMA via an interface (e.g., an AXI interface). The clock of the first and second MCDMAs when implemented on the FPGA is typically 300M and the transfer throughput is typically 2.2GB/s for a single DMA, and the clock is typically 1G when implemented on the ASIC and the transfer throughput is typically 3GB/s for a single DMA. Of course, embodiments of the present disclosure are not limited to using MCDMA to transfer data, alternatively, bus burst transfer mode may also be used directly to transfer data.
The image recognition apparatus 100 may further include: a memory 108 configured to store a first password.
The general purpose processor 107 may be further configured to send the second password to the controller 108. The controller 108 may be further configured to, in response to receiving the second password from the general purpose processor 107, retrieve the first password from the memory 108 and compare the first password to the second password, allow the image cache 101 and the weight cache 103 to receive data when the first password is the same as the second password, and prohibit the image cache 101 and the weight cache 103 from receiving data when the first password is different from the second password.
The general purpose processor 107 may be further configured to receive the feature map from the feature map cache 102 and perform a full join process on the feature map to determine the category and location of the image to be identified.
In another example, the weight buffer 103 may be further configured to receive and store FFT tap coefficients, the pre-processing unit 105 may be further configured to rearrange the processed image and FFT tap coefficients by FFT structure, then perform an alignment operation on the rearranged structured image and FFT tap coefficients to obtain a pre-processing result so as to perform FFT calculation, and send the pre-processing result to the processing unit array 106, and the processing unit array 106 may be further configured to perform FFT processing on the pre-processing result to obtain an FFT result.
Fig. 5 shows a schematic diagram of a method 500 for performing neural network based image recognition using the neural network based image recognition apparatus 100 according to an embodiment of the present disclosure. The neural network based image recognition method 500 may include the following steps.
In step S1, the image cache 101 may receive and store the image to be recognized. Receiving, by the image cache 101, the image to be recognized may include: the image buffer 101 receives the image to be recognized from the general processor 107 through the first MCDMA.
In step S2, the weight cache 103 may receive and store the weights. The weight cache 103 may include a plurality of cache regions. These buffers may include multiple write buffers and multiple read buffers. Receiving the weights by weight cache 103 may include: the weights are received by the weight cache 103 from the general processor 107 via the second MCDMA.
Prior to steps S1 and S2, the image recognition method 500 may further include: storing, by the general-purpose processor 107, the image to be recognized and the weight, and transmitting the image to be recognized to the image buffer 101 and the weight to the weight buffer 103; and sending, by the general purpose processor 107, the second password to the controller 104, and in response to receiving the second password from the general purpose processor 107, obtaining, by the controller 104, the first password from the memory 108 and comparing the second password with the first password, allowing the image cache 101 and the weight cache 103 to receive data when the first password is the same as the second password, and prohibiting the image cache 101 and the weight cache 103 from receiving data when the first password is different from the second password.
In step S3, the controller may receive the configuration information (e.g., from the general purpose processor 107), generate the first control signal and the second control signal according to the configuration information, and transmit the first control signal and the second control signal. The configuration information may include structural information of the neural network, which may indicate one or more processes to be performed among convolution processing, activation processing, pooling processing, full connection processing, and other processes, whether data is ready, whether BN and regularization are performed, the size of a convolution kernel, the number of channels of each layer, and the like, and the size of an image to be recognized.
In step S4, the preprocessing unit 105 may obtain the image and the weight to be recognized from the image map buffer 101 and the weight buffer 103, respectively, receive the first control signal from the controller 104, perform preprocessing on the image and the weight to be recognized according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result to the processing unit array 106. The preprocessing unit 105 performing preprocessing on the image to be recognized and the weight according to the first control signal may include: and performing resizing and filling processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.
In step S5, the processing unit array 106 may receive the preprocessing result from the preprocessing unit 105, receive a second control signal from the controller 104, perform processing on the preprocessing result according to the second control signal to obtain a feature map, and send the feature map to the feature map buffer 102. Performing the processing on the pre-processing result according to the second control signal to obtain the feature map may include: and sequentially performing one or more of convolution processing, activation processing, pooling processing, full-link processing, and other processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.
In step S6, the feature map cache 102 may receive and store the feature map from the processing element array 106. The profile cache 102 may include a plurality of buffers, which may include a plurality of write buffers and a plurality of read buffers.
In step S7, the preprocessing unit 105 may obtain the feature map from the feature map cache 102.
In step S8, the preprocessing unit 105 determines whether the feature map is the final feature map according to the first control signal, and if not, in step S9, the preprocessing unit 105 performs preprocessing on the feature map and the weights to obtain a preprocessing result, and returns to step S5; and if so, the preprocessing unit 105 instructs the feature map cache 102 to send the final feature map to the general-purpose processor 107 at step S10.
After the final feature map is obtained in step S10, in step S11, the feature map is received by the general processor 107 from the feature map cache 102, and classification and localization processing (e.g., by softmax, concat, etc.) is performed on the feature map to determine the category and location of the image to be identified.
In one embodiment according to the present disclosure, the image recognition method 500 may further include: monitoring a first number of a buffer area in which data is being written in a plurality of writing buffer areas in the characteristic diagram buffer 102 and a second number of a reading buffer area in which data is being read in a plurality of reading buffer areas in the characteristic diagram buffer 102, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a first threshold value. The image recognition method 500 may further include: a first number of a buffer area in which data is being written among a plurality of write buffer areas in the weight buffer 103 and a second number of a read buffer area in which data is being read among a plurality of read buffer areas in the weight buffer 103 are monitored, and when a difference between the first number and the second number is smaller than a second threshold value, an interrupt signal is issued to interrupt the image recognition method.
In one embodiment according to the present disclosure, the image recognition method 500 may further include: transmitting, by the preprocessing unit 105, a first status signal to the controller 104 after completion of one of the plurality of operations in the preprocessing; receiving, by the controller 104, the first state signal, generating a first state control signal according to the first state signal, and sending the first state control signal to the preprocessing unit 105; and receiving, by the preprocessing unit 105, the first state control signal from the controller, and performing a next operation after the one operation among the plurality of operations according to the first state control signal.
In one embodiment according to the present disclosure, the image recognition method 500 may further include: sending, by the processing unit array 106, a second status signal to the controller 104 after completion of one of the plurality of operations in the process; receiving, by the controller 104, the second state signal, generating a second state control signal according to the second state signal, and sending the second state control signal to the processing cell array 106; and receiving, by the processing unit array 106, the second state control signal from the controller 104, and performing a next operation after the one operation among the plurality of operations according to the second state control signal.
Fig. 6 illustrates a schematic diagram comparing the effect of image recognition using a neural network-based image recognition method according to an embodiment of the present disclosure with the prior art. As can be seen from fig. 6, image recognition using a neural network-based image recognition method achieves classification and localization of suspicions in an image to be recognized with low power consumption in a shorter operation time than in the prior art.
According to the embodiment of the disclosure, the characteristics of the neural network are fully utilized to accelerate hardware, image recognition at the edge end can be accelerated, the operation speed is increased, and high energy efficiency and universality can be achieved. The image intelligent identification hardware acceleration method can be applied to various application occasions such as a security inspection security intelligent identification terminal, an intelligent box, an intelligent camera, face identification, object detection and the like, so that the operation speed is increased, and the identification efficiency is improved. The entire network configuration information is transferred to the configuration information register in the controller via the bus, rather than requiring configuration at each step, thereby improving efficiency. And the intermediate results of the whole network, including layer results, are re-input into the characteristic diagram for buffering to prepare for the next calculation, so that the whole process is ensured to be the circulating water operation.
Fig. 7 shows a schematic diagram of a neural network-based image recognition system, in accordance with an embodiment of the present disclosure. The system 700 may include a processor 710, such as a Digital Signal Processor (DSP). Processor 710 may be a single device or multiple devices for performing different acts of the processes described herein. System 700 may also include input/output (I/O) device 730 for receiving signals from or transmitting signals to other entities.
Further, system 700 may include a memory 720, the memory 720 may be of the form: non-volatile or volatile memory, such as electrically erasable programmable read-only memory (EEPROM), flash memory, and the like. Memory 720 may store computer readable instructions that, when executed by processor 710, may cause the processor to perform the actions described herein.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system (e.g., one or more processors). In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The foregoing detailed description has set forth numerous embodiments of neural network-based image recognition methods, apparatus, and systems using schematics, flowcharts, and/or examples. Where such diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of structures, hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described in embodiments of the present disclosure may be implemented by Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure. In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include, but are not limited to: recordable type media such as floppy disks, hard disk drives, Compact Disks (CDs), Digital Versatile Disks (DVDs), digital tape, computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).

Claims (30)

1. An image recognition apparatus based on a neural network, comprising:
an image cache configured to receive and store an image to be recognized;
a signature graph cache configured to receive and store a signature graph;
a weight cache configured to receive and store weights;
a controller including a configuration information register and configured to receive configuration information and store the configuration information in the configuration information register, generate a first control signal and a second control signal according to the configuration information, and transmit the first control signal and the second control signal;
the preprocessing unit is configured to obtain the image to be identified and the weight from the image buffer and the weight buffer, respectively, receive the first control signal from the controller, perform preprocessing on the image to be identified and the weight according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result; and
the processing unit array is configured to receive the preprocessing result from the preprocessing unit, receive the second control signal from the controller, perform processing on the preprocessing result according to the second control signal to obtain a feature map, and send the feature map to the feature map buffer.
2. The image recognition device of claim 1, wherein the pre-processing unit is further configured to: receiving the feature map from the feature map cache, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and sending the preprocessing result to the processing unit array; and if so, instructing the feature map cache to send the feature map.
3. The image recognition device of claim 2, wherein the controller further comprises a status register and a control state machine, wherein
The preprocessing unit is further configured to send a first state signal to the state register after completion of one of a plurality of operations in the preprocessing, receive a first state control signal from the control state machine, and perform a next operation of the plurality of operations after the one operation according to the first state control signal; and
the status register is configured to store the first status signal after receiving the first status signal, and the control state machine is configured to generate the first status control signal according to the first status signal stored in the status register and to transmit the first status control signal to the preprocessing unit.
4. The image recognition apparatus according to claim 3,
the processing unit array is further configured to send a second state signal to the state register after completion of one of a plurality of operations in the process, receive a second state control signal from the control state machine, and perform a next operation of the plurality of operations after the one operation in accordance with the second state control signal; and
the status register is further configured to store the second status signal after receiving the second status signal, and the control state machine is further configured to generate the second status control signal from the second status signal stored in the status register and to send the second status control signal to the array of processing units.
5. The image recognition device of claim 2, wherein each of the plurality of processing units comprises a multiplier array and an adder tree.
6. The image recognition apparatus according to claim 2, wherein the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and
the processing unit array is further configured to perform processing on the pre-processing result according to the second control signal to obtain a feature map by:
sequentially performing the one or more of the convolution processing, the activation processing, the pooling processing, and the full-connection processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.
7. The image recognition device of claim 6, wherein the processing unit array comprises:
a plurality of processing units configured to perform the convolution processing and the full-link processing;
a plurality of activation units configured to perform the activation processing; and
a plurality of pooling units configured to perform the pooling process.
8. The image recognition apparatus according to claim 6, wherein the preprocessing unit is further configured to perform preprocessing on the image to be recognized and the weights according to the first control signal by:
performing resizing and padding processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.
9. The image recognition device of claim 2, wherein the feature map cache comprises a plurality of cache regions including a plurality of write cache regions and a plurality of read cache regions, and
wherein the controller is further configured to monitor a first number of a buffer in the plurality of write buffers that is writing data and a second number of a read buffer in the plurality of read buffers that is reading data, and issue an interrupt signal to interrupt an operation of the image recognition apparatus when a difference between the first number and the second number is less than a first threshold.
10. The image recognition device of claim 2, wherein the weight cache comprises a plurality of cache regions comprising a plurality of write cache regions and a plurality of drainage cache regions, and
wherein the controller is further configured to monitor a first number of a buffer in the plurality of write buffers that is writing data and a second number of a read buffer in the plurality of read buffers that is reading data, and issue an interrupt signal to interrupt an operation of the image recognition apparatus when a difference between the first number and the second number is less than a second threshold.
11. The image recognition device of claim 7, wherein the pooling process includes maximum pooling or average pooling.
12. The image recognition apparatus according to claim 7, wherein the plurality of activation units perform the activation process using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.
13. The image recognition device according to claim 2, further comprising: a general purpose processor configured to store the image to be identified and the weights, and to send the image to be identified and the weights to the weight cache.
14. The image recognition device of claim 13, wherein the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA) and the weight cache receives the weights from the general purpose processor through a second MCDMA.
15. The image recognition device of claim 13, further comprising: a memory configured to store a first password,
wherein the general purpose processor is further configured to send a second password to the controller, and
the controller is further configured to, in response to receiving the second password from the general purpose processor, retrieve the first password from the memory and compare the first password to the second password, allow the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibit the image cache and the weight cache from receiving data when the first password is different from the second password.
16. The image recognition device of claim 13, wherein the general purpose processor is further configured to receive the feature map from the feature map cache and perform classification and localization processing on the feature map to determine a category and a location of the image to be recognized.
17. An image recognition method based on a neural network comprises the following steps:
step S1: receiving and storing an image to be identified by an image cache;
step S2: receiving and storing weights by a weight cache;
step S3: receiving, by a controller, configuration information, generating a first control signal and a second control signal according to the configuration information, and transmitting the first control signal and the second control signal;
step S4: obtaining, by a pre-processing unit, the image to be recognized and the weight from the image buffer and the weight buffer, respectively, receiving the first control signal from the controller, performing pre-processing on the image to be recognized and the weight according to the first control signal to obtain a pre-processing result, and transmitting the pre-processing result, an
Step S5: and receiving the preprocessing result from the preprocessing unit by the processing unit array, receiving the second control signal from the controller, processing the preprocessing result according to the second control signal to obtain a feature map, and sending the feature map to a feature map cache.
18. The image recognition method of claim 17, further comprising:
step S6: receiving and storing, by a signature graph cache, the signature graph from the processing unit array;
step S7: obtaining, by the preprocessing unit, the feature map from the feature map cache; and
step S8: determining by the pre-processing unit whether the feature map is a final feature map according to the first control signal,
if not, the preprocessing unit performs preprocessing on the feature map and the weights to obtain a preprocessing result, and returns to the step S5; and
if so, the preprocessing unit instructs the feature map cache to send the final feature map.
19. The image recognition method of claim 18, further comprising:
sending, by the pre-processing unit, a first status signal to the controller after completion of one of a plurality of operations in the pre-processing;
receiving, by the controller, the first state signal, generating a first state control signal according to the first state signal, and sending the first state control signal to the preprocessing unit; and
receiving, by the pre-processing unit, the first state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the first state control signal.
20. The image recognition method of claim 18, further comprising:
sending, by the processing unit array, a second status signal to the controller after completion of one of a plurality of operations in the process;
receiving, by the controller, the second state signal, generating a second state control signal according to the second state signal, and transmitting the second state control signal to the processing cell array; and
receiving, by the processing unit array, the second state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the second state control signal.
21. The image recognition method of claim 18, wherein the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and the performing of the processing on the preprocessing result according to the second control signal to obtain the feature map includes:
sequentially performing the one or more of convolution processing, activation processing, pooling processing, and full-link processing indicated by the neural network structure information on the preprocessor result according to the second control signal to obtain the feature map.
22. The image recognition method of claim 18, wherein performing preprocessing on the image to be recognized and the weights according to the first control signal comprises:
performing resizing and padding processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.
23. The image recognition method of claim 18, wherein the feature map buffer comprises a plurality of buffer regions, the plurality of buffer regions comprises a plurality of write buffer regions and a plurality of drain buffer regions, and
wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a first threshold value.
24. The image recognition method of claim 18, wherein the weight cache comprises a plurality of buffers, the plurality of buffers comprising a plurality of write buffers and a plurality of read buffers, and
wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a second threshold value.
25. An image recognition method according to claim 20, wherein the pooling process includes maximum pooling or average pooling.
26. The image recognition method of claim 20, wherein the activation process is performed using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.
27. The image recognition method of claim 17, further comprising: storing, by a general purpose processor, the image to be identified and the weights, and sending the image to be identified and the weights to the weight cache.
28. The image recognition method of claim 26, wherein receiving, by the image cache, the image to be recognized comprises: the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA), and
receiving, by the weight cache, the weights comprises: receiving, by a weight cache, the weight from the general purpose processor through a second MCDMA.
29. The image recognition method of claim 26, further comprising:
sending, by the general purpose processor, a second password to the controller, an
In response to receiving the second password from the general purpose processor, obtaining, by the controller, a first password from a memory and comparing the second password to the first password, allowing the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibiting the image cache and the weight cache from receiving data when the first password is different from the second password.
30. The image recognition method of claim 26, further comprising:
receiving, by the general purpose processor, the feature map from the feature map cache, and performing classification and localization processing on the feature map to determine a category and a location of the image to be identified.
CN201911132212.4A 2019-11-18 2019-11-18 Image recognition device and image recognition method based on neural network Active CN112819022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911132212.4A CN112819022B (en) 2019-11-18 2019-11-18 Image recognition device and image recognition method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911132212.4A CN112819022B (en) 2019-11-18 2019-11-18 Image recognition device and image recognition method based on neural network

Publications (2)

Publication Number Publication Date
CN112819022A true CN112819022A (en) 2021-05-18
CN112819022B CN112819022B (en) 2023-11-07

Family

ID=75852790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911132212.4A Active CN112819022B (en) 2019-11-18 2019-11-18 Image recognition device and image recognition method based on neural network

Country Status (1)

Country Link
CN (1) CN112819022B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929329A (en) * 2012-09-28 2013-02-13 无锡江南计算技术研究所 Method for dynamically reconfiguring interconnection network between systems-on-chip
CN107657581A (en) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 Convolutional neural network CNN hardware accelerator and acceleration method
US20180307974A1 (en) * 2017-04-19 2018-10-25 Beijing Deephi Intelligence Technology Co., Ltd. Device for implementing artificial neural network with mutiple instruction units
CN109379532A (en) * 2018-10-08 2019-02-22 长春理工大学 A kind of calculating imaging system and method
US20190114499A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Image preprocessing for generalized image processing
CN109740732A (en) * 2018-12-27 2019-05-10 深圳云天励飞技术有限公司 Neural network processor, convolutional neural networks data multiplexing method and relevant device
US20190164037A1 (en) * 2017-11-29 2019-05-30 Electronics And Telecommunications Research Institute Apparatus for processing convolutional neural network using systolic array and method thereof
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN109948774A (en) * 2019-01-25 2019-06-28 中山大学 Neural network accelerator and its implementation based on network layer binding operation
WO2019127838A1 (en) * 2017-12-29 2019-07-04 国民技术股份有限公司 Method and apparatus for realizing convolutional neural network, terminal, and storage medium
CN110135554A (en) * 2019-03-25 2019-08-16 电子科技大学 A kind of hardware-accelerated framework of convolutional neural networks based on FPGA
CN110390384A (en) * 2019-06-25 2019-10-29 东南大学 A kind of configurable general convolutional neural networks accelerator

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929329A (en) * 2012-09-28 2013-02-13 无锡江南计算技术研究所 Method for dynamically reconfiguring interconnection network between systems-on-chip
US20180307974A1 (en) * 2017-04-19 2018-10-25 Beijing Deephi Intelligence Technology Co., Ltd. Device for implementing artificial neural network with mutiple instruction units
CN107657581A (en) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 Convolutional neural network CNN hardware accelerator and acceleration method
US20190114499A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Image preprocessing for generalized image processing
US20190164037A1 (en) * 2017-11-29 2019-05-30 Electronics And Telecommunications Research Institute Apparatus for processing convolutional neural network using systolic array and method thereof
WO2019127838A1 (en) * 2017-12-29 2019-07-04 国民技术股份有限公司 Method and apparatus for realizing convolutional neural network, terminal, and storage medium
CN109379532A (en) * 2018-10-08 2019-02-22 长春理工大学 A kind of calculating imaging system and method
CN109740732A (en) * 2018-12-27 2019-05-10 深圳云天励飞技术有限公司 Neural network processor, convolutional neural networks data multiplexing method and relevant device
CN109948774A (en) * 2019-01-25 2019-06-28 中山大学 Neural network accelerator and its implementation based on network layer binding operation
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN110135554A (en) * 2019-03-25 2019-08-16 电子科技大学 A kind of hardware-accelerated framework of convolutional neural networks based on FPGA
CN110390384A (en) * 2019-06-25 2019-10-29 东南大学 A kind of configurable general convolutional neural networks accelerator

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUHAO CHEN: "Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs", 《ARXIV》, vol. 2019, pages 1 - 9 *
庞伟: "基于FPGA的Adaboost人脸检测算法设计与验证", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2018, no. 1, pages 138 - 1285 *

Also Published As

Publication number Publication date
CN112819022B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
US11741345B2 (en) Multi-memory on-chip computational network
CN110537194B (en) Power efficient deep neural network processor and method configured for layer and operation protection and dependency management
US20180260710A1 (en) Calculating device and method for a sparsely connected artificial neural network
US11550543B2 (en) Semiconductor memory device employing processing in memory (PIM) and method of operating the semiconductor memory device
US20210065379A1 (en) Hardware-based optical flow acceleration
Pestana et al. A full featured configurable accelerator for object detection with YOLO
CN113051216B (en) MobileNet-SSD target detection device and method based on FPGA acceleration
CN111465943B (en) Integrated circuit and method for neural network processing
CN110574045B (en) Pattern matching for optimized deep network processing
US11775832B2 (en) Device and method for artificial neural network operation
US11948352B2 (en) Speculative training using partial gradients update
US20210350230A1 (en) Data dividing method and processor for convolution operation
CN112005251A (en) Arithmetic processing device
KR20210090260A (en) Lossy Sparse Load SIMD Instruction Family
CN109726809B (en) Hardware implementation circuit of deep learning softmax classifier and control method thereof
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN111552652B (en) Data processing method and device based on artificial intelligence chip and storage medium
CN111813721B (en) Neural network data processing method, device, equipment and storage medium
CN115668222A (en) Data processing method and device of neural network
CN112819022B (en) Image recognition device and image recognition method based on neural network
US20200192797A1 (en) Caching data in artificial neural network computations
US20220318604A1 (en) Sparse machine learning acceleration
US20220391676A1 (en) Quantization evaluator
Bai et al. An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks
Zhang et al. DSP-based traffic target detection for intelligent transportation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant