CN112819022A

CN112819022A - Image recognition device and image recognition method based on neural network

Info

Publication number: CN112819022A
Application number: CN201911132212.4A
Authority: CN
Inventors: 周培涛; 陈志强; 张丽; 李元景; 邓智; 李波; 唐虎; 焦凌云; 孙运达; 邢宇翔; 高河伟; 焦亚涛; 廖磊; 付世航; 岳小兵
Original assignee: Tsinghua University; Nuctech Co Ltd
Current assignee: Tsinghua University; Nuctech Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2021-05-18
Anticipated expiration: 2039-11-18
Also published as: CN112819022B

Abstract

An embodiment of the present disclosure discloses an image recognition apparatus based on a neural network, including: an image cache configured to receive and store images; a signature graph cache configured to receive and store a signature graph; a weight cache configured to receive and store weights; a controller including a configuration information register and configured to receive configuration information, generate first and second control signals according to the configuration information, and transmit the first and second control signals; the image processing device comprises a preprocessing unit, a weight buffer and a controller, wherein the preprocessing unit is configured to obtain images and weights from the image buffer and the weight buffer respectively, receive a first control signal from the controller, and perform preprocessing on the images and the weights according to the first control signal to obtain and send a preprocessing result; and the processing unit array is configured to receive the preprocessing result from the preprocessing unit, receive a second control signal from the controller, and perform processing on the preprocessing result according to the second control signal to obtain and send the characteristic diagram to the special diagram buffer.

Description

Image recognition device and image recognition method based on neural network

Technical Field

The present disclosure relates to the field of image recognition, and in particular, to an image recognition system and an image recognition method based on a neural network.

Background

Image recognition belongs to a typical application scene of a Deep Neural Network (DNN) in the fields of image processing, pattern recognition and machine vision synthesis, and in recent years, the artificial Convolutional Neural Network (CNN) enables great progress to be made in image recognition classification precision and accuracy. In the field of artificial intelligence recognition, two application occasions of cloud computing and edge computing exist. Cloud computing and the like mainly rely on general purpose processors (CPUs) such as servers and Graphics Processing Units (GPUs) for identification. The CPU has the problems of very low speed, relatively high power consumption, difficulty in coordination of the multiprocessor and the like. The GPU has large power consumption and high cost, and is not suitable for edge computing applications. With the development of the internet of things, cloud computing is not always efficient, especially in terms of intelligent image identification, image data generated by an edge end is gradually increased, a transmission process of the image data generates a large delay, and if the data can be processed and analyzed at a node of the edge end, the computing model is more efficient. In the existing image identification hardware acceleration technology aiming at the edge end, some technologies adopt the addition of a floating point calculation unit DSP and an ALU for acceleration, but the technologies have the problems of large power consumption and high cost; some technologies adopt a systolic array to carry out convolution acceleration, but the technologies also have the problems of large power consumption, high cost, poor general acceleration effect caused by only carrying out convolution acceleration and the like; some technologies accelerate the whole parallel processing sub-flow on hardware, but the technologies have the problems of complicated design scheme, no simplification of part of units, poor universality and the like. In the field of security inspection, due to the problems of high requirement on detection speed, a large number of images to be identified, limited field and the like, an image identification device and an image identification method which can improve speed, reduce power consumption and have good universality are needed.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

According to an aspect of an embodiment of the present disclosure, there is provided a neural network-based image recognition apparatus including:

an image cache configured to receive and store an image to be recognized;

a signature graph cache configured to receive and store a signature graph;

a weight cache configured to receive and store weights;

a controller including a configuration information register and configured to receive configuration information and store the configuration information in the configuration information register, generate a first control signal and a second control signal according to the configuration information, and transmit the first control signal and the second control signal;

the preprocessing unit is configured to obtain the image to be identified and the weight from the image buffer and the weight buffer, respectively, receive the first control signal from the controller, perform preprocessing on the image to be identified and the weight according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result; and

the processing unit array is configured to receive the preprocessing result from the preprocessing unit, receive the second control signal from the controller, perform processing on the preprocessing result according to the second control signal to obtain a feature map, and send the feature map to the feature map buffer.

In one embodiment, the pre-processing unit is further configured to: receiving the feature map from the feature map cache, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and sending the preprocessing result to the processing unit array; and if so, instructing the feature map cache to send the feature map.

In one embodiment, the controller further comprises a status register and a control state machine, wherein

The preprocessing unit is further configured to send a first state signal to the state register after completion of one of a plurality of operations in the preprocessing, receive a first state control signal from the control state machine, and perform a next operation of the plurality of operations after the one operation according to the first state control signal; and

the status register is configured to store the first status signal after receiving the first status signal, and the control state machine is configured to generate the first status control signal according to the first status signal stored in the status register and to transmit the first status control signal to the preprocessing unit.

In one embodiment, the processing unit array is further configured to send a second status signal to the status register after completion of one of a plurality of operations in the process, receive a second status control signal from the control state machine, and perform a next one of the plurality of operations after the one operation in accordance with the second status control signal; and

the status register is further configured to store the second status signal after receiving the second status signal, and the control state machine is further configured to generate the second status control signal from the second status signal stored in the status register and to send the second status control signal to the array of processing units.

In one embodiment, each of the plurality of processing units includes a multiplier array and an addition tree.

In one embodiment, the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and

the processing unit array is further configured to perform processing on the pre-processing result according to the second control signal to obtain a feature map by:

sequentially performing the one or more of the convolution processing, the activation processing, the pooling processing, and the full-connection processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.

In one embodiment, the processing unit array comprises:

a plurality of processing units configured to perform the convolution processing and the full-link processing;

a plurality of activation units configured to perform the activation processing; and

a plurality of pooling units configured to perform the pooling process.

In one embodiment, the preprocessing unit is further configured to perform preprocessing on the image to be recognized and the weights according to the first control signal by:

performing resizing and padding processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.

In one embodiment, the profile cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and

wherein the controller is further configured to monitor a first number of a buffer in the plurality of write buffers that is writing data and a second number of a read buffer in the plurality of read buffers that is reading data, and issue an interrupt signal to interrupt an operation of the image recognition apparatus when a difference between the first number and the second number is less than a first threshold.

In one embodiment, the weight cache includes a plurality of buffers including a plurality of write buffers and a plurality of read buffers, and

the controller is further configured to monitor a first number of a buffer area in which data is being written in the plurality of write buffer areas and a second number of a read buffer area in which data is being read in the plurality of drain buffer areas, and send an interrupt signal to interrupt an operation of the image recognition device when a difference between the first number and the second number is smaller than a second threshold.

In one embodiment, the pooling process includes maximum pooling or average pooling.

In one embodiment, the plurality of activation units perform the activation process using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.

In one embodiment, the image recognition apparatus further comprises: a general purpose processor configured to store the image to be identified and the weights, and to send the image to be identified and the weights to the weight cache.

In one embodiment, the image cache receives the image to be identified from the general purpose processor via a first multi-channel direct memory access, MCDMA, and the weight cache receives the weights from the general purpose processor via a second MCDMA.

In one embodiment, the image recognition apparatus further comprises: a memory configured to store a first password,

wherein the general purpose processor is further configured to send a second password to the controller, and

the controller is further configured to, in response to receiving the second password from the general purpose processor, retrieve the first password from the memory and compare the first password to the second password, allow the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibit the image cache and the weight cache from receiving data when the first password is different from the second password.

In one embodiment, the general purpose processor is further configured to receive the feature map from the feature map cache and perform classification and localization processing on the feature map to determine the category and location of the image to be identified.

According to another aspect of an embodiment of the present disclosure, there is provided a neural network-based image recognition method including:

step S1: receiving and storing an image to be identified by an image cache;

step S2: receiving and storing weights by a weight cache;

step S3: receiving, by a controller, configuration information, generating a first control signal and a second control signal according to the configuration information, and transmitting the first control signal and the second control signal;

step S4: obtaining, by a pre-processing unit, the image to be recognized and the weight from the image buffer and the weight buffer, respectively, receiving the first control signal from the controller, performing pre-processing on the image to be recognized and the weight according to the first control signal to obtain a pre-processing result, and transmitting the pre-processing result, an

Step S5: and receiving the preprocessing result from the preprocessing unit by the processing unit array, receiving the second control signal from the controller, processing the preprocessing result according to the second control signal to obtain a feature map, and sending the feature map to a feature map cache.

In one embodiment, the image recognition method further comprises:

step S6: receiving and storing, by a signature graph cache, the signature graph from the processing unit array;

step S7: obtaining, by the preprocessing unit, the feature map from the feature map cache; and

step S8: determining by the pre-processing unit whether the feature map is a final feature map according to the first control signal,

if not, the preprocessing unit performs preprocessing on the feature map and the weights to obtain a preprocessing result, and returns to the step S5; and

if so, the preprocessing unit instructs the feature map cache to send the final feature map.

In one embodiment, the image recognition method further comprises:

sending, by the pre-processing unit, a first status signal to the controller after completion of one of a plurality of operations in the pre-processing;

receiving, by the controller, the first state signal, generating a first state control signal according to the first state signal, and sending the first state control signal to the preprocessing unit; and

receiving, by the pre-processing unit, the first state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the first state control signal.

In one embodiment, the image recognition method further comprises:

sending, by the processing unit array, a second status signal to the controller after completion of one of a plurality of operations in the process;

receiving, by the controller, the second state signal, generating a second state control signal according to the second state signal, and transmitting the second state control signal to the processing cell array; and

receiving, by the processing unit array, the second state control signal from the controller, and performing a next operation of the plurality of operations after the one operation according to the second state control signal.

In one embodiment, the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and performing processing on the preprocessing result according to the second control signal to obtain the feature map includes:

sequentially performing the one or more of convolution processing, activation processing, pooling processing, and full-link processing indicated by the neural network structure information on the preprocessor result according to the second control signal to obtain the feature map.

In one embodiment, performing pre-processing on the image to be recognized and the weights according to the first control signal comprises:

wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a first threshold value.

wherein the image recognition method further comprises: monitoring a first number of a buffer area in which data is being written in the plurality of writing buffer areas and a second number of a reading buffer area in which data is being read in the plurality of reading buffer areas, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a second threshold value.

In one embodiment, the activation process is performed using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.

In one embodiment, the image recognition method further comprises: storing, by a general purpose processor, the image to be identified and the weights, and sending the image to be identified and the weights to the weight cache.

In one embodiment, receiving, by the image cache, the image to be identified comprises: the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA), and

receiving, by the weight cache, the weights comprises: receiving, by a weight cache, the weight from the general purpose processor through a second MCDMA.

In one embodiment, the image recognition method further comprises:

sending, by the general purpose processor, a second password to the controller, an

In response to receiving the second password from the general purpose processor, obtaining, by the controller, a first password from a memory and comparing the second password to the first password, allowing the image cache and the weight cache to receive data when the first password is the same as the second password, and prohibiting the image cache and the weight cache from receiving data when the first password is different from the second password.

In one embodiment, the image recognition method further comprises:

receiving, by the general purpose processor, the feature map from the feature map cache, and performing classification and localization processing on the feature map to determine a category and a location of the image to be identified.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 shows a schematic diagram of a neural network-based image recognition apparatus according to an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a multiplier according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an adder, according to an embodiment of the disclosure;

FIG. 4 shows a schematic diagram of an activation function according to an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of a neural network-based image recognition method, according to an embodiment of the present disclosure;

FIG. 6 shows a schematic diagram comparing the effect of image recognition using a neural network based image recognition method according to an embodiment of the present disclosure with the prior art; and

fig. 7 shows a schematic diagram of a neural network-based image recognition system, in accordance with an embodiment of the present disclosure.

The figures do not show all of the circuitry or structures of the embodiments. The same reference numbers will be used throughout the drawings to refer to the same or like parts or features.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Fig. 1 shows a schematic diagram of a neural network-based image recognition apparatus 100 according to an embodiment of the present disclosure. The neural network-based image recognition apparatus 100 may include an image cache 101, a feature map cache 102, a weight cache 103, a controller 104, a preprocessor unit 105, and a processing unit array 106.

Image cache 101 may be configured to receive and store images to be identified. The profile cache 102 may be configured to receive and store a profile. The profile cache 102 may include a plurality of (typically 4 to 16) buffers, which may include a plurality of write buffers and a plurality of read buffers (e.g., FIFOs 1_ a, 2_ a, … … FIFOn-1_ a, FIFOn _ a). In one embodiment, a ping-pong manner may be adopted to perform read-write switching on a plurality of buffer areas, so as to ensure that a predetermined number of buffer areas exist between the buffer area in which data is being written and the buffer area in which data is being read, thereby ensuring that read-write conflicts do not occur in the buffer areas, i.e., the buffer areas are not written or read again when being read or written, and ensuring that the entire data stream is uninterrupted and reducing the total time of data transmission. When the buffer is about to be empty or is already empty, the read priority of the buffer is higher than the write priority, and the other situations are opposite.

The weight buffer 103 may be configured to receive and store weights (or FFT tap coefficients). The weight cache 103 may include a plurality of (typically 4 to 16) cache regions including a plurality of write cache regions and a plurality of read cache regions (e.g., FIFOs 1_ b, 2_ b, … … FIFOn-1_ b, FIFOn _ b). In one embodiment, a ping-pong manner may be adopted to perform read-write switching on a plurality of buffer areas, so as to ensure that a predetermined number of buffer areas exist between the buffer area in which data is being written and the buffer area in which data is being read, thereby ensuring that read-write conflicts do not occur in the buffer areas, i.e., the buffer areas are not written or read again when being read or written, and ensuring that the entire data stream is uninterrupted and reducing the total time of data transmission. When the buffer is about to be empty or is already empty, the read priority of the buffer is higher than the write priority, and the other situations are opposite.

To switch between reading and writing to the plurality of buffer areas in the feature map buffer 102 in the ping pong manner described above, the controller 104 may be configured to monitor a first number of a buffer area in which data is being written among the plurality of write buffer areas in the feature map buffer 102 and a second number of a read buffer area in which data is being read among the plurality of read buffer areas, and to issue an interrupt signal to interrupt the operation of the image recognition apparatus 100 when a difference between the first number and the second number is less than a first threshold (e.g., 1). To perform read/write switching on the plurality of buffer areas in the weight buffer 103 in a ping-pong manner, the controller 104 may be further configured to monitor a first number of a buffer area in which data is being written among the plurality of write buffer areas in the weight buffer 103 and a second number of a read buffer area in which data is being read among the plurality of read buffer areas, and to issue an interrupt signal to interrupt the operation of the image recognition apparatus 100 when a difference between the first number and the second number is smaller than a second threshold (e.g., 1).

The controller 104 may include a configuration information register and may be configured to receive and store configuration information (e.g., from the general-purpose processor 107, as described below) in the configuration information register, generate first and second control signals according to the configuration information, and transmit the first and second control signals. The configuration information may include structural information of the neural network, which may indicate one or more processes to be performed in convolution processing, activation processing, pooling processing, full connection processing, and other processing (e.g., Element-wise, Depthwise, scaled constants, Deconvolution, Dropout, Permute, etc.), whether data is ready, whether BN and regularization are performed, the size of convolution kernel, the number of channels per layer, and the like, and the size of an image to be recognized.

The preprocessing unit 105 may be configured to obtain the image to be recognized and the weight from the image buffer 101 and the weight buffer 103, respectively, receive the first control signal from the controller 104, and perform preprocessing on the image to be recognized and the weight according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result.

The processing unit array 106 may be configured to receive the pre-processing result from the pre-processing unit 105, receive a second control signal from the controller 104, perform processing on the pre-processing result according to the second control signal to obtain a feature map, and send the feature map (e.g., directly or through the pre-processing unit 105) to the feature map buffer 102.

The pre-processing unit 105 may be further configured to: receiving the feature map from the feature map buffer 102, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and transmitting the preprocessing result to the processing unit array 106; if so, the feature map cache 102 is instructed to send the feature map.

The controller 104 may also include status registers and a control state machine. The preprocessing unit 105 may be further configured to transmit a first state signal to the state register after completion of one of the plurality of operations in the preprocessing, receive a first state control signal from the control state machine, and control the preprocessing unit 105 to perform a next operation after the one operation among the plurality of operations according to the first state control signal. The status register may be configured to store the first status signal after receiving the first status signal, and the control state machine may be configured to generate the first status control signal according to the first status signal and to send the first status control signal to the pre-processing unit 105.

The processing unit array 106 is further configured to send a second state signal to the state register after completion of one of the plurality of operations in the process (e.g., directly or through the pre-processing unit 105), receive a second state control signal from the control state machine, and control the processing unit array 106 to perform a next operation of the plurality of operations after the one operation in accordance with the second state control signal. The status register may be further configured to store the second status signal after receiving the second status signal, and the control state machine may be further configured to generate the second status control signal from the second status signal and send the second status control signal to the processing unit array 106 (e.g., directly or through the pre-processing unit 105).

The processing unit array 106 may be further configured to perform processing on the pre-processing result according to the second control signal to obtain the feature map by: and sequentially performing one or more of convolution processing, activation processing, pooling processing, full-link processing, and other processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.

The processing element array 106 may include a plurality of processing elements 1061, a plurality of activation elements 1062, and a plurality of pooling elements 1063. Each processing unit 1061 may include a multiplier array (as shown in fig. 2) and an addition tree (as shown in fig. 3), and may perform convolution processing and full concatenation processing. As each processing unit 1061 performs the convolution process, the input vector is scanned with a first point in the convolution kernel to obtain a first result, then the input vector is scanned with a second point in the convolution kernel to obtain a second result, and so on until the last point in the convolution kernel completes the scan, and then all the results are accumulated together to obtain the convolution result. In this way, the size requirement of the convolution kernel is relaxed, and the input data does not need to be repeatedly transmitted to the on-chip buffer, so that the energy consumption for transmitting the data is reduced. Each activation unit 1062 may perform an activation process using a relu activation function (as shown in fig. 4), a leakey relu activation function (as shown in fig. 4), a sigmod activation function, a tanh activation function, and the like. Each pooling unit 1063 may perform a pooling process, such as maximum pooling or average pooling.

The pre-processing unit 105 may be further configured to perform pre-processing on the image to be recognized and the weights according to the first control signal by: and performing resizing and filling processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight. In one example, when performing a blocking operation, if the processing unit in processing unit array 106 has a dimension of 16 × 16, then the typical blocking size of the pre-processed image is N × 16, where the size of N is determined according to the size of the image buffer and feature map buffer and the size of the feature map required to be output, and dynamic blocking is performed according to configuration information (described below) received by controller 104, and the typical blocking size of the weights is 3 × 16. Furthermore, the preprocessing unit 105 may be further configured to perform an alignment operation on the processed image and the weights to obtain a preprocessing result so as to perform parallel convolution calculation, and send the preprocessing result to the processing unit array 106.

The image recognition apparatus 100 may further include: a general-purpose processor 107 configured to store the image to be recognized and the weights, and to transmit the image to be recognized to the image buffer 101 and the weights to the weight buffer 103. The general purpose processor 107 may be implemented by ARM, RISC-V, MIPS, etc. Before the image recognition device 100 receives the image to be recognized, the general purpose processor calls the underlying function to complete the configuration of the hardware. When the image recognition apparatus 100 receives an image to be recognized, the general-purpose processor 107 receives the image to be recognized from an image capturing terminal (e.g., a camera, an imaging apparatus, etc.) through a network (e.g., a wired network or a wireless network), a USB, or the like, and receives weights from an external memory. Alternatively, the general processor 107 may include a memory configured to store the weight, and when the image recognition apparatus 100 receives the image to be recognized, the general processor 107 receives the image to be recognized from the image capturing end through a network, a USB, or the like, and may receive the weight through the network or the USB at the system initialization and perform a fine-tuning update during use, and a software upgrade update program APP.

The image cache 101 may receive an image to be identified from the general purpose processor 107 through a first multi-channel direct memory access (MCDMA) via an interface (e.g., an AXI interface), and the weight cache 103 may receive weights from the general purpose processor 107 through a second MCDMA via an interface (e.g., an AXI interface). The clock of the first and second MCDMAs when implemented on the FPGA is typically 300M and the transfer throughput is typically 2.2GB/s for a single DMA, and the clock is typically 1G when implemented on the ASIC and the transfer throughput is typically 3GB/s for a single DMA. Of course, embodiments of the present disclosure are not limited to using MCDMA to transfer data, alternatively, bus burst transfer mode may also be used directly to transfer data.

The image recognition apparatus 100 may further include: a memory 108 configured to store a first password.

The general purpose processor 107 may be further configured to send the second password to the controller 108. The controller 108 may be further configured to, in response to receiving the second password from the general purpose processor 107, retrieve the first password from the memory 108 and compare the first password to the second password, allow the image cache 101 and the weight cache 103 to receive data when the first password is the same as the second password, and prohibit the image cache 101 and the weight cache 103 from receiving data when the first password is different from the second password.

The general purpose processor 107 may be further configured to receive the feature map from the feature map cache 102 and perform a full join process on the feature map to determine the category and location of the image to be identified.

In another example, the weight buffer 103 may be further configured to receive and store FFT tap coefficients, the pre-processing unit 105 may be further configured to rearrange the processed image and FFT tap coefficients by FFT structure, then perform an alignment operation on the rearranged structured image and FFT tap coefficients to obtain a pre-processing result so as to perform FFT calculation, and send the pre-processing result to the processing unit array 106, and the processing unit array 106 may be further configured to perform FFT processing on the pre-processing result to obtain an FFT result.

Fig. 5 shows a schematic diagram of a method 500 for performing neural network based image recognition using the neural network based image recognition apparatus 100 according to an embodiment of the present disclosure. The neural network based image recognition method 500 may include the following steps.

In step S1, the image cache 101 may receive and store the image to be recognized. Receiving, by the image cache 101, the image to be recognized may include: the image buffer 101 receives the image to be recognized from the general processor 107 through the first MCDMA.

In step S2, the weight cache 103 may receive and store the weights. The weight cache 103 may include a plurality of cache regions. These buffers may include multiple write buffers and multiple read buffers. Receiving the weights by weight cache 103 may include: the weights are received by the weight cache 103 from the general processor 107 via the second MCDMA.

Prior to steps S1 and S2, the image recognition method 500 may further include: storing, by the general-purpose processor 107, the image to be recognized and the weight, and transmitting the image to be recognized to the image buffer 101 and the weight to the weight buffer 103; and sending, by the general purpose processor 107, the second password to the controller 104, and in response to receiving the second password from the general purpose processor 107, obtaining, by the controller 104, the first password from the memory 108 and comparing the second password with the first password, allowing the image cache 101 and the weight cache 103 to receive data when the first password is the same as the second password, and prohibiting the image cache 101 and the weight cache 103 from receiving data when the first password is different from the second password.

In step S3, the controller may receive the configuration information (e.g., from the general purpose processor 107), generate the first control signal and the second control signal according to the configuration information, and transmit the first control signal and the second control signal. The configuration information may include structural information of the neural network, which may indicate one or more processes to be performed among convolution processing, activation processing, pooling processing, full connection processing, and other processes, whether data is ready, whether BN and regularization are performed, the size of a convolution kernel, the number of channels of each layer, and the like, and the size of an image to be recognized.

In step S4, the preprocessing unit 105 may obtain the image and the weight to be recognized from the image map buffer 101 and the weight buffer 103, respectively, receive the first control signal from the controller 104, perform preprocessing on the image and the weight to be recognized according to the first control signal to obtain a preprocessing result, and transmit the preprocessing result to the processing unit array 106. The preprocessing unit 105 performing preprocessing on the image to be recognized and the weight according to the first control signal may include: and performing resizing and filling processing on the image to be recognized according to the structural information of the neural network and the size of the image to be recognized to obtain a processed image and performing blocking operation on the processed image and the weight.

In step S5, the processing unit array 106 may receive the preprocessing result from the preprocessing unit 105, receive a second control signal from the controller 104, perform processing on the preprocessing result according to the second control signal to obtain a feature map, and send the feature map to the feature map buffer 102. Performing the processing on the pre-processing result according to the second control signal to obtain the feature map may include: and sequentially performing one or more of convolution processing, activation processing, pooling processing, full-link processing, and other processing indicated by structural information of the neural network on the preprocessor result according to the second control signal to obtain the feature map.

In step S6, the feature map cache 102 may receive and store the feature map from the processing element array 106. The profile cache 102 may include a plurality of buffers, which may include a plurality of write buffers and a plurality of read buffers.

In step S7, the preprocessing unit 105 may obtain the feature map from the feature map cache 102.

In step S8, the preprocessing unit 105 determines whether the feature map is the final feature map according to the first control signal, and if not, in step S9, the preprocessing unit 105 performs preprocessing on the feature map and the weights to obtain a preprocessing result, and returns to step S5; and if so, the preprocessing unit 105 instructs the feature map cache 102 to send the final feature map to the general-purpose processor 107 at step S10.

After the final feature map is obtained in step S10, in step S11, the feature map is received by the general processor 107 from the feature map cache 102, and classification and localization processing (e.g., by softmax, concat, etc.) is performed on the feature map to determine the category and location of the image to be identified.

In one embodiment according to the present disclosure, the image recognition method 500 may further include: monitoring a first number of a buffer area in which data is being written in a plurality of writing buffer areas in the characteristic diagram buffer 102 and a second number of a reading buffer area in which data is being read in a plurality of reading buffer areas in the characteristic diagram buffer 102, and sending an interrupt signal to interrupt the image identification method when the difference between the first number and the second number is smaller than a first threshold value. The image recognition method 500 may further include: a first number of a buffer area in which data is being written among a plurality of write buffer areas in the weight buffer 103 and a second number of a read buffer area in which data is being read among a plurality of read buffer areas in the weight buffer 103 are monitored, and when a difference between the first number and the second number is smaller than a second threshold value, an interrupt signal is issued to interrupt the image recognition method.

In one embodiment according to the present disclosure, the image recognition method 500 may further include: transmitting, by the preprocessing unit 105, a first status signal to the controller 104 after completion of one of the plurality of operations in the preprocessing; receiving, by the controller 104, the first state signal, generating a first state control signal according to the first state signal, and sending the first state control signal to the preprocessing unit 105; and receiving, by the preprocessing unit 105, the first state control signal from the controller, and performing a next operation after the one operation among the plurality of operations according to the first state control signal.

In one embodiment according to the present disclosure, the image recognition method 500 may further include: sending, by the processing unit array 106, a second status signal to the controller 104 after completion of one of the plurality of operations in the process; receiving, by the controller 104, the second state signal, generating a second state control signal according to the second state signal, and sending the second state control signal to the processing cell array 106; and receiving, by the processing unit array 106, the second state control signal from the controller 104, and performing a next operation after the one operation among the plurality of operations according to the second state control signal.

Fig. 6 illustrates a schematic diagram comparing the effect of image recognition using a neural network-based image recognition method according to an embodiment of the present disclosure with the prior art. As can be seen from fig. 6, image recognition using a neural network-based image recognition method achieves classification and localization of suspicions in an image to be recognized with low power consumption in a shorter operation time than in the prior art.

According to the embodiment of the disclosure, the characteristics of the neural network are fully utilized to accelerate hardware, image recognition at the edge end can be accelerated, the operation speed is increased, and high energy efficiency and universality can be achieved. The image intelligent identification hardware acceleration method can be applied to various application occasions such as a security inspection security intelligent identification terminal, an intelligent box, an intelligent camera, face identification, object detection and the like, so that the operation speed is increased, and the identification efficiency is improved. The entire network configuration information is transferred to the configuration information register in the controller via the bus, rather than requiring configuration at each step, thereby improving efficiency. And the intermediate results of the whole network, including layer results, are re-input into the characteristic diagram for buffering to prepare for the next calculation, so that the whole process is ensured to be the circulating water operation.

Fig. 7 shows a schematic diagram of a neural network-based image recognition system, in accordance with an embodiment of the present disclosure. The system 700 may include a processor 710, such as a Digital Signal Processor (DSP). Processor 710 may be a single device or multiple devices for performing different acts of the processes described herein. System 700 may also include input/output (I/O) device 730 for receiving signals from or transmitting signals to other entities.

Further, system 700 may include a memory 720, the memory 720 may be of the form: non-volatile or volatile memory, such as electrically erasable programmable read-only memory (EEPROM), flash memory, and the like. Memory 720 may store computer readable instructions that, when executed by processor 710, may cause the processor to perform the actions described herein.

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system (e.g., one or more processors). In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.

The foregoing detailed description has set forth numerous embodiments of neural network-based image recognition methods, apparatus, and systems using schematics, flowcharts, and/or examples. Where such diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of structures, hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described in embodiments of the present disclosure may be implemented by Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure. In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include, but are not limited to: recordable type media such as floppy disks, hard disk drives, Compact Disks (CDs), Digital Versatile Disks (DVDs), digital tape, computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).

Claims

1. An image recognition apparatus based on a neural network, comprising:

an image cache configured to receive and store an image to be recognized;

a signature graph cache configured to receive and store a signature graph;

a weight cache configured to receive and store weights;

2. The image recognition device of claim 1, wherein the pre-processing unit is further configured to: receiving the feature map from the feature map cache, determining whether the feature map is a final feature map according to the first control signal, if not, performing preprocessing on the feature map and the weights to obtain a preprocessing result, and sending the preprocessing result to the processing unit array; and if so, instructing the feature map cache to send the feature map.

3. The image recognition device of claim 2, wherein the controller further comprises a status register and a control state machine, wherein

4. The image recognition apparatus according to claim 3,

the processing unit array is further configured to send a second state signal to the state register after completion of one of a plurality of operations in the process, receive a second state control signal from the control state machine, and perform a next operation of the plurality of operations after the one operation in accordance with the second state control signal; and

5. The image recognition device of claim 2, wherein each of the plurality of processing units comprises a multiplier array and an adder tree.

6. The image recognition apparatus according to claim 2, wherein the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and

7. The image recognition device of claim 6, wherein the processing unit array comprises:

a plurality of pooling units configured to perform the pooling process.

8. The image recognition apparatus according to claim 6, wherein the preprocessing unit is further configured to perform preprocessing on the image to be recognized and the weights according to the first control signal by:

9. The image recognition device of claim 2, wherein the feature map cache comprises a plurality of cache regions including a plurality of write cache regions and a plurality of read cache regions, and

10. The image recognition device of claim 2, wherein the weight cache comprises a plurality of cache regions comprising a plurality of write cache regions and a plurality of drainage cache regions, and

wherein the controller is further configured to monitor a first number of a buffer in the plurality of write buffers that is writing data and a second number of a read buffer in the plurality of read buffers that is reading data, and issue an interrupt signal to interrupt an operation of the image recognition apparatus when a difference between the first number and the second number is less than a second threshold.

11. The image recognition device of claim 7, wherein the pooling process includes maximum pooling or average pooling.

12. The image recognition apparatus according to claim 7, wherein the plurality of activation units perform the activation process using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.

13. The image recognition device according to claim 2, further comprising: a general purpose processor configured to store the image to be identified and the weights, and to send the image to be identified and the weights to the weight cache.

14. The image recognition device of claim 13, wherein the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA) and the weight cache receives the weights from the general purpose processor through a second MCDMA.

15. The image recognition device of claim 13, further comprising: a memory configured to store a first password,

16. The image recognition device of claim 13, wherein the general purpose processor is further configured to receive the feature map from the feature map cache and perform classification and localization processing on the feature map to determine a category and a location of the image to be recognized.

17. An image recognition method based on a neural network comprises the following steps:

step S1: receiving and storing an image to be identified by an image cache;

step S2: receiving and storing weights by a weight cache;

18. The image recognition method of claim 17, further comprising:

19. The image recognition method of claim 18, further comprising:

20. The image recognition method of claim 18, further comprising:

21. The image recognition method of claim 18, wherein the configuration information includes structural information of the neural network and a size of the image to be recognized, wherein the structural information of the neural network indicates one or more of convolution processing, activation processing, pooling processing, and full-connection processing to be performed, and the performing of the processing on the preprocessing result according to the second control signal to obtain the feature map includes:

22. The image recognition method of claim 18, wherein performing preprocessing on the image to be recognized and the weights according to the first control signal comprises:

23. The image recognition method of claim 18, wherein the feature map buffer comprises a plurality of buffer regions, the plurality of buffer regions comprises a plurality of write buffer regions and a plurality of drain buffer regions, and

24. The image recognition method of claim 18, wherein the weight cache comprises a plurality of buffers, the plurality of buffers comprising a plurality of write buffers and a plurality of read buffers, and

25. An image recognition method according to claim 20, wherein the pooling process includes maximum pooling or average pooling.

26. The image recognition method of claim 20, wherein the activation process is performed using a relu activation function, a leakey relu activation function, a sigmod activation function, or a tanh activation function.

27. The image recognition method of claim 17, further comprising: storing, by a general purpose processor, the image to be identified and the weights, and sending the image to be identified and the weights to the weight cache.

28. The image recognition method of claim 26, wherein receiving, by the image cache, the image to be recognized comprises: the image cache receives the image to be recognized from the general purpose processor through a first multi-channel direct memory access (MCDMA), and

29. The image recognition method of claim 26, further comprising:

30. The image recognition method of claim 26, further comprising: