CN110163042B

CN110163042B - Image recognition method and device

Info

Publication number: CN110163042B
Application number: CN201810331869.2A
Authority: CN
Inventors: 欧阳鹏; 赵巍胜; 张有光
Original assignee: Tencent Technology Shenzhen Co Ltd; Beihang University
Current assignee: Tencent Technology Shenzhen Co Ltd; Beihang University
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2023-05-30
Anticipated expiration: 2038-04-13
Also published as: CN110163042A

Abstract

The invention discloses an image recognition method and device, and belongs to the technical field of machine learning. The method comprises the following steps: in the image recognition process, when a layer of the deep neural network receives the feature map, acquiring at least one feature vector of the feature map in a scanning window based on the scanning window corresponding to the intra-layer convolution kernel; and filtering a first target element in at least one feature vector and a second target element in at least one weight vector, wherein the phases of the first target element and the second target element are 0, and the position of the first target element in the feature vector is the same as the position of the second target element in the weight vector. Continuing to convolve based on the filtered at least one feature vector and the filtered at least one weight vector; and obtaining a characteristic diagram of the layer output based on the characteristic points obtained by convolution. The invention avoids redundant operation in the convolution processing process, thereby greatly reducing the calculated amount of image recognition and improving the speed and efficiency of image recognition.

Description

Image recognition method and device

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to an image recognition method and apparatus.

Background

With the development of machine learning technology, in various application scenarios such as picture searching and commodity recommendation, a computer device can identify images based on a deep neural network. Specifically, the image to be identified can be input into a deep neural network, and the image identification result can be output through layer-by-layer operation in the deep neural network.

The deep neural network comprises a plurality of layers, each layer comprises at least one convolution kernel (kernel), and each convolution kernel is used for convolving a feature map received by the layer to obtain a feature map output to the next layer. Each convolution kernel can be regarded as a weight matrix with a certain size, each convolution kernel is composed of a plurality of weight elements, and weight elements corresponding to different convolution kernels are different, so that different features in the feature map can be detected.

Specifically, any one of the convolution kernels of any one layer will scan each region of the feature map in turn by sliding over the feature map. When the scanning window slides to any area, each row of elements in the scanning window in the feature map is obtained and used as each feature vector, each row of elements in the convolution kernel is read and used as each weight vector, the number product of the corresponding feature vector and the weight vector is calculated, the number products of all the feature vectors in the scanning window in the feature map are accumulated, and the accumulated result is used as a feature point. Then, each time the scanning window slides once, namely one feature point is output, after the sliding of the scanning window is finished, all the feature points to be output can form a feature map, and the feature map is output to the next layer.

In carrying out the present invention, the inventors have found that the related art has at least the following problems:

when each layer in the deep neural network is subjected to convolution processing, the characteristic elements and the weight elements all show sparsity, namely, different numbers of 0 values exist in the characteristic vector and the weight vector. The 0 value participates in multiplication operation and addition operation, and does not affect the calculation result, so that the 0 value belongs to redundancy operation, and the redundancy operation brings additional energy consumption expense and increases calculation time.

Disclosure of Invention

The embodiment of the invention provides an image recognition method and device, which can solve the problem of overlong calculation time caused by redundant operation of a deep neural network in the related technology. The technical scheme is as follows:

in one aspect, there is provided an image recognition method, the method comprising:

in the image recognition process, when any layer of the deep neural network receives a feature image, at least one feature vector of the feature image in a scanning window is obtained based on the scanning window corresponding to the intra-layer convolution kernel, and the deep neural network is used for carrying out image recognition according to the feature image of an input image;

acquiring at least one weight vector corresponding to the convolution kernel;

Filtering a first target element in the at least one feature vector and a second target element in the at least one weight vector, wherein the first target element and the second target element phase are 0, and the position of the first target element in the feature vector is the same as the position of the second target element in the weight vector;

continuing to convolve based on the filtered at least one feature vector and the filtered at least one weight vector;

and obtaining a feature map output by any layer based on the feature points obtained by convolution.

In another aspect, there is provided an image recognition apparatus, the apparatus including:

the acquisition module is used for acquiring at least one characteristic vector of the characteristic map positioned in a scanning window based on the scanning window corresponding to the intra-layer convolution kernel when any layer of the deep neural network receives the characteristic map in the image recognition process, and the deep neural network is used for carrying out image recognition according to the characteristic map of the input image;

the acquisition module is further used for acquiring at least one weight vector corresponding to the convolution kernel;

a filtering module, configured to filter a first target element in the at least one feature vector and a second target element in the at least one weight vector, where the first target element and the second target element phase are 0, and a position of the first target element in the feature vector is the same as a position of the second target element in the weight vector;

A convolution module for continuing to convolve based on the filtered at least one feature vector and the filtered at least one weight vector;

and the output module is used for obtaining the characteristic diagram of any layer output based on the characteristic points obtained by convolution.

In another aspect, a computer device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to perform the operations performed by the image recognition method described above.

In another aspect, a computer readable storage medium having stored therein at least one instruction that is loaded and executed by the processor to implement the operations performed by the image recognition method described above is provided.

The technical scheme provided by the embodiment of the invention has the beneficial effects that:

in the method and the device provided by the embodiment of the invention, in the process of image recognition based on the deep neural network, the redundant first target element in the feature vector and the redundant second target element in the weight vector are filtered, and the convolution processing is carried out based on the filtered feature vector and the filtered weight vector, so that the invalid operation caused by the first target element and the second target element is avoided, the calculated amount of the convolution processing is greatly reduced, the speed of image recognition is accelerated, and the efficiency of image recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a convolution process for feature maps according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolution process for feature maps according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of an implementation environment provided by an embodiment of the present invention;

FIG. 4 is a flowchart of an image recognition method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a scanning window sliding on a feature map according to an embodiment of the present invention;

FIG. 6 is a circuit diagram of a computing unit according to an embodiment of the present invention;

FIG. 7 is a flowchart of an image recognition method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a partitioning feature block provided by an embodiment of the present invention;

FIG. 9 is a circuit block diagram for implementing a deep neural network provided by an embodiment of the present invention;

Fig. 10 is a schematic structural diagram of an image recognition device according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to facilitate understanding of the technical process of the present invention, first, the basic principle of the deep neural network will be described:

overall structure of deep neural network: the deep neural network is constructed by N layers (N is a positive integer greater than 1), the output of any layer can be used as the input of the next layer, for example, the feature map output by the first layer can be used as the feature map input by the second layer, the feature map output by the second layer can be used as the feature map input by the third layer, and the like, and the feature map is directly output until the Nth layer.

Layer of deep neural network: from the perspective of the locations of the layers, the layers of the deep neural network may be divided into an input layer (input layer) referring to the first layer, an output layer (output layer) referring to the last layer, and a hidden layer (hidden layer) referring to each layer between the input layer and the output layer. From the aspect of the operation mode of the layers, the layers of the deep neural network can be divided into a convolution layer, a full-connection layer, a recursion layer, an activation function layer and a pooling layer, wherein the convolution layer is a core layer in the deep neural network, and other layers can be regarded as variations of the convolution layer or further process the feature map output by the convolution layer.

Feature map (feature map): and may also be referred to as feature Map, activation Map (Activation Map), activation Map, convolution feature (Convolved Feature), etc., where the feature Map is data input and output by each layer in the deep neural network during image recognition. The feature map output by the 1 st layer to the (N-1) th layer comprises a plurality of feature points, each feature point in the feature map represents a pixel value, the feature map output by the N-th layer may only comprise one feature point, the feature point represents a category to which the image belongs, or the feature map output by the N-th layer may comprise a plurality of feature points, and each feature point represents a category to which the image may belong and a probability of belonging to a corresponding category.

Convolution kernel (kernel): may also be referred to as filters (filters), filters, feature detectors, kernel functions, kernels, weight kernels, etc. The convolution kernel may be regarded as a weight matrix with a scanning window, which comprises a plurality of weight elements (weights), each of which may also be referred to as a parameter, the weight elements corresponding to different convolution kernels being different, so that different features in the feature map can be detected. The size of the convolution kernels (i.e., the size of the weight matrix) may be determined according to the actual service requirements, where the size of each convolution kernel in the convolution layer is m×m (N is a positive integer greater than 1), typically 3×3, and the size of each convolution kernel in the full-connection layer is 1×1. Each convolution kernel is configured to convolve the feature map to obtain an output feature map, and each layer may include at least one convolution kernel through which at least one feature map may be output.

Scanning window: the size of the scanning window is equal to the size of the convolution kernel (i.e. the size of the weight matrix), the step size (stride) of the scanning window is used for determining the sliding distance of the scanning window, and the step size of the scanning window can be 1 according to the actual requirement.

Calculation principle of convolution kernel: convolution processing can be understood as a process of corresponding multiply-and-accumulate. In the process of carrying out convolution processing on the convolution kernel, a scanning window of the convolution kernel slides on the feature map, and each region of the feature map is calculated in sequence by scanning each region of the feature map. When the scanning window of the convolution kernel slides to any region of the feature map, reading each row of feature elements in the region to serve as each feature vector, simultaneously acquiring each row of weight elements of the convolution kernel to serve as each image vector, calculating the corresponding number product of the feature vector and the image vector, accumulating the number product of all the image vectors of the feature map in the scanning window, and taking the accumulated result as a feature point. And then, the scanning window slides by one step length, so that the scanning window is positioned in the next area in the feature map, one feature point is output again, and when the sliding process is finished, all the output feature points can be formed into one feature map and output to the next layer.

Illustratively, referring to fig. 1, there is shown a schematic diagram of a convolution process based on a convolution kernel feature map, the convolution kernel having a size of 3×3, the weight vectors including (2,5,0), (0, 1) and (7,0,4), and a step size of 1. Assuming that the scan window of the convolution kernel is in the region shown in the figure, the feature vectors of the feature map within the scan window include (6,2,0), (0, 3, 7) and (7,5,0), the convolution kernel reads (6,2,0) from the scan window and correspondingly reads the weight vector (2,5,0), and the product of the number of the two is calculated to be 6×2+2×5+0×0=22. And then reading (0, 1) from the scanning window, correspondingly reading weight vectors (0, 3, 7), calculating the number product of the two (0 x 0+1 x 3+1 x 7=10), then reading (7,0,4) from the scanning window, correspondingly reading weight vectors (7,5,0), calculating the number product of the two (7 x 7+0 x 5+4 x 0=49), and then accumulating the three number products (22+10+49=74) as characteristic points of the scanning window. Thereafter, referring to fig. 2, the scan window is slid by one row, and the feature vectors of the feature map located in the scan window include (0, 3, 7), (7,5,0) and (2,0,0), and the product of the three feature vectors and the three weight vectors is calculated again to obtain a feature point. Similarly, when the sliding of the scanning window is finished, all obtained feature points can form an output feature map.

FIG. 3 is a schematic diagram of an implementation environment provided by an embodiment of the present invention. The implementation environment includes a plurality of terminals 301 and a plurality of computer devices 302. The terminals 301 are connected to a plurality of computer devices 302 through a wireless or wired network, where the terminals 301 may be computers, smart phones, tablet computers or other electronic devices, and each computer device 302 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center, and of course, may also be a personal computer, a notebook computer, or the like.

In the image recognition process, the terminal 301 may send the image to be recognized to the computer device 302, and the computer device 302 may receive the image to be recognized, input the image into the deep neural network, and output the recognition result of the image. Optionally, the computer device 302 may also have at least one image database, such as a face database, a merchandise database, etc., for storing images to be identified provided by the terminal 301.

The image recognition method provided by the embodiment of the invention can be widely applied to various actual application scenes, and the actual technical effects of the embodiment of the invention are explained below by combining two exemplary application scenes:

(1) The method can be applied to the scene of face recognition: in the face recognition scenario, the terminal 301 may send the target face image to the computer device 302, and the computer device 302 may receive the target face image, input the target face image into the deep neural network, and output a recognition result of the target face image, for example, output a name of a person corresponding to the face image.

The following is an exemplary description in connection with three specific scenarios for face recognition:

scene (1.1) attendance checking and card punching

When a user arrives at a company and needs to punch a card, a face image can be shot through the terminal, the terminal can send the face image to the computer equipment, the computer equipment identifies the name of the user based on the deep neural network, the user is verified to belong to staff of the company, and the card punching record is carried out when the name of the user is determined to belong to the staff.

Scene (1.2) face Payment

When a user is about to check out in a certain market, a terminal arranged at a check-out place can shoot a face image of the user, the face image is sent to computer equipment, the computer equipment identifies the name of the user based on a deep neural network, and fee deduction operation is carried out on a user account corresponding to the name of the user, so that payment is completed.

Scene (1.3) when applied in a scene of monitoring security,

when a certain bank has a theft event and a certain road section has a traffic accident, the terminal can intercept the face image of the suspected person recorded by the monitoring video, send the face image to the computer equipment, and the computer equipment identifies the face image based on the deep neural network, determines the name of the user and provides the name for the security department.

In the above-mentioned various face recognition scenes, the computer device filters the feature vector and the weight vector in the deep neural network and then convolves the feature vector and the weight vector, so that the calculated amount of face recognition is greatly reduced, and the target face image can be rapidly recognized.

(2) The method can be applied to the scene of commodity recommendation: in a commodity recommendation scenario, the terminal 301 may send a commodity image to be identified to the computer device 302, and the computer device 302 may input the commodity image to be identified into the deep neural network, output a recognition result of the commodity image, for example, output a brand, a model, a name, a material, a store, and the like of the commodity image. Then, the target commodity data matched with the identification result of the commodity image can be searched in the commodity database based on the identification result of the commodity image and recommended to the user.

For example, a user may take a photograph of a certain overcoat through a terminal, the terminal may send the photograph of the overcoat to a computer device, the computer device inputs the photograph of the overcoat to a deep neural network, outputs a brand and model of the overcoat, searches a purchase page of the overcoat in a commodity database according to the brand and model of the overcoat, and provides the purchase page to the user so that the user can purchase the overcoat online.

In a commodity recommendation scene, the computer equipment filters the feature vector and the weight vector in the deep neural network and then convolves the feature vector and the weight vector, so that the calculated amount for identifying the commodity image is greatly reduced, the commodity image can be quickly identified, and further target commodity data can be quickly recommended.

Fig. 4 is a flowchart of an image recognition method provided by an embodiment of the present invention, where an execution body of the embodiment of the present invention is a computer device, and referring to fig. 4, the method includes:

401. in the image recognition process, when any layer of the deep neural network receives the feature map, the computer equipment acquires at least one feature vector of the feature map in a scanning window based on the scanning window corresponding to the intra-layer convolution kernel.

Each layer in the deep neural network may include at least one convolution kernel, and when any layer in the deep neural network receives the feature map, the computer device may process the feature map based on the at least one convolution kernel, to obtain a plurality of feature maps output by the plurality of convolution kernels. In this embodiment, before each convolution kernel convolution process, a step of filtering the feature vector and the weight vector is added, so that the feature element and the weight element with 0 are ensured not to participate in the convolution operation process, thereby avoiding redundant operation caused by the 0 value. For ease of description, the process of performing convolution processing based on any one of the convolution kernels within any one layer is described herein, and in implementations, a layer may include multiple convolution kernels, and a computer device may perform convolution processing based on each convolution kernel in turn in the same manner.

For any convolution kernel within a layer, the computer device will slide over the feature map based on the scan window corresponding to the convolution kernel. When the scanning window is slid to any region of the feature map, the computer device may acquire at least one feature vector of the feature map within the scanning window so as to calculate feature points from the at least one feature vector.

Specifically, the computer device may acquire feature vectors column by column, that is, acquire at least one column of feature elements of the feature map located in the scanning window, and use each column of feature elements as a feature vector to obtain at least one feature vector. The feature vector may be obtained line by line, that is, at least one line of feature elements of the feature map located in the scanning window may be obtained, and each line of feature elements is used as a feature vector to obtain at least one feature vector.

The number and the length of the feature vectors of the feature map in the scanning window are determined according to the size of the scanning window, for example, when the size of the scanning window is 3×3, the number of the feature vectors of the feature map in the scanning window is 3, and the length of each feature vector is 3.

Optionally, the process of acquiring the feature vector of the feature map located in the scanning window by the computer device may include the following step one and step two:

Step one, for a designated area in a scanning window, the computer equipment reads a feature vector of a feature map in the designated area from a shift register, wherein the designated area refers to an overlapped area between the current area of the scanning window and the area before sliding of the window, and the feature vector in the designated area is cached in the shift register when the area before sliding of the window is subjected to convolution processing.

In combination with the above calculation principle of the convolution kernel, it can be seen that the scan window of the convolution kernel has an overlapping region in the sliding process, that is, the current region of the scan window has a repeated feature vector with the region before sliding. For example, please refer to fig. 5, which illustrates a process of sliding the scanning window twice, when the scanning window is located in the area 1 and the area 2, the areas where the scanning window is located twice have overlapping areas (6,2,0) and (0, 3, 7), and when the scanning window is located in the area 2 and the area 3, the areas where the scanning window is located twice have overlapping areas (0, 3, 7) and (7,5,0).

For convenience of description, an overlapping area between an area where the scanning window is currently located and an area where the window is located before sliding is referred to herein as a designated area, an area after the designated area within the scanning window is referred to as a remaining area, for example, please refer to fig. 5, when the scanning window is located in the area 2, the designated area is an area composed of (6,2,0) and (0, 3, 7), the remaining area is an area composed of (7,5,0), and when the scanning window is located in the area 3, the designated area is an area composed of (0, 3, 7) and (7,5,0), the remaining area is an area composed of (2,0,0).

In this embodiment, the manner in which the computer device obtains the feature vector in the designated area is different from the feature vector in the other areas, when the scan window slides to any area of the feature map, the computer device may determine the designated area in the scan window, and since the feature vector in the designated area is not only subordinate to the area where the scan window is currently located, but also subordinate to the area located immediately before the scan window slides, when the computer device performs the convolution processing on the area immediately before the scan window slides, the computer device has previously loaded the feature vector in the designated area into the shift register, and then the computer device does not need to repeatedly load the feature vector in the designated area, and directly reads the feature vector in the designated area from the shift register.

For the process of determining the designated area, the designated area in the scanning window may be determined according to the size of the scanning window of the convolution kernel and the step size of the scanning window. For example, assuming that the size of the scanning window is m×m and the step size is s, the difference j between M and s may be calculated, and an area formed by the 1 st column of feature elements to the j th column of feature elements of the area where the scanning window is currently located is obtained, and the area is taken as the specified area. Accordingly, the region consisting of the j+1th column feature element to the Mth column feature element of the region where the scanning window is currently located can be used as the rest region, and the number of feature vectors of the rest region is equal to s. For example, assuming that the size of the scanning window is 3×3 and the step size is 1, an area composed of the 1 st column feature element and the 2 nd column feature element of the area where the scanning window is currently located may be taken as a designated area, and an area composed of the 3 rd column feature element may be taken as the remaining area.

And step two, for the rest areas after the appointed area in the scanning window, the computer equipment loads the feature vectors of the feature map in the rest areas, and caches the feature vectors in the rest areas into the shift register.

For the rest of the regions in the scan window, the feature vectors in the rest of the regions are feature vectors that have not been cached, and the computer device loads the feature vectors in the rest of the regions for the feature map. Meanwhile, the feature vectors in the other areas can be cached in the shift register, so that when the scanning window slides to the next area again, the other areas become designated areas of the next area, and when the computer equipment acquires the feature vectors of the feature map in the next area, the feature vectors can be directly read from the shift register.

It should be noted that, because the designated area is an overlapping area between the scanning window and the area where the scanning window is located before sliding, when the scanning window is located in the first area of the feature map, the scanning window is not yet slid at this time, the first area does not have the designated area, the computer device loads each feature vector in the first area, and the first and second steps can be adopted to obtain the feature vector in the scanning window in the process that the scanning window slides to the second area of the feature map and even the last area of the feature map.

Illustratively, referring to FIG. 5, when the scan window is in region 1, the computer device loads feature vectors (4,0,1), (6,2,0), and (0, 3, 7) and buffers the 3 feature vectors into the shift register, respectively. When the scan window is slid from region 1 to region 2, the computer device only has to load (7,5,0) without reloading (6,2,0) and (0, 3, 7), but instead reads (6,2,0) and (0, 3, 7) directly from the shift register. Similarly, when the scan window is slid from region 2 to region 3, the computer device only needs to load (2,0,0) without reloading (0, 3, 7) and (7,5,0), but instead reads (0, 3, 7) and (7,5,0) directly from the shift register.

The following technical effects can be achieved by combining the first step and the second step:

in the related art, each time the scanning window slides to any area of the feature map, the computer device needs to load each feature vector of the feature map in the scanning window so as to perform accumulation operation on each feature vector, in this process, the number of loaded feature vectors is large, and the number of times of accessing the memory is extremely large, so that the operation cost and the energy consumption of the computer device are excessive, and the performance of the computer device is affected.

In this embodiment, the first and second steps implement multiplexing of feature vectors in the overlapping area, and when the scanning window slides to the next area, only the feature vectors in the other areas need to be loaded, no feature vector in the designated area needs to be loaded, and the number of loaded feature vectors is small, so that the number of times of accessing the memory is greatly reduced, and the running cost and energy consumption of the computer device are reduced.

402. The computer device obtains at least one weight vector corresponding to the convolution kernel.

The computer device may obtain at least one weight vector corresponding to the convolution kernel to calculate feature points from the at least one feature vector and the at least one weight vector. In particular, the computer device may obtain the weight vector column by column, i.e. obtain at least one column of weight elements in the weight matrix, and use each column of weight elements as one weight vector, thereby obtaining at least one weight vector. Of course, the weight vector may be obtained line by line, that is, at least one line of weight elements in the weight matrix may be obtained, and each line of weight element is used as a weight vector, so as to obtain at least one weight vector, which is only required to ensure that the manner of obtaining the weight vector is consistent with the manner of obtaining the feature vector.

In the implementation, the weight vector of any convolution kernel of any layer may be stored in the weight buffer in advance, and when the weight vector is acquired, the weight vector may be read from the weight buffer.

403. The computer device filters a first target element in the at least one feature vector and a second target element in the at least one weight vector.

For convenience of description, in this embodiment, a feature element filtered out from a feature vector is referred to as a first target element, a weight element filtered out from a weight vector is referred to as a second target element, and the first target element and the second target element simultaneously satisfy the following two conditions:

condition one: the first target element and the second target element phase are 0. That is, either one of the first target element and the second target element is 0, or both are 0, then the product will be 0 after multiplication of the first target element and the second target element.

Condition II: the position of the first target element in the feature vector is the same as the position of the second target element in the weight vector. That is, if the first target element is the nth element in the feature vector, the second target element is the nth element in the weight vector.

In combination with the above description of the working principle of the convolution kernel, it is obvious that the convolution process is to multiply and then accumulate the feature elements and the weight elements correspondingly, and in the process of multiplying correspondingly, the result is 0 when any one of the two multipliers is 0, so it is obvious that, for any feature element and the corresponding weight element, when any one of the two elements is 0, the result of multiplying correspondingly by the two elements is 0, so that the operation of multiplying by the two elements belongs to redundant operation and consumes no significant calculation time.

In this embodiment, before calculating the number product of any feature vector and weight vector, the elements with the same phase and 0 position in the feature vector and weight vector are filtered out in advance, and in the process of calculating the number product of the feature vector and weight vector, for any feature element and corresponding weight element, when both elements are not 0, the product of both elements is calculated, and when either element is 0, both elements are filtered out in advance, so that the operation of corresponding multiplication is not participated, thereby avoiding the process of calculating the two elements with the phase and 0 result, greatly accelerating the operation speed and improving the operation efficiency.

The following is an exemplary description of the specific process of filtering feature vectors and weight vectors through steps one and two:

and step one, performing AND operation on the feature elements and the weight elements with the same positions in any feature vector and the corresponding weight vector to obtain the AND result of the feature elements and the weight elements.

In the convolution processing process, each feature vector of the feature map in the scanning window corresponds to each weight vector in the weight matrix corresponding to the convolution kernel, for example, assuming that the convolution kernel is 3×3 in size, 3 feature vectors exist in the scanning window, the 1 st feature vector in the scanning window corresponds to the 1 st weight vector in the weight matrix, the 2 nd feature vector in the scanning window corresponds to the 2 nd weight vector in the weight matrix, and the 3 rd feature vector in the scanning window corresponds to the 3 rd weight vector in the weight matrix.

Thus, for the ith feature vector in the scan window of m×m size (i refers to a positive integer not less than 1 and not more than M), the corresponding weight vector is referred to as the ith weight vector in the weight matrix, and the computer device performs an and operation on the feature element and the feature element with the same position in the ith feature vector and the ith weight vector to obtain a sum result of the feature element and the weight element, so as to filter the ith feature vector and the ith weight vector according to the sum result of each feature element and weight element.

The AND operation is to perform logical AND operation on two elements, in the process of performing AND operation on the feature element and the weight element, whether the feature element is not 0 can be judged, whether the weight element is not 0 is judged, when the feature element and the weight element are both not 0, the AND result is determined to be 1, and when one of the feature element and the weight element is 0, the AND result is determined to be 0. In implementations, the AND operation may be implemented by logic circuits such as AND gates, NAND gates, NOT gates, and the like.

The specific process of performing AND operation on the feature elements with the same positions in the ith feature vector and the ith weight vector can traverse each element in the feature vector and the weight vector, operate on the 1 st feature element in the feature vector and the 1 st weight element in the weight vector, operate on the 2 nd feature element in the feature vector and the 2 nd weight element in the weight vector, and so on until the operation on the last feature element in the feature vector and the last weight element in the weight vector is completed.

And step two, the computer equipment filters a first target element from the characteristic vector and filters a second target element from the weight vector according to the sum result of the characteristic element and the weight element to obtain a filtered characteristic vector and a filtered weight vector, wherein the sum result of the first target element and the second target element is 0.

After the result of the summation of the feature element and the weight element is obtained, if the result of the summation of the feature element and the weight element is 0, which indicates that at least one of the two elements is 0, the computer device can filter the two elements from the two vectors respectively, and if the result of the summation of the feature element and the weight element is 1, which indicates that neither of the two elements is 0, the computer device can reserve the two elements. Then, according to the result of the summation of the feature element and the weight element, the computer device can filter the feature vector and the weight vector to obtain two filtered vectors, the filtered feature vector does not contain the first target element, and the filtered weight vector does not contain the second target element, so that the calculation redundancy caused by the first target element and the second target element can be avoided when the convolution processing is continuously carried out according to the two filtered vectors, and the speed of the convolution processing is greatly accelerated.

Regarding a specific implementation manner of filtering the feature vector and the weight vector according to the sum result of the feature element and the weight element, illustratively, the present embodiment provides the following implementation one and implementation two:

the first embodiment is implemented by adding a preset mark to the characteristic element and the weight element which are 1 with the result, and not adding a preset mark to the characteristic element and the weight element which are other numerical values with the result, so that the characteristic element and the weight element which are different from the result can be distinguished through the existence of the preset mark, so that the characteristic vector and the weight vector can be filtered.

Specifically, implementation may include the following (1) to (3).

(1) For any feature element in the feature vector and the weight element at the corresponding position in the weight vector, when the result of the summation of the feature element and the weight element is 1, the computer equipment adds a preset mark for the feature element and the weight element, wherein the preset mark is used for indicating that the corresponding element needs to be reserved.

For any feature element and weight element, the computer device may determine whether the sum result of the feature element and weight element is 1, and when the sum result of the feature element and weight element is 1, add a preset flag for the feature element and weight element, and when the sum result of the feature element and weight element is 0, then there is no need to add a preset flag for the feature element and weight element. By adding the preset mark, the following technical effects can be achieved: the computer device adds a preset mark to the elements except the first target element in the feature vector, but does not add a preset mark to the first target element, and adds a preset mark to the elements except the second target element in the weight vector, but does not add a preset mark to the second target element, so that the first target element and the second target element can be distinguished from other elements by the existence of the preset mark later so as to filter the vector. Wherein the preset mark may be 1 or other numbers, letters or identifiers.

For example, assuming that the feature vector is (7,5,0), the corresponding weight vector is (7,0,4), and the sum of the feature element 7 and the weight element 7 is 1, a preset flag is added to the feature element 7 and the weight element 7, and the sum of the feature element 5 and the weight element 0 is 0, and the sum of the feature element 0 and the weight element 4 is 0, without adding a preset flag to the feature element 5, the weight element 0, the feature element 0, and the weight element 4.

It should be noted that, regarding a specific implementation manner of obtaining the preset mark, the computer device may generate the preset mark after obtaining the sum result of the feature element and the weight element, so as to add the generated preset mark to the feature element and the weight element. In addition, the computer device may also store a preset flag in advance, and call the preset flag stored in advance after the result of the summation of the feature element and the weight element is obtained, so as to add the preset flag to the feature element and the weight element.

(2) At least one characteristic element corresponding to a preset mark is obtained from the characteristic vector to form a filtered characteristic vector.

In order to filter out the feature element with the result of 0 from the feature vector and retain the feature element with the result of 1, the computer device may obtain at least one feature element corresponding to the preset flag from the feature vector, where the at least one feature element and the result of 1 are both the same, and may form the at least one feature element into the filtered feature vector.

For any feature element in the feature vector, whether the feature element corresponds to a preset mark or not can be judged, when the feature element corresponds to the preset mark, the feature element is obtained from the feature vector, and when the feature element does not correspond to the preset mark, whether the next feature element corresponds to the preset mark or not is continuously judged until the last feature element is judged to be finished, and at least one feature element corresponding to the preset mark can be obtained.

In this step, since the feature elements corresponding to the preset mark are the feature elements other than the first target element, and the feature elements not corresponding to the preset mark are the first target element, at least one feature element corresponding to the preset mark is obtained and a filtered feature vector is formed, and the filtered feature vector does not contain the first target element, so that the effect of filtering the first target element from the feature vector is achieved.

(3) At least one weight element corresponding to the preset mark is obtained from the weight vector to form a filtered weight vector.

The process of forming the filtered weight vector is the same as the process of forming the filtered feature vector, and will not be described in detail herein.

In this step, since the weight element corresponding to the preset mark is a weight element other than the second target element, and the weight element not corresponding to the preset mark is the second target element, at least one weight element corresponding to the preset mark is obtained and forms a filtered weight vector, and the filtered weight vector does not contain the second target element, so that the effect of filtering the second target element from the weight vector is achieved.

For example, assuming that the feature vector is (7,5,0) and the corresponding weight vector is (7,0,4), the feature element 7 is obtained from the feature vector, the feature element 7 is taken as a filtered feature vector, the weight element 7 is obtained from the weight vector, and the weight element 7 is taken as a filtered weight vector.

It should be noted that, the above-mentioned step one and step two only take the process of filtering any feature vector and the corresponding weight vector as an example for illustration, at least one feature vector of the scanning window and at least one weight vector in the weight matrix may be filtered by the above-mentioned method in a similar way, so as to filter out elements with the same phase and 0 and the same position in at least one feature vector and at least one weight vector.

And secondly, a first mark can be added for the characteristic element and the weight element which are 1 as a result, and a second mark can be added for the characteristic element and the weight element which are 0 as a result, so that the characteristic element and the weight element which are different from the result can be distinguished through different marks, and the characteristic vector and the weight vector are filtered.

Specifically, implementation two may include the following (1) to (3).

(1) For any feature element in the feature vector and the weight element in the corresponding position in the weight vector, when the sum of the feature element and the weight element is 1, the computer device adds a first mark for the feature element and the weight element, and when the sum of the feature element and the weight element is 0, the computer device adds a second mark for the feature element and the weight element.

The first mark and the second mark are used for distinguishing elements different from the result, the first mark is used for marking the sum result of the characteristic element and the weight element as 1, the second mark is used for marking the sum result of the characteristic element and the weight element as 0, and the first mark and the second mark can be preset numbers, letters or identifiers.

Regarding the specific procedure of adding the markers, for any of the feature elements and the weight elements, the computer device may add a first marker for the feature elements and the weight elements when the sum of the feature elements and the weight elements is 1, and may add a second marker for the feature elements and the weight elements when the sum of the feature elements and the weight elements is 0.

For example, assuming that the feature vector is (7,5,0), the corresponding weight vector is (7,0,4), the sum of the feature element 7 and the weight element 7 is 1, the first flag is added to the feature element 7 and the weight element 7, the sum of the feature element 5 and the weight element 0 is 0, the second flag is added to the feature element 5 and the weight element 0, the sum of the feature element 0 and the weight element 4 is 0, and the second flag is added to the feature element 0 and the weight element 4.

(2) And obtaining at least one characteristic element corresponding to the first mark from the characteristic vector to form a filtered characteristic vector.

In order to filter out the feature element with the result of 0 from the feature vector and retain the feature element with the result of 1, the computer device may obtain at least one feature element corresponding to the first flag from the feature vector, where the at least one feature element and the result of 1 are both the at least one feature element, and may compose the at least one feature element into the filtered feature vector.

For any feature element in the feature vector, whether the feature element corresponds to the first mark can be judged, when the feature element corresponds to the first mark, the feature element is obtained from the feature vector, and when the feature element does not correspond to the first mark, whether the next feature element corresponds to the first mark is continuously judged until the last feature element is judged to be finished, and at least one feature element corresponding to the first mark can be obtained.

In this step, since the feature elements corresponding to the first mark are feature elements other than the first target element, and the feature elements corresponding to the second mark are first target elements, after at least one feature element corresponding to the first mark is obtained and a filtered feature vector is formed, the filtered feature vector does not include the first target element, so as to achieve the effect of filtering the first target element from the feature vector.

(3) At least one weight element corresponding to the first mark is obtained from the weight vector to form a filtered weight vector.

In this step, since the weight element corresponding to the first mark is a weight element other than the second target element, and the weight element corresponding to the second mark is the second target element, after at least one weight element corresponding to the first mark is obtained and a filtered weight vector is formed, the filtered weight vector does not contain the second target element, so as to achieve the effect of filtering the second target element from the weight vector.

It should be noted that, regarding a specific implementation manner of obtaining the first mark and the second mark, the computer device may generate the first mark and/or the second mark after obtaining the sum result of the feature element and the weight element, so as to add the generated first mark and/or the second mark to the feature element and the weight element. In addition, the computer device may also store the first flag and/or the second flag in advance, and call the stored first flag and/or second flag after obtaining the and result of the feature element and the weight element, so as to add the first flag and/or the second flag to the feature element and the weight element.

404. The computer device continues to convolve based on the filtered at least one feature vector and the filtered at least one weight vector.

And after the filtered at least one feature vector and the filtered at least one weight vector of the convolution kernel in the scanning window are obtained, continuing to convolve the filtered at least one feature vector and the filtered at least one weight vector to finally obtain a convolution result of the filtered at least one feature vector and the filtered at least one weight vector, and taking the convolution result as a feature point corresponding to the scanning window.

For the specific process of convolution, the filtered feature vector and the filtered weight vector may be correspondingly multiplied and accumulated, that is, the number product of the filtered 1 st feature vector and the filtered 1 st weight vector is calculated, the number product of the filtered 2 nd feature vector and the filtered 2 nd weight vector is calculated, and so on, to obtain the sum of at least one number product obtained by calculation after the number product of each filtered feature vector and the filtered weight vector is obtained as the convolution result.

In the invention, only the elements with the same phase and position as 0 in the feature vector and the weight vector are filtered, and other elements in the feature vector and the weight vector are reserved, so that compared with the convolution processing of the feature vector before filtering and the weight vector before filtering, the calculation result of the convolution processing can be ensured not to change on the basis of greatly reducing the calculation amount of the feature vector after filtering and the weight vector after filtering, and the accuracy of the convolution processing is not influenced.

Illustratively, referring to fig. 1, at least one image vector in the pre-filtering scanning window includes (6,2,0), (0, 3, 7) and (7,5,0), the weight vector corresponding to the pre-filtering convolution kernel includes (2,5,0), (0, 1) and (7,0,4), the at least one filtered image vector includes (6, 2), (3, 7) and (7) through the filtering process described above, the at least one filtered weight vector includes (2, 5), (1, 1) and (7), a number product of (6, 2), (2) and (2, 5) may be calculated=6×2+2×5=22, a number product of (3, 7) and (1, 1) =3×1+7×1=10, and a number product of (7) and (7) = 7*7 =49 may be calculated, and then three number products=10+10+49=74 may be accumulated as a feature point of the scanning window. As can be seen from verification of calculation results, after the weight vector and the feature vector are filtered, the calculation amount is only reduced in the convolution processing, and the accuracy of the convolution processing is not affected.

In summary, through the steps 401 to 404, the computer device may obtain the feature point of the convolution processing when the scanning window is located in the current area. Then, the scanning window of the convolution kernel can slide one step length on the feature map so as to slide to the next region, and the computer equipment can perform convolution processing on the next region in a similar way to obtain feature points corresponding to the next region. After the scanning window finishes sliding on the feature map, the computer equipment can obtain a plurality of feature points, and the feature map can be obtained according to the plurality of feature points.

405. The computer equipment obtains a feature map of any layer output based on the feature points obtained by convolution.

After the feature points after the convolution processing of each region in the feature map are obtained, the obtained feature points can be formed into the feature map, and the feature map of the layer is output, so that the next layer receives the feature map, and the convolution processing is performed on the received feature map in a similar way. And the like, until the last layer carries out convolution processing on the received characteristic diagram and outputs the characteristic diagram.

Alternatively, after the feature map output by any layer is obtained, the feature map may be activated first, and then the feature map after the activation is input to the next layer. The activation operation refers to an operation of calculating the feature map by adopting an activation function so as to realize nonlinear transformation on the feature map. The activation function is typically a Relu function, and the mathematical expression f (x) =max (0, x) of the Relu function converts all feature points smaller than 0 in the input feature map to 0, resulting in high sparsity of the feature map. In addition, the feature map may be pooled first, and then the feature map after pooling is input to the next layer, where pooling refers to downsampling the feature map, for example, may be performed in a max-pooling manner.

The first point to be described is that the present invention can be applied to various types of layers such as a convolutional layer, a full-connection layer, and a recursive layer of the deep neural network, and can operate on line at the same time, without balancing the load in the process of training the deep neural network in advance. Meanwhile, the embodiment of the invention does not require sparse distribution of the characteristic elements in the characteristic diagram, does not require sparse distribution of the weight matrix, and has higher flexibility.

The second point to be described is that, when a layer has a plurality of convolution kernels, the calculation order of the plurality of convolution kernels may be determined in accordance with the order of magnitude of the ratio between the number of 0 s in the convolution kernels and the number of weight elements, for example, the plurality of convolution kernels may be calculated in order of magnitude of the ratio between the number of 0 s in the corresponding convolution kernels and the number of weight elements, or the plurality of convolution kernels may be calculated in order of magnitude of the ratio. Wherein the ratio between the number of 0's and the number of weight elements in the convolution kernel may be defined as the weight sparsity of the convolution kernel, e.g., weight sparsity = 4/9 = 44.4% for the convolution kernel shown in fig. 1.

Taking the sequential calculation of a plurality of convolution kernels according to the order of the weight sparsity from large to small as an example, in the implementation, for any layer, the plurality of convolution kernels of the layer can be determined, for each convolution kernel, the number of 0 in the convolution kernel and the number of weight elements in the convolution kernel are obtained, and the ratio of the number of 0 to the number of weight elements is calculated to obtain the weight sparsity of the convolution kernel. And then arranging a plurality of convolution kernels according to the order of the weight sparsity from large to small, and sequentially adopting each convolution kernel to calculate according to the order of each convolution kernel.

By determining the calculation processing of the plurality of convolution kernels according to the order of the weight sparsity, the excessive jump of the calculation amount can be avoided, which is specifically described as follows:

the weight sparsity of the convolution kernel directly influences the calculated amount in the convolution processing, and the larger the weight sparsity of the convolution kernel is, the smaller the calculated amount is in the convolution processing based on the convolution check feature map. In the process of processing the feature map based on the plurality of convolution checks, if the weight sparsity of the convolution kernels in the process of the adjacent two convolution processes differ too much, the calculation amount will be changed suddenly. For example, if a certain convolution process is performed based on a certain convolution check feature map with high weight sparsity, and then a certain convolution process is performed based on a certain convolution check feature map with low weight sparsity, the calculation amount of the convolution process increases sharply. The condition of abrupt change of the calculated amount easily influences the running performance of equipment, and the risk of downtime is caused.

In this embodiment, the calculation sequence of the plurality of convolution kernels is determined according to the sequence of the weight sparsity, and in the process of performing the convolution processing for multiple times based on the feature images of the convolution cores, the weight sparsity of the convolution kernels is uniformly changed in the process of the convolution processing for multiple times, so that the calculated amount in the process of the convolution processing for multiple times is also uniformly changed, the situations that the calculated amount is suddenly high and suddenly low and repeatedly jumps are avoided, and the running stability of the equipment is ensured.

In the method provided by the embodiment of the invention, in the process of image recognition based on the deep neural network, the redundant first target element in the feature vector and the redundant second target element in the weight vector are filtered, and the convolution processing is performed based on the filtered feature vector and the filtered weight vector, so that invalid operation caused by the first target element and the second target element is avoided, the calculated amount of the convolution processing is greatly reduced, the speed of image recognition is accelerated, and the efficiency of image recognition is improved.

In the embodiment of fig. 4 described above, the process of convolving the feature map may be performed by the PE (Processing Element, computing unit). The processing unit is hardware with a calculation function and can comprise a plurality of circuit units such as an adder, a multiplier, a buffer and the like, and the processing unit is used for multiplying and accumulating the input characteristic vector and the weight vector and outputting the characteristic point.

Specifically, referring to fig. 6, fig. 6 is a circuit diagram of a processing unit according to an embodiment of the present invention, where the processing unit may include four buffers, one multiplier and one adder, where the four buffers are Reg1, reg2, reg3 and Reg4, respectively. Wherein, reg1, reg2 and multiplier are electrically connected, multiplier and Reg3 are electrically connected, reg3 and adder are electrically connected, adder and Reg4 are electrically connected.

Reg1 is used for caching the filtered feature vector, reg2 is used for caching the filtered weight vector, and the multiplier is used for obtaining the product of the filtered feature vector and the filtered weight vector and outputting the product to Reg3.Reg3 is used for buffering the product of the filtered feature vector and the filtered weight vector, and outputting the product to the adder. The adder is used for accumulating the result output by the Reg3, outputting the result to the Reg4, and the Reg4 is used for caching the accumulated result and outputting the characteristic point.

On the basis of the embodiment of fig. 4, the embodiment of the present invention further designs a method for performing convolution processing on the feature map by multiple processing units in parallel, and referring to fig. 7, the method may include:

701. when any layer of the deep neural network receives the feature map, the computer equipment divides the feature map to obtain a plurality of feature blocks.

In order to share the calculation amount of the convolution processing on the feature map, the computer device may divide the feature map into a plurality of feature tiles so as to allocate processing tasks of different feature tiles to different processing units, so that the plurality of processing units operate in parallel, and performance advantages of the plurality of processing units are fully exerted.

Specifically, the feature map may be divided according to the number of processing units, the number of processing units is taken as the number of divided feature tiles, and how many processing units divide the feature map into how many feature tiles, so that each processing unit is responsible for performing convolution processing on one feature tile. Alternatively, the feature map may be divided into a plurality of equal-sized feature tiles in a uniform manner, that is, the areas of the divided feature tiles may be the same. Illustratively, referring to fig. 8, assuming a 4-by-4 number of processing units, the signature may be equally divided into 4 signature tiles.

For the storage mode of the feature image blocks, in one possible implementation, a plurality of memories may be provided, after the feature image blocks of the feature image are obtained by dividing, the feature image blocks may be stored and divided, and each feature image block is stored in a corresponding memory. Different feature blocks can be stored in different memories, and the processing unit can read different feature blocks from different memories, so that the processing unit can switch the feature blocks of the convolution processing only by switching the memories for reading data, thereby flexibly, quickly and efficiently switching the feature blocks of the convolution processing.

For example, 4 memories may be provided, and the feature map is divided into 4 feature tiles, each feature tile being stored into 1 memory. For example, referring to fig. 9, tile (1) is stored by memory No. 1, tile (2) is stored by memory No. 2, tile (3) is stored by memory No. 3, tile (4) is stored by memory No. 4. In combination with this storage mode, it is assumed that the processing unit is originally responsible for performing convolution processing on the tile (1), and data is read from the No. 1 memory, and when the processing unit is switched to be responsible for performing convolution processing on the tile (2), the processing unit is switched to read data from the No. 2 memory.

With respect to the specific type of memory, dual port memory storage feature tiles may optionally be employed, each feature tile being stored into a corresponding dual port memory. The dual-port memory is a memory having two independent read-write control circuits, which can provide stored data to different objects at the same time. By adopting the dual-port memory to store the characteristic blocks, the characteristic blocks can be output in parallel, namely, the characteristic blocks are provided for two processing units at the same time, so that the technical problem that operation conflict can be generated when the two processing units need to access the same memory during sparse scheduling is avoided, and the robustness of the balanced scheduling processing units is improved.

Specifically, because the sparse distributions of different tiles are different, the computation amount of the convolution processing on different tiles will be different after the redundant first target element in the tile and the redundant second target element in the convolution kernel are filtered. By way of example, assuming that the feature elements of the block (1) are sparse and the feature elements of the block (2) are dense in the feature map, the first target elements filtered out of the block (1) are more, the remaining feature elements to be convolved are fewer, the calculation amount of convolving the block (1) is obviously smaller, the first target elements filtered out of the block (2) are fewer, the remaining feature elements to be convolved are more, and the calculation amount of convolving the block (2) is obviously larger.

Therefore, when the processing units adopt the same convolution to check different characteristic blocks to carry out convolution processing, the different processing units can generate difference of processing speeds due to different calculated amounts, so that a certain sequence is formed on the time for completing the convolution processing. For example, if the feature elements of the block (1) are sparse and the feature elements of the block (2) are dense, the processing unit responsible for the block (1) will complete the convolution processing first and the processing unit responsible for the block (2) will complete the convolution processing later.

Under the condition that the time for different processing units to complete the convolution processing is different, if the processing unit which completes the convolution processing first waits for the processing unit which completes the convolution processing to complete, the convolution processing can be continued, the time is longer, and the efficiency is lower.

In this embodiment, the dual-port memory is used to store the feature block, and the dual-port memory may also provide the feature block for the processing unit performing the convolution processing through the previous convolution kernel while providing the feature block for the processing unit performing the convolution processing through the next convolution kernel, so that two processing units may simultaneously read data from the dual-port memory, so that two processing units perform the convolution processing simultaneously with respect to the same feature block using different convolution kernels. For example, assuming that the processing unit 1 reads the block (1) from the memory No. 1, the convolution kernel 1 is used to perform convolution processing on the block (1), the processing unit 2 reads the block (2) from the memory No. 2, the convolution kernel 2 is used to perform convolution processing on the block (2), the processing unit 1 is completed, the processing unit 2 is not yet completed, the processing unit 1 does not need to wait for the processing unit 2, and the block (2) is directly read from the memory No. 2, and the memory No. 2 can simultaneously provide the block (2) for the processing unit 1 and the processing unit 2, so that the processing unit 1 and the processing unit 2 cannot generate conflict of read data.

702. The computer device performs convolution processing on the plurality of feature blocks through the plurality of processing units.

The computer device may assign each feature tile to each processing unit, and convolve the assigned feature tiles by each processing unit executing step 203 in the embodiment of fig. 4 described above. Specifically, for any processing unit and a corresponding feature block, at least one feature vector of the feature block located in the scanning window and at least one weight vector corresponding to the convolution kernel may be obtained, elements with 0 and the same positions in the at least one feature vector and the at least one weight vector are filtered, the filtered at least one feature vector and the filtered at least one weight vector are input into the processing unit, the processing unit continues to convolve the filtered at least one feature vector and the filtered at least one weight vector, and feature points are output. Then, feature points output by the plurality of processing units may be combined to obtain a feature map.

For a specific process of the internal calculation of the processing unit, referring to fig. 6, the processing unit may use reg1 to cache the filtered feature vector, use reg2 to cache the filtered weight vector, multiply the filtered feature vector and the filtered weight vector through a multiplier, and then accumulate at least one result obtained by multiplication through an adder, and finally output the feature point. Optionally, the processing unit may further perform an activation operation, that is, calculating feature points using an activation function, and forming each activated feature point into an activated feature map.

In this embodiment, on the basis of performing convolution processing on the feature map by using a plurality of processing units together, a design for distributing feature blocks is also provided, so that load balancing of the plurality of processing units can be ensured. Specifically, the design includes the following (1) and (2):

(1) In the process of carrying out convolution processing by a plurality of processing units, when any processing unit finishes the first processing, the characteristic block which finishes the last processing in the process of carrying out the last convolution processing is distributed to the processing units, and the processing units carry out the convolution processing based on the next convolution check characteristic block.

In each allocation of a feature tile, the processing unit that is the fastest for each calculation may be adhered to, and the next time is responsible for the feature tile with the most dense data. Specifically, in the process that the processing units perform convolution processing on the feature blocks allocated respectively based on a certain convolution kernel, whether each processing unit has completed processing the feature blocks or not can be monitored, when any processing unit completes processing the feature blocks first, the feature blocks which are completed in the last process of convolution processing are allocated to the processing unit, and the processing unit performs convolution processing on the feature blocks which are completed in the last process of convolution processing based on the next convolution kernel.

By this way of distributing feature tiles, the barrel effect in performance can be avoided:

the sparse distribution condition of different feature image blocks of the feature image may be different, some feature image blocks are denser, some feature image blocks are sparse, after the filtering process, the calculation amount of the feature image blocks with different sparsity can be different, the feature image blocks with higher sparsity can be filtered by a large amount of feature elements, the data amount is greatly reduced, and the calculation amount of convolution processing on the feature image blocks can be obviously less than that of other feature image blocks.

If the same processing unit is fixed to carry out convolution processing on the feature image blocks based on at least one convolution kernel, the calculation amount shared by the processing unit is obviously smaller than that shared by other processing units, so that the processing unit can complete the task of convolution processing in advance by the other processing units, and the processing unit is free to wait for the completion of convolution processing of the other processing units, so that the next-layer convolution processing can be carried out, thereby generating a barrel effect and having low calculation efficiency.

In this embodiment, when any processing unit finishes processing the feature image block first, it indicates that the feature elements of the feature image block allocated by the processing unit are sparsest, that is, the feature image block has the most filtered feature elements, then the feature image block which is processed last in the process of the previous convolution processing is determined, the feature image block is the feature image block which is the most dense in the process of the previous convolution processing, after the feature image block is allocated to the processing unit, the calculated amount of the processing unit in the next convolution processing can be ensured to be larger, so that the calculated amount of the whole processing unit is balanced, and the calculated amount of the processing unit is ensured to be relatively uniform compared with other processing units.

(2) In the process of carrying out convolution processing by a plurality of processing units, when the final processing of any processing unit is finished, the feature block which is finished in the process of the previous convolution processing and is firstly processed is distributed to the processing units, and the processing units carry out convolution processing based on the next convolution check feature block.

In each allocation of the feature tiles, the processing unit that is slowest to compute each time may be adhered to, and the next time is responsible for the feature tile with the sparsest data. Specifically, in the process of performing convolution processing on the feature image blocks allocated to each processing unit based on a certain convolution kernel, whether each processing unit has completed processing the feature image block can be monitored, when any processing unit has completed processing the feature image block last, the feature image block which has been processed first in the process of performing the previous convolution processing is allocated to the processing unit, and the processing unit performs convolution processing on the feature image block which has been processed first in the process of performing the previous convolution processing based on the next convolution kernel.

By the mode of distributing the feature blocks, the overall calculation time for carrying out convolution processing on the feature images can be greatly reduced: when any processing unit finishes processing the feature image blocks finally, the feature elements of the feature image blocks allocated by the processing unit are shown to be the most dense, namely, the feature image blocks are filtered out to be the least, the feature image blocks which are processed and finished first in the process of the last convolution processing are determined, the feature image blocks are the feature image blocks which are sparsest in the process of the last convolution processing, and after the feature image blocks are allocated to the processing unit, the calculated amount of the processing unit in the next convolution processing can be ensured to be smaller, so that the calculated amount of the whole processing unit is balanced, and the calculated amount of the processing unit is ensured to be more uniform relative to other processing units.

It should be noted that, in the process of performing the convolution processing by the plurality of processing units, for at least one processing unit that is not the first processing unit and is not the last processing unit, the feature blocks of at least one processing unit may be exchanged, so as to ensure that the feature block of each convolution processing of the processing unit is different from the feature block of the previous convolution processing. Then, in the process of carrying out convolution processing based on a plurality of convolution kernels, the effect that each processing unit carries out convolution processing on different characteristic blocks in turn can be achieved, the total calculated amount of each processing unit is guaranteed to be consistent, and balanced scheduling is achieved.

Through the balanced scheduling mechanism, on the basis of greatly reducing the calculated amount by removing 0-value operation, the convolution processing time lengths of all the processing units are ensured to be basically equal, the condition that each processing unit needs to wait for other processing units is avoided, the operation efficiency is improved, and the operation time delay is reduced. Meanwhile, the principle of scheduling each characteristic block is very easy to realize, no extra circuit logic is needed, and for each processing unit, only the characteristic block allocated to the processing unit is switched, so that the cost and the energy consumption are low, a scheduling switching strategy with nearly zero cost is realized, and the condition of unbalanced load after the deep neural network sparse operation is avoided.

It should be noted that, the image recognition method provided by the embodiment of the present invention may be used as a general architecture of a deep neural network, and a special processor may be manufactured to execute the image recognition method provided by the embodiment of the present invention. Specifically, a plurality of processing units, shift registers, sub-memories, and other hardware units may be integrated inside the processor, and the processor may perform the above-described process of recognizing an image by driving the respective hardware units, for example, a process of performing convolution processing by driving the processing units. The logic structure within the processor may be as shown in fig. 9 in conjunction with the fig. 4 embodiment and fig. 7 embodiment described above.

Fig. 10 is a schematic structural diagram of an image recognition device according to an embodiment of the present invention. Referring to fig. 10, the apparatus includes: an acquisition module 1001, a filtering module 1002, a convolution module 1003 and an output module 1004.

The obtaining module 1001 is configured to obtain, in an image recognition process, at least one feature vector of a feature map located in a scanning window based on a scanning window corresponding to a convolution kernel in a layer when any layer of a deep neural network receives the feature map, where the deep neural network is configured to perform image recognition according to the feature map of an input image;

The obtaining module 1001 is further configured to obtain at least one weight vector corresponding to the convolution kernel;

a filtering module 1002, configured to filter a first target element in the at least one feature vector and a second target element in the at least one weight vector, where the first target element and the second target element phase are 0, and a position of the first target element in the feature vector is the same as a position of the second target element in the weight vector;

a convolution module 1003, configured to continue convolution based on the filtered at least one feature vector and the filtered at least one weight vector;

and an output module 1004, configured to obtain a feature map output by any layer based on the feature points obtained by the convolution.

In one possible design, the filter module 1002 includes:

the and operation sub-module is used for performing and operation on the characteristic elements and the weight elements with the same positions in any characteristic vector and the corresponding weight vector to obtain the sum result of the characteristic elements and the weight elements;

and the filtering sub-module is used for filtering the first target element from the characteristic vector and filtering the second target element from the weight vector according to the result of the summation of the characteristic element and the weight element, so as to obtain a filtered characteristic vector and a filtered weight vector.

Wherein, the sum result of the first target element and the second target element is 0.

In one possible design, the filtering submodule is further configured to, for any feature element in the feature vector and a weight element in a corresponding position in the weight vector, add a preset flag to the feature element and the weight element when a sum result of the feature element and the weight element is 1;

at least one characteristic element corresponding to the preset mark is obtained from the characteristic vector to form the filtered characteristic vector;

at least one weight element corresponding to the preset mark is obtained from the weight vector to form the filtered weight vector.

In one possible design, the obtaining module 1001 includes:

the reading submodule is used for reading the feature vector of the feature map in the appointed area from the shift register for the appointed area in the scanning window, wherein the appointed area refers to an overlapped area between the current area of the scanning window and the area before the window slides, and the feature vector in the appointed area is cached in the shift register when the area before the window slides is subjected to convolution processing;

And the loading sub-module is used for loading the feature vectors of the feature map in the other areas after the appointed area in the scanning window and caching the feature vectors in the other areas into the shift register.

In one possible design, the obtaining module 1001 includes:

the dividing sub-module is used for dividing the feature map to obtain a plurality of feature blocks when any layer of the deep neural network receives the feature map;

and the distribution submodule is used for respectively carrying out convolution processing on the plurality of characteristic blocks through the plurality of processing units.

In one possible design, the allocation submodule is used for: in the process of carrying out convolution processing on a plurality of processing units, when any processing unit finishes the first processing, distributing the characteristic block which finishes the last processing in the process of the last convolution processing to the processing unit, and carrying out convolution processing on the characteristic block by the processing unit based on the next convolution check; or, in the process of performing convolution processing by a plurality of processing units, when the last processing by any processing unit is completed, the feature block which is processed first in the process of the last convolution processing is distributed to the processing unit, and the processing unit performs convolution processing based on the next convolution check on the feature block.

In one possible design, the apparatus further comprises: the memory module is used for storing each characteristic block into a corresponding dual-port memory, and the dual-port memory is used for simultaneously providing the characteristic blocks for at least two processing units.

In one possible design, any one layer of the deep neural network includes a plurality of convolution kernels, and the order of computation of the plurality of convolution kernels is determined according to the order of magnitude of the ratio between the number of 0's in the convolution kernels and the number of weight elements.

It should be noted that: in the image recognition apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration when recognizing an image, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the image recognition device and the image recognition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the computer device 1100 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 1101 and one or more memories 1102, where the memories 1102 store at least one instruction, and the at least one instruction is loaded and executed by the processors 1101 to implement the methods provided in the foregoing method embodiments. Of course, the computer device may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, such as a memory comprising instructions executable by a processor in a computer device to perform the image recognition method of the above embodiment is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. An image recognition method, the method comprising:

Acquiring at least one weight vector corresponding to the convolution kernel;

2. The method of claim 1, wherein said filtering a first target element in said at least one feature vector and a second target element in said at least one weight vector comprises:

for any feature vector and the corresponding weight vector, performing AND operation on the feature elements and the weight elements with the same positions in the two vectors to obtain the AND results of the feature elements and the weight elements;

according to the sum result of the characteristic elements and the weight elements, filtering the first target elements from the characteristic vectors, and filtering the second target elements from the weight vectors to obtain filtered characteristic vectors and filtered weight vectors;

3. The method according to claim 2, wherein the filtering the first target element from the feature vector and the second target element from the weight vector according to the sum of feature elements and weight elements to obtain a filtered feature vector and a filtered weight vector comprises:

for any feature element in the feature vector and the weight element at the corresponding position in the weight vector, when the result of the summation of the feature element and the weight element is 1, adding a preset mark for the feature element and the weight element;

and obtaining at least one weight element corresponding to the preset mark from the weight vector to form the filtered weight vector.

4. The method according to claim 1, claim 2 or claim 3, wherein the obtaining a plurality of feature vectors in the feature map located in a scan window based on the scan window corresponding to the intra-layer convolution kernel includes:

For a designated area in the scanning window, reading a feature vector of the feature map in the designated area from a shift register, wherein the designated area is an overlapped area between the current area of the scanning window and the area before window sliding, and the feature vector in the designated area is cached in the shift register when the area before window sliding is subjected to convolution processing;

and for the rest areas after the designated areas in the scanning window, loading the feature vectors of the feature map in the rest areas, and caching the feature vectors in the rest areas into the shift register.

5. A method according to claim 1, claim 2 or claim 3, wherein the method further comprises:

when any layer of the deep neural network receives the feature map, dividing the feature map to obtain a plurality of feature blocks;

and respectively carrying out convolution processing on the plurality of characteristic blocks through a plurality of processing units.

6. The method of claim 5, wherein the convolving the plurality of feature tiles with a plurality of processing units, respectively, comprises:

In the process of carrying out convolution processing on a plurality of processing units, when any processing unit finishes the first processing, distributing a characteristic block which is finally processed in the process of the last convolution processing to the processing unit, and carrying out convolution processing on the characteristic block based on the next convolution check by the processing unit; or alternatively, the first and second heat exchangers may be,

and in the process of carrying out convolution processing on the plurality of processing units, when the final processing of any processing unit is finished, distributing the feature block which is processed firstly in the process of the last convolution processing to the processing unit, and carrying out convolution processing on the feature block based on the next convolution check by the processing unit.

7. The method of claim 5 or claim 6, wherein after dividing the feature map to obtain a plurality of feature tiles, the method further comprises:

each feature tile is stored into a corresponding dual port memory for simultaneously providing feature tiles for at least two processing units.

8. A method according to claim 1, claim 2 or claim 3, wherein any one of the layers of the deep neural network comprises a plurality of convolution kernels, and the order of computation of the plurality of convolution kernels is determined according to the order of magnitude of the ratio between the number of 0's in the convolution kernels and the number of weight elements.

9. An image recognition apparatus, the apparatus comprising:

10. The apparatus of claim 9, wherein the filter module comprises:

the filtering sub-module is used for filtering the first target element from the characteristic vector and filtering the second target element from the weight vector according to the result of the summation of the characteristic element and the weight element to obtain a filtered characteristic vector and a filtered weight vector;

11. The apparatus of claim 9 or claim 10, wherein the acquisition module comprises:

the reading submodule is used for reading the characteristic vector of the characteristic image in the appointed area from the shift register for the appointed area in the scanning window, wherein the appointed area is an overlapped area between the current area of the scanning window and the area before the window slides, and the characteristic vector in the appointed area is cached in the shift register when the area before the window slides is subjected to convolution processing;

12. The apparatus of claim 9, claim 10 or claim 11, wherein the acquisition module comprises:

13. The apparatus according to claim 12, wherein the allocation module is configured to allocate, in a process of performing convolution processing by a plurality of processing units, when a processing performed by any one of the processing units is completed first, a feature block completed by a last processing in a process of performing the previous convolution processing to the processing unit, and perform convolution processing by the processing unit based on a next convolution check; or, when the last processing of any processing unit is completed in the process of performing the convolution processing on the plurality of processing units, the feature block which is processed first in the process of performing the previous convolution processing is distributed to the processing units, and the processing units perform the convolution processing based on the next convolution check on the feature block.

14. A computer device comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement the operations performed by the image recognition method of any one of claims 1 to 8.

15. A computer-readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the operations performed by the image recognition method of any one of claims 1 to 8.