CN111949405A

CN111949405A - Resource scheduling method, hardware accelerator and electronic equipment

Info

Publication number: CN111949405A
Application number: CN202010812916.2A
Authority: CN
Inventors: 刘君
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2020-11-17

Abstract

The application discloses a resource scheduling method, a hardware accelerator and an electronic device. The resource scheduling method is applied to a hardware accelerator, and the hardware architecture of the hardware accelerator comprises at least one memristor chip; each of the at least one memristor dice includes at least one memristor array; the resource scheduling method comprises the following steps: determining model information for setting a calculation model; setting a calculation model to comprise at least one calculation layer; determining configuration information corresponding to the hardware accelerator when executing the operation of each computation layer in at least one computation layer based on the model information and the hardware architecture; the configuration information includes at least an operating mode of each of the at least one memristor slices; the working mode represents the number of memristor arrays correspondingly configured in the memristor chip by each hardware resource; and when the operation of each computation layer in the at least one computation layer is executed, performing hardware resource scheduling on the hardware accelerator according to the corresponding configuration information.

Description

Resource scheduling method, hardware accelerator and electronic equipment

Technical Field

The present application relates to the field of communications, and in particular, to a resource scheduling method, a hardware accelerator, and an electronic device.

Background

In the related art, an accelerator based on a memristor (ReRAM) is generally adopted to perform a calculation task of a neural network. Due to the fact that the calculated amount and the storage amount of different neural networks are different, in the process of scheduling hardware resources in the accelerator to execute related calculation tasks, the problem that the hardware resources of the accelerator are not matched with the actual resource requirements exists, and therefore the hardware resources are wasted or the hardware resources cannot meet the related requirements of the calculation tasks.

Disclosure of Invention

In view of this, embodiments of the present application are expected to provide a resource scheduling method, a hardware accelerator, and an electronic device, so as to solve the technical problem in the related art that when an accelerator based on a memristor executes a computation task of a neural network, hardware resources of the accelerator are not matched with actual resource requirements of the computation task.

In order to achieve the purpose, the technical scheme of the application is realized as follows:

the embodiment of the application provides a resource scheduling method, which is applied to a hardware accelerator; a hardware architecture of the hardware accelerator includes at least one memristor slice; each of the at least one memristor tile includes at least one memristor array;

the resource scheduling method comprises the following steps:

determining model information for setting a calculation model; the set computational model comprises at least one computational layer;

determining, based on the model information and the hardware architecture, configuration information corresponding to the hardware accelerator when performing operations of each of the at least one computation layer; the configuration information includes at least an operating mode of each of the at least one memristor slices; the working mode represents the number of memristor arrays correspondingly configured in the memristor chip by each hardware resource; the hardware resources include at least one of: computing resources, storage resources, and cache resources;

and when the operation of each computation layer in the at least one computation layer is executed, performing hardware resource scheduling on the hardware accelerator according to the corresponding configuration information.

In the above solution, the model information includes a topology and a weight parameter of each of the at least one computation layer;

when determining the configuration information corresponding to the hardware accelerator when executing the operation of each computation layer of the at least one computation layer, the resource scheduling method includes:

determining at least one type of characteristic parameters corresponding to the computing layer based on the topological structure and the weight parameters related to the computing layer; each type of characteristic parameter in the at least one type of characteristic parameter correspondingly represents the requirement of the computing layer on the hardware resource of the corresponding type;

determining configuration information corresponding to a computing layer according to each type of characteristic parameters in the at least one type of characteristic parameters and the hardware architecture; wherein the configuration information at least comprises:

a first number; the first number characterizes a number of memristor arrays in the memristor tiles configured as computing resources;

a second number; the second number characterizes a number of memristor arrays in a memristor slice configured to store resources;

a third number; the third number characterizes a number of memristor arrays in the memristor dice configured as a cache resource.

In the above scheme, the set calculation model is a neural network model; when determining the configuration information corresponding to the computing layer according to each type of feature parameter in the at least one type of feature parameter and the hardware architecture, the resource scheduling method includes:

determining the first number based on filter related information and size information of memristor arrays in the memristor slices, corresponding to convolutional layers or fully-connected layers of the neural network model; the filter related information includes the number of filters and size information of the filters.

In the above scheme, the model information includes a topology structure, a weight parameter, and input data of each of the at least one computation layer; when determining the configuration information corresponding to the hardware accelerator when executing the operation of each computation layer of the at least one computation layer, the resource scheduling method further includes:

determining the operation logic of the computing layer based on the model information related to the computing layer;

determining a corresponding data mapping mode of the hardware accelerator when executing the operation of the calculation layer of the set calculation model based on the operation logic of the calculation layer, the hardware architecture and the working mode of each memristor slice in the at least one memristor slice corresponding to the calculation layer; wherein the content of the first and second substances,

the data mapping manner characterizes a conversion logic between input data and output data of the computation layer.

In the above scheme, each memristor slice of the at least one memristor slice further comprises a shared cell bank electrically connected with all memristor arrays in the memristor slice; the sharing unit library comprises at least one functional unit; the at least one functional unit is commonly used by all memristor arrays in the memristor slice;

when determining the configuration information corresponding to the hardware accelerator when executing the operation of each computation layer of the at least one computation layer, the resource scheduling method further includes:

and determining functional units which need to be called when the calculation of the calculation layer is executed from the shared unit library based on the model information related to the calculation layer.

In the above scheme, the set calculation model is a neural network model, and the neural network model includes a convolutional layer, a pooling layer, and a full-link layer;

the method for determining the functional units needing to be called when the calculation of the calculation layer is executed from the shared unit library based on the model information related to the calculation layer comprises the following steps:

corresponding to the convolution layer or the full-link layer of the set calculation model, the functional units which are determined from the shared unit library and need to be called when the computation of the convolution layer or the full-link layer is executed at least comprise: the device comprises a digital-to-analog converter, an analog-to-digital converter, a shift addition unit and an activation unit;

corresponding to the pooling layer of the set calculation model, the functional units which are determined from the shared unit library and need to be called when the operation of the pooling layer is executed comprise: a maximum pooling unit.

The embodiment of the application also provides a hardware accelerator, wherein the hardware architecture of the hardware accelerator comprises at least one memristor slice; each of the at least one memristor tile includes at least one memristor array;

when the hardware accelerator operates the set calculation model, the hardware accelerator is used for:

In the above scheme, each memristor slice of the at least one memristor slice further comprises a shared cell bank electrically connected with all memristor arrays in the memristor slice; functional units in the shared cell library are commonly used by all memristor arrays in the memristor chip;

when the hardware accelerator operates the set calculation model, the hardware accelerator is further configured to:

determining a functional unit corresponding to the computing layer from the shared unit library based on the model information related to the computing layer;

determining configuration information corresponding to a computing layer based on a functional unit corresponding to the computing layer and the hardware architecture; the hardware resources further comprise functional units, and the working mode in the configuration information further represents the number of corresponding configurations of each functional unit in the memristor slice.

An embodiment of the present application further provides an electronic device, including: a hardware accelerator and a memory for storing a computer program capable of running on the hardware accelerator,

wherein, the hardware accelerator is used for realizing any one of the above resource scheduling methods when running the computer program.

The embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a hardware accelerator, implements any of the above resource scheduling methods.

According to the embodiment of the application, the hardware accelerator can determine the resource requirement of each computing layer of the set computing model based on the model information of the set computing model, and determine the corresponding configuration information of the hardware accelerator when executing the operation of each computing layer of at least one computing layer based on the hardware architecture of the hardware accelerator and the resource requirement of each computing layer. The hardware accelerator can dynamically schedule hardware resources according to the actual resource requirements of the set calculation model, different hardware resources can be scheduled aiming at different calculation layers, the hardware resource scheduling mode is more flexible, and the matching degree between the hardware resources of the hardware accelerator and the resource requirements of the set calculation model can be improved.

Drawings

FIG. 1 is a hardware architecture diagram of a memristor-based hardware accelerator provided in the related art;

FIG. 2 is a hardware architecture diagram of a memristor-based hardware accelerator provided by an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating an implementation process of a resource scheduling method according to an embodiment of the present application;

fig. 4 is a schematic diagram of an implementation flow of determining configuration information in a resource scheduling method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of another implementation process for determining configuration information in a resource scheduling method according to an embodiment of the present application;

FIG. 6 is a schematic flow chart illustrating an implementation of a hardware accelerator processing a neural network model according to an embodiment of the present application;

FIG. 7 is a schematic processing flow diagram of a neural network model provided in an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware component structure of an electronic device according to an embodiment of the present application.

Detailed Description

Before the embodiments of the present application are introduced, the architecture of a hardware accelerator based on memristors in the related art is introduced.

Fig. 1 shows a hardware architecture diagram of a memristor-based hardware accelerator provided by the related art. As shown in fig. 1, a memristor-based hardware accelerator includes at least one memristor slice (Tile). Each memristor tile 1 of the at least one memristor tile includes one memristor array 11 and a corresponding at least one functional unit.

Here, each memristor array in the memristor slice electrically connects a corresponding functional unit. The functional unit in fig. 1 electrically connected with the memristor array 11 includes: at least one Digital-to-Analog Converter (DAC) 12, at least one Analog-to-Digital Converter (ADC) 13, at least one Sample-and-Hold (S + H, Sample & Hold) 14, at least one Shift-and-Add (S + a, Shift & Add) unit 15. The memristor array 11 adopts a cross-bar (cross-bar) mode without transistors, Word Lines (WL) and Bit Lines (BL) are in cross connection, and the memristor unit is arranged between connection points of the word lines and the bit lines. The memristor unit can be regarded as a variable resistor, and is a basic unit of the memristor array.

Memristor cells between intersections of word lines and bit lines serve as basic cells of memristor arrays, one memristor array including a plurality of memristor cells. The memristor unit has the advantages of nonvolatility, high integration density, high speed, low power consumption and the like, and when the memristor unit is used as a memory, the data read-write speed can be improved, and the data read-write delay can be reduced.

In addition, based on ohm's law and kirchhoff's law, the memristor array shown in fig. 1 may also implement multiplication of vectors and matrices:

the weighting matrix Wij is mapped onto a memristor array of 4 x 4, input Vi is loaded on each row WL as voltage, and according to kirchhoff's law, analog currents of each column BL are added to obtain an output result Ij obtained by multiplying an input vector by the weighting matrix, wherein the expression is as follows:

the calculation of the vector multiplication matrix is the most important operation In the neural network, and since the memristor array can be used for both storage and calculation, a hardware accelerator based on the memristor can implement a PIM (Processing In Memory) integration, and the hardware accelerator based on the memristor can be used for performing the related operation of the neural network.

However, in the related art, the memristor array and the corresponding functional unit in each memristor slice of the hardware accelerator based on the memristor architecture are bound, so that the internal structure of the memristor slice for each layer of the neural network is fixed and unchanged. When the hardware accelerator executes the calculation task of the neural network, a fixed hardware scheduling mode is adopted to execute the calculation task of the neural network, and due to the fact that the calculation amount and the storage amount of different neural networks are different, when the accelerator based on the memristor architecture executes the calculation task of the neural network, the technical problem that the hardware resources provided by the hardware accelerator are not matched with the actual resource requirements of the calculation task exists in the related art.

In order to solve the above problem, an embodiment of the present application provides a hardware architecture of a hardware accelerator based on a memristor. The memristor array and the functional units in the memristor sheet can be independently scheduled, all the functional units in the shared unit library can be commonly used by all the memristors in the memristor sheet, when the accelerator based on the memristors executes the operation of the set calculation model, the hardware accelerator can dynamically schedule the hardware resources according to the actual resource requirements of the set calculation model, the hardware resource scheduling mode is more flexible, and the matching degree between the hardware resources of the accelerator and the resource requirements of the set calculation model can be improved.

Part of memristor arrays in the same memristor slice can be configured as computing resources, and the rest part of memristor arrays can be configured as storage resources or cache resources, so that when relevant operations of a computing layer are executed, data operations and data storage can be performed in the same memristor slice, data transmission can be reduced, and power consumption can be reduced.

A hardware architecture schematic diagram of a hardware accelerator provided by the present application is shown in fig. 2, where the hardware accelerator includes at least one memristor slice 2, and each memristor slice includes at least one memristor array 21; each memristor tile may also include a shared cell bank electrically connected with all memristor arrays in the memristor tile, functional cells in the shared cell bank being commonly used by all memristor arrays in the memristor tile. The shared unit library comprises but is not limited to at least one of the following functional units:

DAC, ADC, S + A, S + H, Max Pool Unit (MPU), and Sigmoid Unit (SU).

The technical solution of the present application is further described in detail with reference to the drawings and specific embodiments of the specification.

Fig. 3 shows an implementation flow diagram of a resource scheduling method provided in an embodiment of the present application. In the embodiment of the present application, the execution subject of the resource scheduling method is a hardware accelerator. A hardware architecture of a hardware accelerator includes at least one memristor slice; each of the at least one memristor dice includes at least one memristor array.

Referring to fig. 3, a resource scheduling method provided in an embodiment of the present application includes:

s101: determining model information for setting a calculation model; the set computational model includes at least one computational layer.

The hardware accelerator may analyze the loaded configuration calculation model file to obtain an analysis result under the condition that the configuration calculation model file is loaded, and determine model information of the configuration calculation model based on the analysis result.

Here, the setting calculation model file includes model information of the setting calculation model. The set calculation model includes, but is not limited to, a neural network model. Each layer included in the neural network model is a calculation layer.

The model information may include, among other things, topology and weight parameters that set each computational layer of the computational model. The topology of each computing layer includes an intra-layer structure and an inter-layer structure of the computing layer. The intra-layer structure represents a topological structure inside the computation layer, and the inter-layer structure represents a topological structure between the computation layer and an adjacent computation layer.

S102: determining, based on the model information and the hardware architecture, configuration information corresponding to the hardware accelerator when performing operations of each of the at least one computation layer; the configuration information includes at least an operating mode of each of the at least one memristor slices; the working mode represents the number of memristor arrays correspondingly configured in the memristor chip by each hardware resource; the hardware resources include at least one of: computing resources, storage resources, and cache resources.

The hardware accelerator can determine resource requirements of the hardware accelerator when executing the operation of each of the at least one computation layer based on the topology structure and the weight parameter of each of the at least one computation layer; and determining corresponding configuration information of the hardware accelerator when executing the operation of the corresponding computing layer based on the hardware resources represented by the hardware architecture of the hardware accelerator and the determined resource requirements corresponding to each computing layer.

The configuration information includes information about hardware resources configured for the compute layer. The related information at least includes the number of each hardware resource configured for the computing layer, and may further include address information, data bit width, and the like. The address information represents a write address or a read address, and the data bit width represents the data width which can be transmitted at one time. The memristor array configured as a compute resource is in a compute mode, the memristor array configured as a storage resource is in a storage mode, and the memristor array configured as a cache resource is in a cache mode.

Here, memristor arrays configured as computational resources are used to perform computational tasks of the computational layer, e.g., to perform vector correlation operations; a memristor array configured as a storage resource for storing input data and output data of a compute layer; the memristor array configured as a cache resource is used for caching an intermediate calculation result obtained by the calculation layer when the related calculation task is executed. The intermediate calculation result refers to a calculation result obtained by the calculation layer before outputting corresponding output data by performing corresponding operation based on the input data.

It should be noted that the configuration information corresponding to different computing layers may be the same or different.

S103: and when the operation of each computation layer in the at least one computation layer is executed, performing hardware resource scheduling on the hardware accelerator according to the corresponding configuration information.

Here, in the case where the calculation model is set to include three calculation layers, when the hardware accelerator executes the calculation of the first calculation layer, the hardware accelerator schedules hardware resources in the hardware accelerator based on the configuration information of the first calculation layer, and executes the calculation of the first calculation layer using the scheduled hardware resources. After the operation of the first computing layer is executed, the hardware resources in the hardware accelerator are scheduled according to the configuration information of a second computing layer adjacent to the first computing layer, so that the operation of the second computing layer is executed by adopting the scheduled hardware resources. After the operation of the second computing layer is executed, the hardware resources in the hardware accelerator are scheduled according to the configuration information of a third computing layer adjacent to the second computing layer, so that the operation of the third computing layer is executed by adopting the scheduled hardware resources.

According to the technical scheme provided by this embodiment, the hardware accelerator may determine, based on the model information of the set calculation model, the resource requirement of each calculation layer of the set calculation model, and determine, based on the hardware architecture of the hardware accelerator and the resource requirement of each calculation layer, the configuration information corresponding to the hardware accelerator when executing the operation of each calculation layer of the at least one calculation layer. The hardware accelerator can dynamically schedule hardware resources according to the actual resource requirements of the set calculation model, different hardware resources can be scheduled aiming at different calculation layers, the hardware resource scheduling mode is more flexible, and the matching degree between the hardware resources of the hardware accelerator and the resource requirements of the set calculation model can be improved.

In an embodiment, the hardware accelerator may compile model information for setting the computation model, and determine intermediate information of a corresponding computation layer based on a topology, weight parameters, and input data of each of at least one computation layer included in the model information. Wherein the intermediate information may be an Intermediate Representation (IR) of the computation graph. The computation graph characterizes the operational logic of the computation layer. The intermediate representation is an abstract machine language that can represent the operation of the target machine without much involvement of machine-dependent details and independent of details of the source language.

Under the condition that the intermediate information of the computing layers is determined, the hardware accelerator can determine the resource requirement of the hardware accelerator when executing the operation of each computing layer in at least one computing layer based on the intermediate information of the computing layers; and determining corresponding configuration information of the hardware accelerator when executing the operation of the corresponding computing layer based on the hardware resources represented by the hardware architecture of the hardware accelerator and the determined resource requirements corresponding to each computing layer.

Here, the hardware accelerator may optimize the operation logic of the computation layer based on the intermediate information of the computation layer, generate a scheduling instruction of the computation layer based on the operation logic optimized by the computation layer and the hardware resources configured to the computation layer, and schedule the corresponding hardware resources to perform the operation of the computation layer based on the scheduling instruction.

According to the technical scheme provided by the embodiment, the hardware accelerator determines the intermediate information of the calculation layer based on the model information of the set calculation model, so that the calculation logic is optimized based on the intermediate information, and the calculation efficiency can be improved.

As another embodiment of the present application, the model information includes a topology and weight parameters of each of the at least one computation layer; fig. 4 is a schematic diagram illustrating an implementation flow of determining configuration information in a resource scheduling method according to an embodiment of the present application.

Referring to fig. 4, when determining configuration information corresponding to the hardware accelerator when performing an operation of each of the at least one computation layer, the resource scheduling method includes:

s201: determining at least one type of characteristic parameters corresponding to the computing layer based on the topological structure and the weight parameters related to the computing layer; each type of characteristic parameter in the at least one type of characteristic parameter correspondingly represents the requirement of the computing layer on the hardware resource of the corresponding type;

s202: determining configuration information corresponding to a computing layer according to each type of characteristic parameters in the at least one type of characteristic parameters and the hardware architecture; wherein the content of the first and second substances,

the configuration information at least includes:

Here, the hardware accelerator may determine at least one type of feature parameter corresponding to the computation layer based on the computation complexity of the topology representation and the computation complexity of the weight parameter representation.

The corresponding categories of feature parameters may include a compute category, a store category, and a cache category. The characteristic parameters of the computing category characterize the demand for computing resources, the characteristic parameters of the storage category characterize the demand for storage resources, and the characteristic parameters of the cache category characterize the demand for cache resources.

In a case where the corresponding category of the feature parameter includes a calculation category, the configuration information includes a first number; the configuration information includes a second number in a case where the corresponding category of the characteristic parameter includes a storage category; the configuration information includes a third number in a case where the corresponding category of the characteristic parameter includes a cache category. In a case where the corresponding category of the feature parameter includes a calculation category, a storage category, and a cache category, the configuration information includes a first number, a second number, and a third number.

According to the technical scheme provided by the embodiment, the hardware accelerator can determine the characteristic parameters corresponding to each calculation layer of the set calculation model based on the model information of the set calculation model; and determining the configuration information corresponding to each computing layer based on the determined characteristic parameters and the hardware architecture of the hardware accelerator. The characteristic parameters can represent the requirements of the computing layer on the hardware resources of the corresponding category, so that the configuration information determined based on the characteristic parameters is more accurate, and the matching degree between the hardware resources of the hardware accelerator and the resource requirements of the set computing model can be further improved.

As another embodiment of the present application, the setting calculation model is a neural network model; when determining the configuration information corresponding to the computing layer according to each type of feature parameter in the at least one type of feature parameter and the hardware architecture, the resource scheduling method includes:

determining the first number based on filter related information and size information of memristor arrays in the memristor slices, corresponding to convolutional layers or fully-connected layers of the neural network model.

Here, the filter-related information refers to information related to a filter in a convolutional layer or a full link layer. The filter related information includes the number of filters and the size of the filters. The size of the filter characterizes the length and width of the filter, and may also characterize the depth of the filter and the sliding step size of the filter. The dimensional information of the memristor array characterizes the number of rows and columns that the memristor array contains.

In practical application, in the case that the convolutional layer includes 8 filters, each filter has a size of 3 × 3 × 3, one memristor chip includes 16 memristor arrays, and each memristor array in the memristor chip has a size of 8 rows and 8 columns, the hardware accelerator determines that the convolutional layer needs 8 columns of memristor arrays when performing convolution operation, and the number of rows is 3 × 3 × 3 ═ 27; and the size of one memristor array in the memristor slice is 8 × 8, the first number determined by the hardware accelerator is 4, that is, 4 memristor arrays in one memristor slice are configured as computing resources for the convolutional layer to perform the convolution operation.

In the technical solution provided in this embodiment, the hardware accelerator may respectively configure the computation resource and the storage resource for the convolutional layer and the fully-connected layer according to the computation requirement and the storage requirement corresponding to the convolutional layer and the fully-connected layer of the neural network model.

As another embodiment of the present application, the model information includes a topology, weight parameters, and input data of each of the at least one computation layer; fig. 5 is a schematic flow chart illustrating another implementation of determining configuration information in a resource scheduling method according to an embodiment of the present application.

Referring to fig. 5, on the basis of any of the foregoing embodiments, when determining configuration information corresponding to the hardware accelerator when performing an operation of each computation layer of the at least one computation layer, the resource scheduling method further includes:

s301: and determining the operation logic of the computing layer based on the model information related to the computing layer.

Here, the model information may include a topology, weight parameters, and input data of each of at least one computation layer in which the computation model is set.

The hardware accelerator may determine the operational logic of the computation layer based on the topology, the weight parameters, and the input data of each of the at least one computation layer.

In practical applications, when the calculation model is set as the neural network model, the hardware accelerator may determine the convolution operation logic of the convolution layer based on the topology of the convolution layer, the weight parameter of the convolution layer, and the input data of the convolution layer, corresponding to the convolution layer of the neural network model. Here, the weight parameter of the convolutional layer includes at least information of the filter including the number of filters, the size of the filter, the weight value of the filter, and the like.

When the weight parameters of the convolutional layer further include the relevant information of the activation function, the hardware accelerator may further determine, based on the relevant information of the activation function, an operation logic for processing the convolution operation result by using the activation function.

S302: determining a corresponding data mapping mode of the hardware accelerator when executing the operation of the calculation layer of the set calculation model based on the operation logic of the calculation layer, the hardware architecture and the working mode of each memristor slice in the at least one memristor slice corresponding to the calculation layer; wherein the content of the first and second substances,

Here, the configuration information corresponding to the hardware accelerator when executing the operation of the computation layer further includes a data mapping method. The implementation process of determining the data mapping mode may be:

after the hardware accelerator determines the operation logic of the calculation layer based on the topological structure, the weight parameters and the input data of each calculation layer in at least one calculation layer of the set calculation model, the corresponding data mapping mode of the hardware accelerator when executing the operation of the calculation layer of the set calculation model is determined based on the operation logic of the calculation layer, the connection relation between the hardware resources represented by the hardware architecture, the working mode of each memristor slice in at least one memristor slice corresponding to the calculation layer, the address information of the hardware resources configured to the calculation layer and the data bit width.

The connection relation between the hardware resources represented by the hardware architecture is used for determining the data transfer direction between the hardware resources. The address information is used for the hardware resource to read data or write data. The operating mode of each memristor slice characterizes the type and amount of hardware resources required by the compute layer in performing the correlation operations. The data bit width is used to determine the size of the data volume transferred between the hardware resources at each time.

The data mapping method characterizes the conversion logic between input data and output data of the computation layer, and also characterizes the data flow direction of the computation layer. The data flow direction characterizes the data source and destination.

In practical application, when the working mode of each memristor chip in at least one memristor chip corresponding to a computation layer represents the quantity of computation resources, storage resources and cache resources configured to the computation layer, a data mapping mode is used for a hardware accelerator to control the memristor array configured as the computation resources in the memristor chip, based on the data mapping mode, to read input data of the computation layer from a read address corresponding to the memristor array configured as the storage resources, to calculate an intermediate computation result for the read input data according to an operation logic corresponding to the input data, to write the intermediate computation result into the corresponding memristor array configured as the cache resources, to read the intermediate computation result from the read address corresponding to the cache resources, to perform an operation on the intermediate computation result according to the operation logic corresponding to the intermediate computation result to obtain output data of the computation layer, the output data is written to a corresponding memristor array configured as a storage resource.

It should be noted that the operation mode of each memristor slice in the at least one memristor slice corresponding to the computation layer further characterizes the type and the number of the functional units configured to the computation layer.

According to the technical scheme provided by the embodiment, the hardware accelerator determines a corresponding data mapping mode when executing the operation of the computing layer of the set computing model based on the operation logic and the hardware architecture of the computing layer and the working mode of each memristor in at least one memristor corresponding to the computing layer, so that the corresponding hardware resources are scheduled based on the data mapping mode, the conversion logic of the input data to the output data of the computing layer represented by the data mapping mode executes the related operation of the computing layer, and the operation performance of the hardware accelerator is improved.

As another embodiment of the present application, each of the at least one memristor dice further includes a shared cell bank electrically connected with all memristor arrays in the memristor dice; the sharing unit library comprises at least one functional unit; the at least one functional unit is for common use by all memristor arrays in the memristor tile. The hardware architecture of the hardware accelerator is described with reference to fig. 2.

The hardware accelerator can determine the functional units which need to be called when the operation of the computing layer is executed from the shared unit library based on the weight parameters related to the computing layer. In practical application, when the weight parameters of the calculation layer comprise the number and size information of the filters of the pooling layer, the determined functional units comprise the maximum pooling unit; and when the weight parameter of the calculation layer comprises the activation function, the determined functional unit comprises an activation function unit.

The hardware accelerator may also determine, based on the topology and the weight parameters related to the computation layer, the feature parameters corresponding to the computation layer, and determine, based on the determined feature parameters, the functional units that need to be called when performing the computation of the computation layer from the shared unit library. The characteristic parameter characterizes the number of demands made on each functional unit.

In practical application, when a computation layer transfers input data for a memristor array configured as a computation resource, since digital-to-analog conversion needs to be performed on the input data, the input data is converted into corresponding analog data, and correlation operation is performed on the analog data obtained by the conversion, a functional unit configured for the computation layer includes: DAC, ADC, S + H, and S + A.

According to the technical scheme, the memristor arrays and the functional units of the memristor pieces in the hardware accelerator can be independently scheduled, when the operations of different calculation layers are executed, the functional units in the memristor pieces in the hardware accelerator can be used for being configured for the memristor arrays of the different calculation layers to be multiplexed, and the resource utilization rate can be improved.

In some embodiments, the set computational model is a neural network model comprising a convolutional layer, a pooling layer, and a fully-connected layer;

The implementation process of the hardware accelerator processing neural network model is described below with reference to a specific application example:

referring to fig. 6 and fig. 7, fig. 6 shows an implementation flow diagram of a hardware accelerator processing a neural network model provided by an embodiment of the present application, and fig. 7 shows a processing flow diagram of a neural network model provided by an embodiment of the present application. In fig. 7, the neural network model includes: at least one convolutional layer and at least one pooling layer are illustrated as examples.

And the hardware accelerator loads the neural network model file, analyzes the neural network model file to obtain an analysis result, and determines the model information of the neural network model based on the analysis result. Here, the model information of the neural network model includes:

the topological structure of the neural network model represents that the kth layer of the neural network model is a convolutional layer, the kth +1 layer is a pooling layer, and k is a positive integer;

input data of each computation layer of the neural network model, wherein the input data of the convolution layer is characterized by a first characteristic diagram 1 of 7 multiplied by 8, the input data of the pooling layer is characterized by a second characteristic diagram 2 of 4 multiplied by 16, and the output data of the pooling layer is characterized by a third characteristic diagram 3 of 2 multiplied by 16;

the weight parameters of the neural network model, which characterize the convolutional layer, include 16 filters with size 4 × 4 × 8 and sliding step size 1, and the filter of the pooling layer has size 2 × 2 × 16.

Here, since each filter in the k-th layer (convolutional layer) of the neural network model is a 4 × 4 × 8 three-dimensional weight, when performing convolution operation, an input block with the same size is taken in the first feature map 1 with the size of 7 × 7 × 8, and multiplication and accumulation are performed at the corresponding position, that is, a dot product operation of 4 × 4 × 8 is performed to obtain one output point 21; each filter slides on the first characteristic diagram according to the corresponding sliding step length and executes convolution operation to obtain a corresponding convolution result (a plurality of output points 21); all convolution results of the 16 filters with the first profile constitute a second profile 2 of size 4 × 4 × 16.

On the convolutional layer, the second feature map having a size of 4 × 4 × 16 obtained by the convolution operation is processed by using an activation function (e.g., sigmoid function) to obtain a processing result, and the processing result is stored.

Since the (k + 1) th layer (pooling layer) reads the second feature map 2 of the output size of 4 × 4 × 16 of the k-th layer from the corresponding storage resource, the second feature map 2 of the size of 4 × 4 × 16 is down-sampled with a filter of the size of 2 × 2 × 16, the third feature map 3 of the size of 2 × 2 × 16 is obtained, and the third feature map 3 of the size of 2 × 2 × 16 is stored. In the pooling layer, a maximum pooling operation of 2 × 2 is performed once to obtain one output point 31, and the third feature map 3 includes a plurality of output points 31.

And the hardware accelerator compiles the determined model information to obtain a compilation result, and determines corresponding configuration information when the hardware accelerator executes the operation of each calculation layer in at least one calculation layer based on the compilation result and the hardware architecture of the hardware accelerator. Here, the computation layer corresponds to the convolution layer and the pooling layer. The process for the hardware accelerator to determine configuration information for the convolutional and pooling layers is as follows:

since the k-th layer (convolutional layer) needs to input a first characteristic diagram 1 with a size of 7 × 7 × 8 and perform convolution operation on the first characteristic diagram, it is necessary to configure a digital-to-analog converter DAC, an analog-to-digital converter ADC, a sample-and-hold S + H device, and a shift-and-add S + a unit for the convolutional layer to convert the first characteristic diagram into a corresponding analog signal, to perform convolution operation on the converted analog signal using a memristor array, and to perform analog-to-digital conversion processing, shift-and-add processing, and the like on a result output by the memristor array, thereby outputting data of a corresponding second characteristic diagram.

Since there are 16 filters in the k-th layer (convolutional layer), and each filter is a 4 × 4 × 8 three-dimensional weight, the number of columns of the memristor array performing the convolutional operation is 16, and the number of rows is 4 × 4 × 8 — 128. Namely, a memristor array with the size of 128 x 16 can calculate an output point 21 on the second characteristic diagram 2 of the convolutional layer output. Here, on a memristor array of size 128 × 16, each memristor cell stores one weight value, each column stores all weight values (4 × 4 × 8) in one filter, and different columns store weight values of different filters.

Since the activation function is required to process the second characteristic diagram output by the convolutional layer and having a size of 4 × 4 × 16, to obtain a processing result, and to store the processing result, the hardware accelerator is further required to configure a corresponding activation function unit SU and a memristor array for storing the processing result of the activation function unit for the convolutional layer.

Since the k +1 th layer (pooling layer) needs to obtain the third feature map 3 of 2 × 2 × 16 by downsampling the second feature map 2 of 4 × 4 × 16 in size of the k-th layer output using a filter of 2 × 2 × 16, the hardware accelerator needs to configure the largest pooling unit MPU and a memristor array for storing the third feature map of 2 × 2 × 16 for the pooling layer.

In the case that the hardware architecture of the hardware accelerator characterizes that the hardware accelerator comprises 16 memristor slices, each memristor slice comprises 8 memristor arrays with the size of 16 × 16, 8 memristor arrays with the size of 16 × 16 in one memristor slice form a memristor array with the size of 128 × 16. Therefore, 1 memristor patch can satisfy a memristor array with a size of 128 × 16 required for performing convolution operation of the k-th layer. Here, the hardware accelerator may configure all memristor arrays in one of the 16 memristor slices to the convolutional layer for convolution operation when performing convolution operation of the k-th layer.

In practical application, the hardware accelerator can configure the memristor slice at the top left corner in the hardware accelerator as a computing resource, since a convolution operation needs to be performed based on the first feature diagram, a digital-to-analog converter is configured for each row of a memristor array with the size of 128 × 16 configured for a convolution layer, a sampling holder and an analog-to-digital converter are sequentially configured for each column of the memristor array, and a shift addition unit and an activation function unit electrically connected with the shift addition unit are configured. That is, the functional unit arranged for the convolutional layer includes: 128 digital analog converters, 16 sample-and-hold units, 16 analog-to-digital converters, 1 shift-and-add unit, and 1 activation function unit.

In order to reduce data delay time, according to the cache requirement of the convolutional layer, a corresponding number of memristor arrays are configured from memristor pieces near the configured computing resource as cache resources, and then according to the storage requirement of the convolutional layer, a corresponding number of memristor arrays are configured from memristor pieces at the rest positions as storage resources.

And when executing the operation of the computing layer, the hardware accelerator loads the input data of the computing layer and performs hardware resource scheduling on the hardware accelerator according to the corresponding configuration information so as to execute the related operation to obtain the operation result of the computing layer. Wherein the content of the first and second substances,

when the convolution operation of the k-th layer is executed, the hardware accelerator maps the weight parameters of 16 filters with the size of 4 × 4 × 8 to corresponding positions on all memristor arrays in the memristor chips configured to the k-th layer, the filters read 128 corresponding input data from storage addresses used for storing the first characteristic diagram 1 with the size of 7 × 7 × 8 according to corresponding sliding steps, correspondingly input the 128 input data into digital-to-analog converters electrically connected with each row of 8 memristor arrays with the size of 16 × 16, convert the 128 input data into 128 analog signals, correspondingly input the 128 analog signals into each row of the memristor arrays, and execute the dot product operation of 4 × 4 × 8 on the 128 analog signals based on the weight values stored by the corresponding memristor cells in the memristor arrays to obtain first processing results corresponding to the 128 analog signals, the memristor array inputs first processing results corresponding to 128 analog signals into a sampling holder for processing to obtain second processing results corresponding to the 128 analog signals; the corresponding sampling holder inputs the corresponding second processing result into the corresponding analog-digital converter for analog-digital conversion to obtain a corresponding third processing result; and the corresponding analog-digital converter inputs the corresponding third processing result into a shift addition unit configured to the convolutional layer for processing, so as to obtain output results corresponding to 128 input data. The output result here corresponds to the data of one output point 21 in the second characteristic diagram 2. The shift-and-add unit may write output results corresponding to the 128 input data into a memristor array configured as a cache resource. In this way, after each filter is slid on the whole first characteristic diagram according to the corresponding sliding step, the second characteristic diagram corresponding to the filter can be obtained. The second profile is composed of a plurality of output points 21.

The hardware accelerator may control an activation function unit configured to the convolution layer to read the second feature map from a storage address of a cache resource for storing data constituting an output point of the second feature map; and the activation function unit processes the 16 second characteristic diagrams to obtain a processing result output by the convolutional layer, and writes the processing result output by the convolutional layer into a memristor array configured as a storage resource.

When the downsampling operation is performed on the k +1 th layer, the hardware accelerator may configure the memristor arrays in all the memristor slices as storage resources, and of course, the storage resources may also be configured according to actual requirements. The hardware accelerator reads the processing result output by the convolution layer from the storage address of the memristor array used for storing the processing result output by the convolution layer, inputs the processing result output by the convolution layer into the maximum pooling unit configured to the pooling layer, executes downsampling operation to obtain the feature diagram output by the pooling layer, and writes the feature diagram output by the pooling layer into the corresponding memristor array serving as a storage resource. The feature map output by the pooling layer is used to input the next computing layer adjacent to the pooling layer.

It should be noted that, when the neural network model further includes a fully-connected layer, since the fully-connected layer is also a convolutional layer, the implementation process of configuring hardware resources for the fully-connected layer is similar to the implementation process of configuring hardware resources for the convolutional layer, and details are not described here.

To implement the method of the embodiments of the present application, a hardware accelerator is provided in the embodiments of the present application, where a hardware architecture of the hardware accelerator includes at least one memristor slice; each of the at least one memristor tile includes at least one memristor array;

In an embodiment, the model information includes a topology and weight parameters of each of the at least one computing layer;

when determining configuration information corresponding to the hardware accelerator in executing the operation of each of the at least one computation layer, the hardware accelerator is configured to:

In one embodiment, the set calculation model is a neural network model; and when determining the configuration information corresponding to the computing layer according to each type of characteristic parameter in the at least one type of characteristic parameter and the hardware architecture, the hardware accelerator is used for:

determining the first number based on filter related information of a convolutional layer and size information of memristor arrays in the memristor pieces, corresponding to convolutional layers of the neural network model; the filter related information includes the number of filters and size information of the filters.

In an embodiment, the model information comprises a topology, weight parameters and input data for each of the at least one computing layer; when determining configuration information corresponding to the hardware accelerator in executing the operation of each of the at least one computation layer, the hardware accelerator is further configured to:

In an embodiment, each of the at least one memristor dice further includes a shared cell bank electrically connected with all memristor arrays in the memristor dice; functional units in the shared cell library are commonly used by all memristor arrays in the memristor chip;

In one embodiment, the set calculation model is a neural network model, and the neural network model comprises a convolutional layer, a pooling layer and a full-link layer;

It should be noted that the hardware accelerator and the resource scheduling method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

In order to implement the method of the embodiment of the application, the embodiment of the application further provides an electronic device. Fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device includes:

a communication interface 1 capable of information interaction with other devices such as electronic devices and the like;

and the hardware accelerator 2 is connected with the communication interface 1 to realize information interaction with other equipment, and is used for executing the resource scheduling method provided by one or more technical schemes when running a computer program. And the computer program is stored on the memory 3.

In practice, of course, the various components in the electronic device are coupled together by the bus system 4. It will be appreciated that the bus system 4 is used to enable connection communication between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. For the sake of clarity, however, the various buses are labeled as bus system 4 in fig. 8.

The memory 3 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.

It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced Synchronous Dynamic Random Access Memory), Synchronous linked Dynamic Random Access Memory (DRAM, Synchronous Link Dynamic Random Access Memory), Direct Memory (DRmb Random Access Memory). The memory 3 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed by the embodiment of the present application can be applied to the hardware accelerator 2, or implemented by the hardware accelerator 2. The hardware accelerator 2 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware in the hardware accelerator 2. The hardware accelerator 2 described above may be a memristor array based accelerator. The hardware accelerator 2 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware accelerator, or implemented by a combination of hardware and software modules in the hardware accelerator. The software module may be located in a storage medium located in the memory 3, and the hardware accelerator 2 reads the program in the memory 3 and performs the steps of the foregoing method in conjunction with its hardware.

When the hardware accelerator 2 executes the program, the corresponding processes in the methods according to the embodiments of the present application are implemented, and for brevity, are not described herein again.

In an exemplary embodiment, the present application further provides a storage medium, specifically a computer-readable storage medium, for example, a memory 3 storing a computer program, which can be executed by the hardware accelerator 2 to perform the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing module, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A resource scheduling method is characterized by being applied to a hardware accelerator; a hardware architecture of the hardware accelerator includes at least one memristor slice; each of the at least one memristor tile includes at least one memristor array;

the resource scheduling method comprises the following steps:

2. The method according to claim 1, wherein the model information comprises a topology and weight parameters of each of the at least one computation layer;

3. The resource scheduling method according to claim 2, wherein the set calculation model is a neural network model; when determining the configuration information corresponding to the computing layer according to each type of feature parameter in the at least one type of feature parameter and the hardware architecture, the resource scheduling method includes:

4. The method according to claim 1, wherein the model information comprises a topology, weight parameters and input data of each of the at least one computation layer;

5. The resource scheduling method of any of claims 1-4, wherein each of the at least one memristor tile further comprises a shared cell bank electrically connected with all memristor arrays in the memristor tile; the sharing unit library comprises at least one functional unit; the at least one functional unit is commonly used by all memristor arrays in the memristor slice;

6. The resource scheduling method according to claim 5, wherein the set calculation model is a neural network model, and the neural network model comprises a convolutional layer, a pooling layer and a full connection layer;

7. A hardware accelerator, wherein a hardware architecture of the hardware accelerator comprises at least one memristor slice; each of the at least one memristor tile includes at least one memristor array;

8. The hardware accelerator of claim 7, wherein each of the at least one memristive tile further comprises a shared cell bank electrically connected with all memristive arrays in the memristive tile; functional units in the shared cell library are commonly used by all memristor arrays in the memristor chip;

9. An electronic device, comprising: a hardware accelerator and a memory for storing a computer program capable of running on the hardware accelerator,

wherein the hardware accelerator is configured to execute the steps of the resource scheduling method according to any one of claims 1 to 6 when running the computer program.

10. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a hardware accelerator, implements the steps of the resource scheduling method of any one of claims 1 to 6.